Full disclosure, we had an issue on our server last week – this was caused by a command being run on the server which caused a large number of files to be altered.
Initially we thought the issue was localised however it quickly became apparent it was server wide – this potentially meant the server operating system was compromised.
We use a backup system called Guardian, technically a very good restore tool. It takes incremental backups and you can roll back ad hoc, in the process it does a huge number of file and data checks thus ensuring the result is pretty much 100%. Guardian we were led to believe could handle any “catastrophic failure” instances.
When this issue was identified it was recommended we do a server restore which was (allegedly) going to take about 2 hrs. This was well wide of the mark. (any subsequent updates were also well wide of the mark).
Feedback was hard to come by and we finally gave up on the restore, created a new server and started putting sites on the new server.
This created the next challenge as we did not have access to some companies DNS records and by now it was comfortably after hours.
First our goals:
So our actions:
What is interesting (or concerning based on your viewpoint) is that most hosting providers operate the way we did so it is worth referring this to them for that once in a lifetime catastrophic failure.
Finally now we have set up our systems we will be coming to you to update various things in your systems.