System Down: The Anatomy of an “Oopsie!”

Home > Computers, Life, Security, Technology > System Down: The Anatomy of an “Oopsie!”

System Down: The Anatomy of an “Oopsie!”

January 8, 2020 Tim Dreyer Leave a comment Go to comments

You have all heard the horror stories of large organizations, and even governments, that have fallen prey to malicious actors. The massive data breaches, crippled businesses and governments, and the theft of medical records are some examples of these debilitating and embarrassing threats.

Having worked in IT since 1993, I have seen a lot. I have worked with businesses and governments, ranging in size from 3 employees to over 1000 employees. I have worked with businesses that have been hit by viruses that just a nuisance, and I have resolved situations where data got encrypted with a crypto locker variant. All of these situations occurred in businesses that were ill-prepared to deal with an imminent threat. One organization in particular leaps to mind when I think about this situation. The impact for them was especially painful in cost to recover, loss of productivity, and delay of order fulfillment.

For confidentiality, I will not be mentioning the company name or location. All I will say is that they operate in the Pacific Northwest.

This organization was a newer client at the time and had not yet agreed to implement all of the recommendations that I had proposed to them. As most businesses are, they are cost-conscious and had not budgeted for some of the changes. Their servers had been virtualized using Hyper-V on older hardware, so I was supporting one physical server and three virtualized servers.

This episode started when one of their employees disabled the anti-malware software on their computer because they thought it was causing performance issues with their virtual load testing solution. After it was disabled, this person mistyped the name of a well-known web site. That site was able to plant a RAT (Remote Access Trojan) on the computer. One more important detail: This person happened to be a local administrator on every computer in the company. After business hours, a bad actor located in another part of the world accessed this employee’s computer via the RAT. They then proceeded to disable the security solutions on every other computer in the organization. Once they accomplished this, they uploaded a file to every workstation and server in the organization. This file proceeded to encrypt all the data stored on the local drives. It then damaged the operating system in such a way that if the user rebooted to see if the problem went away, the operating system got damaged beyond repair. Since they were able to attack every computer in the organization, every bit of data on all the servers was encrypted.

By now you are probably thinking something like “Yikes! Thank goodness for disaster recovery solutions!”. That is the same thing I thought on my way into resolving this solution. And yes, thank goodness for the backups. The biggest problem we ran into with the restoration of data was performance. Their entire backup solution was cloud-based. Their internet was 50-megabit, so you’re thinking “no problem!”. That’s what I thought too. We’ll circle back to that in a few minutes.

The recovery for this client started immediately. The biggest blessing on this dark day was that I had just started an infrastructure refresh. I had just delivered a new physical server that was destined to by the new Hyper-V host. It was replacing hardware that was almost seven years old. Because I had the basic groundwork laid, I had all the new servers built and fully updated within 5 hours. This is the point where I started running into issues.

Something you may already know, but I’ll say it anyway: Not all cloud-based backup solutions are equal. This client had about 12-terabytes of data backed up to the cloud. Most of it was enormous CAD or other modeling files. As the data stared restoring to the server, we quickly maxed out the 50-megabit connection. I got the go-ahead from the owner to increase the speed to “whatever I thought was appropriate.” I called the ISP and had the bandwidth bumped to 200-megabit in less than 45 minutes. Now the frustration began in earnest. The backup solution that was in place did not list any speed limits on upstream or downstream data. There was a limit somewhere. There had to be with the poor restoration performance. The speed never went above 56-megabit. After testing and verifying the performance of the ISP, I called the backup vendor. When I finally got through 30 minutes later, they informed me that there wasn’t a speed limit, but they had algorithms that distributed the bandwidth so that one customer could not consume the entire connection. They either had a lot of customers, or they had very limited bandwidth. Of course, they would not admit to either, and I was stuck with the miserable performance.

I ended up working with the various department heads to determine which files were critical RIGHT NOW and selectively restored those files first. They then specified a secondary level of important files. Everything left was restored last. The largest downside to this was that restoration was extremely tedious due to complex directory structures.

While the data was restoring, I started rebuilding all the computers in the organization. After the first 24-hours, I had the servers rebuilt, updated and secured, the domain and AD restored, all the workstations rebuilt, and data restoring to the shares.

All told, this project took the better part of 5 days. The majority of that was restoring the data files and fixing glitches with the permissions on shares and files. In total, there were over 90 billable hours spent on this project. The total cost in billable hours worked out to $16,650. All because one person decided to disable their security software. We worked with the client and lowered the bill to just over $11,000. They still complained, but they also realized the value of the work to their business, so eventually paid.

Lessons learned from this experience:

Verify performance and capabilities for cloud based backup solutions before signing up for them
Have a local copy of the backup date
- Their backup solution had an unused option to backup to a local NAS
Don’t just list the security recommendations, but make them a key part of the presentation, repeatedly highlighting the potential issues and driving the security concerns home
When there is push-back on remediation suggestions, you also need to push back, so the point is made abundantly clear. Be prepared when you go into your meeting with the following information.
- Be able to back up your assertions with actual data and examples
- Include potential disaster remediation times and costs
- Include the hidden costs, such as damage to the business reputation, loss of productivity, and loss of product production

This story could have had a much worse ending than it did. At the time, this was an organization of 12 people, with seven computers and three servers. Imagine the impact on a larger organization that was ill-prepared for such an event. The results could be catastrophic to the business!

Techie Guru

System Down: The Anatomy of an “Oopsie!”

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Techie Guru

System Down: The Anatomy of an “Oopsie!”

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta