Protecting an Enterprise from Cyber Catastrophe
Recovery planning must come before the disaster.
We are suffering an epidemic of cyber-attacks while in a viral pandemic. This post is for those who have responsibility for assuring that the IT-based services offered by their enterprise can quickly recover in the case of successful cyber-attack or other disaster.
University of Vermont Medical Center (UVMMC) is an excellent hospital. I owe my life to treatment there and am grateful for both the skill and the kindness of UVMMC staff. They have been devastated by a cyber-attack.
It took a full month for UVMMC to recover use of its patient database after the attack and the institution recently blamed failure to report COVID cases on after-effects. It is not possible to avoid all disasters; it is possible to recover quickly – but only if recovery has been planned and practiced in advance. There are several lessons in UVMMC’s travails for every organization and every business with a critical database.
At this point it would be reasonable and prudent for readers to ask whether I’m qualified to give this advice. I blog about a lot of stuff like education, politics, and economics which I’m not expert in. You don’t want to rely on amateur advice for service security.
At Microsoft in the early 90s I was responsible for the development of server-based products including Outlook and Exchange. Later I led the development and rollout of AT&T’s first ISP, AT&T WorldNet Service. ITXC, which Mary and I founded, had a network which spanned 200 countries and provided a VoIP service despised by most of the world’s telcos and quite a few governments. It had to be hacker resistant. NG Advantage, which we also founded, has an extensive internet of things (iot) network. I’m a nerd so I was deeply involved in the technology of all these products and services. More boasting here.
I’m no longer an expert in how to prevent a hacker attack although I did write a novel called hackoff.com. The technologies for intrusion and intrusion detection and prevention change so rapidly that only those active in the field have any hope to keep up. Fortunately, the principles of preparing for and accomplishing catastrophe recovery are largely the same no matter what tools mother nature or a hacker group used to bring your servers and your services down. This post is about preparing for recovery, a very separate subject than preventing attacks.
- Recovery planning starts with the assumption that there will be a disaster which renders all your organization’s computers unusable. Could be a fire, a flood, a cyber-attack or something else. UVMMC and the Green Mountain Care Board, which is their regulator, have been citing attacks on other hospitals and the continuing arms race between black-hat hackers and defenders. If you know that there is a possibility of a successful attack, there is no excuse for not having and rehearsing a recovery plan. Even the “unsinkable” Titanic didn’t put to sea without lifeboats.
- Recovery capability requires an off-premise backup of ALL critical data. In the olden days, we used to truck magnetic tapes with backup data to places like Iron Mountain in New York. Now the backup data can move over the internet, but the principle is the same. The backup data must not be on the same premises or, equally important, on the same network as the servers which are being used to provide the service.
- The off-premise backup data must be current. For many operations, including running a hospital, restoring the data as it was a month or even a week before the disaster struck means a significant loss of function. Even though it is only practical to backup an entire huge database periodically, changes to the database can also be sent offsite. Ideally these changes are applied to a shadow copy of the database so that almost all data can be restored immediately when required. The process of updating the shadow database must also be off-premise and off-network and not rely on any of the software used for the day-to-day service.
- Recovery of function must not depend on use of the original hardware. During Tropical Storm Irene, the State of Vermont’s computers in the basement of the Waterbury complex drowned. In the UVMMC disaster, whatever malware had been loaded on to the computers apparently took a month to eradicate. There didn’t used to be a good solution to the problem of quick access to replacement servers.
Now getting new server hardware up and running immediately sounds hard and expensive but is actually cheap and almost trivially easy. As long as preparation has been made in advance, it is possible to spin up a practically unlimited amount of computer power from cloud-providers like Amazon, Microsoft, or IBM within minutes. There is no significant standby cost for this capability. Once the cloud equipment is no longer needed, it can be shut down and the cloud billing meter stops.
Apparently the desktop computers and laptops (and possibly tablets) which are used at UVMMC to access data were also infected and unusable. Recovery of function cannot depend on restoring the access devices any more than it can depend of restoring the servers. In practice, this means that access to all essential functionality must be possible from a web-browser on any properly authenticated laptop, computer, or smartphone. There must be a small backup supply of devices to restore key functionality immediately. New ones can be purchased and placed in service in days so long as they don’t have to be loaded with special software.
- Recovery must be practiced frequently and after any change to the IT environment. Experience says that a recovery plan which has not been practiced before an emergency can be counted on to fail when disaster strikes. Lifeboat drill is mandatory. If an organization’s servers are not already in the cloud (as most should be), the organization must periodically practice bringing up its applications and restoring its data on cloud computers. Losing a few minutes’ data is excusable; losing access for up to an hour may be unavoidable. Losing access for a month means recovery has not been sufficiently planned or practiced.
- The functional recovery team must be separate from the hardware recovery team in order to restore function as quickly as possible. As soon as the environment has been compromised by disaster, the recovery team swings into a well-rehearsed routine of restoring data from the offsite backup to backup servers in the cloud (if it is not already being replicated there) and providing any new access devices and passwords needed. If the original hardware does end up coming back soon, there is a small expense for renting cloud-servers; but this is immaterial compared to the cost of not having access to critical data.
- The post-mortem which follows every disaster must separately determine why the vulnerability and how successful the recovery. The two issues are different.
Anyone who is responsible for critical systems in the public or private sector should be asking their own IT people two simple questions: when was the last successful rehearsal of our functional recovery plan? How long did it take to restore functionality in the rehearsal?
For disaster proofing your home computers, see Protecting Yourself from Cyber Disaster
Comments