The primary objective when building a Business Continuity Plan (BCP) is to increase the likelihood that your organization will survive a disaster, such as a massive cybersecurity breach or a global pandemic. Many organizations overlook the importance of a BCP and question the return on investment in crafting one, citing the low likelihood of disasters. Unfortunately, the value of a BCP is only understood when a crisis strikes.
So, how will your organization react to a disaster? Do you have a Business Continuity Plan? Here are the steps to begin setting one up:
1. Define “Disaster”
For any organization, it’s important to begin by defining the parameters of a disaster. For a technology startup with a remote workforce, a pandemic such as COVID-19 may not crush the organization. But for major retailers that provide essential services to the public, the coronavirus requires contingency planning.
It’s therefore important to explore the types of disasters that are very impactful on organizations. These include:
- Natural disasters such as earthquakes, tornadoes, and pandemics.
- Social disruptions such as widespread protests, utility and service shortages (e.g., water and fuel), and war.
- Security incidents that can bring down a group of systems, resulting in widespread outage of services.
A common mistake when defining a disaster is to only think about direct impacts. For example, while COVID-19 may not directly impact the operations of a technology startup, it may affect the availability of its customers. An earthquake in California may not directly impact manufacturers in Seattle, but it could significantly disrupt their supply chains. The indirect impacts and consequences must be fully considered.
2. Conduct a Business Impact Analysis (BIA)
Once disasters have been defined, the next phase is the Business Impact Analysis (BIA). The importance of the BIA cannot be emphasized enough. The objective of the BIA is to identify the various disaster scenarios and the impact they could have on business operations.
BIA Step 1: Inventory Key Business Processes and Systems
The first step in crafting the BIA is to inventory key business processes and systems. This step is more difficult than one might first imagine. What makes a business process or system “key”? How do you prioritize one department over the other? Who should you ask and verify with? Each executive may have their own opinions and perceptions; furthermore, they may not fully understand the tactical systems and processes at mid-management levels.
In response, consider launching an organization-wide survey based on relevant criteria. The form might include items such as:
- Process or system name
- Process/system overview
- Is this directly customer-facing or does it directly support a customer-facing process?
- Does it collect customer data (or donor data for not-for-profits)?
- The supporting IT systems used
- Key suppliers (internal and external) to the process
- Other processes that depend on this process
- Which department (head) pays for all the resources used in this process?
- How many employees use this process or system?
- What would happen if this department, process, or system could not operate regularly?
BIA Step 2: Statement of Impact
As each process/system is being discussed and inventoried, make sure to summarize, document, and understand why it exists; what benefit it serves; and what problems it solves. This documentation is called the Statement of Impact. Do not underestimate the power of the pen; what gets documented gets understood, improved, and executed. Here are some Statement of Impact examples:
- The Accounts Payable (AP) Processing System: This system may not be able to timely process payments to suppliers within specified dates; the organization takes on the risk of incurring late fee penalties. Also, journal entries cannot be automatically posted to the GL system.
- The Point of Sale (POS) System: This system is owned by the “Stores” group and may not be able to timely record and process sales. It also collects donor information should a customer choose to donate their change to cancer research. Finally, journal entries cannot be automatically posted to the GL system.
BIA Step 3: Criticality Analysis
This step will further help define, prioritize, and identify key systems and processes. The Criticality Analysis involves identifying the threats to each system, while analyzing the likelihood of them occurring and the cost of their potential damage; it also lists the compensating/mitigating factors of those threats. In practice, this exercise requires various inputs, including cost, perceptions, results of audits, and more to properly execute.
The following is a very simple analysis and by no means complete or complex enough. A proper analysis will also include associated costs, the quantified likelihood of occurrence, and adjustments through insurance.
|System ||Threat ||Probability||What can happen?||Mitigating Factors||Next Steps
|AP System||Malware||Medium||Malware could severely limit the availability of the system.||Antivirus is installed on all employee computers and the server; this is a third party product and security controls are implemented by the vendor and verified via third party audit reports.||None.
|AP System||Pandemic||Low||Employees may not be able to physically come into the office to operate the AP system; the AP system is hosted in a file cabinet in Accounting.||Accounting personnel can log into the corporate network and process payables remotely.||None.
|POS System||Malware||Medium||Malware could severely limit the availability of the system; this system is built on legacy architecture from the 1970s where modern security patches cannot be applied. |
Donor data is collected through the POS terminals and a breach could release sensitive PII.
|None; a recent penetration test revealed critical weaknesses.||Build a backup system in case the POS system is unable to process orders; employees will manually document sales slips and key them in within the backup system.
|POS System||Pandemic||Low||Employees may not be able to maintain social distance when checking out a customer; the current process requires retail employees to physically scan merchandise while the customer stands in line.||None.||Enable “buy online and pick-up in stores” on desktop and mobile applications.
Have additional retail employees to absorb demand; also provide protective equipment such as hand sanitizers and masks.
In this very simple example above, it appears that the AP system is well covered to handle malware and a pandemic. However, the POS system is much less resilient and more critical to the organization. Given that knowledge, additional time and resources must be spent in identifying a contingency plan. In this example, the plan includes:
- Building a backup system database where employes can manually key in sales
- Expediting the release of the “buy online and pick-up in-store” feature on the e-commerce site
- Creating a plan to have additional employees and maintaining an emergency pile of personal protective equipment
3. Define Tolerable Downtime and Recovery Targets
Before the BCP is documented and finalized, the tolerable downtime and recovery target must be discussed, defined, and agreed to by leadership. This further fine-tunes systems and processes that are a higher priority but also provide solutions to residual risk. These necessary metrics include:
- Maximum Tolerable Downtime (MTD): This is the maximum time that a system can be down; any time that exceeds this metric will most likely impact the very survival of an organization.
- Maximum Tolerable Outage (MTO): This is the maximum time that an organization can tolerate operating in backup mode. For example, if the POS is down, retail employees must establish backup processes, such as manually writing down sales in dedicated sales forms and keying them in a backup system.
- Recovery Time Objective (RTO): From the onset of the disaster, how fast should a system come back online?
Establishing these metrics defines what success looks like but also provides an estimate on how much things cost. For example, assume that we want the POS system to be back online within one day. This will require additional resources and cost to build versus a system that takes three days to get back online.
4. Develop the Business Continuity Plan
Now that the BIA has been completed and your tolerable downtime targets established, it’s time to put the plan in writing. Developing BCPs is not simply about identifying critical processes but also identifying the opportunity to transform an organization, such as the development of new architectures, processes, and procedures. It also defines measures of success through recovery targets, such as the RTO discussed above.
The BCPs, then, are detailed instructions on how to achieve recovery targets and also how to respond to an event. For example, when the POS system goes offline, a BCP will answer who is responsible to get the backup system running and which processes to use. In the case of a pandemic, it will have covered who is responsible to maintain a pile of personal protective equipment for retail workers.
A BCP is a collection of these instructions tied to key processes and systems identified through the BIA process above. These instructions should be highly customized to each organization based on the risk tolerance, budget, and other factors at play. With that said, however, the following are typical procedures addressed in a BCP. This is meant to be an example, not an exhaustive list.
As mentioned above, it’s important to define what a disaster is for a company. Only then can a disaster be identified and appropriately responded to. This procedure includes:
- The core leadership team: Who will declare and execute business continuity plans? What if that person is not available?
- Criteria: For example, if the POS system is down for 10 minutes, is that a disaster? Or is it a disaster when it’s down for over 30 minutes?
Disasters often pose a threat to our well being. Procedures need to be documented and executed to ensure:
- All personnel are familiar with evacuation procedures
- Accessibility of personal protective equipment
- Emergency lighting
- Fire extinguishment
- CPR and first-aid training
- Personnel who can assist in the evacuation of injured people
- Emergency food and water
Roles and Responsibilities
Different types of disasters require different roles and responsibilities. For instance, identify specific names and people who will:
- Care for the injured
- Maintain emergency transportation
- Communicate to customers, family members, suppliers, partners, media, insurance, and the authorities
- Oversee and distribute emergency equipment (e.g., water, masks, first-aid)
- Identify and validate personnel (e.g., emergency SMS to role call employees)
- Assess damage
- Salvage equipment
- Provide security
Recovery and Continuing Operations
Agreed-upon procedures must exist on how to run in backup mode and execute recovery targets. For example:
- Instructions on which version of the backup system to make the primary system during an outage
- Location of backup system files
- Training on backup processes (e.g., filling out sales slips and entering them into the backup system)
- Documentation of key recovery and restore targets of the primary system
5. Update and Test Business Continuity Plans
In the Criticality Analysis section above, we determined that the threat of a pandemic is low, and many organizations may have opted to not spend the time and resources on contingency plans addressing one. This highlights the need to continuously (at least annually) revisit the BCP. The disaster landscape is contantly evolving, and new data can considerably change the probability of a disaster. For instance, a mature BCP process would’ve triggered a reassessment and response when a highly communicable coronavirus was discovered in January.
Depending on the nature of the organization, say, a technology startup, the BCP process may have concluded limited impact from a pandemic. However, a large retail organization could have come to a very different conclusion. Consider that Wimbledon paid $2 million for the last 14 years for pandemic insurance. They will now reportedly receive a $141 million payout from the policy. This is the spirit of continuously reassessing the BCP which is fundamentally about understanding risks to an organization and responding appropriately.
Finally, a plan becomes a dead document if it’s not practiced and tested. A fire drill gives meaning and embeds a sense of urgency over an evacuation map at the kitchenette. A test gives real feedback on the efficiency and effectiveness of the plan and, only then, can the plan be improved.
Leading Practice Guidance
For additional details, please reference the following standards and guidance:
Additional questions? Please reach out to Clark Nuber’s IT Services.
© Clark Nuber PS, 2020. All Rights Reserved