1 00:00:00,550 --> 00:00:03,040 Business impact analysis. 2 00:00:03,040 --> 00:00:04,940 In this lesson, we are going to talk about 3 00:00:04,940 --> 00:00:07,750 the concept of a business impact analysis. 4 00:00:07,750 --> 00:00:10,080 And this is really focused on how do things 5 00:00:10,080 --> 00:00:11,720 affect our business. 6 00:00:11,720 --> 00:00:13,970 Now when I talk about a business impact analysis 7 00:00:13,970 --> 00:00:15,633 this is also abbreviate as a BIA. 8 00:00:16,820 --> 00:00:18,440 This is a systemic activity 9 00:00:18,440 --> 00:00:20,470 that identifies organizational risks 10 00:00:20,470 --> 00:00:22,550 and determines their effect on ongoing 11 00:00:22,550 --> 00:00:24,630 mission critical operations. 12 00:00:24,630 --> 00:00:25,900 Let me give you a great example 13 00:00:25,900 --> 00:00:28,870 of a business impact analysis from my own company. 14 00:00:28,870 --> 00:00:32,000 My company happens to reside in Puerto Rico. 15 00:00:32,000 --> 00:00:34,760 Puerto Rico is a small Island in the Caribbean. 16 00:00:34,760 --> 00:00:36,820 Now you can see it here on the screen. 17 00:00:36,820 --> 00:00:38,570 Now because our company is in Puerto Rico, 18 00:00:38,570 --> 00:00:40,560 we have to worry about natural disasters 19 00:00:40,560 --> 00:00:42,440 and their impact to our business. 20 00:00:42,440 --> 00:00:45,890 For example, after a massive hurricane in 2017, 21 00:00:45,890 --> 00:00:49,170 the Island of Puerto Rico was without power for months. 22 00:00:49,170 --> 00:00:51,810 So, when we decided to move our company to Puerto Rico, 23 00:00:51,810 --> 00:00:54,270 we had to consider the impacts to our employees 24 00:00:54,270 --> 00:00:57,210 and our business if another huge storm came through. 25 00:00:57,210 --> 00:00:58,850 Because we are in Hurricane Alley 26 00:00:58,850 --> 00:01:01,050 and we're likely to get more storms. 27 00:01:01,050 --> 00:01:02,820 Now another thing we have in Puerto Rico 28 00:01:02,820 --> 00:01:04,930 that happened recently is earthquakes. 29 00:01:04,930 --> 00:01:07,100 So right after I moved my company to Puerto Rico 30 00:01:07,100 --> 00:01:08,330 the island started shaking 31 00:01:08,330 --> 00:01:10,210 and started having a series of earthquakes 32 00:01:10,210 --> 00:01:11,880 at the southern part of our island. 33 00:01:11,880 --> 00:01:14,130 Now this started creating damage to many businesses, 34 00:01:14,130 --> 00:01:16,760 houses and churches, as you can see here. 35 00:01:16,760 --> 00:01:19,060 Now because of our choice to be based out of Puerto Rico, 36 00:01:19,060 --> 00:01:21,620 our company, our employees and my family 37 00:01:21,620 --> 00:01:23,990 all have to have primary and backup plans 38 00:01:23,990 --> 00:01:26,980 to continue our operations even when the local power grid 39 00:01:26,980 --> 00:01:29,750 goes offline due to storms or earthquakes 40 00:01:29,750 --> 00:01:31,060 or any other reason. 41 00:01:31,060 --> 00:01:32,780 That is one of the main things we have to think about 42 00:01:32,780 --> 00:01:35,300 as part of our business impact analysis. 43 00:01:35,300 --> 00:01:37,790 Now, when you're conducting your business impact analysis 44 00:01:37,790 --> 00:01:39,540 this is going to be governed by metrics 45 00:01:39,540 --> 00:01:42,490 that express your system in terms of availability. 46 00:01:42,490 --> 00:01:44,840 For example, are you a 100% available 47 00:01:44,840 --> 00:01:46,670 or are you 90% available? 48 00:01:46,670 --> 00:01:50,010 In our company, we try to achieve 99% uptime, 49 00:01:50,010 --> 00:01:52,930 meaning we want to be available to you 99% of the time 50 00:01:52,930 --> 00:01:55,200 during normal business hours. 51 00:01:55,200 --> 00:01:57,110 Now, when we start talking about these metrics 52 00:01:57,110 --> 00:01:59,310 there are lots of different ones we have to consider. 53 00:01:59,310 --> 00:02:02,880 We have things like our Maximum Tolerable Downtime or MTD. 54 00:02:02,880 --> 00:02:06,000 We have Recovery Time Objective RTO. 55 00:02:06,000 --> 00:02:08,620 The Work Recovery Time WRT, 56 00:02:08,620 --> 00:02:11,350 or Recovery Point Objective RPO. 57 00:02:11,350 --> 00:02:13,170 Let's talk about each of these. 58 00:02:13,170 --> 00:02:16,580 Now when we talk about a Maximum Tolerable Downtime or MTD, 59 00:02:16,580 --> 00:02:18,200 this is the longest period of time 60 00:02:18,200 --> 00:02:19,820 a business can be inoperable 61 00:02:19,820 --> 00:02:22,760 without causing irrevocable business failure. 62 00:02:22,760 --> 00:02:24,840 Essentially, how long can you be down 63 00:02:24,840 --> 00:02:26,840 without going out of business? 64 00:02:26,840 --> 00:02:29,790 Now, the MTD is going to be different for each organization 65 00:02:29,790 --> 00:02:31,810 and even within each organization 66 00:02:31,810 --> 00:02:35,250 each of your business processes can have its own MTD. 67 00:02:35,250 --> 00:02:37,700 For example, some may be just a couple of minutes 68 00:02:37,700 --> 00:02:40,010 or a couple hours for critical functions. 69 00:02:40,010 --> 00:02:42,550 You may have up to 24 hours for urgent functions 70 00:02:42,550 --> 00:02:45,410 and up to seven days or longer for normal functions. 71 00:02:45,410 --> 00:02:47,640 It really does depend on your organization 72 00:02:47,640 --> 00:02:50,110 and you have to figure that out for yourself. 73 00:02:50,110 --> 00:02:52,560 Now the MTD is going to set your upper limit 74 00:02:52,560 --> 00:02:55,840 on the recovery time that the system and the asset owners 75 00:02:55,840 --> 00:02:59,020 have to have to resume your operations. 76 00:02:59,020 --> 00:03:00,640 So keeping that in mind, 77 00:03:00,640 --> 00:03:02,720 let's take a look at my own business. 78 00:03:02,720 --> 00:03:03,553 I already mentioned 79 00:03:03,553 --> 00:03:06,060 that power was a really important thing for us. 80 00:03:06,060 --> 00:03:08,730 So we know that power is critical to our business functions. 81 00:03:08,730 --> 00:03:10,750 Without it we really can't do our job. 82 00:03:10,750 --> 00:03:13,350 I can't film courses and I can't answer student questions 83 00:03:13,350 --> 00:03:15,730 if I don't have power to turn on my computers 84 00:03:15,730 --> 00:03:17,610 and be able to run our internet. 85 00:03:17,610 --> 00:03:19,830 Now, without this power, we can't do our job at all. 86 00:03:19,830 --> 00:03:22,020 So we have multiple backup systems 87 00:03:22,020 --> 00:03:24,060 to protect power to our building. 88 00:03:24,060 --> 00:03:26,700 First, we have power from our local electrical company 89 00:03:26,700 --> 00:03:29,120 coming through the wires on their grid. 90 00:03:29,120 --> 00:03:31,810 Then, we have solar power that provides as a backup 91 00:03:31,810 --> 00:03:33,760 during the day if the grid goes out 92 00:03:33,760 --> 00:03:36,060 and it actually acts as our primary electrical source 93 00:03:36,060 --> 00:03:37,570 during the day because it's so bright 94 00:03:37,570 --> 00:03:39,200 and sunny in Puerto Rico. 95 00:03:39,200 --> 00:03:41,570 Now after that, we have some Tesla power walls 96 00:03:41,570 --> 00:03:42,403 that we've installed. 97 00:03:42,403 --> 00:03:43,320 These are batteries. 98 00:03:43,320 --> 00:03:45,330 And these collect power from our solar system 99 00:03:45,330 --> 00:03:46,163 during the day, 100 00:03:46,163 --> 00:03:48,040 and that way, even if the grid goes out, 101 00:03:48,040 --> 00:03:49,720 we can end up having those batteries 102 00:03:49,720 --> 00:03:51,470 provide the power to us. 103 00:03:51,470 --> 00:03:53,520 And that can happen at night as well. 104 00:03:53,520 --> 00:03:54,630 When the sun goes down, 105 00:03:54,630 --> 00:03:56,870 we can actually survive off those batteries 106 00:03:56,870 --> 00:03:58,510 if the grid is gone. 107 00:03:58,510 --> 00:04:00,930 Now, if it becomes cloudy then the solar won't work. 108 00:04:00,930 --> 00:04:02,160 Well, we still have the batteries. 109 00:04:02,160 --> 00:04:05,060 And so we have primary of grid, secondary solar, 110 00:04:05,060 --> 00:04:06,820 tertiary of batteries. 111 00:04:06,820 --> 00:04:08,700 But for us, that's not enough. 112 00:04:08,700 --> 00:04:10,180 We actually in addition to that, 113 00:04:10,180 --> 00:04:11,800 have a diesel generator as well. 114 00:04:11,800 --> 00:04:14,520 And that diesel generator has a hundred gallons of fuel. 115 00:04:14,520 --> 00:04:16,740 That's enough for us to run seven days straight 116 00:04:16,740 --> 00:04:18,450 without the need for more fuel. 117 00:04:18,450 --> 00:04:22,170 So for us to be without power takes a lot these days. 118 00:04:22,170 --> 00:04:25,230 Because we have the grid, we have solar, we have battery, 119 00:04:25,230 --> 00:04:27,800 we have diesel, we have all of these systems in place 120 00:04:27,800 --> 00:04:29,530 because we realize that is an important 121 00:04:29,530 --> 00:04:32,240 mission essential thing for us is to have power 122 00:04:32,240 --> 00:04:34,420 for us to be able to do our jobs. 123 00:04:34,420 --> 00:04:37,330 Now, if the power grid goes out for more than 60 minutes, 124 00:04:37,330 --> 00:04:39,080 we found another issue though. 125 00:04:39,080 --> 00:04:40,750 And that's that our primary internet connection 126 00:04:40,750 --> 00:04:42,860 from our cable provider dies. 127 00:04:42,860 --> 00:04:45,260 Now, that isn't a problem for me if I'm filming, 128 00:04:45,260 --> 00:04:47,000 but if I'm trying to answer your questions 129 00:04:47,000 --> 00:04:48,290 and do student support, 130 00:04:48,290 --> 00:04:49,650 we can't do that without internet 131 00:04:49,650 --> 00:04:51,660 because we have to be able to reach to you, right? 132 00:04:51,660 --> 00:04:53,320 And so that's another issue for us. 133 00:04:53,320 --> 00:04:55,880 And so we have a secondary service for that. 134 00:04:55,880 --> 00:04:58,500 We have a cellular modem that we use as a backup. 135 00:04:58,500 --> 00:04:59,850 Now we realized recently, 136 00:04:59,850 --> 00:05:01,310 there's a big storm that came through 137 00:05:01,310 --> 00:05:03,330 and our area lost power for three days. 138 00:05:03,330 --> 00:05:04,310 Now we didn't lose power 139 00:05:04,310 --> 00:05:05,930 'cause we had all these systems in place, 140 00:05:05,930 --> 00:05:07,780 but our internet connection died. 141 00:05:07,780 --> 00:05:09,670 And so we tried to switch over to the cellular modem 142 00:05:09,670 --> 00:05:12,780 and we found out the speed was horrible. 143 00:05:12,780 --> 00:05:13,613 Why? 144 00:05:13,613 --> 00:05:15,010 Because everybody else was out as well. 145 00:05:15,010 --> 00:05:16,840 And they all went onto their modems. 146 00:05:16,840 --> 00:05:18,290 And because they went to a cellular modem, 147 00:05:18,290 --> 00:05:20,290 the cell towers were being flooded with users. 148 00:05:20,290 --> 00:05:22,920 And so we were getting speeds that were 10 149 00:05:22,920 --> 00:05:25,970 or 20 or 30 or 40 kilobits per second. 150 00:05:25,970 --> 00:05:29,275 And we were hardly able to even open up our ticket system 151 00:05:29,275 --> 00:05:30,870 to be able to answer our students' questions. 152 00:05:30,870 --> 00:05:32,610 So now that we've learned from that, 153 00:05:32,610 --> 00:05:34,730 we have a third internet provider as well 154 00:05:34,730 --> 00:05:36,560 that provides a service over microwave 155 00:05:36,560 --> 00:05:39,070 and they have a better backup plan for power outages. 156 00:05:39,070 --> 00:05:40,780 So they maintain services up 157 00:05:40,780 --> 00:05:42,790 even when the local cable company is down. 158 00:05:42,790 --> 00:05:44,860 So this helps eliminate that bottleneck. 159 00:05:44,860 --> 00:05:47,610 My point here is that you have to do this business impact 160 00:05:47,610 --> 00:05:49,470 for yourself in your organization. 161 00:05:49,470 --> 00:05:52,250 And figure out what things are critical to your operations 162 00:05:52,250 --> 00:05:55,200 and put things in place to help protect those. 163 00:05:55,200 --> 00:05:57,170 So when I look at my support services 164 00:05:57,170 --> 00:06:00,510 what is our MTD for our support services? 165 00:06:00,510 --> 00:06:03,110 Well, we calculated that our maximum tolerable downtime 166 00:06:03,110 --> 00:06:04,610 is 12 hours. 167 00:06:04,610 --> 00:06:06,570 Now, how do I come up with 12 hours? 168 00:06:06,570 --> 00:06:08,030 Well, we started thinking about 169 00:06:08,030 --> 00:06:10,750 if I was a student, what was the longest period of time 170 00:06:10,750 --> 00:06:11,583 I would want to go 171 00:06:11,583 --> 00:06:13,260 without getting an answer to my questions? 172 00:06:13,260 --> 00:06:15,610 And we figured that 12 hours was a reasonable time 173 00:06:15,610 --> 00:06:17,470 in an emergency situation. 174 00:06:17,470 --> 00:06:18,500 Now to accommodate that 175 00:06:18,500 --> 00:06:21,410 we actually have our staff split in two parts. 176 00:06:21,410 --> 00:06:23,730 Half of my team resides in Puerto Rico. 177 00:06:23,730 --> 00:06:26,020 The other half of my team is in the Philippines. 178 00:06:26,020 --> 00:06:28,230 They are 12 hours off sync from each other. 179 00:06:28,230 --> 00:06:30,020 So when it's daytime in the Philippines, 180 00:06:30,020 --> 00:06:31,430 it's nighttime in Puerto Rico. 181 00:06:31,430 --> 00:06:32,820 And when it's daytime in Puerto Rico, 182 00:06:32,820 --> 00:06:34,310 it's nighttime in the Philippines. 183 00:06:34,310 --> 00:06:36,850 And so we actually can cover almost 24 hours a day 184 00:06:36,850 --> 00:06:38,280 using these two locations. 185 00:06:38,280 --> 00:06:39,880 'Cause each side works eight hours. 186 00:06:39,880 --> 00:06:41,100 And so there's a little bit of time 187 00:06:41,100 --> 00:06:43,070 that isn't covered by both teams. 188 00:06:43,070 --> 00:06:46,040 Now that again works within our 12 hours of time. 189 00:06:46,040 --> 00:06:47,290 Now, the other good thing about this 190 00:06:47,290 --> 00:06:49,580 is because they're so geographically distance 191 00:06:49,580 --> 00:06:51,730 if there's a big storm that takes out Puerto Rico, 192 00:06:51,730 --> 00:06:53,840 hopefully it doesn't affect the Philippines. 193 00:06:53,840 --> 00:06:55,450 If there's a big storm that takes out the Philippines, 194 00:06:55,450 --> 00:06:57,360 hopefully it won't affect Puerto Rico. 195 00:06:57,360 --> 00:06:59,950 And so by having our geographic diversity here 196 00:06:59,950 --> 00:07:02,220 it allows us to maintain that 12 hour response time. 197 00:07:02,220 --> 00:07:04,450 Because even if Puerto Rico was down 198 00:07:04,450 --> 00:07:07,440 well, in the 12 hours, the Philippines team would be awake 199 00:07:07,440 --> 00:07:09,100 and they'll be able to take on that load 200 00:07:09,100 --> 00:07:10,700 I'll be able to answer your questions. 201 00:07:10,700 --> 00:07:11,880 And so that's why we do that. 202 00:07:11,880 --> 00:07:13,350 And that's why we break up our company. 203 00:07:13,350 --> 00:07:14,740 It was a risk management decision 204 00:07:14,740 --> 00:07:17,940 that we've made to provide better service to our students. 205 00:07:17,940 --> 00:07:20,600 Now, again, you need to figure out what your MTD is 206 00:07:20,600 --> 00:07:22,720 for your company in the real world, 207 00:07:22,720 --> 00:07:24,660 so you can design your risk management plan 208 00:07:24,660 --> 00:07:27,090 around supporting that MTD. 209 00:07:27,090 --> 00:07:28,500 What we've done for us, 210 00:07:28,500 --> 00:07:30,190 was thinking about the biggest threats to us, 211 00:07:30,190 --> 00:07:31,700 which are natural disasters. 212 00:07:31,700 --> 00:07:34,220 And so we have our MTD built around that 213 00:07:34,220 --> 00:07:35,580 for our student support services 214 00:07:35,580 --> 00:07:37,710 and we put power and internet connectivity 215 00:07:37,710 --> 00:07:40,120 at the top of that list of things that we need to have 216 00:07:40,120 --> 00:07:42,180 two and three and four different mechanisms 217 00:07:42,180 --> 00:07:44,100 to be able to provide backups to our backups, 218 00:07:44,100 --> 00:07:46,100 to make sure we can always support you. 219 00:07:46,100 --> 00:07:48,730 Now, the next one we want to talk about is our RTO. 220 00:07:48,730 --> 00:07:50,950 This is our recovery time objective. 221 00:07:50,950 --> 00:07:53,200 Now this is the length of time it takes after an event 222 00:07:53,200 --> 00:07:56,290 to resume your normal business operations and activities. 223 00:07:56,290 --> 00:07:58,520 When you start thinking about recovery time objective 224 00:07:58,520 --> 00:08:01,260 I want you to think about the fact of something went down. 225 00:08:01,260 --> 00:08:02,170 We lost power. 226 00:08:02,170 --> 00:08:04,130 How quickly do you need it back? 227 00:08:04,130 --> 00:08:07,820 In my case, we have a 60 second time for power. 228 00:08:07,820 --> 00:08:09,290 We want to make sure our power is back up 229 00:08:09,290 --> 00:08:11,060 and online within 60 seconds. 230 00:08:11,060 --> 00:08:12,220 Now, is that achievable? 231 00:08:12,220 --> 00:08:13,053 Yes. 232 00:08:13,053 --> 00:08:15,000 If you have a backup diesel generator 233 00:08:15,000 --> 00:08:17,110 it will turn on, in about 45 seconds 234 00:08:17,110 --> 00:08:19,270 and transfer power to the diesel generator. 235 00:08:19,270 --> 00:08:22,970 Now my wife, wasn't happy with 45 seconds or 60 seconds 236 00:08:22,970 --> 00:08:25,330 and she wanted a recovery time of zero. 237 00:08:25,330 --> 00:08:26,730 Now, can I achieve that? 238 00:08:26,730 --> 00:08:27,690 The answer is yes. 239 00:08:27,690 --> 00:08:28,523 And that's one of the reasons 240 00:08:28,523 --> 00:08:30,470 why we have those battery backup systems. 241 00:08:30,470 --> 00:08:31,940 Because if power goes away, 242 00:08:31,940 --> 00:08:33,810 those batteries come on instantly. 243 00:08:33,810 --> 00:08:35,600 There is zero lag time there. 244 00:08:35,600 --> 00:08:37,330 And so we're able to hit a recovery time objective 245 00:08:37,330 --> 00:08:39,390 for power of zero seconds. 246 00:08:39,390 --> 00:08:41,670 Now the overall power of getting it back to the grid, 247 00:08:41,670 --> 00:08:42,800 we can't control that. 248 00:08:42,800 --> 00:08:44,640 That's up to our local power company. 249 00:08:44,640 --> 00:08:47,200 But we can make sure that we can recover our business 250 00:08:47,200 --> 00:08:50,140 and make sure we're on battery, on solar, or on generator 251 00:08:50,140 --> 00:08:51,630 within zero seconds. 252 00:08:51,630 --> 00:08:53,660 And that's our recovery time objective. 253 00:08:53,660 --> 00:08:56,220 Now, the next one want to talk about is work recovery time 254 00:08:56,220 --> 00:08:57,820 or WRT. 255 00:08:57,820 --> 00:09:00,360 This is the length of time in addition to the RTO 256 00:09:00,360 --> 00:09:03,360 of individual systems to perform re-integration and testing 257 00:09:03,360 --> 00:09:06,550 of a restored or upgraded system, following an event. 258 00:09:06,550 --> 00:09:07,990 So let me give you an example. 259 00:09:07,990 --> 00:09:11,030 Let's say in my organization, we had a power outage 260 00:09:11,030 --> 00:09:12,150 and we didn't have the batteries yet. 261 00:09:12,150 --> 00:09:13,910 We had to rely on those generators. 262 00:09:13,910 --> 00:09:16,150 So we had a 60 second recovery time. 263 00:09:16,150 --> 00:09:18,740 Well, in 45 seconds, power comes back up. 264 00:09:18,740 --> 00:09:22,090 But if those systems went down because of a power surge, 265 00:09:22,090 --> 00:09:24,200 and I had to replace one of my servers, 266 00:09:24,200 --> 00:09:26,810 well, that's going to take additional work recovery time. 267 00:09:26,810 --> 00:09:28,130 I fixed the main problem 268 00:09:28,130 --> 00:09:30,390 the recovery time objective of getting the power up. 269 00:09:30,390 --> 00:09:32,710 But now I have to fix the second and third order effects 270 00:09:32,710 --> 00:09:34,570 to get work product going again 271 00:09:34,570 --> 00:09:36,150 which might be rebooting your computer. 272 00:09:36,150 --> 00:09:37,680 It might be rebuilding a computer. 273 00:09:37,680 --> 00:09:39,570 It might be replacing a hard drive. 274 00:09:39,570 --> 00:09:40,580 Whatever those things are, 275 00:09:40,580 --> 00:09:42,810 I have to perform that re-integration and testing 276 00:09:42,810 --> 00:09:44,460 to bring those systems back online 277 00:09:44,460 --> 00:09:46,290 in an upgrade or restored state 278 00:09:46,290 --> 00:09:48,800 to be able to get us back to regular work product. 279 00:09:48,800 --> 00:09:51,280 And our final one we want to talk about is RPO. 280 00:09:51,280 --> 00:09:53,450 This is our recovery point objective. 281 00:09:53,450 --> 00:09:55,200 This is the longest period of time 282 00:09:55,200 --> 00:09:57,540 that an organization can tolerate lost data 283 00:09:57,540 --> 00:09:59,260 being unrecoverable. 284 00:09:59,260 --> 00:10:00,900 Now the way I like to think about this one, 285 00:10:00,900 --> 00:10:04,280 when I think about RPO is think about ransomware. 286 00:10:04,280 --> 00:10:05,930 If you have ransomware on a system, 287 00:10:05,930 --> 00:10:07,570 it's going to encrypt your files. 288 00:10:07,570 --> 00:10:09,340 Now you've got a couple of choices here. 289 00:10:09,340 --> 00:10:12,050 You can pay the ransom, which we never recommend. 290 00:10:12,050 --> 00:10:13,990 You could try to crack the ransomware key 291 00:10:13,990 --> 00:10:16,610 which could take you days, weeks, or months or years, 292 00:10:16,610 --> 00:10:18,070 depending on how strong it is 293 00:10:18,070 --> 00:10:20,380 or you can actually wipe that system 294 00:10:20,380 --> 00:10:22,390 and recover from a known good backup. 295 00:10:22,390 --> 00:10:23,390 Well, that's great. 296 00:10:23,390 --> 00:10:24,810 Let's go ahead and choose that option. 297 00:10:24,810 --> 00:10:27,470 Well, if we do that, what is the longest period of time 298 00:10:27,470 --> 00:10:29,580 that we can tolerate data loss? 299 00:10:29,580 --> 00:10:31,870 Well, there's going to be time that we're going to be lagging 300 00:10:31,870 --> 00:10:33,870 as we're recovering all that data back. 301 00:10:33,870 --> 00:10:36,410 And that data may have several hours 302 00:10:36,410 --> 00:10:38,100 since it was last backed up. 303 00:10:38,100 --> 00:10:41,010 For instance, if you run your backup once a day at midnight, 304 00:10:41,010 --> 00:10:42,820 that's when your data was backed up. 305 00:10:42,820 --> 00:10:45,220 And if this ransomware hits you at six in the morning, 306 00:10:45,220 --> 00:10:46,970 you have six hours worth of lost data 307 00:10:46,970 --> 00:10:48,900 because you don't have a backup of that. 308 00:10:48,900 --> 00:10:49,733 This is what we're talking about 309 00:10:49,733 --> 00:10:51,820 when we're talking about recovery point objective. 310 00:10:51,820 --> 00:10:54,420 That six hours is going to be lost period of time. 311 00:10:54,420 --> 00:10:56,830 And so if your RPO was 12 hours, that's fine. 312 00:10:56,830 --> 00:10:59,720 If your RPO was four hours, you've just broken your RPO. 313 00:10:59,720 --> 00:11:00,960 So you need to keep that in mind. 314 00:11:00,960 --> 00:11:03,300 When you're thinking about your recovery point objective 315 00:11:03,300 --> 00:11:07,100 you are focused on how long you can be without your data. 316 00:11:07,100 --> 00:11:08,870 That is the whole idea here. 317 00:11:08,870 --> 00:11:11,810 Now, when we start thinking about your MTD and your RPO 318 00:11:11,810 --> 00:11:14,180 these are key terms you need to know for the exam. 319 00:11:14,180 --> 00:11:15,310 They're going to help you determine 320 00:11:15,310 --> 00:11:17,210 which business functions are critical. 321 00:11:17,210 --> 00:11:20,520 And they're going to specify appropriate risk countermeasures 322 00:11:20,520 --> 00:11:21,890 to help you make sure 323 00:11:21,890 --> 00:11:23,420 that you're going to be able to get those systems 324 00:11:23,420 --> 00:11:26,760 up and running and within those MTD and RPOs. 325 00:11:26,760 --> 00:11:29,560 For example, if your RPO is measured in days 326 00:11:29,560 --> 00:11:31,720 then a simple tape backup will work fine. 327 00:11:31,720 --> 00:11:33,520 Or you could use a network attached storage array, 328 00:11:33,520 --> 00:11:34,880 or something like that. 329 00:11:34,880 --> 00:11:38,550 But if your RPO is zero or measured in minutes or seconds, 330 00:11:38,550 --> 00:11:40,670 then a more expensive server cluster backup 331 00:11:40,670 --> 00:11:43,240 and redundancy solution is going to be needed. 332 00:11:43,240 --> 00:11:45,350 A good example of this is my own company. 333 00:11:45,350 --> 00:11:46,890 We have a server cluster backup 334 00:11:46,890 --> 00:11:50,020 and full redundancy solution for our web servers. 335 00:11:50,020 --> 00:11:50,853 This is because 336 00:11:50,853 --> 00:11:53,260 we can't afford to be down for long periods of time 337 00:11:53,260 --> 00:11:56,140 when over 200,000 students are relying on our servers 338 00:11:56,140 --> 00:11:58,170 to learn from our videos and our practice exams 339 00:11:58,170 --> 00:12:00,310 to prepare to pass their certifications. 340 00:12:00,310 --> 00:12:03,010 When I sold you a course, I sold you access to that course. 341 00:12:03,010 --> 00:12:04,800 We want to make sure you have access. 342 00:12:04,800 --> 00:12:06,560 And we can't say, "Oh, I'm sorry. 343 00:12:06,560 --> 00:12:08,580 We lost all of your data for the last three weeks. 344 00:12:08,580 --> 00:12:10,030 We're going to have to have you start over." 345 00:12:10,030 --> 00:12:12,010 So we want to make sure we have that set up right. 346 00:12:12,010 --> 00:12:14,370 And so we've taken the risk management mitigations 347 00:12:14,370 --> 00:12:17,650 to help make sure we maintain a good MTD and RPO 348 00:12:17,650 --> 00:12:20,857 based on that need keeping our MTD very low in time 349 00:12:20,857 --> 00:12:23,000 and our RPO, very low in time. 350 00:12:23,000 --> 00:12:24,940 So you get a better experience. 351 00:12:24,940 --> 00:12:26,230 Now we've covered the basics 352 00:12:26,230 --> 00:12:28,500 but there's still another piece we have to talk about. 353 00:12:28,500 --> 00:12:30,740 Because when you start designing your disaster recovery 354 00:12:30,740 --> 00:12:32,390 and continuity of operation plans, 355 00:12:32,390 --> 00:12:33,870 it requires that you have an understanding 356 00:12:33,870 --> 00:12:36,530 of your availability and reliability levels. 357 00:12:36,530 --> 00:12:38,430 This is because disasters can be caused 358 00:12:38,430 --> 00:12:41,010 by either internal or external forces. 359 00:12:41,010 --> 00:12:42,950 I mentioned a lot about external forces 360 00:12:42,950 --> 00:12:44,100 so far in this video. 361 00:12:44,100 --> 00:12:45,800 I talked about the fact that the power can go out 362 00:12:45,800 --> 00:12:48,070 or a hurricane can come, or earthquakes. 363 00:12:48,070 --> 00:12:50,360 Those are all external forces to my company. 364 00:12:50,360 --> 00:12:52,410 But I also have internal ones. 365 00:12:52,410 --> 00:12:54,620 For example, I might have a system administrator 366 00:12:54,620 --> 00:12:57,190 who makes a mistake and deletes the file share. 367 00:12:57,190 --> 00:12:59,150 I might have somebody who's trying to upgrade something 368 00:12:59,150 --> 00:13:00,750 and they crashed the servers. 369 00:13:00,750 --> 00:13:02,250 I may have a router or switch 370 00:13:02,250 --> 00:13:05,200 that just decides it's done working and it crashes itself. 371 00:13:05,200 --> 00:13:06,740 Or the power supply goes out. 372 00:13:06,740 --> 00:13:09,160 These are all things that are internal to my organization, 373 00:13:09,160 --> 00:13:11,270 and we have to take care of those. 374 00:13:11,270 --> 00:13:12,610 Now, when we talk about this, 375 00:13:12,610 --> 00:13:14,480 there's really two terms we need to talk about. 376 00:13:14,480 --> 00:13:16,080 And this is the mean time to repair 377 00:13:16,080 --> 00:13:18,130 and the mean time between failures. 378 00:13:18,130 --> 00:13:20,510 Now, when I talk about the mean time to repair 379 00:13:20,510 --> 00:13:23,050 I'm talking about the time on average that it takes 380 00:13:23,050 --> 00:13:26,150 to go from system failure to resuming operations. 381 00:13:26,150 --> 00:13:28,210 So if I have a server that crashes 382 00:13:28,210 --> 00:13:29,870 and I have to install a new server 383 00:13:29,870 --> 00:13:31,220 there's time involved with that. 384 00:13:31,220 --> 00:13:33,690 And that is what we're talking about with time to repair. 385 00:13:33,690 --> 00:13:35,320 Now when I talk about the time to repair, 386 00:13:35,320 --> 00:13:36,330 I want you to think about this 387 00:13:36,330 --> 00:13:38,230 in terms of the recovery time objective 388 00:13:38,230 --> 00:13:39,970 and the recovery point objective. 389 00:13:39,970 --> 00:13:42,790 Because those are the goal of what we're trying to go for. 390 00:13:42,790 --> 00:13:46,160 So if my recovery point objective is going to be four hours 391 00:13:46,160 --> 00:13:48,630 and it takes me two hours to recover the system, 392 00:13:48,630 --> 00:13:49,570 that's fine. 393 00:13:49,570 --> 00:13:50,560 But if I know it's going to take me 394 00:13:50,560 --> 00:13:51,990 eight hours to recover this system 395 00:13:51,990 --> 00:13:53,760 well, we need to come up with a new plan 396 00:13:53,760 --> 00:13:57,290 because we're not meeting the goal that we set in that RPO. 397 00:13:57,290 --> 00:13:59,640 Now the second part of this is our time to failure. 398 00:13:59,640 --> 00:14:02,240 And this is a time when you go from a normal operations 399 00:14:02,240 --> 00:14:03,990 into a system failure. 400 00:14:03,990 --> 00:14:06,780 Essentially, I like to think about this as the uptime. 401 00:14:06,780 --> 00:14:08,250 So when you think about the time to failure, 402 00:14:08,250 --> 00:14:10,310 if I started running all my servers today 403 00:14:10,310 --> 00:14:13,360 and they run flawlessly for one year and three days, 404 00:14:13,360 --> 00:14:14,470 and then they fail. 405 00:14:14,470 --> 00:14:17,580 Then my time to failure is one year and three days. 406 00:14:17,580 --> 00:14:19,860 If I take all the time that I've had that server 407 00:14:19,860 --> 00:14:21,200 and I average it together, 408 00:14:21,200 --> 00:14:23,720 I can figure out what my mean time to failure is 409 00:14:23,720 --> 00:14:25,660 over that extended period of time. 410 00:14:25,660 --> 00:14:27,610 Now the reason that the meantime to repair 411 00:14:27,610 --> 00:14:29,310 and the time to failure are important 412 00:14:29,310 --> 00:14:32,107 is because this tells me how long I expect to be down 413 00:14:32,107 --> 00:14:33,840 and how long I expect to be up. 414 00:14:33,840 --> 00:14:35,550 And that helps building my knowledge 415 00:14:35,550 --> 00:14:38,080 as I start building my continuity plans. 416 00:14:38,080 --> 00:14:40,330 Now when we talk about the time between failures 417 00:14:40,330 --> 00:14:43,950 this is the time going from one failure to the next failure. 418 00:14:43,950 --> 00:14:46,560 So for example, if I have a switch and it fails today 419 00:14:46,560 --> 00:14:49,160 and I put in a brand new switch and it took me three hours 420 00:14:49,160 --> 00:14:51,330 well, the time to repair was three hours. 421 00:14:51,330 --> 00:14:53,530 I had a failure, I fixed it within three hours 422 00:14:53,530 --> 00:14:54,820 and now we're back online. 423 00:14:54,820 --> 00:14:56,010 And so now that we're back online 424 00:14:56,010 --> 00:14:57,360 let's say we're going to continue operating. 425 00:14:57,360 --> 00:14:59,750 And it takes us a year before the next failure. 426 00:14:59,750 --> 00:15:02,440 That means I had time to repair of three hours, 427 00:15:02,440 --> 00:15:04,350 time to failure of one year. 428 00:15:04,350 --> 00:15:06,750 And if I put those together, I get the time between failures 429 00:15:06,750 --> 00:15:10,070 which in this case would be one year and three hours. 430 00:15:10,070 --> 00:15:13,650 That way I know how long I can expect a given piece of gear 431 00:15:13,650 --> 00:15:16,080 to work for me before I need to replace it. 432 00:15:16,080 --> 00:15:18,160 If I know my meantime between failures 433 00:15:18,160 --> 00:15:19,890 happens to be three years, 434 00:15:19,890 --> 00:15:21,010 then at two and a half years, 435 00:15:21,010 --> 00:15:22,090 I would want to replace that 436 00:15:22,090 --> 00:15:24,910 so I can plan to replace it before it fails. 437 00:15:24,910 --> 00:15:25,970 So just as a review, 438 00:15:25,970 --> 00:15:28,080 now that we've talked about what this looks like 439 00:15:28,080 --> 00:15:29,870 when we talk about meantime to repair, 440 00:15:29,870 --> 00:15:32,000 we're talking about the average time it takes 441 00:15:32,000 --> 00:15:34,300 to repair a network device when it breaks. 442 00:15:34,300 --> 00:15:35,850 And when we start talking about the meantime 443 00:15:35,850 --> 00:15:36,800 between failures, 444 00:15:36,800 --> 00:15:37,890 we're talking about the measure 445 00:15:37,890 --> 00:15:40,570 of the average time between failures of a device. 446 00:15:40,570 --> 00:15:42,880 So keep these two terms in mind for the exam, 447 00:15:42,880 --> 00:15:44,780 because you may see questions on them.