1 00:00:00,540 --> 00:00:02,350 Machine learning. 2 00:00:02,350 --> 00:00:05,000 In this lesson, we're going to talk about machine learning 3 00:00:05,000 --> 00:00:06,990 and a couple of related concepts. 4 00:00:06,990 --> 00:00:09,640 These are known as artificial intelligence or AI, 5 00:00:09,640 --> 00:00:13,190 machine learning or ML, and deep learning. 6 00:00:13,190 --> 00:00:16,000 First, let's talk about artificial intelligence. 7 00:00:16,000 --> 00:00:17,960 Now, artificial intelligence is the science 8 00:00:17,960 --> 00:00:19,720 of creating machines with the ability 9 00:00:19,720 --> 00:00:22,410 to develop problem solving and analysis strategies 10 00:00:22,410 --> 00:00:25,650 without significant human direction or intervention. 11 00:00:25,650 --> 00:00:27,270 Essentially, we want to have a machine 12 00:00:27,270 --> 00:00:28,830 that can think for itself. 13 00:00:28,830 --> 00:00:31,250 Now, there are a lot of great things that we can do 14 00:00:31,250 --> 00:00:34,850 with artificial intelligence, especially in cyber security. 15 00:00:34,850 --> 00:00:36,710 When we start looking at artificial intelligence, 16 00:00:36,710 --> 00:00:38,500 we can create these expert systems. 17 00:00:38,500 --> 00:00:41,810 And the original ones use these if-then-else statements 18 00:00:41,810 --> 00:00:44,770 to basically make things happen based on a limited dataset 19 00:00:44,770 --> 00:00:47,430 using knowledge bases and set rules, 20 00:00:47,430 --> 00:00:50,080 but modern AI can think for itself. 21 00:00:50,080 --> 00:00:52,260 And that's really where the benefit comes. 22 00:00:52,260 --> 00:00:54,450 Now, we're going to talk about this as we go through 23 00:00:54,450 --> 00:00:56,540 in terms of machine learning, though, 24 00:00:56,540 --> 00:00:58,970 because machine learning is a component of AI 25 00:00:58,970 --> 00:01:01,170 that really enables the machines to develop strategies 26 00:01:01,170 --> 00:01:02,940 for solving a given task. 27 00:01:02,940 --> 00:01:04,820 Now, if you get a labeled dataset 28 00:01:04,820 --> 00:01:07,180 where the features have been manually identified, 29 00:01:07,180 --> 00:01:10,020 but they don't have further explicit instructions. 30 00:01:10,020 --> 00:01:12,380 And so, machine learning, the concept here is, 31 00:01:12,380 --> 00:01:14,200 you have to train the machine. 32 00:01:14,200 --> 00:01:16,540 If you don't teach the machine what you want it to know, 33 00:01:16,540 --> 00:01:19,000 it's not going to know how to categorize things. 34 00:01:19,000 --> 00:01:21,360 And machine learning works really, really well 35 00:01:21,360 --> 00:01:22,990 when you start dealing with things that are dealing 36 00:01:22,990 --> 00:01:25,230 with labels or categorizations. 37 00:01:25,230 --> 00:01:27,790 So, for example, if I wanted to go through a dataset 38 00:01:27,790 --> 00:01:30,200 and say, this is malware, this is not, 39 00:01:30,200 --> 00:01:33,390 this is malware, this is not, and I train the machine, 40 00:01:33,390 --> 00:01:36,020 it can then take over using its behavioral engine 41 00:01:36,020 --> 00:01:37,820 and machine learning to identify 42 00:01:37,820 --> 00:01:40,740 on its own what is and what is not malware. 43 00:01:40,740 --> 00:01:42,460 Now, this isn't a rule-based set, 44 00:01:42,460 --> 00:01:44,950 but by training it with a large dataset, 45 00:01:44,950 --> 00:01:47,510 it, over time, can start learning on its own. 46 00:01:47,510 --> 00:01:49,520 Let me give you a real world example of this. 47 00:01:49,520 --> 00:01:52,170 One of the earlier machine learning case studies they did 48 00:01:52,170 --> 00:01:53,920 was training a machine to identify 49 00:01:53,920 --> 00:01:55,890 what was a party and what wasn't. 50 00:01:55,890 --> 00:01:57,750 And so, they started showing it images. 51 00:01:57,750 --> 00:02:00,490 So, for example, if I showed the computer an image like this, 52 00:02:00,490 --> 00:02:02,700 I would categorize it and say, this is a party, 53 00:02:02,700 --> 00:02:04,940 there's a bunch of people there, they're having a good time, 54 00:02:04,940 --> 00:02:06,220 they're playing with some confetti. 55 00:02:06,220 --> 00:02:09,070 Looking at it as a human, I would say, yes, this is a party. 56 00:02:09,070 --> 00:02:10,270 Then they would show it another image, 57 00:02:10,270 --> 00:02:11,690 and the computer would sit there and look at it, 58 00:02:11,690 --> 00:02:14,130 and say, nope, that doesn't look like a party to me, 59 00:02:14,130 --> 00:02:15,620 Looks like they are at the office working. 60 00:02:15,620 --> 00:02:17,150 No, that's not a party. 61 00:02:17,150 --> 00:02:18,650 And the human would categorize it. 62 00:02:18,650 --> 00:02:20,090 And they would keep doing this with images. 63 00:02:20,090 --> 00:02:22,200 The next one here, is this a party? 64 00:02:22,200 --> 00:02:23,720 No, it looks like they're at a conference. 65 00:02:23,720 --> 00:02:24,940 They're at a work event. 66 00:02:24,940 --> 00:02:27,160 They are smiling, which is usually a sign of a party, 67 00:02:27,160 --> 00:02:28,960 but they're obviously not at a party. 68 00:02:28,960 --> 00:02:31,290 And so, I would say, no, this is not a party. 69 00:02:31,290 --> 00:02:32,790 Then they go to the next one. 70 00:02:32,790 --> 00:02:33,700 What about this one? 71 00:02:33,700 --> 00:02:35,420 There's a couple of ladies dancing. 72 00:02:35,420 --> 00:02:36,690 That looks like a party, right? 73 00:02:36,690 --> 00:02:38,930 They're probably having a good time either at a club 74 00:02:38,930 --> 00:02:41,160 or at a friend's house, and they have a drink in their hand 75 00:02:41,160 --> 00:02:42,450 and they're having a good time at a party. 76 00:02:42,450 --> 00:02:44,130 So, I would say, yes, that's a party. 77 00:02:44,130 --> 00:02:45,210 And then I go to another one. 78 00:02:45,210 --> 00:02:46,930 Here's one where people are sitting around a table. 79 00:02:46,930 --> 00:02:48,740 They're eating, they're having a good time, 80 00:02:48,740 --> 00:02:50,650 and there's a lot of different people at this table. 81 00:02:50,650 --> 00:02:52,600 So, it looks like they're having a dinner party. 82 00:02:52,600 --> 00:02:55,440 So, I can categorize that as a party and say, yes, it is. 83 00:02:55,440 --> 00:02:57,760 Now, what's the problem with what I just did? 84 00:02:57,760 --> 00:02:59,010 I went through five images, 85 00:02:59,010 --> 00:03:00,670 which is a very limited dataset, 86 00:03:00,670 --> 00:03:02,530 but let's say I do this with 5,000 images. 87 00:03:02,530 --> 00:03:03,650 That would be enough for a computer 88 00:03:03,650 --> 00:03:05,150 to start making some decisions 89 00:03:05,150 --> 00:03:07,010 on what's a party and what isn't. 90 00:03:07,010 --> 00:03:08,540 So, what is the problem that just happened 91 00:03:08,540 --> 00:03:09,640 when I use these images 92 00:03:09,640 --> 00:03:11,890 to train this machine learning engine? 93 00:03:11,890 --> 00:03:14,210 Well, the problem is I just trained this computer 94 00:03:14,210 --> 00:03:15,690 to be racist. 95 00:03:15,690 --> 00:03:17,500 That's right, because if you look back 96 00:03:17,500 --> 00:03:19,350 at these images I just went through, 97 00:03:19,350 --> 00:03:21,320 all the ones that were at parties, 98 00:03:21,320 --> 00:03:24,513 they only had men and women and only people who are white. 99 00:03:25,350 --> 00:03:27,080 The only image that had somebody who was 100 00:03:27,080 --> 00:03:29,130 of a darker complexion, an African American 101 00:03:29,130 --> 00:03:30,490 or a black person, happened 102 00:03:30,490 --> 00:03:32,210 to be at that business conference. 103 00:03:32,210 --> 00:03:34,980 So, this computer has now just learned 104 00:03:34,980 --> 00:03:38,200 that for a party to exist, it has to have white people. 105 00:03:38,200 --> 00:03:39,920 This is a problem with machine learning, 106 00:03:39,920 --> 00:03:42,400 because if you give it a bad dataset, 107 00:03:42,400 --> 00:03:44,870 you can train these machines to be racist, 108 00:03:44,870 --> 00:03:47,870 to be discriminatory, or to simply misclassify things 109 00:03:47,870 --> 00:03:48,990 and miss things. 110 00:03:48,990 --> 00:03:50,200 So, you have to be very careful 111 00:03:50,200 --> 00:03:52,110 with the datasets you provide these machines 112 00:03:52,110 --> 00:03:53,170 so they can learn. 113 00:03:53,170 --> 00:03:55,630 Now, this is the danger with machine learning. 114 00:03:55,630 --> 00:03:58,030 Machine learning is only as good as the datasets 115 00:03:58,030 --> 00:03:58,970 that are used to train it. 116 00:03:58,970 --> 00:04:00,590 So, you have to keep this in mind 117 00:04:00,590 --> 00:04:02,280 when you're going through and creating your datasets. 118 00:04:02,280 --> 00:04:04,300 If you're trying to train it what malware looks like, 119 00:04:04,300 --> 00:04:06,360 you need to make sure that you identify properly 120 00:04:06,360 --> 00:04:07,620 what is malware and what isn't 121 00:04:07,620 --> 00:04:09,300 as you're feeding at those datasets. 122 00:04:09,300 --> 00:04:11,270 And this is the same thing we deal with images 123 00:04:11,270 --> 00:04:14,090 or any other type of dataset you're feeding it. 124 00:04:14,090 --> 00:04:15,150 Now, the next concept we need 125 00:04:15,150 --> 00:04:18,960 to talk about is an artificial neural network or ANN. 126 00:04:18,960 --> 00:04:22,230 This is an architecture of input, hidden, and output layers 127 00:04:22,230 --> 00:04:25,070 that can perform algorithmic analysis of a dataset 128 00:04:25,070 --> 00:04:27,230 to achieve outcome objectives. 129 00:04:27,230 --> 00:04:29,720 Now, essentially, when we have an artificial neural network, 130 00:04:29,720 --> 00:04:32,160 this is the pathways that are being created 131 00:04:32,160 --> 00:04:33,970 based on that learning it's doing. 132 00:04:33,970 --> 00:04:35,290 So, as it's learning, it's starting 133 00:04:35,290 --> 00:04:38,810 to make its own feedback loops of what is the right if-then. 134 00:04:38,810 --> 00:04:41,600 If I see this, I see somebody holding a glass 135 00:04:41,600 --> 00:04:43,550 of champagne, that's a party. 136 00:04:43,550 --> 00:04:45,230 If I see people eating food, smiling, 137 00:04:45,230 --> 00:04:47,120 and having a good time, that's a party. 138 00:04:47,120 --> 00:04:49,470 If I see people dancing, that's a party. 139 00:04:49,470 --> 00:04:51,140 That's all part of this neural network. 140 00:04:51,140 --> 00:04:52,910 And it's all being developed on the fly 141 00:04:52,910 --> 00:04:55,300 by the computer based on what it's learning. 142 00:04:55,300 --> 00:04:56,710 Now, a machine learning system 143 00:04:56,710 --> 00:04:58,870 can adjust its neural networks over time. 144 00:04:58,870 --> 00:05:00,780 And they do this to try to reduce errors 145 00:05:00,780 --> 00:05:02,790 and optimize the objectives, because they're trying 146 00:05:02,790 --> 00:05:04,830 to always get to better identification 147 00:05:04,830 --> 00:05:06,250 of what you're trying to identify. 148 00:05:06,250 --> 00:05:09,870 In my example, identifying what is and is not a party. 149 00:05:09,870 --> 00:05:11,720 So, now at this point, we've already talked 150 00:05:11,720 --> 00:05:12,880 about artificial intelligence. 151 00:05:12,880 --> 00:05:14,810 We started talking about machine learning, 152 00:05:14,810 --> 00:05:16,580 and now we're going to dive a little bit deeper 153 00:05:16,580 --> 00:05:18,820 and go into deep learning. 154 00:05:18,820 --> 00:05:21,500 Now, when we talk about deep learning, this is refinement 155 00:05:21,500 --> 00:05:23,520 of machine learning that enables a machine 156 00:05:23,520 --> 00:05:25,640 to develop strategies for solving a task, 157 00:05:25,640 --> 00:05:27,050 given a labeled dataset. 158 00:05:27,050 --> 00:05:30,000 Now, all of that, so far, sounds like machine learning, 159 00:05:30,000 --> 00:05:31,380 but here's the difference, 160 00:05:31,380 --> 00:05:34,040 without further explicit instructions. 161 00:05:34,040 --> 00:05:36,350 So, I can just hand it a dataset 162 00:05:36,350 --> 00:05:38,880 and it will start making its own determinations. 163 00:05:38,880 --> 00:05:41,430 I don't have to do all the categorization for it. 164 00:05:41,430 --> 00:05:43,160 That's the difference with deep learning. 165 00:05:43,160 --> 00:05:45,080 So, when you create deep learning, 166 00:05:45,080 --> 00:05:47,150 deep learning is going to use complex classes 167 00:05:47,150 --> 00:05:49,580 of knowledge defined in relation to simpler classes 168 00:05:49,580 --> 00:05:51,990 of knowledge, to make more informed determinations 169 00:05:51,990 --> 00:05:53,490 about an environment. 170 00:05:53,490 --> 00:05:56,130 So, I might start out giving it that simple dataset 171 00:05:56,130 --> 00:05:58,760 and saying, this is party, this isn't a party, 172 00:05:58,760 --> 00:06:00,970 but then I turn it over to the machine and it can learn 173 00:06:00,970 --> 00:06:03,460 from there much better on its own what is 174 00:06:03,460 --> 00:06:06,360 and is not a party based on its own observations. 175 00:06:06,360 --> 00:06:08,560 Basically, it's like a child, and when it starts out, 176 00:06:08,560 --> 00:06:10,990 it doesn't know much, but as it learns and grows, 177 00:06:10,990 --> 00:06:12,890 it creates deeper and deeper connections 178 00:06:12,890 --> 00:06:15,850 inside its neural networks to make better decisions. 179 00:06:15,850 --> 00:06:17,840 So, to help solidify what the differences 180 00:06:17,840 --> 00:06:19,920 between machine learning and deep learning, 181 00:06:19,920 --> 00:06:21,560 let me give you an example that applies 182 00:06:21,560 --> 00:06:23,130 to the cyber security world. 183 00:06:23,130 --> 00:06:24,870 Let's say I have network traffic, 184 00:06:24,870 --> 00:06:26,380 and I'm going to take that as my input. 185 00:06:26,380 --> 00:06:28,000 And I want to be able to categorize that 186 00:06:28,000 --> 00:06:30,630 and say, this is benign, or this is malicious, 187 00:06:30,630 --> 00:06:32,330 this is okay, and this is something 188 00:06:32,330 --> 00:06:33,830 that's bad and needs to be flagged. 189 00:06:33,830 --> 00:06:36,150 Now, if I'm dealing with machine learning, I have a human 190 00:06:36,150 --> 00:06:38,660 who has to determine what those malicious factors are. 191 00:06:38,660 --> 00:06:39,920 Just like, I sat there and said, 192 00:06:39,920 --> 00:06:42,230 this is a party, this is not a party, 193 00:06:42,230 --> 00:06:44,430 I would have to sit there and start training that system. 194 00:06:44,430 --> 00:06:46,960 So, you might have a week period or a month period, 195 00:06:46,960 --> 00:06:49,000 or even a six-month period where you have analysts 196 00:06:49,000 --> 00:06:51,190 who are actually going through and categorizing traffic 197 00:06:51,190 --> 00:06:53,720 that you're seeing as malicious or benign, 198 00:06:53,720 --> 00:06:55,010 and based on that, 199 00:06:55,010 --> 00:06:57,220 that is going to start training the computer on what it is, 200 00:06:57,220 --> 00:06:59,480 and then the computer can take over. 201 00:06:59,480 --> 00:07:01,200 Now, when you deal with deep learning, 202 00:07:01,200 --> 00:07:02,450 you don't even have a human there. 203 00:07:02,450 --> 00:07:04,730 You just send it the network traffic, and over time, 204 00:07:04,730 --> 00:07:07,320 it's going to make its own decisions on what is benign 205 00:07:07,320 --> 00:07:09,780 and what is malicious, training itself. 206 00:07:09,780 --> 00:07:11,400 And so, we have those deeper connections 207 00:07:11,400 --> 00:07:13,910 that really starts figuring out what are those things 208 00:07:13,910 --> 00:07:15,950 that make up something that's malicious. 209 00:07:15,950 --> 00:07:17,440 Now, how would the computer know? 210 00:07:17,440 --> 00:07:19,890 Well, maybe it's being able to see your whole network 211 00:07:19,890 --> 00:07:22,150 and it sees one computer that you took offline, 212 00:07:22,150 --> 00:07:23,650 re-imaging it and put it back online, 213 00:07:23,650 --> 00:07:26,190 and now knows there was something bad on that system. 214 00:07:26,190 --> 00:07:28,420 And based on that, it can start looking into those logs 215 00:07:28,420 --> 00:07:30,130 and figure out what was it it saw 216 00:07:30,130 --> 00:07:32,360 that may have been an indicator of malicious traffic. 217 00:07:32,360 --> 00:07:34,670 And so, these things can learn over time. 218 00:07:34,670 --> 00:07:36,350 Now, are we there yet? 219 00:07:36,350 --> 00:07:39,050 Are we a 100% with deep learning and all of that 220 00:07:39,050 --> 00:07:40,940 that goes with it, for it to be able to do all of this 221 00:07:40,940 --> 00:07:42,330 on its own without people? 222 00:07:42,330 --> 00:07:45,920 Not yet, but we are getting better and better all the time. 223 00:07:45,920 --> 00:07:48,010 Now, a lot of people worry this is going to put humans 224 00:07:48,010 --> 00:07:49,600 out of jobs, but I will tell you, 225 00:07:49,600 --> 00:07:50,950 that's not going to be the case, 226 00:07:50,950 --> 00:07:53,220 because we still need people to make decisions. 227 00:07:53,220 --> 00:07:55,590 We still need people to look at those things. 228 00:07:55,590 --> 00:07:56,480 All this is doing 229 00:07:56,480 --> 00:07:58,690 in this deep learning scenario is labeling it. 230 00:07:58,690 --> 00:08:01,370 It's saying this is bad, or this isn't bad, 231 00:08:01,370 --> 00:08:03,730 but then a human is going to look at it and verify that it is 232 00:08:03,730 --> 00:08:05,440 and take follow-on actions. 233 00:08:05,440 --> 00:08:06,880 Now, some of the newer systems 234 00:08:06,880 --> 00:08:08,910 that they're trying to build are going to try to take the human 235 00:08:08,910 --> 00:08:10,270 out of the loop completely, 236 00:08:10,270 --> 00:08:12,300 but that is a very dangerous thing to do, 237 00:08:12,300 --> 00:08:14,880 because you're relying solely on the computer's decision, 238 00:08:14,880 --> 00:08:16,980 and then it could take follow-on actions like removing 239 00:08:16,980 --> 00:08:19,300 that system from the network, re-imaging the machine, 240 00:08:19,300 --> 00:08:20,180 and other things. 241 00:08:20,180 --> 00:08:21,970 So, you have to keep those things in mind too 242 00:08:21,970 --> 00:08:23,830 when you're deciding how far you want to go 243 00:08:23,830 --> 00:08:25,830 with machine learning and deep learning.