Autore Topic: Deep Session Learning for Cyber Security  (Letto 260 volte)

0 Utenti e 1 Visitatore stanno visualizzando questo topic.

Offline Flavio58

Deep Session Learning for Cyber Security
« Risposta #1 il: Giugno 01, 2018, 03:47:49 am »
Advertisement
Recently, Lab41 teamed up with Cyber Reboot (a sister lab) to explore the intersection of deep learning (DL) and cyber security in a software defined network (SDN) environment. We called it Poseidon, based heavily on it being a cool word with the letters s, d, and n in order.

$ cat /usr/share/dict/american-english | tr ‘A-Z’ ‘a-z’ | grep ‘.*s.*d.*n?*’ | grep -v \’
The goal was to use predictions about network traffic to automatically update a network’s posture. This entailed three main objectives: performing deep learning on packet data, setting up an SDN environment, and scheduling a microservice to connect the two (for more information and code visit our Github page). Since I belong to the cult of deep learning, I was tasked with the first objective. But in order to create something meaningful I had to first immerse myself in the world of cyber security, and then break out of some typical analytical norms. Here is my story.

MY KINGDOM TO ANSWER THESE TWO QUESTIONS

I had the privilege of working with some of the best cyber security experts in the field today. They helped guide the analytical research focus by expressing interest in finding bad people already on the network. There are many cyber security companies that focus their efforts on preventing bad people from getting on your computer network, but fewer are focused on finding an intruder who has bypassed such preventions and is already pivoting on the network. In order to identify such unauthorized individuals, we needed to answer two questions: 1) what is on the network? 2) what is it doing? Though, we could philosophically banter how the final algorithm implicitly answers both questions, I will focus on our contributions to answering the second question.

WHAT IS IT DOING?

Through my literature review of network behavior analysis it became abundantly clear that anomaly detection was the most often used algorithm to identify anomalous events on a network. The reason is that in network data you have a billion examples of normal traffic and only a few examples of abnormal/bad/malicious traffic. It is tough to build a classifier on such an imbalance of class examples, because the classifier would simply label everything as normal and produce a classification accuracy of 99.99999%. Anomaly detection algorithms were created for grossly imbalanced datasets. They ignore the abnormal examples and model only the normal, all the while flagging anything that deviates too far from “normal”. The hope of this approach is to catch anything not normal, regardless of if it is a new or old type of attack. Unfortunately, there are many drawbacks to only modeling one class. The main drawback is the assumption that you know what normal is. A few years ago, the University of Berkeley and Lawrence Berkeley National Laboratory published a paper on using machine learning for intrusion detection (IDS). They stated, “…traffic often exhibits much more diversity than people intuitively expect, which leads to misconceptions about what anomaly detection technology can realistically achieve in operational environments.” The diversity found in networks makes modeling normality difficult, and it can lead to a high rate of false alarms. I decided to turn this anomaly detection problem into a classification problem. Here is how I did it.

ANOMALY CLASSIFICATION

/*BETA FOR DATA SCIENCE FOLKS WHO AREN’T INTO NETWORKING: When two machines are networked together they communicate by sending data packets back and forth. The collection of all of the packets in a conversation between two computers is called a session. This is very similar to how utterances (or data packets) are structured in a dialogue (or a session)*/

First, the right inputs needed to be selected. I didn’t want to deal with the complications of deep packet inspection (compute time, encryption, etc.) so I decided to focus only on packet headers. The raw hex dump of the headers offered a beautifully sequential structure with a very small lexicon (256 hex pairs, or words). Not only are the hex pairs in packet headers sequentially ordered (like words in an utterance), but also the packets themselves are sequentially organized in a session (like utterances in a dialogue). This is a perfect recipe for deep learning consumption. The only thing left was to create anomalous sessions for the classifier.

THE THREE ABBY NORMALS

The trick of switching from anomaly detection to classification is being able to programmatically create or generate anomalies. Recent advances in machine learning (see Generative Adversarial Networks) use competing neural networks to generate examples that are indistinguishable from a training data set. (NOTE: “Adversarial” in this context is not meant to refer to an adversary on the network but rather the competition between the two neural networks.)

In the spirit of GAN, I manually generated abnormal sessions to look almost indistinguishable from normal sessions using three basic techniques. The first two abnormal sessions that were synthetically created are similar. In the first technique, the order of the source and destination IP and MAC addresses in all of the packets in a session are switched. The second type is similar in that the order of the source and destination ports is switched. The purpose of this approach was to simulate a role reversal between two machines. As an example of machine role reversal, imagine if the server you normally SSH to decides to SSH to your workstation. It is similar to me starting a conversation with my wife by saying, “Honey, I just watched the most moving episode of Grey’s Anatomy;” this is a complete role reversal in a conversation. See the figure below.



The third abnormal type is accomplished by leaving the source IP in its proper place and swapping out the destination IP address with an IP address the source never talks to (the swap out creates unwanted correlations with within the header — these will be investigated in follow on work). This is to simulate a conversation that never happens, or should never happen, on the network. It’s similar to me in college telling my friends that I had an engaging conversation with a woman — a conversation that never happened. See the figure below.



The implementation is fairly simple. We assume that all of our data is benign. When each session is presented at training time it has a 50/50 chance of remaining as a normal session or being morphed into one of the three abnormal sessions.

P(normal) = 0.5
P(IP switch) = 0.5/3
P(port switch) = 0.5/3
P(IP swap out) = 0.5/3
This allows us to have as many examples of normal sessions as we do abnormal sessions. Next, we need to choose the right algorithm and make sure it is assessing the right parts of the packets.

HEIRARCHICAL RECURRENT NEURAL NETWORKS

Since both the hexadecimal pairs in a packet header and the packet headers in a session have a beautiful sequential order, a Recurrent Neural Network (RNN) is a natural choice for encoding packets and sessions. We will use two RNNs: one to summarize the hex pairs in a packet header, and one to encode all the packets in a session. We call these the Packet RNN and Session RNN respectively. The Packet RNN starts at the beginning of a header and encodes the first hex pair into a vector of numbers. It moves to the second hex pair and combines its representation with the information passed on from the first. Thus, at any pair in the header the Packet RNN is outputting a summary representation of that pair combined with the information from all the pairs before it. It does this sequentially until the last hex pair. We discard all the outputs from each pair in the sequence except for the last. This final output is a lovely summary of all the information in a packet header (see the red boxes in the figure below).

Now that we have a way of encoding and compressing a packet header into numbers, we need to collect these representations and use them to create a session representation. We use a second RNN, the Session RNN, which takes as input the ordered header representations we just created. It starts with the first header representation and combines its with the representation of the second header, and so on until the last packet in the session (see the blue boxes in the figure below).



In the end we are left with a real-valued vector that is a compressed and latent representation of the entire session. This paper (including a lovely generative twist) and this one (adding attention mechanisms) are excellent examples of this architecture.

ADDING ATTENTION

An attention mechanism is a simple addition to the DL architecture, which allows the user to catch a glimpse into its decision process. It effectively turns the neural black-box, to more of a grey one. The output of the last time step of an RNN, as previously explained, is supposed to be a nice summary of the entire sequence it just digested. But, instead of using 100% of the last output, an attention mechanism creates a weighted sum of all the time step outputs (compare the figure below with the one above). These attention weights are part of the algorithm’s learning process and update as more examples pass through the network. This gives it the ability to ignore parts of the input and emphasize more important parts of the sequence. We use two attention mechanisms: Packet Attention to focus on the most important parts in the header, and Session Attention to focus on the most important headers in the session.



The figure below is a visualization of the two attention types of the first 8 packets of a session that suffers from destination IP swap out. Since we swapped out the destination IP address and left the source IP alone, we would hope that the Packet Attention mechanism would focus on the destination IP portion of the header. And it does! The darker the blue indicates what part of the header the Packet Attention deemed most important. Interestingly, it also focuses on the destination port, probably thinking they don’t match up very well. The Packet Attention didn’t look at the right parts of every packet in the session, but that is ok. The Session Attention pretty much ignored the packets that didn’t focus on the destination IP address and port areas. The darker red indicates which packet the Session Attention thought was most important.



Follow this link and especially this one for more information on attention.

RESULTS

I tested the accuracy of the classifier on an openly available PCAP file called bigFlows.pcap. The order of the packets in the file is preserved. We use the first 80% of the sessions for model training and the remaining 20% for model testing. Remember, all of the data are presumed to be benign. In reality, some portion of any given network is likely to be compromised. This means the model won’t identify the existing hostility, but it will identify when the attacker tries to spread. The testing data is modified in the same manner as the training. The results are exciting.

http://distill.pub/2016/augmented-rnns/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=revue
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-3/


RESULTS

I tested the accuracy of the classifier on an openly available PCAP file called bigFlows.pcap. The order of the packets in the file is preserved. We use the first 80% of the sessions for model training and the remaining 20% for model testing. Remember, all of the data are presumed to be benign. In reality, some portion of any given network is likely to be compromised. This means the model won’t identify the existing hostility, but it will identify when the attacker tries to spread. The testing data is modified in the same manner as the training. The results are exciting.


We expect it to do well in this adversarial scenario. Poor results here would have indicated a need for more model tuning. What should really get you fist pumping is the memory capacity of the RNNs. They can remember the relationship between two IP addresses! The next step was to test it out on a labeled IDS dataset.

ISCX IDS RESULTS

Finding a useful IDS dataset is difficult. A considerable amount of time was spent looking for an applicable dataset. The University of New Brunswick published an IDS dataset in 2012. It consists of seven days of network traffic PCAP files. The details of the data are in the figure below.


http://www.unb.ca/cic/research/datasets/ids.html
I was only interested in days 1 through 3, thus the other 4 days were discarded. We use the normal traffic from days 1 and 2 to train the model, using only two of the three abnormal types to define abnormal sessions: IP direction switch, and destination IP swap out (i.e. each selected with probability = 0.5/2). Once the model sufficiently learned from the normal and synthetic abnormal data, it was put to the classification test on data from day 3. Below is a confusion matrix of the results from the best model.

               Predicted  Predicted               
                normal     attack
Actual normal [  92455       46  ]
Actual attack [  1608       8346 ]
What this matrix tells us is out of the total actual attack sessions (1608+8346=9954) the model catches 83.8% (8346/9954=0.838) of them, with only a 0.5% (46/(46+8346)=0.00548) false positive rate. Remember, the neural network defines abnormal based on only 2 simple attack types, and zero firewall rules. What if we had 5 attack types, or 10, or 20 to teach the neural network what abnormal is? There is room for improvement. By this time your arm should be sore from fist pumping so much. You can try this out now by downloading the Jupyter Notebooks on our Github repo.

FUTURE DIRECTION

I am aware that these three threat types are pretty basic when it comes to network security, but their effectiveness on the ISCX dataset was surprising. What would make this all the more awesome is the addition of a generative component as described in this paper. This would allow the classifier to go beyond the dataset and be more robust in catching variations of the same attack type.

The success of this normal/abnormal classifier and attention mechanisms gives hope that this architecture can teach us what is important in headers and sessions for more sophisticated attack types. But finding an interesting event on your network is only part of a complete cyber defense system. Next, you must effectively react to that event. This reaction be the focus of the next phase of our project.

Lab41 is a Silicon Valley challenge lab where experts from the U.S. Intelligence Community (IC), academia, industry, and In-Q-Tel come together to gain a better understanding of how to work with — and ultimately use — big data.
Learn more at lab41.org and follow us on Twitter: @_lab41
Cyber Reboot, an IQT lab, challenges the traditional approach to cybersecurity with the goal of rebalancing the equation to increase the cost and complexity for our adversaries while reducing cost and complexity for our defenders.
Learn more at http://www.cyberreboot.org/



Consulente in Informatica dal 1984

Software automazione, progettazione elettronica, computer vision, intelligenza artificiale, IoT, sicurezza informatica, tecnologie di sicurezza militare, SIGINT. 

Facebook:https://www.facebook.com/flaviobernardotti58
Twitter : https://www.twitter.com/Flavio58

Cell:  +39 366 3416556

f.bernardotti@deeplearningitalia.eu

#deeplearning #computervision #embeddedboard #iot #ai

 

Related Topics

  Oggetto / Aperto da Risposte Ultimo post
0 Risposte
368 Visite
Ultimo post Marzo 18, 2018, 04:51:10 pm
da Ruggero Respigo
0 Risposte
362 Visite
Ultimo post Maggio 18, 2018, 02:59:02 pm
da Flavio58
0 Risposte
88 Visite
Ultimo post Luglio 02, 2018, 10:01:32 pm
da Flavio58
0 Risposte
301 Visite
Ultimo post Luglio 09, 2018, 02:08:28 am
da Flavio58
0 Risposte
108 Visite
Ultimo post Ottobre 04, 2018, 10:04:55 pm
da Flavio58

Sitemap 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326