Authors: Balamurali A R and Satnam Singh

Post Web 2.0, data generated on the internet has increased manifold. This has led to the use of data driven approaches to solve many traditional problems across different industry verticals. Among them, deep learning-based (DL) approaches have been quite impactful in recent times.  With powerful yet inexpensive hardware enabling millions of calculations to optimize parameters, DL algorithms have been successfully tackling problems in vision, language, operations research etc., to name a few.

Deep learning is a type of machine learning that learn from experience and understand the world in terms of a hierarchy of concepts [1, 2]. It applies different neural network architectures to learn the concepts from a large data samples over time using a lot of parallel computations. Deep learning is an advanced representational learning that learns complicated concepts by building a graph of many deep layers each representing simple concepts in a hierarchy. With more context available, the deep learning-based systems perform even better than human. It has made significant advances in the problems where the accuracies were weak, and real-world usage was impossible. For example, classifying images, identifying objects, translating speech, automatically tagging photos, etc. In these applications, deep learning has made a significant improvement in achieving high accuracies, and therefore it is now used in online advertising, search engines, chatboxes, video games, computer vision, robotics, finance, and bioinformatics, and genomics.

Deep learning is not a silver bullet that can solve all the InfoSec problems because it needs extensive labeled datasets and no such labeled datasets are readily available. However, there are several InfoSec use cases where the deep learning networks are making significant improvements to the existing solutions.

Figure 1: Use Cases of Deep Learning in Information Security

As discussed earlier, deep learning requires a significant amount of labeled data which is not easily obtained in the information security Industry. Figure 1 shows some of the widespread use cases of deep learning in InfoSec. Malware detection and network intrusion detection are two such areas where deep learning has shown significant improvements over the rule-based and classic machine learning-based solutions.

Advent of SIEMs and active system logging has enabled InfoSec industry to embrace machine learning based approaches to detect security breaches and other malicious activities. We at Acalvio dabble with data to bring interesting use cases to aid the needs of the business. In fact, it has been ingrained in our genes to think and devise solutions based on advanced machine learning. In this blog we focused on how deep learning can be leveraged to address specific use cases that link security logs and deception technology. We present a white paper focusing on some of the Information Security (InfoSec) use cases that can be enabled through deep learning. We focus on the following:

  • Introduce deep learning to InfoSec community with use cases they can relate to.
  • Introduce deep learning architecture and nuances related to it.
  • Introduce Feed Forward network (FFN) and anonymous traffic detection problem. How FFN can be leveraged to detect TOR traffic detection.
  • Introduce convolutional neural network and how it can be used for InfoSec use cases.
  • Introduce sequence labelling InfoSec tasks. How recurrent neural network and long short-term memory network can be used to detect C&C domains.
  • We also look at the interesting problem of parameter optimization in DL systems. We use auto-ml framework explore and optimize the parameters.

The white paper on detecting Tor traffic using deep learning can be downloaded from here.

Acalvio’s Shadowplex lures attackers and malware alike to dynamically deployed deceptions with artificially induced vulnerabilities. It is an enticing prospect for attacker or malware to exfiltrate data or contact C&C from there. Our threat detection engines, can detect the data exfiltration and thus thwart the attack as well as capture more information about the adversary and the tools, techniques used by him. Please contact us for your queries regarding our solutions and products.

References:

[1] He, G., Yang, M., Luo, J. and Gu, X., “ Inferring Application Type Information from Tor Encrypted Traffic,” Advanced Cloud and Big Data (CBD), 2014 Second International Conference on (pp. 220-227), Nov. 2014.

[2] Juarez, M., Afroz, S., Acar, G., Diaz, C. and Greenstadt, R., “A critical evaluation of website fingerprinting attacks,” Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (pp. 263-274), November 2014