Machine learning is a branch of artificial intelligence, which uses empirical learning and adaptive techniques to make computers imitate human cognition. It is characterized by learning based on experience and patterns, rather than learning based on inferences (causes and effects). At present, deep learning in machine learning has been able to independently build pattern recognition models without relying on humans to build models.
Traditional network security technology is difficult to detect the new generation of malware and network attacks that evolve over time. ML-based dynamic network security solutions can use previous network attack data to deal with newer but similar risks. Using AI to enhance network security can provide more protection for user systems, such as automating complex processes to detect attacks and respond to violations.
As the pattern recognition model becomes more effective in detecting network security threats, hackers will conduct research on the working and learning mechanism of the underlying model, find effective ways to confuse the model to circumvent the recognition of the model, and hopefully establish that the attacker himself AI and machine learning tools to launch attacks.
Next, the author will share with you how the attacker will use AI to achieve his goals.
1. Malware escape
Most malicious software is generated manually. Attackers will write scripts to generate computer viruses and Trojan horses, and use rootkits, password capture and other tools to assist in distribution and execution.
Can this process be accelerated? Can machine learning help create malware?
The machine learning method is used as an effective tool for detecting malicious executable files. Using data retrieved from malware samples (such as title fields, instruction sequences and even raw bytes) for learning can establish a model for distinguishing benign and malicious software. However, analyzing security intelligence can find that machine learning and deep neural networks may be confused by evasive attacks (also called adversarial samples).
In 2017, the first example of publicly using machine learning to create malware was proposed in the paper “Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN”. Malware authors usually do not have access to the detailed structure and parameters of the machine learning model used by the malware detection system, so they can only perform black box attacks. The paper reveals how to generate anti-malware samples by constructing a generative adversarial network (GAN) algorithm, which can bypass the black box detection system based on machine learning.
If the AI of a cybersecurity company can learn to identify potential malware, then the “hacker AI” can make decisions by observing and learning anti-malware AI, and use this knowledge to develop malware that is “minimally detected”. At the 2017 DEFCON conference, the security company Endgame revealed how to use Elon Musk’s OpenAI framework to generate customized malware, and the created malware cannot be detected by the security engine. Endgame’s research is based on binary files that appear to be malicious. By changing part of the code, the changed code can evade detection by the antivirus engine.
The paper “Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables” published in March this year proposed a gradient-based attack by studying the vulnerabilities in the use of deep networks to learn malware detection methods from raw bytes. Small changes in data can lead to misclassification during testing. Therefore, only a small number of specific bytes need to be changed at the end of each malware sample to avoid security detection while retaining its intrusion capabilities. The results show that by modifying less than 1% of the bytes, the anti-malware binary can evade security detection with a high probability.
2. Advanced spear phishing attacks
A more obvious application against machine learning is to use the text-to-speech conversion, speech recognition and natural language processing algorithms in intelligent social engineering to teach the software’s email writing style through the time recurrent neural network to make it authentic and credible. Sex is enhanced. Therefore, in theory, phishing emails may become more complex and credible.
According to the 2017 forecast of McAfee Labs, criminals will increasingly use machine learning to analyze a large number of stolen records to identify potential victims and build detailed phishing electronics that can target these people more effectively.
In addition, at the 2016 American Black Hat Conference, John Seymour and Philip Tully published a paper entitled “Weaponzing data secience for social engineering: automated E2E spear phishing on Twitter”, and proposed a time recurrent neural network SNAP_R to learn how to Phishing posts published by specific users. Spear phishing uses posts posted by users as training test data. Dynamic seeding of themes in the timeline posts according to the target users (including users who post or follow posts) will make phishing posts more likely to be clicked . Through testing on the Twitter social platform, it is found that the click-through rate of phishing posts tailored for users is the highest ever reported in large-scale phishing attacks.
3. Use AI to defeat verification codes
At present, the distinction between humans and machines mainly adopts the “Completely Automated Public Turing test to tell Computers and Humans Apart, CAPTCHA” (Completely Automated Public Turing test to tell Computers and Humans Apart, CAPTCHA), commonly known as verification code, to prevent people from using automated robots Fake accounts are set up on the website. When logging into a website, users must prove that they are human by solving visual puzzles, and this requires identifying letters, numbers, symbols, or objects that are distorted or animated in some way. The reCAPTCHA project is a system developed by Carnegie Mellon University. The main purpose is to use CAPTCHA technology to help the digitization of classics. This project will be scanned by books and cannot be accurately recognized by Optical Character Recognition (OCR). The text is displayed in the CAPTCHA question, allowing humans to use the human brain to recognize the text when answering the CAPTCHA question.
As early as 2012, researchers Claudia Cruz, Fernando Uceda and Leobardo Reyes published an example of a machine learning security attack. They used a support vector machine (SVM) to crack the image operating system reCAPTCHA with 82% accuracy. As a result, all verification code mechanisms have been targeted for security improvements. In the face of these new verification code systems, researchers began to try to use depth Learn techniques to crack it.
Vicarious has been developing algorithms for the Recursive Cortical Network (RCN), a probabilistic generative model, which aims to identify objects by analyzing pixels in the image to see if they match the contours of the object. In 2013, Vicarious announced that it had cracked the text-based verification code test used by Google, Yahoo, PayPal, and Captcha.com, with an accuracy rate of 90%. In the standard reCAPTCHA test, the software can successfully solve two-thirds of the verification problems. In the robot detection system test, the success rate of Yahoo’s verification code was 57.4%, and that of PayPal was 57.1%.
The “I am a robot” study on BlackHat last year revealed how researchers cracked the latest semantic image CAPTCHA and compared various machine learning algorithms.
4. Phishing webpages that bypass security detection
“Cracking Classifiers for Evasion: A Case Study on the Google’s Phishing Pages Filter” pointed out that the phishing webpage classifier in Google is obtained through machine learning training. The attacker uses reverse engineering technology to obtain part of the information of the classifier. The generated new phishing webpage can bypass Google’s phishing webpage classifier with a 100% success rate. The early-developed classifiers are of a research nature, and when deployed in a client environment, their security has not received the attention it deserves.
The selected case for studying the security challenges of client-side classifiers is the Google’s phishing pages filter (GPPF) deployed on the Chrome browser with more than one billion users. The new attack method against client-side classifiers is called Crack for the classifier. After successfully cracking the GPPF classification model, sufficient knowledge (including classification algorithms, scoring rules and features, etc.) can be obtained from it to effectively evade attacks. The attacker can obtain 84.8% of the scoring rules through reverse engineering, which covers most of the high-weight rules. Based on these cracking information, two evasion attacks against GPPF were implemented. After testing 100 real phishing webpages, it was found that all phishing webpages (100%) could easily bypass GPPF detection. Research shows that existing client-side classifiers are vulnerable to targeted attacks by classifiers.
5. Let the machine learning engine “poison”
A simpler and more effective AI utilization technique is to “poison” the machine learning engine used to detect malware and make it ineffective, just like criminals have done to antivirus engines in the past. The machine learning model needs to learn from the input data. If the data pool is “poisoned”, the
output will also be “poisoned.” Deep neural network training requires a lot of computing resources. Therefore, many users train in the cloud or rely on pre-trained models for recognition and fine-tuning for specific tasks. Researchers from New York University demonstrated in the paper “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain” the vulnerability of externally trained neural networks: the adversary can generate a malicious training network (backdoor neural network or BadNets), and at the same time The effectiveness of the BadNets attack was demonstrated in the MNIST digit recognition and traffic sign detection tasks.
Hackers are increasingly using AI vulnerabilities to build “adversarial samples” to avoid attacks. The main countermeasures that can be taken at present are: use game theory or probabilistic models to predict attack strategies to construct more robust classifiers and use multiple classifiers .The system increases the difficulty of avoidance, and optimizes feature selection to make feature average distribution, etc. More AI attack countermeasures are still being explored.