EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box Android Malware Detection (Seminar 8.1)

NOTE: This blog is for a special topics course at Texas A&M (ML for Cyber Defenses). During each lecture a student presents information from the assigned paper. This blog summarizes and further discusses each topic.

During this seminar, Colton Simpson presented EvadeDroid: A Practical Evasion Attack on Machine Learning for Black-box Android Malware Detection. After their presentation our class had an open discussion related to the paper and more. This blog post will cover a summary of the information presented as well as a summary of our class discussion.

Presentation Summary

Introduction

ML-based defenses are vulnerable to evasion attacks, where attackers transform malware into adversarial examples
However, the use of adversarial examples still pose some challenges
- Manipulating feature representations of malicious Android applications may break their functionality as APK features are typically discrete instead of continuous
  - A possible solution is to extract features from the Android Manifest file
  - There is questionable practicality as executability of the original apps is not guaranteed, preprocessing techniques can discard features added to the Manifest file, and advanced Android malware detectors rely on the semantics of Android apps rather than Manifest files
- Feature mapping in Android app security is irreversible, preventing direct conversion of feature-space changes into malicious app modifications
  - Adversaries use feature-space manipulations to craft evasion attacks, but finding corresponding real-world transformations is complex and not always straightforward
  - Some transformations may fail to produce viable adversarial examples or introduce problematic changes that could crash the malware, diverging from attackers’ intentions
- Current methods produce adversarial examples tailored to specific malware detectors’ machine learning algorithms and features, assuming attackers have perfect or limited knowledge of the classifiers
  - Realistically, attackers often have zero knowledge of malware detectors, which operate as black-boxes with some studies exploring semi-black-box approaches using feedback from the detectors
  - These methods are inefficient, requiring many queries and substantial input manipulation, which can incur costs and risk detection, and excessive alterations may compromise the malware’s functionality
To combat those issues, the authors propose EvadeDroid
EvadeDroid is a two-step evasion technique for Android malware that first selects similar benign apps as donors to create a set of code transformations, ensuring the malware mimics benign behavior
It uses program slicing to extract useful code snippets, or gadgets, from these donors and injects them into malware to evade machine learning classifiers while maintaining the app’s functionality
The second step involves iteratively applying these transformations to malware, guided by feedback from the target classifier, to produce altered versions that are misclassified as benign
The contributions can be summarized as follows:
- EvadeDroid is proposed, which is a state-of-the-art solution that successfully evades ML-based malware detectors
- Demonstrate that EvadeDroid is a query-efficient attacker
- The proposed attack can operate with either soft labels or hard labels
- Evaluate EvadeDroid’s performance on popular commercial antivirus products

Various studies have explored adversarial examples in the Windows domain
- Modeling the action selection as a multi-armed bandit problem within reinforcement learning
- Applying padding and sample injection to malware samples to bypass visualization-based malware detectors
- Making small manipulations in file headers to generate adversarial malware
- Perturbing API sequences of malware samples as a black-box attack
Advancements in the Android domain have not been as significant as the same manipulations cannot be mapped to Android applications
- All variations of existing studies require an attacker’s knowledge of the system and perturbations in the problem and feature spaces
- These perturbations do not mimic the reality of problem and feature spaces
- It is not practical to assume that attackers have knowledge of deployed classifiers as well
- Some studies also lack details about feature extraction methods
- EvadeDroid addresses the limitations of existing attacks

Proposed Attack

Threat Model of EvadeDroid

Adversarial goal:
- To manipulate Android malware samples in successfully deceiving static ML-based malware detectors
- Trick classifiers into classifying malware samples as benign
Adversarial knowledge:
- Black-box access restricts EvadeDroid from knowing the training data, feature set, and classification model
- The attacker can only query the target classifier to obtain results
Adversarial capabilities:
- The attack manipulates the Android malware apps through Android gadgets (slices of benign app bytecode) where the gadgets are optimized through the queries
- The manipulation process of a malware app is performed gradually to best resemble benign apps
- Minimal number of gadgets are inserted from the benign apps into the malware apps
- EvadeDroid must follow two constraints:
  - Number of queries: EvadeDroid aims to generate adversarial examples with a minimal amount of queries
  - Size of adversarial payloads: EvadeDroid aims to minimize the size of injected adversarial payloads
- Gadgets:
  - Gadgets contain an organ (slice of program functionality), an entry point to the organ, and a vein (execution path leading to the entry point)
  - EvadeDroid identifies entry points as API calls through string analysis, and uses them to extract gadgets from benign apps
  - The assumption is that the API calls of benign apps are fully functional
  - Gadget injection is successful when the loss of the injected app increases and the size of adversarial payload is the same as before
Defender’s capabilities:
- Target models cannot perform online learning through using EvadeDroid’s adversarial examples
- Defender cannot detect and block queries from EvadeDroid due to suspicion

Problem Definition

Essentially, the objective of EvadeDroid is to generate an adversarial example that can trick the target classifier
This must be accomplished via the attacker-defined constraints:
- A minimal sequence of transformations
- At a maximum number of queries
- An a maximum adversarial payload size
The adversarial payload size is the relative increase in the size of a malware sample applying a transformation

Methodology

The proposed attack is achieved through an iterative and incremental algorithm
Problem space transformations, satisfying problem-space constraints, are used to generate real-world adversarial samples
The transformations are extracted from benign apps that are similar to malware apps using n-gram-based similarity
Random-search is used to optimize the manipulation of apps
App manipulations are incremented during the optimization process, with a sequence of transformations being applied in different iterations
n-grams:
- n-Grams are sequences of ‘n’ contiguous items used to capture patterns within texts or programs, analyzing frequencies of unique sequences
- In malware detection, n-grams help extract feature sequences from malware binaries or source code for analysis
- For Android malware, n-grams analyze opcodes in disassembled DEX files of APKs, extracting patterns from smali file functions for static analysis
Random Search:
- Random Search (RS) is an optimization method that depends entirely on randomness without knowledge about the objective function
- The algorithm starts with a defined sampling distribution and an initial solution, generating new random solutions in each iteration
- RS iterates until finding the best solution or meeting the termination criteria, noted for its efficiency in generating adversarial examples (AEs)
Preparation:
- The goal is to create an action set of safe transformations to modify APKs without crashes, maintaining functionality, using program slicing for transformation extraction
- EvadeDroid employs two steps: selecting suitable donor gadgets and using them to adjust features that can change the classifier’s decision
- Donor selection:
  - EvadeDroid efficiently selects donor apps that are similar to malware from a pool of benign applications, reducing the need for complex transformations and computational resources
  - This approach, as proven, makes the process of disguising malware as benign faster and more efficient, requiring fewer modifications and queries
  - EvadeDroid utilizes segments of benign apps that represent malware apps to either make the malware look benign, or shift them towards the target classifier’s blind spots
  - It employs an n-gram-based opcode technique to compare the similarity of malware and benign samples without needing feature vector knowledge in the classifier’s feature space
  - The n-gram opcode feature extraction focuses on opcode types and involves automated extraction from raw bytecodes for effective similarity measurement
    - Disassemble Android application’s DEX files into smali files using Apktool
    - Discard operands and extract n-grams from the types of all opcode sequences in each smali file belonging to the app
    - Map the extracted feature sets to a feature space by aggregating all observable n-grams from all APKs
    - Create a feature vector for each app, where each element of the feature vector indicates the presence or absence of a specific n-gram in the app
  - A weight is computed and then sorted for each benign app
  - The top-k benign apps are selected as donors for gadget extraction
- Gadget extraction:
  - Gadget extraction targets key functional components from donor apps to reshape malware into benign-appearing for static analysis
  - The process focuses on extracting API call snippets, as they convey an app’s main functionality and semantics
  - Extraction involves identifying relevant bytecode in APKs that represent the app’s essential behaviors via API interactions
    - Disassemble DEX files of donors into smali files by using Apktool
    - Perform string analysis on each app to identify all API calls in its smali files
    - Extract the gadgets associated with the collected API calls from each app
  - Manipulation:
    - EvadeDroid uses Random Search to find the best set of transformations for creating adversarial examples with fewer queries compared to methods like Genetic Algorithms
    - Random search selects transformations randomly from a set to improve an objective function, applying them only if they enhance the target classifier’s evasion
    - EvadeDroid adapts to hard-label outputs by modifying its objective function to minimize transformation use while maximizing evasion
    - EvadeDroid conducts non-optimal hard-label attacks by randomly applying transformations from an action set to malware until it generates an adversarial example or exhausts the query limit
    - The process involves querying the target classifier after each transformation to check if the malware is still detected

Simulation Results

Target detectors:
- EvadeDroid is tested against a variety of Android malware detectors, including DREBIN, Sec-SVM, ADE-MA, MaMaDroid, and Opcode-SVM, to verify the attack’s effectiveness
- These chosen malware detection models are well-researched for their ability to identify adversarial attacks in Android environments
Datasets:
- The performance of EvadeDroid was tested on a dataset of around 170K Android app samples, gathered from AndroZoo, with features described by the DREBIN set and labeled as malicious or benign based on detection by 4+ or 0 VirusTotal engines, respectively
- Labeling was based on a threshold approach that considers quantity
- Distinct feature representations for MaMaDroid and Opcode-SVM require direct app collection for training, while a larger training set for DREBIN and Sec-SVM showcases that increasing training set size does not notably affect EvadeDroid’s performance
Evaluation metrics:
- Malware classifiers are evaluated using TPR and FPR, and display performance through ROC curves with 10-fold cross-validation
- Evasion Rate (ER) and Evasion Time (ET) are introduce to measure how well our tool, EvadeDroid, can trick these classifiers, with ER being the proportion of malware that can bypass detection after alteration
- Evasion Time is the average time taken by EvadeDroid to create an evasive sample, including the time spent on optimization, transformations, and feature extraction
Evaluation costs and generalizability:
- EvadeDroid is versatile and doesn’t make assumptions about target detectors
- It can operate effectively in various attack settings
EvadeDroid vs other attacks:
- EvadeDroid proves to be more practical and realistic in generating adversarial examples than other attacks like PiAttack and Sparse-RS, which have less practical threat models
- ShadowDroid is less effective as it targets detectors based on API calls and permissions and fails against detectors like MaMaDroid or those using opcodes, unlike EvadeDroid
- GenDroid, while operating under the Zero-Knowledge setting, struggles to evade sophisticated detectors and may not be viable in real-world applications due to its high query requirements, in contrast to the more efficient EvadeDroid
EvadeDroid in Real-World Scenarios:
- EvadeDroid is tested for real-world applicability, demonstrating its ability to bypass commercial antivirus products verified on VirusTotal
- Five leading antivirus engines were selected based on AV-Test ratings to assess EvadeDroid’s performance against real malware apps
- A subset of 100 malware apps from EvadeDroid’s dataset was used, ensuring consistency with the labels from a benchmark dataset
- EvadeDroid successfully deceived all five antivirus engines, often requiring only one query to generate adversarial examples
- A responsible disclosure process was conducted, informing VT and the affected antivirus engines about the findings and the attack methodology

Discussion Summary

Mention main idea of using opaque predicates, dead code, etc.
The reason that droppers aren’t used is that Android doesn’t support them and that once a dropper is used, the program file is immediately reordered, and thus the dropper is not needed
- The implementation is the same without being the same, Android uses different abstraction layers than x86, for example
- Compiled languages have assembly instructions (C for example), but Android is built on Java which is easier to handle as there is more high level information
- Compiled vs interpreted languages have different security implications
The attack used in this paper is ‘semi black box’ as the authors know that the target model is based on code along with other information
In theory, EvadeDroid can be used for Windows PE files
- Implementation will be different, but the concepts will be similar such as using opaque predicates, dead code, etc.
- The first android attacks were based on manifest files which list permissions and more
- Attacks against manifest files (where there are boolean vectors of manifest permissions) are also possible through
Gadgets are essential in security as they fall under return oriented programming attacks which can be used to control the stack pointers as a result of buffer overflow, leading to control of the program
Markov decision processes (which are a chain of state action pairs while accounting for the probability of choosing the best next state, using previous state information) can also be applied to malware detection
- Function calls, for instance, can be represented as a markov decision process by determining the probability of a function being chosen or called next.

Mal-LSGAN: An Effective Adversarial Malware Example Generation Model (Seminar 7.2)

Adversarial Machine Learning in Image Classification: A Survey Toward the Defender’s Perspective (Seminar 8.2)