Research Report: Evaluating AI-based malicious PowerShell
detection and optimizing features
-Youngho Hong, AI
Researcher at
Yangpyeong County
1. Introduction
BackgroundPowerShell is widely used as a powerful scripting tool for system
administration and automation. However, these powerful features also
provide opportunities malware authors to exploit. In recent years, the rise of
fileless malware has made detecting malicious activity based on PowerShell
more difficult.
Purpose: The objective of this research propose a methodology to
efficiently detect malicious PowerShell scripts using AI techniques, and to
increase the detection accuracy through feature selection and optimization.
By doing so, we aim to achieve high accuracy and low false positive rate,
and to establish more effective security measures.The main objective is to
detect malicious PowerShell scripts using AI techniques, and to optimize
the performance of the detection system. This is to overcome the limitation
that traditional signature-based detection methods can be bypassed by
attackers.1)
2. PowerShell and cyberattacks
Because of its power, PowerShell is often used by malware to attack systems
without files, especially because it has the following characteristics that are
exploited
Command execution: Remote command execution and system
administration capabilities.
Data exfiltration: Fileless attacks and data leakage over
the network. Obfuscation: Evading detection through
obfuscation techniques.
This creates the need for an effective methodology for detecting PowerShellbased
malicious activity.
3. Feature selection methodology
Feature extraction and optimization: Researchers use a variety of machine
learning (ML) and deep learning (DL) techniques to extract and optimize
features from PowerShell scripts. For example, feature selection techniques
using tokens and abstract syntax trees (ASTs) are useful for improving
detection accuracy. In addition, methods using Word2Vec and convolutional
neural networks (CNNs) to learn the semantics of scripts have also been
proposed.2)
Dataset construction: Build a dataset containing benign and malicious
PowerShell scripts to train and evaluate the model. Obfuscation and back-obfuscation an important role in this process.
Feature ExtractionTo effectively analyze the features of PowerShell scripts, we
used the following methodology:
Token analysis: Analyzes syntactic elements in a script, such as
keywords, commands, and variables. Abstract Syntax Tree (AST)
analysis: Transforms a script into data containing structural
information.
1)Song, Ji-Hyun, Kim, Jung-Tae, Choi, Sun-Oh, Kim, Jong-Hyun, & Kim, Ik-Gyun. (2021).
Evaluations of AI-based malicious PowerShell detection with feature optimizations.
ETRI Journal, 43(3), 549-560.
2)Ho-Jin Jung, Hyung-Gon Lee, Kyu-Hwan Cho, & Sang-Keun Lee. (2022). A reverse
processing and learning-based detection method for Powershell-based malware.
Journal of the Information Security Society, 32(3), 501-511.3-gram method: Analyzes patterns in data by extracting features based on
three consecutive elements (tokens or ASTs).
Feature optimization
5-token 3-gram: Deeply analyze relationships between
keywords, variables, and instructions. AST 3-gram:
Maximizes detection performance based on structural
information.
4. AI models and evaluation
AI modelsWe evaluated detection performance using a variety of AI models:
Machine learning (ML) models: Random Forest (RF), Support Vector Machine
(SVM), K-Nearest Neighbor (K-NN).
Deep learning (DL) models: Convolutional neural networks (CNNs), longstanding
memory networks (LSTMs), and CNN-LSTM hive-lead models.
Model Performance
Use metrics: The performance of a model is evaluated by metrics such as
accuracy, precision, and recall. For , optimized features been used to
achieve 98% detection rates in ML and DL experiments.3)
Performance : 've shown faster detection than before, with improved deobfuscation
turnaround times and detection rates, resulting in a 100%
success rate and low positive rate (FPR).4)
ML models: 5-token 3-gram based random forest models perform best.
DL model: CNN-LSTM model based on AST 3-gram performs best, achieving
98% accuracy and 0.1% false positive rate.
Mixed case handling: higher detection rate when unified in lowercase.
5. Experi
ment
results
dataset:
22,261 legitimate PowerShell scripts.
4,214 malicious PowerShell scripts.
Collected from a variety of sources (Base64
encoded, OLE files, etc.). Summary of results:ML models: 5-token 3-gram based random forest model with 5-token 3-
gram the best performing.
DL model: AST 3-gram based CNN-LSTM model is the best with high accuracy
and low false positives
3)Song, Ji-Hyun, Kim, Jung-Tae, Choi, Sun-Oh, Kim, Jong-Hyun, & Kim, Ik-Gyun. (2021).
Evaluations of AI-based malicious PowerShell detection with feature optimizations.
ETRI Journal, 43(3), 549-560.
4)Ho-Jin Jung, Hyung-Gon Lee, Kyu-Hwan Cho, & Sang-Keun Lee. (2022). A reverse
processing and learning-based detection method for Powershell-based malware.
Journal of the Information Security Society, 32(3), 501-511.Performance.
6. Conclusions and future research directions
AI-based detection methods enable effective detection of malicious
PowerShell scripts and require continuous optimization. In particular, the
use of various feature extraction and selection techniques a key factor in
increasing detection accuracy. This approach can overcome the limitations
of traditional detection techniques and provide a better security solution.
This AI-based detection systems powerful defense against cybersecurity
threats. Future research will need to improve the model to account for
more data and complex attack patterns.
ConclusionThis study achieved high accuracy and low false positive rate in
PowerShell-based malware detection using AI and feature optimization
techniques. In particular, the DL model using AST 3-gram provides an effective
alternative for fileless malware detection.
Future research directions
De-obfuscation: Researching techniques to restore obfuscated
scripts (de-obfuscation). Model hardening: Improving the
accuracy of detection models and developing automated,
integrated security systems.
'연구 보고서' 카테고리의 다른 글
A deep dive into where artificial intelligence is headed:technical, ethical, and social aspects (0) | 2025.01.17 |
---|---|
양평군 인공지능(AI) 활용 방안 연구보고서 (0) | 2025.01.17 |
연구보고서: AI 기반 악성 PowerShell 탐지 평가 및 특징 최적화 (0) | 2025.01.17 |
3D 공간 데이터 처리기술 및 그 응용 분야 (0) | 2025.01.16 |