Periodic Reporting for period 1 - OPTIMA (Organization sPecific Threat Intelligence Mining and sharing)
Okres sprawozdawczy: 2022-12-01 do 2025-03-31
The Research & Innovation Objectives (RIO) of the project are as follows:
1. RIO1-To develop techniques for automatic extraction of threat intelligence using OSINT data for multiple institutions (eg., health care, finance, IoT, education) using deep learning approaches.
2. RIO2-To create a novel automated system to derive Indicator of Compromise (IOC) based on word embedding and syntactic dependencies of words to identify unseen IOCs. Utilizing the extracted IOCs a threat index will be estimated to define the impact of threat and attack trends across individual organizations;
3. RIO3-To build a system by integrating cryptographic tools and Federated learning which will enable an organization to anonymously share threat logs with different parties in a privacy-preserving manner.
The OPTIMA project (Organization-sPecific Threat Intelligence Mining and shAring) developed advanced AI-driven tools and frameworks to generate, analyze, and securely share cyber threat intelligence (CTI) tailored to organizational needs. The core outcome, OSTIS, enables organization-specific CTI generation through a dedicated crawler and NLP pipeline that extracts threat data from reliable sources, classifies it by domain (e.g. healthcare, finance), and visualizes attack patterns via knowledge graphs. Explainable AI tools like SHAP were integrated to interpret threat predictions and support trust in automation. Complementing OSTIS, we proposed SeCTIS, a privacy-preserving CTI sharing framework using Blockchain and Swarm Learning. SeCTIS ensures secure collaboration and verifiable trust among participants through Zero-Knowledge Proofs. The MoRSE and IntellBot systems advanced AI-based CTI delivery by deploying Retrieval-Augmented Generation (RAG) models to provide accurate, real-time cybersecurity insights. Additionally, our efforts in darknet traffic analysis, malware visualization, and multi-modal threat detection delivered interpretable models using SHAP, GradCAM, and LIME. In parallel, we addressed security in federated learning (FL) with tools like DLShield, SecDefender, and LFGuard, which detect low-quality or poisoned models and improve global accuracy while preserving privacy. Through these contributions, OPTIMA has enhanced both the granularity and trustworthiness of CTI across diverse domains, enabling proactive, explainable, and collaborative cybersecurity defense.
(1) Threat Intelligence Generation and Analysis
-Developed OSTIS, a novel end-to-end framework that collects threat data from curated online sources using a custom web crawler.
-Applied deep learning models (BERT-based) for domain classification, achieving an F1-score of 0.93 and entity-relation extraction, reaching 0.95 and 0.89 F1-scores respectively.
-Introduced Explainable AI techniques (SHAP) to provide interpretability of CTI predictions, aiding human analysts in trust calibration.
-Constructed OSTIKG, an organization-specific threat knowledge graph, enabling contextual visualization of attack patterns, actors, and tools.
(2) CTI Sharing and Privacy-Preserving Collaboration
-Designed SeCTIS, a secure CTI sharing framework combining Blockchain and Swarm Learning.
-Integrated Zero Knowledge Proofs to assess data/model integrity and validate participant trustworthiness during collaborative CTI exchange.
- Demonstrated resistance to data inference and poisoning attacks through rigorous threat modeling.
(3) Explainable Cyber Threat Analysis
-Conducted darknet traffic analysis using SHAP, LIME, and counterfactuals across ISCXTor2016 and CIC-Darknet2020 datasets.
-Identified key discriminative features (e.g. Protocol, Source Port, IdleMax) and extracted malicious IPs, malware types, and TTPs such as MITRE’s T1071.
(4)Malware Detection and Visualization
-Proposed multi-modal deep learning fusion techniques combining visual features (e.g. entropy graphs, SimHash) for malware classification.
-Achieved 100% detection rate on benchmark datasets (Malhub, BIG2015) and integrated adversarial robustness using GAN-based retraining.
-Developed vDefender for hypervisor-layer introspection, achieving 95.8% F1-score in detecting new malware behaviors.
(5)Secure Federated Learning
-Addressed FL security via DLShield, SecDefender, and LFGuard, capable of detecting low-quality or poisoned client models.
-Demonstrated up to 24% improvement in source class recall and 22.8% reduction in attack success rate, with minimal degradation in global accuracy.
(6) LLM-based CTI Delivery Systems
-Introduced MoRSE and IntellBot, cybersecurity-focused Retrieval-Augmented Generation (RAG) chatbots, outperforming GPT-4 in answer correctness and relevance by over 10%, verified on 600 cybersecurity-specific queries.
1)OSTIS Framework and Threat Knowledge Graph:
-Developed a web-crawling pipeline and entity extraction system to collect and process threat data from trusted sources.
-Designed domain classification and relation extraction models achieving F1-scores of 0.93 and 0.89 respectively.
-Constructed the OSTIS Knowledge Graph (OSTIKG), enabling graph-based threat reasoning and visualization tailored to domains such as healthcare, ICS, IoT, etc.
2)RAG-based CTI Delivery Systems
-Developed two cybersecurity co-pilot systems, MoRSE and IntellBot, based on Retrieval-Augmented Generation (RAG) architecture and evaluated them on over 600 cybersecurity-specific queries.
-We demonstrate that MoRSE leverages its unique real-time cybersecurity keyword detection capability to enhance response accuracy by 10% compared to GPT-4, effectively addressing the critical need for timely and precise security analysis.
-IntellBot aggregates data from diverse sources such as threat reports, vulnerability databases, and CTI feeds. It achieves high relevance, with BERT scores > 0.8 and cosine similarity scores > 0.8–1.0 in Question Answer (QA) evaluations.
3)Secure CTI Sharing
-Developed SeCTIS a secure CTI sharing framework which adopts an Swarm-Learning (SL) Network to generate a CTI Model in a distributed manner collaboratively. Since SL does not require data to be shared with a central entity, this decentralization protects data privacy as the raw data never leaves the local node, ensuring the confidentiality of each organization’s data.
-Trust among participants and quality of CTI data: SeCTIS provides a process based on validator nodes to assess CTI data and model quality using reputation scores. In addition, through the Zero-Knowledge Proofs (ZKP) mechanism, validator activities can be verified ensuring that malicious entities cannot compromise the system. These mechanisms make SeCTIS also a collaborative trust framework.
-Interoperability and automation: SeCTIS provides a middleware that can manage heterogeneous data formats and establish a unique methodology to be employed. Indeed, only model parameters (weights and biases) are shared, not the raw data. Moreover, different ML frameworks can be used to train local models as long as they can produce compatible model parameters for aggregation, thus enhancing interoperability across different platforms and tools. Furthermore, automation is achieved through both the decentralized and autonomous model training and the automated validation of participants, thus reducing the need for manual oversight and intervention.
-Scalability: SeCTIS offers a scalable solution by distributing the workload of training models across multiple nodes which reduces the burden on any single entity and allows the network to expand as needed.
4)Explainable Threat Detection and Interpretation
-Integrated SHAP and LIME for interpretability of predictions across all AI models.
-Applied XAI techniques to darknet traffic analysis, revealing key features and adversary strategies mapped to MITRE ATT&CK knowledge base.
5)Advanced Malware Detection
-Proposed multimodal fusion approaches for malware detection using CNNs trained on grayscale, entropy, and SimHash image representations.
-Achieved near-perfect detection rates (F1 > 0.99) with robustness against adversarial attacks using Generative Adversarial Networks (GAN).
6)Federated Learning Security
-Designed FL-specific defenses including DLShield, SecDefender, and LFGuard.
-Demonstrated 22.8% reduction in attack success rate and 24% improvement in source class recall across benchmark datasets.