├── .gitignore
├── taxonomy
    ├── taxonomy.png
    ├── taxonomy.xlsx
    └── README.md
├── analyses
    ├── taxonomy.xlsx
    ├── datasets_labeling_summary.csv
    └── study_taxonomy_analysis.ipynb
├── LICENSE
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store


--------------------------------------------------------------------------------
/taxonomy/taxonomy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hs-esslingen-it-security/Awesome-LLM4SVD/HEAD/taxonomy/taxonomy.png


--------------------------------------------------------------------------------
/analyses/taxonomy.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hs-esslingen-it-security/Awesome-LLM4SVD/HEAD/analyses/taxonomy.xlsx


--------------------------------------------------------------------------------
/taxonomy/taxonomy.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hs-esslingen-it-security/Awesome-LLM4SVD/HEAD/taxonomy/taxonomy.xlsx


--------------------------------------------------------------------------------
/taxonomy/README.md:
--------------------------------------------------------------------------------
1 | # LLM4SVD TAXONOMY 🗂️
2 | 
3 | We categorize existing LLM4SVD approaches according to detection task, input representation, system architecture, and technique. The presented taxonomy allows for meaningful comparison and benchmarking of studies. <br><br>
4 | 
5 | 
6 | 
7 | 
8 | ![Taxonomy of LLM-based vulnerability detection studies.](taxonomy.png)


--------------------------------------------------------------------------------
/analyses/datasets_labeling_summary.csv:
--------------------------------------------------------------------------------
 1 | Dataset,Labeling,Type
 2 | SARD,Synthetic,Mixed
 3 | Juliet C/C++,Synthetic,Synthetic
 4 | Juliet Java,Synthetic,Synthetic
 5 | VulDeePecker,Security Vendor,Mixed
 6 | Draper VDISC,Tool,Mixed
 7 | Devign,Developer,Real (Balanced)
 8 | Big-Vul,Security Vendor,Real (Imbalanced)
 9 | D2A,Tool,Real (Imbalanced)
10 | ReVeal,Developer,Real (Imbalanced)
11 | CVEfixes,Security Vendor,Real (Imbalanced)
12 | CrossVul,Security Vendor,Real (Balanced)
13 | SecurityEval,Synthetic,Mixed
14 | SVEN,Developer,Real (Balanced)
15 | DiverseVul,Developer,Real (Imbalanced)
16 | FormAI,Tool,Synthetic
17 | ReposVul,Tool,Real (Imbalanced)
18 | PrimeVul,Security Vendor,Real (Imbalanced)
19 | MegaVul,Security Vendor,Real (Imbalanced)
20 | CleanVul,Developer,Real (Balanced)
21 | PairVul,Security Vendor,Real (Balanced)
22 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 hs-esslingen-it-security
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome-LLM4SVD 🌟-🧠👩‍💻🔍
  2 | 
  3 | This repository contains the artifacts from the systematic literature review (SLR) on LLM-based software vulnerability detection ("A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models"). 
  4 | The SLR analyzes 263 studies published between January 2020 and November 2025 and provides a structured taxonomy of detection approaches, input representations, system architectures, techniques, and dataset usage.
  5 | 
  6 | 
  7 | ## Table of Contents
  8 | 
  9 | To support open science and reproducibility, we publicly release:
 10 | - 📝 [Surveyed Papers](#papers): A curated list of surveyed papers. This list will be continuously updated to track the latest papers.
 11 | - 🗂️ [Taxonomy](https://github.com/hs-esslingen-it-security/Awesome-LLM4SVD/tree/main/taxonomy): Taxonomy of LLM-based vulnerability detection studies along with the categorization of each surveyed paper.
 12 | - 📝 [Selected Datasets](#datasets): A list of the most commonly used datasets in the surveyed studies with their download sources.
 13 | 
 14 | 
 15 | 
 16 | <br>
 17 | 
 18 | For details, see our [preprint here](https://arxiv.org/abs/2507.22659): 
 19 | 
 20 | 📚 S. Kaniewski, F. Schmidt, M. Enzweiler, M. Menth, und T. Heer. 2025. *A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models*. arXiv:2507.22659.
 21 | ```bibtex
 22 | @preprint{kaniewskiLLM4SVD2025,
 23 |     title={{A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models}}, 
 24 |     author={Kaniewski, Sabrina and Schmidt, Fabian and Enzweiler, Markus and Menth, Michael and Heer, Tobias},
 25 |     year={2025},
 26 |     eprint={2507.22659},
 27 |     archivePrefix={arXiv},
 28 |     primaryClass={cs.SE},
 29 |     url={https://arxiv.org/abs/2507.22659}, 
 30 | }
 31 | ```
 32 | 
 33 | 
 34 | <br>
 35 | 
 36 | - 🤝 [Contribute to this repository](#contribution)
 37 | - ⚖️ [License](#license)
 38 | 
 39 | 
 40 | <br>
 41 | 
 42 | ----------------
 43 | ----------------
 44 | 
 45 | ## Papers
 46 | 
 47 | > **Note:** Entries marked with ✨ indicate the latest papers that are not discussed in the preprint of the SLR. The latest preprint version covers all studies up to November 2025.
 48 | 
 49 | 
 50 | ### 2025
 51 | - (11/2025) Leveraging Self-Paced Learning for Software Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2511.09212)] [[Code](https://figshare.com/s/bef3211194fc18fe375e)]
 52 | - (11/2025) Specification-Guided Vulnerability Detection with Large Language Models.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2511.04014)] [[Code](https://github.com/zhuhaopku/VulInstruct-temp)]
 53 | - (11/2025) Compressing Large Language Models for SQL Injection Detection: A Case Study on Deep Seek-Coder and Meta-llama-3-70b-instruct.  **`FRUCT 2025`** [[Paper](https://ieeexplore.ieee.org/document/11239157)]
 54 | - (11/2025) VulTrLM: LLM-assisted Vulnerability Detection via AST Decomposition and Comment Enhancement.  **`EMSE 2025`** [[Paper](https://link.springer.com/article/10.1007/s10664-025-10738-7)]
 55 | - (11/2025) Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open and Industry Data.  **`PROFES 2025`** [[Paper](https://arxiv.org/abs/2509.09313)] [[Code](https://github.com/CybersecurityLab-unibz/cross_domain_evaluation)]
 56 | - (11/2025) Learning-based Models for Vulnerability Detection: An Extensive Study.  **`EMSE 2025`** [[Paper](https://arxiv.org/abs/2408.07526)] [[Code](https://figshare.com/s/bde8e41890e8179fbe5f?file=41894784)]
 57 | - (11/2025) A Sequential Multi-Stage Approach for Code Vulnerability Detection via Confidence- and Collaboration-based Decision Making.  **`EMNLP 2025`** [[Paper](https://aclanthology.org/2025.emnlp-main.1071/)]
 58 | - (10/2025) Leveraging Intra-and Inter-References in Vulnerability Detection using Multi-Agent Collaboration Based on LLMs.  **`Cluster Computing 2025`** [[Paper](https://link.springer.com/article/10.1007/s10586-025-05721-2)]
 59 | - (10/2025) iCodeReviewer: Improving Secure Code Review with Mixture of Prompts.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.12186)]
 60 | - (10/2025) Bridging Semantics \& Structure for Software Vulnerability Detection using Hybrid Network Models.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.10321)] [[Code](https://zenodo.org/records/17259519)]
 61 | - (10/2025) FuncVul: An Effective Function Level Vulnerability Detection Model using LLM and Code Chunk.  **`ESORICS 2025`** [[Paper](https://arxiv.org/abs/2506.19453)] [[Code](https://github.com/sajalhalder/FuncVul)]
 62 | - (10/2025) On Selecting Few-Shot Examples for LLM-based Code Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.27675)]
 63 | - (10/2025) Towards Explainable Vulnerability Detection With Large Language Models.  **`TSE 2025`** [[Paper](https://arxiv.org/abs/2406.09701)]
 64 | - (10/2025) MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.04397)]
 65 | - (10/2025) Llama-Based Source Code Vulnerability Detection: Prompt Engineering vs Fine Tuning.  **`ESORICS 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-032-07884-1_15)] [[Code](https://github.com/DynaSoumhaneOuchebara/Llama-based-vulnerability-detection)]
 66 | - (10/2025) Real-VulLLM: An LLM Based Assessment Framework in the Wild.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.04056)]
 67 | - (10/2025) Distilling Lightweight Language Models for C/C++ Vulnerabilities.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.06645)] [[Code](https://github.com/yangxiaoxuan123/ FineSec_detect)]
 68 | - (10/2025) A Zero-Shot Framework for Cross-Project Vulnerability Detection in Source Code.  **`EMSE 2025`** [[Paper](https://link.springer.com/article/10.1007/s10664-025-10749-4)] [[Code](https://github.com/Radowan98/ZSVulD)]
 69 | - (10/2025) Sparse-MoE: Syntax-Aware Multi-view Mixture of Experts for Long-Sequence Software Vulnerability Detection.  **`ADMA 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-981-95-3456-2_24)]
 70 | - (09/2025) DeepVulHunter: Enhancing the Code Vulnerability Detection Capability of LLMs through Multi-Round Analysis.  **`JIIS 2025`** [[Paper](https://link.springer.com/article/10.1007/s10844-025-00982-0)]
 71 | - (09/2025) Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2412.12039)]
 72 | - (09/2025) GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts.  **`ASE 2025`** [[Paper](https://link.springer.com/article/10.1007/s10515-025-00550-4)] [[Code](https://github.com/chenyn273/GPTVD)]
 73 | - (09/2025) An Advanced Detection Framework for Embedded System Vulnerabilities.  **`IEEE Access 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11153853)]
 74 | - (09/2025) Utilizing Large Programming Language Models on Software Vulnerability Detection.  **`ASYU 2025`** [[Paper](https://ieeexplore.ieee.org/document/11208282)]
 75 | - (09/2025) MAVUL: Multi-Agent Vulnerability Detection via Contextual Reasoning and Interactive Refinement.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2510.00317)] [[Code](https://github.com/youpengl/MAVUL)]
 76 | - (09/2025) Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2509.12629)] [[Code](https://github.com/sssszh/ELVul4LLM)]
 77 | - (09/2025) VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2509.11523)]
 78 | - (09/2025) PIONEER: Improving the Robustness of Student Models when Compressing Pre-Trained Models of Code.  **`ASE 2025`** [[Paper](https://link.springer.com/article/10.1007/s10515-025-00560-2)] [[Code](https://github.com/illsui1on/PIONEER)]
 79 | - (08/2025) VulPr: A Prompt Learning-based Method for Vulnerability Detection.  **`EIT 2025`** [[Paper](https://ieeexplore.ieee.org/document/11231886)]
 80 | - (08/2025) MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning.  **`IRI 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11153184)]
 81 | - (08/2025) Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/pdf/2508.04448)] [[Code](https://github.com/Damian0401/ProjectAnalyzer)]
 82 | - (08/2025) Enhancing Fine-Grained Vulnerability Detection With Reinforcement Learning.  **`TSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11145224)] [[Code](https://github.com/YuanJiangGit/RLFD)]
 83 | - (08/2025) CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2508.11599)]
 84 | - (08/2025) Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses?.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2507.21817)] [[Code](https://github.com/yikun-li/TitanVul-BenchVul)]
 85 | - (08/2025) LLM-GUARD: Large Language Model-Based Detection and Repair of Bugs and Security Vulnerabilities in C++ and Python.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2508.16419)] [[Code](https://github.com/NoujoudNader/LLM-Bugs-Detection)]
 86 | - (08/2025) Multimodal Fusion for Vulnerability Detection: Integrating Sequence and Graph-Based Analysis with LLM Augmentation.  **`MAPR 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11133833)]
 87 | - (08/2025) SAFE: A Novel Approach For Software Vulnerability Detection from Enhancing The Capability of Large Language Models.  **`ASIACCS 2025`** [[Paper](https://arxiv.org/abs/2409.00882)]
 88 | - (08/2025) Software Vulnerability Detection using Large Language Models.  **`SecureComm 2025`** [[Paper](https://arxiv.org/abs/2410.00249)]
 89 | - (08/2025) Data and Context Matter: Towards Generalizing AI-based Software Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2508.16625)]
 90 | - (08/2025) Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2508.01451)] [[Code](https://zenodo.org/records/15871507)]
 91 | - (08/2025) Improving Software Security Through a LLM-Based Vulnerability Detection Model.  **`DEXA 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-032-02049-9_9)]
 92 | - (07/2025) An Automatic Classification Model for Long Code Vulnerabilities Based on the Teacher-Student Framework.  **`QRS 2025`** [[Paper](https://ieeexplore.ieee.org/document/11216609)]
 93 | - (07/2025) LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models.  **`USENIX Security 2025`** [[Paper](https://arxiv.org/abs/2507.16585)] [[Code](https://github.com/qcri/llmxcpg)] [[Code](https://zenodo.org/records/15614095)]
 94 | - (07/2025) CLeVeR: Multi-modal Contrastive Learning for Vulnerability Code Representation.  **`ACL 2025`** [[Paper](https://aclanthology.org/2025.findings-acl.414/)] [[Code](https://github.com/yoimiya-nlp/CLeVeR)]
 95 | - (07/2025) Revisiting Pre-trained Language Models for Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2507.16887)]
 96 | - (07/2025) Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2507.03051)]
 97 | - (07/2025) HgtJIT: Just-in-Time Vulnerability Detection Based on Heterogeneous Graph Transformer.  **`TDSC 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11072308)]
 98 | - (07/2025) AI-Powered Vulnerability Detection in Code Using BERT-Based LLM with Transparency Measures.  **`ITC-Egypt 2025`** [[Paper](https://ieeexplore.ieee.org/document/11186618)]
 99 | - (07/2025) Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories.  **`Unknown 2025`** [[Paper](https://arxiv.org/abs/2503.03586)]
100 | - (06/2025) VulnTeam: A Team Collaboration Framework for LLM-based Vulnerability Detection.  **`IJCNN 2025`** [[Paper](https://ieeexplore.ieee.org/document/11229292)]
101 | - (06/2025) One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE).  **`PACMSE 2025`** [[Paper](https://arxiv.org/abs/2501.16454)]
102 | - (06/2025) Improving Vulnerability Type Prediction and Line-Level Detection via Adversarial Training-based Data Augmentation and Multi-Task Learning.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.23534)] [[Code](https://github.com/Karelye/EDAT-MLT)]
103 | - (06/2025) Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2406.11147)] [[Code](https://github.com/knowledgerag4llmvuld/knowledgerag4llmvuld)]
104 | - (06/2025) Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.10104)]
105 | - (06/2025) Evaluating LLaMA 3.2 for Software Vulnerability Detection.  **`EICC 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-94855-8_3)]
106 | - (06/2025) How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python?.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2408.10495)] [[Code](https://github.com/jianian0318/LLMSecureCode)]
107 | - (06/2025) Detecting Code Vulnerabilities using LLMs.  **`DSN 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11068842)] [[Code](https://github.com/a24167566/LLMs-Code-Vulnerability-Detection)]
108 | - (06/2025) LPASS: Linear Probes as Stepping Stones for Vulnerability Detection using Compressed LLMs.  **`JISA 2025`** [[Paper](https://www.sciencedirect.com/science/article/pii/S2214212625001620)]
109 | - (06/2025) Smart Cuts: Enhance Active Learning for Vulnerability Detection by Pruning Bad Seeds.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.20444)]
110 | - (06/2025) CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2411.17274)] [[Code](https://github.com/yikun-li/CleanVul)]
111 | - (06/2025) Large Language Models for Multilingual Vulnerability Detection: How Far Are We?.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.07503)] [[Code](https://github.com/SpanShu96/Large-Language-Model-for-Multilingual-Vulnerability-Detection/tree/main)]
112 | - (06/2025) Large Language Models for In-File Vulnerability Localization Can Be ""Lost in the End"".  **`PACMSE 2025`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3715758)] [[Code](https://zenodo.org/records/14840519)]
113 | - (06/2025) LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2401.16185)] [[Code](https://anonymous.4open.science/r/LLM4Vuln/README.md)]
114 | - (06/2025) ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2408.16028)] [[Code](https://anonymous.4open.science/r/anvil)]
115 | - (06/2025) Line-level Semantic Structure Learning for Code Vulnerability Detection.  **`Internetware 2025`** [[Paper](https://arxiv.org/abs/2407.18877)] [[Code](https://figshare.com/articles/dataset/CSLS_model_code_and_data/26391658)]
116 | - (06/2025) SecureMind: A Framework for Benchmarking Large Language Models in Memory Bug Detection and Repair.  **`ISMM 2025`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3735950.3735954)] [[Code](https://github.com/HuantWang/SecureMind)]
117 | - (06/2025) VuL-MCBERT: A Vulnerability Detection Method Based on Self-Supervised Contrastive Learning.  **`CAIBDA 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11183103)]
118 | - (06/2025) Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.07390)] [[Code](https://github.com/Xin-Cheng-Wen/PO4Vul)]
119 | - (06/2025) Beyond Static Pattern Matching? Rethinking Automatic Cryptographic API Misuse Detection in the Era of LLMs.  **`PACMSE 2025`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3728875)]
120 | - (06/2025) An Insight into Security Code Review with LLMs: Capabilities, Obstacles, and Influential Factors.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2401.16310)] [[Code](https://zenodo.org/records/15572151)]
121 | - (05/2025) SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2505.19828)] [[Code](https://github.com/basimbd/SecVulEval)]
122 | - (05/2025) AutoAdapt: On the Application of AutoML for Parameter-Efficient Fine-Tuning of Pre-Trained Code Models.  **`TOSEM 2025`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3734867)] [[Code](https://github.com/serval-uni-lu/AutoAdapt)]
123 | - (05/2025) Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues.  **`ICSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11028308)]
124 | - (05/2025) LLaVul: A Multimodal LLM for Interpretable Vulnerability Reasoning about Source Code.  **`ICSC 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11140501)]
125 | - (05/2025) A Comparative Study of Machine Learning and Large Language Models for SQL and NoSQL Injection Vulnerability Detection.  **`SIST 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11139190)]
126 | - (05/2025) Are Sparse Autoencoders Useful for Java Function Bug Detection?.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2505.10375)]
127 | - (05/2025) ♪ With a Little Help from My (LLM) Friends: Enhancing Static Analysis with LLMs to Detect Software Vulnerabilities.  **`ICSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11028575)]
128 | - (05/2025) GraphCodeBERT-Augmented Graph Attention Networks for Code Vulnerability Detection.  **`CAI 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11050748)]
129 | - (05/2025) Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2505.15088)]
130 | - (05/2025) Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2505.10961)] [[Code](https://figshare.com/s/1514bc9a7aa64b46d94e)]
131 | - (05/2025) Adversarial Training for Robustness Enhancement in LLM-Based Code Vulnerability Detection.  **`CISCE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11065803)]
132 | - (05/2025) Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2505.17460)]
133 | - (05/2025) An Automated Code Review Framework Based on BERT and Qianwen Large Model.  **`CCAI 2025`** [[Paper](https://ieeexplore.ieee.org/document/11189422)]
134 | - (04/2025) A Software Vulnerability Detection Model Combined with Graph Simplification.  **`AIBDF 2025`** [[Paper](https://dl.acm.org/doi/full/10.1145/3718491.3718525)]
135 | - (04/2025) Human-Understandable Explanation for Software Vulnerability Prediction.  **`JSS 2025`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0164121225001232)] [[Code](https://github.com/quy-ng/human-xai-software-vulnerability-prediction)]
136 | - (04/2025) Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2504.16584)] [[Code](https://huggingface.co/floxihunter/codegen-mono-CWEdetect)] [[Code](https://huggingface.co/datasets/floxihunter/synthetic_python_cwe)]
137 | - (04/2025) Vulnerability Detection with Code Language Models: How Far are We?.  **`ICSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11029911)] [[Code](https://github.com/DLVulDet/PrimeVul)]
138 | - (04/2025) Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2504.13474)] [[Code](https://anonymous.4open.science/r/CORRECT/README.md)]
139 | - (04/2025) IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities.  **`ICLR 2025`** [[Paper](https://arxiv.org/abs/2405.17238)] [[Code](https://github.com/iris-sast/iris)]
140 | - (04/2025) Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability Prediction.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2504.13676)]
141 | - (04/2025) An Ensemble Transformer Approach with Cross-Attention for Automated Code Security Vulnerability Detection and Documentation.  **`ISDFS 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11012039)]
142 | - (04/2025) Metamorphic-Based Many-Objective Distillation of LLMs for Code-Related Tasks.  **`ICSE 2025`** [[Paper](https://ieeexplore.ieee.org/document/11029766)] [[Code](https://zenodo.org/records/14857610)]
143 | - (04/2025) XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection.  **`The Journal of Supercomputing 2025`** [[Paper](https://link.springer.com/article/10.1007/s11227-025-07198-7)]
144 | - (04/2025) Leveraging Multi-Task Learning to Improve the Detection of SATD and Vulnerability.  **`ICPC 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11025930)] [[Code](https://github.com/moritzmock/multitask-vulberability-detection)]
145 | - (04/2025) Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection \& Repair in the IDE.  **`ICSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11029760)] [[Code](https://figshare.com/articles/dataset/Closing_the_Gap_A_User_Study_on_the_Real-world_Usefulness_of_AI-powered_Vulnerability_Detection_Repair_in_the_IDE/26367139?file=52478936)]
146 | - (04/2025) R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2504.04699)] [[Code](https://github.com/martin-wey/R2Vul)]
147 | - (04/2025) Context-Enhanced Vulnerability Detection Based on Large Language Models.  **`TOSEM 2025`** [[Paper](https://arxiv.org/abs/2504.16877)] [[Code](https://github.com/DoeSEResearch/PacVD)]
148 | - (04/2025) SSRFSeek: An LLM-based Static Analysis Framework for Detecting SSRF Vulnerabilities in PHP Applications.  **`AINIT 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11035424)]
149 | - (03/2025) CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection.  **`TASE 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-98208-8_15)] [[Code](https://github.com/CASTLE-Benchmark)]
150 | - (03/2025) SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection With LLMs?.  **`TSE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10910240)]
151 | - (03/2025) Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities.  **`ICST 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10988968)] [[Code](https://github.com/seal-research/secvul-llm-study/)]
152 | - (03/2025) Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis.  **`ADIoT 2025`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-85593-1_9)]
153 | - (03/2025) Steering Large Language Models for Vulnerability Detection.  **`ICASSP 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10887736)]
154 | - (03/2025) HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2503.10793)]
155 | - (03/2025) You Only Train Once: A Flexible Training Framework for Code Vulnerability Detection Driven by Vul-Vector.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2506.10988)]
156 | - (03/2025) Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2503.01449)] [[Code](https://github.com/soarsmu/SVD-Bench)]
157 | - (03/2025) Reasoning with LLMs for Zero-Shot Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2503.17885)] [[Code](https://github.com/Erroristotle/VulnSage)]
158 | - (02/2025) EFVD: A Framework of Source Code Vulnerability Detection via Fusion of Enhanced Graph Representation Learning and Pre-trained Transformer-Based Model.  **`CNSSE 2025`** [[Paper](https://dl.acm.org/doi/full/10.1145/3732365.3732421)]
159 | - (02/2025) Fine-Tuning Transformer LLMs for Detecting SQL Injection and XSS Vulnerabilities.  **`ICAIIC 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10920868)]
160 | - (02/2025) Finetuning Large Language Models for Vulnerability Detection.  **`IEEE Access 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10908394)] [[Code](https://github.com/rmusab/vul-llm-finetune)]
161 | - (02/2025) Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study.  **`IEEE Access 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10879492)]
162 | - (02/2025) Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy.  **`CDMA 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10908746)]
163 | - (02/2025) AIDetectVul: Software Vulnerability Detection Method Based on Feature Fusion of Pre-trained Models.  **`ICCECE 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10985370)]
164 | - (01/2025) DMVL4AVD: A Deep Multi-View Learning Model for Automated Vulnerability Detection.  **`Neural Comput. Appl. 2025`** [[Paper](https://link.springer.com/article/10.1007/s00521-024-10892-x)] [[Code](https://drive.google.com/file/d/1-qWqmRuBi8kRAAE2yiG6JNiY8vLYxXlz/view)]
165 | - (01/2025) Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2412.14841)]
166 | - (01/2025) CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2501.04510)]
167 | - (01/2025) Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2412.18260)] [[Code](https://github.com/SakiRinn/LLM4CVD)] [[Code](https://huggingface.co/datasets/xuefen/VulResource)]
168 | - (01/2025) To Err is Machine: Vulnerability Detection Challenges LLM Reasoning.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2403.17218)] [[Code](https://figshare.com/articles/dataset/Data_Package_for_LLM_Vulnerability_Detection_Study/27368025)]
169 | - (01/2025) Streamlining Security Vulnerability Triage with Large Language Models.  **`arXiv 2025`** [[Paper](https://arxiv.org/abs/2501.18908)] [[Code](https://zenodo.org/records/14776104)]
170 | - (01/2025) Sink Vulnerability Type Prediction Using Small Language Model (SLM).  **`IC3ECSBHI 2025`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10991300)]
171 | - (01/2025) A Vulnerability Detection Framework Based on Graph Decomposition Fusion and Augmented Abstract Syntax Tree.  **`BDICN 2025`** [[Paper](https://dl.acm.org/doi/full/10.1145/3727353.3727471)]
172 | 
173 | ### 2024
174 | - (12/2024) Vulnerability Detection in Popular Programming Languages with Language Models.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2412.15905)] [[Code](https://github.com/syafiq/llm_vd)]
175 | - (12/2024) On the Compression of Language Models for Code: An Empirical Study on CodeBERT.  **`SANER 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10992473)] [[Code](https://zenodo.org/records/14357478)]
176 | - (12/2024) LLM-Based Approach for Buffer Overflow Detection in Source Code.  **`CIT 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/11021816)]
177 | - (12/2024) A Source Code Vulnerability Detection Method Based on Positive-Unlabeled Learning.  **`RICAI 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10911761)]
178 | - (12/2024) Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows.  **`ICMLA 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10903489)]
179 | - (12/2024) EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code.  **`BigData 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10825609)]
180 | - (12/2024) Software Vulnerability Detection Using LLM: Does Additional Information Help?.  **`ACSAC 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10917361)] [[Code](https://github.com/research7485/vulnerability_detection)]
181 | - (12/2024) Enhancing Source Code Vulnerability Detection Using Flattened Code Graph Structures.  **`ICFTIC 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10913325)]
182 | - (12/2024) SQL Injection Vulnerability Detection Based on Pissa-Tuned Llama 3 Large Language Model.  **`ICFTIC 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10912886)]
183 | - (12/2024) A Method of SQL Injection Attack Detection Based on Large Language Models.  **`CNTEIE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10987904)]
184 | - (12/2024) MVD: A Multi-Lingual Software Vulnerability Detection Framework.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2412.06166)] [[Code](https://figshare.com/s/10ec70108294a225f391)]
185 | - (12/2024) Python Source Code Vulnerability Detection Based on CodeBERT Language Model.  **`ACAI 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10899694)]
186 | - (11/2024) RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?.  **`EMNLP 2024`** [[Paper](https://arxiv.org/abs/2410.07573)]
187 | - (11/2024) StagedVulBERT: Multigranular Vulnerability Detection With a Novel Pretrained Code Model.  **`TSE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10746847)] [[Code](https://github.com/YuanJiangGit/StagedVulBERT)]
188 | - (11/2024) Applying Contrastive Learning to Code Vulnerability Type Classification.  **`EMNLP 2024`** [[Paper](https://aclanthology.org/2024.emnlp-main.666/)]
189 | - (11/2024) Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.15735)]
190 | - (11/2024) Enhancing Vulnerability Detection Efficiency: An Exploration of Light-weight LLMs with Hybrid Code Features.  **`JISA 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S2214212624002278)] [[Code](https://github.com/JNL-28/Enhancing-Vulnerability-Detection-Efficiency)]
191 | - (11/2024) Research on the LLM-Driven Vulnerability Detection System Using LProtector.  **`ICDSCA 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10859408)]
192 | - (11/2024) Enhanced LLM-Based Framework for Predicting Null Pointer Dereference in Source Code.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2412.00216)]
193 | - (10/2024) Vulnerability Prediction using Pre-trained Models: An Empirical Evaluation.  **`MASCOTS 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10786510)] [[Code](https://sites.google.com/view/vpllm/)]
194 | - (10/2024) Fine-Tuning Pre-trained Model with Optimizable Prompt Learning for Code Vulnerability Detection.  **`ISSRE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10771498)] [[Code](https://github.com/Exclusisve-V/PromptVulnerabilityDetection)]
195 | - (10/2024) Improving Long-Tail Vulnerability Detection Through Data Augmentation Based on Large Language Models.  **`ICSME 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10795073)] [[Code](https://github.com/LuckyDengXiao/LERT)]
196 | - (10/2024) Exploring AI for Vulnerability Detection and Repair.  **`CARS 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10778769)]
197 | - (10/2024) DetectBERT: Code Vulnerability Detection.  **`GCCIT 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10862235)]
198 | - (10/2024) VULREM: Fine-Tuned BERT-Based Source-Code Potential Vulnerability Scanning System to Mitigate Attacks in Web Applications.  **`Applied Sciences 2024`** [[Paper](https://www.mdpi.com/2076-3417/14/21/9697)]
199 | - (10/2024) A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality.  **`TPS-ISA 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10835695)] [[Code](https://figshare.com/articles/dataset/Reproduction_package_for_paper_A_Qualitative_Study_on_Using_ChatGPT_for_Software_Security_Perception_vs_Practicality_/24452365?file=48008890)]
200 | - (10/2024) Vul-LMGNNs: Fusing Language Models and Online-distilled Graph Neural Networks for Code Vulnerability Detection.  **`Information Fusion 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S1566253524005268)] [[Code](https://github.com/Vul-LMGNN/vul-LMGGNN)]
201 | - (10/2024) SecureQwen: Leveraging LLMs for Vulnerability Detection in Python Codebases.  **`COSE 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0167404824004565)]
202 | - (10/2024) VulnerAI: GPT Based Web Application Vulnerability Detection.  **`ICAMAC 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10828788)]
203 | - (10/2024) DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection.  **`JSS 2024`** [[Paper](nan)] [[Code](https://github.com/Yang-Yanjing/DLAP)]
204 | - (10/2024) Multitask-Based Evaluation of Open-Source LLM on Software Vulnerability.  **`TSE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10706805)] [[Code](https://github.com/vinci-grape/VulEmpirical)]
205 | - (10/2024) Detecting Source Code Vulnerabilities Using Fine-Tuned Pre-Trained LLMs.  **`ICSP 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10846595)]
206 | - (10/2024) A Source Code Vulnerability Detection Method Based on Adaptive Graph Neural Networks.  **`ASE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10765114)]
207 | - (09/2024) Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection.  **`ESORICS 2024`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-70879-4_14)]
208 | - (09/2024) Navigating (In)Security of AI-Generated Code.  **`CSR 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10679468)]
209 | - (09/2024) Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code.  **`ISSTA 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3650212.3652127)] [[Code](https://anonymous.4open.science/r/EXPO/README.md)]
210 | - (09/2024) Can a Llama Be a Watchdog? Exploring Llama 3 and Code Llama for Static Application Security Testing.  **`CSR 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10679444)]
211 | - (09/2024) May the Source Be with You: On ChatGPT, Cybersecurity, and Secure Coding.  **`Information 2024`** [[Paper](https://www.mdpi.com/2078-2489/15/9/572)]
212 | - (09/2024) Enhancing Source Code Security with LLMs: Demystifying The Challenges and Generating Reliable Repairs.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.00571)]
213 | - (09/2024) Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.10490)]
214 | - (09/2024) SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection.  **`ISSTA 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3650212.3652124)] [[Code](https://github.com/Xin-Cheng-Wen/Comment4Vul)]
215 | - (09/2024) Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.01001)] [[Code](https://figshare.com/s/5da14b0776750c6fa787)]
216 | - (09/2024) VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.10756)]
217 | - (08/2024) VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2406.07595)] [[Code](https://github.com/Sweetaroo/VulDetectBench)]
218 | - (08/2024) Defect-Scanner: A Comparative Empirical Study on Language Model and Deep Learning Approach for Software Vulnerability Detection.  **`IJIS 2024`** [[Paper](https://link.springer.com/article/10.1007/s10207-024-00901-4)]
219 | - (08/2024) From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2408.02329)]
220 | - (08/2024) Large Language Models for Secure Code Assessment: A Multi-Language Empirical Study.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2408.06428)]
221 | - (08/2024) Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning.  **`ACL 2024`** [[Paper](https://arxiv.org/abs/2406.03718)] [[Code](https://github.com/CGCL-codes/VulLLM)]
222 | - (08/2024) Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2409.00199)] [[Code](https://anonymous.4open.science/r/Threat_Detection_Modeling-BB7B/README.md)]
223 | - (08/2024) Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection.  **`USENIX Security 2024`** [[Paper](https://www.usenix.org/conference/usenixsecurity24/presentation/risse)] [[Code](https://github.com/niklasrisse/USENIX_2024)] [[Code](https://github.com/niklasrisse/VPP)]
224 | - (08/2024) VulSim: Leveraging Similarity of {Multi-Dimensional.  **`USENIX Security 2024`** [[Paper](https://www.usenix.org/conference/usenixsecurity24/presentation/shimmi)] [[Code](https://github.com/SamihaShimmi/VulSim)]
225 | - (07/2024) Enhancing Software Code Vulnerability Detection Using GPT-4o and Claude-3.5 Sonnet: A Study on Prompt Engineering Techniques.  **`Electronics 2024`** [[Paper](https://www.mdpi.com/2079-9292/13/13/2657)]
226 | - (07/2024) MultiVD: A Transformer-based Multitask Approach for Software Vulnerability Detection.  **`SECRYPT 2024`** [[Paper](https://www.scitepress.org/Papers/2024/127194/127194.pdf)]
227 | - (07/2024) DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection.  **`Internetware 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3671016.3671388)] [[Code](https://github.com/GCVulnerability/DFEPT)]
228 | - (07/2024) Vulnerability Classification on Source Code Using Text Mining and Deep Learning Techniques.  **`QRS 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10727022)] [[Code](https://sites.google.com/view/vulnerabilityclassification/)]
229 | - (07/2024) Exploration On Prompting LLM With Code-Specific Information For Vulnerability Detection.  **`SSE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10664399)]
230 | - (07/2024) Effectiveness of ChatGPT for Static Analysis: How Far Are We?.  **`AIware 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3664646.3664777)] [[Code](https://zenodo.org/records/10828316)]
231 | - (07/2024) Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2408.00197)]
232 | - (07/2024) M2CVD: Enhancing Vulnerability Understanding through Multi-Model Collaboration for Code Vulnerability Detection.  **`TOSEM 2024`** [[Paper](https://arxiv.org/abs/2406.05940)] [[Code](https://github.com/HotFrom/M2CVD)]
233 | - (07/2024) SCL-CVD: Supervised Contrastive Learning for Code Vulnerability Detection via GraphCodeBERT.  **`COSE 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0167404824002992)]
234 | - (07/2024) Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2407.16235)]
235 | - (06/2024) Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT.  **`EASE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3661167.3661281)] [[Code](https://github.com/lhmtriet/LLM4Vul)]
236 | - (06/2024) Greening Large Language Models of Code.  **`ICSE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3639475.3640097)] [[Code](https://github.com/soarsmu/Avatar)]
237 | - (06/2024) Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2406.05892)] [[Code](https://zenodo.org/records/11403208)]
238 | - (06/2024) Evaluating the Impact of Conventional Code Analysis Against Large Language Models in API Vulnerability Detection.  **`EICC 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3655693.3655701)]
239 | - (06/2024) SVulDetector: Vulnerability Detection based on Similarity using Tree-based Attention and Weighted Graph Embedding Mechanisms.  **`COSE 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0167404824002335)] [[Code](https://figshare.com/s/426156a96a83da1d38d0)]
240 | - (05/2024) DB-CBIL: A DistilBert-Based Transformer Hybrid Model Using CNN and BiLSTM for Software Vulnerability Detection.  **`IEEE Access 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10517582)]
241 | - (05/2024) LLM-CloudSec: Large Language Model Empowered Automatic and Deep Vulnerability Analysis for Intelligent Clouds.  **`INFOCOM 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10620804)] [[Code](https://github.com/DPCa0/LLM-CloudSec)]
242 | - (05/2024) LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks.  **`SP 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10646663/)] [[Code](https://github.com/ai4cloudops/SecLLMHolmes)]
243 | - (05/2024) VulD-CodeBERT: CodeBERT-Based Vulnerability Detection Model for C/C++ Code.  **`CISCE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10653337)]
244 | - (05/2024) Large Language Model for Vulnerability Detection: Emerging Results and Future Directions.  **`ICSE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3639476.3639762)] [[Code](https://github.com/soarsmu/ChatGPT-VulDetection)]
245 | - (04/2024) VulnGPT: Enhancing Source Code Vulnerability Detection Using AutoGPT and Adaptive Supervision Strategies.  **`DCOSS-IoT 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10621527)]
246 | - (04/2024) BiT5: A Bidirectional NLP Approach for Advanced Vulnerability Detection in Codebase.  **`Procedia Computer Science 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S1877050924006306)]
247 | - (04/2024) Software Vulnerability and Functionality Assessment using Large Language Models.  **`ICSE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3643787.3648036)]
248 | - (04/2024) Pre-training by Predicting Program Dependencies for Vulnerability Analysis Tasks.  **`ICSE 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10548173)] [[Code](https://zenodo.org/records/10140638)]
249 | - (04/2024) Towards Causal Deep Learning for Vulnerability Detection.  **`ICSE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3597503.3639170)] [[Code](https://figshare.com/s/0ffda320dcb96c249ef2?file=41801019)]
250 | - (04/2024) ProRLearn: Boosting Prompt Tuning-based Vulnerability Detection by Reinforcement Learning.  **`ASE 2024`** [[Paper](https://link.springer.com/article/10.1007/s10515-024-00438-9)] [[Code](https://github.com/ProRLearn/ProRLearn001)]
251 | - (04/2024) VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2404.15596)]
252 | - (03/2024) Python Source Code Vulnerability Detection with Named Entity Recognition.  **`COSE 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0167404824001032)] [[Code](https://github.com/mmeberg/PyVulDet-NER)]
253 | - (03/2024) GRACE: Empowering LLM-based Software Vulnerability Detection with Graph Structure and In-Context Learning.  **`JSS 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0164121224000748)] [[Code](https://github.com/P-E-Vul/GRACE)]
254 | - (03/2024) Learning Defect Prediction from Unrealistic Data.  **`SANER 2024`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10589866)] [[Code](https://zenodo.org/records/10514652)]
255 | - (03/2024) Making Vulnerability Prediction more Practical: Prediction, Categorization, and Localization.  **`IST 2024`** [[Paper](https://www.sciencedirect.com/science/article/pii/S0950584924000636)] [[Code](https://github.com/liucyy/VulPCL)]
256 | - (02/2024) A Preliminary Study on Using Large Language Models in Software Pentesting.  **`NDSS 2024`** [[Paper](https://arxiv.org/abs/2401.17459)]
257 | - (02/2024) TRACED: Execution-aware Pre-training for Source Code.  **`ICSE 2024`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3597503.3608140)] [[Code](https://github.com/ARiSE-Lab/TRACED_ICSE_24)]
258 | - (02/2024) LLbezpeky: Leveraging Large Language Models for Vulnerability Detection.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2401.01269)]
259 | - (02/2024) Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2402.17230)]
260 | - (01/2024) Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability Detection.  **`arXiv 2024`** [[Paper](https://arxiv.org/abs/2401.07466)]
261 | 
262 | ### 2023
263 | - (12/2023) Joint Geometrical and Statistical Domain Adaptation for Cross-domain  Code Vulnerability Detection.  **`EMNLP 2023`** [[Paper](https://aclanthology.org/2023.emnlp-main.788/)]
264 | - (12/2023) ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We?.  **`APSEC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10479409)] [[Code](https://github.com/awsm-research/ChatGPT4Vul)]
265 | - (12/2023) Code Defect Detection Method Based on BERT and Ensemble.  **`ICCC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10507306)]
266 | - (12/2023) Assessing the Effectiveness of Vulnerability Detection via Prompt Tuning: An Empirical Study.  **`APSEC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10479384)] [[Code](https://github.com/P-E-Vul/prompt-empircial-vulnerability)]
267 | - (12/2023) Enhancing Code Security Through Open-source Large Language Models: A Comparative Study.  **`FPS 2023`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-57537-2_15)]
268 | - (12/2023) Optimizing Pre-trained Language Models for Efficient Vulnerability Detection in Code Snippets.  **`ICCC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10507456)]
269 | - (12/2023) Exploring the Limits of ChatGPT in Software Security Applications.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2312.05275)]
270 | - (11/2023) How To Get Better Embeddings with Code Pre-trained Models? An Empirical Study.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2311.08066)]
271 | - (11/2023) AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities.  **`EMSE 2023`** [[Paper](https://link.springer.com/article/10.1007/s10664-023-10346-3)] [[Code](https://github.com/awsm-research/AIBugHunter)]
272 | - (11/2023) The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification.  **`ESEC/FSE 2023`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3611643.3616304)] [[Code](https://zenodo.org/records/10499843)]
273 | - (11/2023) Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation.  **`ESEC/FSE 2023`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3611643.3616358)] [[Code](https://github.com/jacknichao/SVulD)]
274 | - (11/2023) Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2311.04109)] [[Code](https://figshare.com/s/4a16a528d6874aad51a0)]
275 | - (11/2023) Software Vulnerabilities Detection Based on a Pre-trained Language Model.  **`TrustCom 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10538979)]
276 | - (10/2023) DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection.  **`RAID 2023`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3607199.3607242)] [[Code](https://github.com/wagner-group/diversevul)]
277 | - (10/2023) PTLVD:Program Slicing and Transformer-based Line-level Vulnerability Detection System.  **`SCAM 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10356694)] [[Code](https://github.com/chenshixu/PTLVD)]
278 | - (10/2023) Software Vulnerability Detection using Large Language Models.  **`ISSRE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10301302)]
279 | - (10/2023) Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2310.16263)]
280 | - (09/2023) Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge.  **`ASE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10298584)] [[Code](https://github.com/jacknichao/MVulD)]
281 | - (09/2023) DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2309.15324)] [[Code](https://github.com/WJ-8/DefectHunter)]
282 | - (09/2023) When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection.  **`ASE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10298363)] [[Code](https://github.com/PILOT-VD-2023/PILOT)]
283 | - (08/2023) Using ChatGPT as a Static Application Security Testing Tool.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2308.14434)] [[Code](https://github.com/abakhshandeh/ChatGPTasSAST)]
284 | - (08/2023) VulExplainer: A Transformer-Based Hierarchical  Distillation for Explaining Vulnerability Types.  **`TSE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10220166)] [[Code](https://github.com/awsm-research/VulExplainer)]
285 | - (08/2023) Software Vulnerability Detection with GPT and In-Context Learning.  **`DSC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10381286)]
286 | - (08/2023) Can Large Language Models Find And Fix Vulnerable Software?.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2308.10345)]
287 | - (07/2023) Leveraging Deep Learning Models for Cross-function Null Pointer Risks Detection.  **`AITest 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10229470)]
288 | - (07/2023) An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph.  **`EuroS&P 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10190505)] [[Code](https://github.com/pial08/SemVulDet)]
289 | - (07/2023) VulDetect: A novel technique for detecting software vulnerabilities using Language Models.  **`CSR 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10224924)]
290 | - (07/2023) An Enhanced Vulnerability Detection in Software Using a Heterogeneous Encoding Ensemble.  **`ISCC 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10217978)]
291 | - (06/2023) New Tricks to Old Codes: Can AI Chatbots Replace Static Code Analysis Tools?.  **`EICC 2023`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3590777.3590780)] [[Code](https://github.com/New-Tricks-to-Old-Codes/Replace-Static-Analysis-Tools)]
292 | - (06/2023) Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code.  **`TSE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10153647)] [[Code](https://zenodo.org/records/7123322)]
293 | - (05/2023) An Empirical Study of Deep Learning Models for Vulnerability Detection.  **`ICSE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10172583)] [[Code](https://figshare.com/articles/dataset/An_Empirical_Study_of_Deep_Learning_Models_for_Vulnerability_Detection/20791240?file=39183863)]
294 | - (05/2023) Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2306.01754)]
295 | - (05/2023) Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models.  **`ICSE 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10172346)] [[Code](https://github.com/ReliableCoding/REPEAT)]
296 | - (05/2023) Detecting Vulnerabilities in IoT Software: New Hybrid Model and Comprehensive Data Analysis.  **`JISA 2023`** [[Paper](https://www.sciencedirect.com/science/article/pii/S2214212623000510)]
297 | - (05/2023) VulDefend: A Novel Technique based on Pattern-exploiting Training for Detecting Software Vulnerabilities Using Language Models.  **`JEEIT 2023`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10185860)]
298 | - (04/2023) Evaluation of ChatGPT Model for Vulnerability Detection.  **`arXiv 2023`** [[Paper](https://arxiv.org/abs/2304.07232)]
299 | 
300 | ### 2022
301 | - (12/2022) BBVD: A BERT-based Method for Vulnerability Detection.  **`IJACSA 2022`** [[Paper](https://www.proquest.com/docview/2770373789?pq-origsite=gscholar&fromopenview=true&sourcetype=Scholarly%20Journals)]
302 | - (12/2022) Exploring Transformers for Multi-Label Classification of Java Vulnerabilities.  **`QRS 2022`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10062434)] [[Code](https://github.com/TQRG/VDET-for-Java)]
303 | - (12/2022) Transformer-Based Language Models for Software Vulnerability Detection.  **`ACSAC 2022`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3564625.3567985)] [[Code](https://bitbucket.csiro.au/users/jan087/repos/acsac-2022-submission/browse)]
304 | - (12/2022) PATVD: Vulnerability Detection Based on Pre-training Techniques and Adversarial Training.  **`SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta 2022`** [[Paper](https://ieeexplore.ieee.org/abstract/document/10189687/)]
305 | - (11/2022) Multi-view Pre-trained Model for Code Vulnerability Identification.  **`WASA 2022`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-19211-1_11)]
306 | - (11/2022) Distilled and Contextualized Neural Models Benchmarked for Vulnerable Function Detection.  **`Mathematics 2022`** [[Paper](https://www.mdpi.com/2227-7390/10/23/4482)]
307 | - (11/2022) BERT-Based Vulnerability Type Identification with Effective Program Representation.  **`WASA 2022`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-19208-1_23#citeas)]
308 | - (10/2022) VulDeBERT: A Vulnerability Detection System Using BERT.  **`ISSRE 2022`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9985089)] [[Code](https://github.com/SKKU-SecLab/VulDeBERT)]
309 | - (07/2022) VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection.  **`IJCNN 2022`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9892280)] [[Code](https://github.com/ICL-ml4csec/VulBERTa)]
310 | - (06/2022) Cyber Security Vulnerability Detection Using Natural Language Processing.  **`AIIoT 2022`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9817336)]
311 | - (05/2022) LineVul: A Transformer-based Line-level Vulnerability Prediction.  **`MSR 2022`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3524842.3528452)] [[Code](https://github.com/awsm-research/LineVul)]
312 | - (05/2022) LineVD: Statement-level Vulnerability Detection using Graph Neural Networks.  **`MSR 2022`** [[Paper](https://dl.acm.org/doi/abs/10.1145/3524842.3527949)] [[Code](https://github.com/davidhin/linevd)]
313 | - (03/2022) Intelligent Detection of Vulnerable Functions in Software through Neural Embedding-based Code Analysis.  **`IJNM 2022`** [[Paper](https://onlinelibrary.wiley.com/doi/full/10.1002/nem.2198)] [[Code](https://cybercodeintelligence.github.io/CyberCI/)]
314 | - (01/2022) Deep Neural Embedding for Software Vulnerability Discovery: Comparison and Optimization.  **`Security and Communication Networks 2022`** [[Paper](https://onlinelibrary.wiley.com/doi/full/10.1155/2022/5203217)] [[Code](https://cybercodeintelligence.github.io/CyberCI/)]
315 | 
316 | ### 2021
317 | - (12/2021) Automated Software Vulnerability Detection via Pre-trained Context Encoder and Self Attention.  **`ICDF2C 2021`** [[Paper](https://link.springer.com/chapter/10.1007/978-3-031-06365-7_15)]
318 | - (11/2021) Detecting Integer Overflow Errors in Java Source Code via Machine Learning.  **`ICTAI 2021`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9643278)]
319 | - (06/2021) Unified Pre-training for Program Understanding and Generation.  **`NAACL 2021`** [[Paper](https://par.nsf.gov/servlets/purl/10336701)] [[Code](https://github.com/wasiahmad/PLBART)]
320 | - (05/2021) Security Vulnerability Detection Using Deep Learning Natural Language Processing.  **`INFOCOM 2021`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9484500)]
321 | 
322 | ### 2020
323 | - (06/2020) Exploring Software Naturalness through Neural Language Models.  **`arXiv 2020`** [[Paper](https://arxiv.org/abs/2006.12641)]
324 | 
325 | 
326 | ## Datasets
327 | 
328 | - SARD. [[Repo](https://samate.nist.gov/SARD)]
329 | - Juliet C/C++. [[Repo](https://samate.nist.gov/SARD/test-suites/112)]
330 | - Juliet Java. [[Repo](https://samate.nist.gov/SARD/test-suites/111)]
331 | - VulDeePecker.  **`NDSS`** [[Paper](https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-2_Li_paper.pdf)] [[Repo](https://github.com/CGCL-codes/VulDeePecker)]
332 | - Draper.  **`ICMLA`** [[Paper](https://ieeexplore.ieee.org/document/8614145)] [[Repo](https://osf.io/d45bw/)]
333 | - Devign.  **`NeurIPS`** [[Paper](https://proceedings.neurips.cc/paper_files/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html)] [[Repo](https://github.com/epicosy/devign)]
334 | - Big-Vul.  **`MSR`** [[Paper](https://dl.acm.org/doi/10.1145/3379597.3387501)] [[Repo](https://github.com/ZeoVan/MSR_20_Code_vulnerability_CSV_Dataset)]
335 | - D2A.  **`ICSE-SEIP`** [[Paper](https://ieeexplore.ieee.org/document/9402126)] [[Repo](https://github.com/IBM/D2A)]
336 | - Reveal.  **`TSE`** [[Paper](https://ieeexplore.ieee.org/abstract/document/9448435)] [[Repo](https://github.com/VulDetProject/ReVeal)]
337 | - CVEfixes.  **`PROMISE`** [[Paper](https://dl.acm.org/doi/10.1145/3475960.3475985)] [[Repo](https://zenodo.org/records/13118970)]
338 | - CrossVul.  **`ESEC/FSE`** [[Paper](https://dl.acm.org/doi/10.1145/3468264.3473122)] [[Repo](https://zenodo.org/records/4734050)]
339 | - SecurityEval.  **`MSR4P&S`** [[Paper](https://dl.acm.org/doi/10.1145/3549035.3561184)] [[Repo](https://github.com/s2e-lab/SecurityEval)]
340 | - DiverseVul.  **`RAID`** [[Paper](https://dl.acm.org/doi/10.1145/3607199.3607242)] [[Repo](https://github.com/wagner-group/diversevul)]
341 | - SVEN.  **`CCS`** [[Paper](https://dl.acm.org/doi/10.1145/3576915.3623175)] [[Repo](https://github.com/eth-sri/sven)]
342 | - FormAI.  **`PROMISE`** [[Paper](https://dl.acm.org/doi/10.1145/3617555.3617874)] [[Repo](https://github.com/FormAI-Dataset/FormAI-dataset)]
343 | - ReposVul.  **`ICSE-Companion`** [[Paper](https://dl.acm.org/doi/10.1145/3639478.3647634)] [[Repo](https://github.com/Eshe0922/ReposVul)]
344 | - PrimeVul.  **`arXiv`** [[Paper](https://arxiv.org/abs/2403.18624)] [[Repo](https://github.com/DLVulDet/PrimeVul)]
345 | - PairVul.  **`arXiv`** [[Paper](https://arxiv.org/abs/2406.11147)] [[Repo](https://github.com/KnowledgeRAG4LLMVulD/KnowledgeRAG4LLMVulD/tree/main/dataset)]
346 | - MegaVul.  **`MSR`** [[Paper](https://dl.acm.org/doi/10.1145/3643991.3644886)] [[Repo](https://github.com/Icyrockton/MegaVul)]
347 | - CleanVul.  **`arXiv`** [[Paper](https://arxiv.org/abs/2411.17274)] [[Repo](https://github.com/yikun-li/CleanVul)]
348 | 
349 | 
350 | 
351 | ## Contribution
352 | 
353 | If you want to suggest additions to the list of studies or datasets, please open a pull request or submit an issue. 
354 | 
355 | 
356 | ## License
357 | 
358 | - 🧠 Code & scripts (`*.py`, `*.ipynb`, etc.): Licensed under the [MIT License](LICENSE).
359 | - 📚 Taxonomy, markdown outputs and lists: Licensed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/).
360 | 
361 | Please cite our paper if you use this resource.
362 | 


--------------------------------------------------------------------------------
/analyses/study_taxonomy_analysis.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "id": "85d7fa6a",
   6 |    "metadata": {},
   7 |    "source": [
   8 |     "## Insights into Taxonomy"
   9 |    ]
  10 |   },
  11 |   {
  12 |    "cell_type": "code",
  13 |    "execution_count": 1,
  14 |    "id": "5a5bf3ec",
  15 |    "metadata": {},
  16 |    "outputs": [],
  17 |    "source": [
  18 |     "import pandas as pd\n",
  19 |     "import seaborn as sns\n",
  20 |     "import numpy as np\n",
  21 |     "import matplotlib\n",
  22 |     "import matplotlib.pyplot as plt\n",
  23 |     "import matplotlib.gridspec as gridspec\n",
  24 |     "import matplotlib.cm as cm\n",
  25 |     "import plotly.colors as pc\n",
  26 |     "import plotly.express as px\n",
  27 |     "import plotly.graph_objects as go\n",
  28 |     "import plotly.colors as pc\n",
  29 |     "import plotly.io as pio\n",
  30 |     "from plotly.subplots import make_subplots\n",
  31 |     "import json\n",
  32 |     "import os\n",
  33 |     "import re\n",
  34 |     "import h5py\n",
  35 |     "pio.renderers.default = \"vscode\"\n",
  36 |     "\n",
  37 |     "from matplotlib.colors import LinearSegmentedColormap\n",
  38 |     "from matplotlib.patches import Rectangle\n",
  39 |     "from matplotlib.patches import Patch\n",
  40 |     "from matplotlib.lines import Line2D\n",
  41 |     "from matplotlib.ticker import MultipleLocator\n",
  42 |     "from matplotlib.ticker import AutoMinorLocator\n",
  43 |     "from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset\n",
  44 |     "from collections import defaultdict\n",
  45 |     "from collections import Counter\n",
  46 |     "from pathlib import Path\n",
  47 |     "\n",
  48 |     "plt.rcParams[\"font.family\"] = \"serif\"\n",
  49 |     "plt.rcParams[\"font.serif\"] = [\"Times New Roman\"]\n",
  50 |     "plt.rcParams[\"mathtext.fontset\"] = \"dejavuserif\" \n",
  51 |     "\n",
  52 |     "sns.set_theme(style=\"white\")\n",
  53 |     "pd.set_option('display.max_rows', None)\n",
  54 |     "pd.set_option('display.max_columns', None)\n",
  55 |     "pd.set_option('display.width', None)         # Prevents wrapping\n",
  56 |     "pd.set_option('display.max_colwidth', None)  # Shows full content in each cell"
  57 |    ]
  58 |   },
  59 |   {
  60 |    "cell_type": "code",
  61 |    "execution_count": 2,
  62 |    "id": "daf22d40",
  63 |    "metadata": {},
  64 |    "outputs": [
  65 |     {
  66 |      "data": {
  67 |       "application/vnd.plotly.v1+json": {
  68 |        "config": {
  69 |         "plotlyServerURL": "https://plot.ly"
  70 |        },
  71 |        "data": [
  72 |         {
  73 |          "arrangement": "snap",
  74 |          "link": {
  75 |           "color": [
  76 |            "rgba(220, 220, 220, 0.5)",
  77 |            "rgb(102, 197, 204)",
  78 |            "rgb(102, 197, 204)",
  79 |            "rgb(102, 197, 204)",
  80 |            "rgba(220, 220, 220, 0.5)",
  81 |            "rgb(158, 185, 243)",
  82 |            "rgb(158, 185, 243)",
  83 |            "rgb(158, 185, 243)",
  84 |            "rgba(220, 220, 220, 0.5)",
  85 |            "rgb(254, 136, 177)",
  86 |            "rgb(254, 136, 177)",
  87 |            "rgb(254, 136, 177)",
  88 |            "rgba(220, 220, 220, 0.5)",
  89 |            "rgb(201, 219, 116)",
  90 |            "rgb(201, 219, 116)",
  91 |            "rgb(201, 219, 116)"
  92 |           ],
  93 |           "source": {
  94 |            "bdata": "AAAAAAUFBQUGBgYGBwcHBw==",
  95 |            "dtype": "i1"
  96 |           },
  97 |           "target": {
  98 |            "bdata": "AQIDBAECAwQBAgMEAQIDBA==",
  99 |            "dtype": "i1"
 100 |           },
 101 |           "value": {
 102 |            "bdata": "AAAAAAAQYUAAAAAAAIAlQAAAAAAAoDJAAAAAAACACkAAAAAAAEA9QAAAAAAAABJAAAAAAACAAkAAAAAAAAAIQAAAAAAAABZAAAAAAAAA8D8AAAAAAIAMQAAAAAAAgCJAAAAAAAAAGUAAAAAAAAACQAAAAAAAgBFAAAAAAAAAsD8=",
 103 |            "dtype": "f8"
 104 |           }
 105 |          },
 106 |          "node": {
 107 |           "color": [
 108 |            "rgb(102, 197, 204)",
 109 |            "rgba(200, 200, 200, 0.5)",
 110 |            "rgb(248, 156, 116)",
 111 |            "rgb(220, 176, 242)",
 112 |            "rgb(135, 197, 95)",
 113 |            "rgb(158, 185, 243)",
 114 |            "rgb(254, 136, 177)",
 115 |            "rgb(201, 219, 116)"
 116 |           ],
 117 |           "hovertemplate": "<b>%{label}</b><br>Volume: %{value:.2f}<extra></extra>",
 118 |           "label": [
 119 |            "Binary (F1.1) {195}",
 120 |            "Classification Only {190}",
 121 |            "Description (F2.1) {21}",
 122 |            "Reasoning (F2.2) {40}",
 123 |            "Report (F2.3) {23}",
 124 |            "Multi-Class (F1.2) {61}",
 125 |            "Multi-Label (F1.3) {23}",
 126 |            "Vulnerability-Specific (F1.1.1) {20}"
 127 |           ],
 128 |           "line": {
 129 |            "color": "black",
 130 |            "width": 0.5
 131 |           },
 132 |           "pad": 20,
 133 |           "thickness": 20
 134 |          },
 135 |          "type": "sankey"
 136 |         }
 137 |        ],
 138 |        "layout": {
 139 |         "annotations": [
 140 |          {
 141 |           "align": "center",
 142 |           "font": {
 143 |            "color": "black",
 144 |            "size": 20
 145 |           },
 146 |           "showarrow": false,
 147 |           "text": "Classification (F1)",
 148 |           "x": 0,
 149 |           "xref": "paper",
 150 |           "y": -0.1,
 151 |           "yref": "paper"
 152 |          },
 153 |          {
 154 |           "align": "center",
 155 |           "font": {
 156 |            "color": "black",
 157 |            "size": 20
 158 |           },
 159 |           "showarrow": false,
 160 |           "text": "Generation (F2)",
 161 |           "x": 1,
 162 |           "xref": "paper",
 163 |           "y": -0.1,
 164 |           "yref": "paper"
 165 |          }
 166 |         ],
 167 |         "font": {
 168 |          "color": "black",
 169 |          "family": "Times New Roman, serif",
 170 |          "size": 20
 171 |         },
 172 |         "height": 500,
 173 |         "margin": {
 174 |          "b": 60,
 175 |          "t": 40
 176 |         },
 177 |         "template": {
 178 |          "data": {
 179 |           "bar": [
 180 |            {
 181 |             "error_x": {
 182 |              "color": "#2a3f5f"
 183 |             },
 184 |             "error_y": {
 185 |              "color": "#2a3f5f"
 186 |             },
 187 |             "marker": {
 188 |              "line": {
 189 |               "color": "#E5ECF6",
 190 |               "width": 0.5
 191 |              },
 192 |              "pattern": {
 193 |               "fillmode": "overlay",
 194 |               "size": 10,
 195 |               "solidity": 0.2
 196 |              }
 197 |             },
 198 |             "type": "bar"
 199 |            }
 200 |           ],
 201 |           "barpolar": [
 202 |            {
 203 |             "marker": {
 204 |              "line": {
 205 |               "color": "#E5ECF6",
 206 |               "width": 0.5
 207 |              },
 208 |              "pattern": {
 209 |               "fillmode": "overlay",
 210 |               "size": 10,
 211 |               "solidity": 0.2
 212 |              }
 213 |             },
 214 |             "type": "barpolar"
 215 |            }
 216 |           ],
 217 |           "carpet": [
 218 |            {
 219 |             "aaxis": {
 220 |              "endlinecolor": "#2a3f5f",
 221 |              "gridcolor": "white",
 222 |              "linecolor": "white",
 223 |              "minorgridcolor": "white",
 224 |              "startlinecolor": "#2a3f5f"
 225 |             },
 226 |             "baxis": {
 227 |              "endlinecolor": "#2a3f5f",
 228 |              "gridcolor": "white",
 229 |              "linecolor": "white",
 230 |              "minorgridcolor": "white",
 231 |              "startlinecolor": "#2a3f5f"
 232 |             },
 233 |             "type": "carpet"
 234 |            }
 235 |           ],
 236 |           "choropleth": [
 237 |            {
 238 |             "colorbar": {
 239 |              "outlinewidth": 0,
 240 |              "ticks": ""
 241 |             },
 242 |             "type": "choropleth"
 243 |            }
 244 |           ],
 245 |           "contour": [
 246 |            {
 247 |             "colorbar": {
 248 |              "outlinewidth": 0,
 249 |              "ticks": ""
 250 |             },
 251 |             "colorscale": [
 252 |              [
 253 |               0,
 254 |               "#0d0887"
 255 |              ],
 256 |              [
 257 |               0.1111111111111111,
 258 |               "#46039f"
 259 |              ],
 260 |              [
 261 |               0.2222222222222222,
 262 |               "#7201a8"
 263 |              ],
 264 |              [
 265 |               0.3333333333333333,
 266 |               "#9c179e"
 267 |              ],
 268 |              [
 269 |               0.4444444444444444,
 270 |               "#bd3786"
 271 |              ],
 272 |              [
 273 |               0.5555555555555556,
 274 |               "#d8576b"
 275 |              ],
 276 |              [
 277 |               0.6666666666666666,
 278 |               "#ed7953"
 279 |              ],
 280 |              [
 281 |               0.7777777777777778,
 282 |               "#fb9f3a"
 283 |              ],
 284 |              [
 285 |               0.8888888888888888,
 286 |               "#fdca26"
 287 |              ],
 288 |              [
 289 |               1,
 290 |               "#f0f921"
 291 |              ]
 292 |             ],
 293 |             "type": "contour"
 294 |            }
 295 |           ],
 296 |           "contourcarpet": [
 297 |            {
 298 |             "colorbar": {
 299 |              "outlinewidth": 0,
 300 |              "ticks": ""
 301 |             },
 302 |             "type": "contourcarpet"
 303 |            }
 304 |           ],
 305 |           "heatmap": [
 306 |            {
 307 |             "colorbar": {
 308 |              "outlinewidth": 0,
 309 |              "ticks": ""
 310 |             },
 311 |             "colorscale": [
 312 |              [
 313 |               0,
 314 |               "#0d0887"
 315 |              ],
 316 |              [
 317 |               0.1111111111111111,
 318 |               "#46039f"
 319 |              ],
 320 |              [
 321 |               0.2222222222222222,
 322 |               "#7201a8"
 323 |              ],
 324 |              [
 325 |               0.3333333333333333,
 326 |               "#9c179e"
 327 |              ],
 328 |              [
 329 |               0.4444444444444444,
 330 |               "#bd3786"
 331 |              ],
 332 |              [
 333 |               0.5555555555555556,
 334 |               "#d8576b"
 335 |              ],
 336 |              [
 337 |               0.6666666666666666,
 338 |               "#ed7953"
 339 |              ],
 340 |              [
 341 |               0.7777777777777778,
 342 |               "#fb9f3a"
 343 |              ],
 344 |              [
 345 |               0.8888888888888888,
 346 |               "#fdca26"
 347 |              ],
 348 |              [
 349 |               1,
 350 |               "#f0f921"
 351 |              ]
 352 |             ],
 353 |             "type": "heatmap"
 354 |            }
 355 |           ],
 356 |           "histogram": [
 357 |            {
 358 |             "marker": {
 359 |              "pattern": {
 360 |               "fillmode": "overlay",
 361 |               "size": 10,
 362 |               "solidity": 0.2
 363 |              }
 364 |             },
 365 |             "type": "histogram"
 366 |            }
 367 |           ],
 368 |           "histogram2d": [
 369 |            {
 370 |             "colorbar": {
 371 |              "outlinewidth": 0,
 372 |              "ticks": ""
 373 |             },
 374 |             "colorscale": [
 375 |              [
 376 |               0,
 377 |               "#0d0887"
 378 |              ],
 379 |              [
 380 |               0.1111111111111111,
 381 |               "#46039f"
 382 |              ],
 383 |              [
 384 |               0.2222222222222222,
 385 |               "#7201a8"
 386 |              ],
 387 |              [
 388 |               0.3333333333333333,
 389 |               "#9c179e"
 390 |              ],
 391 |              [
 392 |               0.4444444444444444,
 393 |               "#bd3786"
 394 |              ],
 395 |              [
 396 |               0.5555555555555556,
 397 |               "#d8576b"
 398 |              ],
 399 |              [
 400 |               0.6666666666666666,
 401 |               "#ed7953"
 402 |              ],
 403 |              [
 404 |               0.7777777777777778,
 405 |               "#fb9f3a"
 406 |              ],
 407 |              [
 408 |               0.8888888888888888,
 409 |               "#fdca26"
 410 |              ],
 411 |              [
 412 |               1,
 413 |               "#f0f921"
 414 |              ]
 415 |             ],
 416 |             "type": "histogram2d"
 417 |            }
 418 |           ],
 419 |           "histogram2dcontour": [
 420 |            {
 421 |             "colorbar": {
 422 |              "outlinewidth": 0,
 423 |              "ticks": ""
 424 |             },
 425 |             "colorscale": [
 426 |              [
 427 |               0,
 428 |               "#0d0887"
 429 |              ],
 430 |              [
 431 |               0.1111111111111111,
 432 |               "#46039f"
 433 |              ],
 434 |              [
 435 |               0.2222222222222222,
 436 |               "#7201a8"
 437 |              ],
 438 |              [
 439 |               0.3333333333333333,
 440 |               "#9c179e"
 441 |              ],
 442 |              [
 443 |               0.4444444444444444,
 444 |               "#bd3786"
 445 |              ],
 446 |              [
 447 |               0.5555555555555556,
 448 |               "#d8576b"
 449 |              ],
 450 |              [
 451 |               0.6666666666666666,
 452 |               "#ed7953"
 453 |              ],
 454 |              [
 455 |               0.7777777777777778,
 456 |               "#fb9f3a"
 457 |              ],
 458 |              [
 459 |               0.8888888888888888,
 460 |               "#fdca26"
 461 |              ],
 462 |              [
 463 |               1,
 464 |               "#f0f921"
 465 |              ]
 466 |             ],
 467 |             "type": "histogram2dcontour"
 468 |            }
 469 |           ],
 470 |           "mesh3d": [
 471 |            {
 472 |             "colorbar": {
 473 |              "outlinewidth": 0,
 474 |              "ticks": ""
 475 |             },
 476 |             "type": "mesh3d"
 477 |            }
 478 |           ],
 479 |           "parcoords": [
 480 |            {
 481 |             "line": {
 482 |              "colorbar": {
 483 |               "outlinewidth": 0,
 484 |               "ticks": ""
 485 |              }
 486 |             },
 487 |             "type": "parcoords"
 488 |            }
 489 |           ],
 490 |           "pie": [
 491 |            {
 492 |             "automargin": true,
 493 |             "type": "pie"
 494 |            }
 495 |           ],
 496 |           "scatter": [
 497 |            {
 498 |             "fillpattern": {
 499 |              "fillmode": "overlay",
 500 |              "size": 10,
 501 |              "solidity": 0.2
 502 |             },
 503 |             "type": "scatter"
 504 |            }
 505 |           ],
 506 |           "scatter3d": [
 507 |            {
 508 |             "line": {
 509 |              "colorbar": {
 510 |               "outlinewidth": 0,
 511 |               "ticks": ""
 512 |              }
 513 |             },
 514 |             "marker": {
 515 |              "colorbar": {
 516 |               "outlinewidth": 0,
 517 |               "ticks": ""
 518 |              }
 519 |             },
 520 |             "type": "scatter3d"
 521 |            }
 522 |           ],
 523 |           "scattercarpet": [
 524 |            {
 525 |             "marker": {
 526 |              "colorbar": {
 527 |               "outlinewidth": 0,
 528 |               "ticks": ""
 529 |              }
 530 |             },
 531 |             "type": "scattercarpet"
 532 |            }
 533 |           ],
 534 |           "scattergeo": [
 535 |            {
 536 |             "marker": {
 537 |              "colorbar": {
 538 |               "outlinewidth": 0,
 539 |               "ticks": ""
 540 |              }
 541 |             },
 542 |             "type": "scattergeo"
 543 |            }
 544 |           ],
 545 |           "scattergl": [
 546 |            {
 547 |             "marker": {
 548 |              "colorbar": {
 549 |               "outlinewidth": 0,
 550 |               "ticks": ""
 551 |              }
 552 |             },
 553 |             "type": "scattergl"
 554 |            }
 555 |           ],
 556 |           "scattermap": [
 557 |            {
 558 |             "marker": {
 559 |              "colorbar": {
 560 |               "outlinewidth": 0,
 561 |               "ticks": ""
 562 |              }
 563 |             },
 564 |             "type": "scattermap"
 565 |            }
 566 |           ],
 567 |           "scattermapbox": [
 568 |            {
 569 |             "marker": {
 570 |              "colorbar": {
 571 |               "outlinewidth": 0,
 572 |               "ticks": ""
 573 |              }
 574 |             },
 575 |             "type": "scattermapbox"
 576 |            }
 577 |           ],
 578 |           "scatterpolar": [
 579 |            {
 580 |             "marker": {
 581 |              "colorbar": {
 582 |               "outlinewidth": 0,
 583 |               "ticks": ""
 584 |              }
 585 |             },
 586 |             "type": "scatterpolar"
 587 |            }
 588 |           ],
 589 |           "scatterpolargl": [
 590 |            {
 591 |             "marker": {
 592 |              "colorbar": {
 593 |               "outlinewidth": 0,
 594 |               "ticks": ""
 595 |              }
 596 |             },
 597 |             "type": "scatterpolargl"
 598 |            }
 599 |           ],
 600 |           "scatterternary": [
 601 |            {
 602 |             "marker": {
 603 |              "colorbar": {
 604 |               "outlinewidth": 0,
 605 |               "ticks": ""
 606 |              }
 607 |             },
 608 |             "type": "scatterternary"
 609 |            }
 610 |           ],
 611 |           "surface": [
 612 |            {
 613 |             "colorbar": {
 614 |              "outlinewidth": 0,
 615 |              "ticks": ""
 616 |             },
 617 |             "colorscale": [
 618 |              [
 619 |               0,
 620 |               "#0d0887"
 621 |              ],
 622 |              [
 623 |               0.1111111111111111,
 624 |               "#46039f"
 625 |              ],
 626 |              [
 627 |               0.2222222222222222,
 628 |               "#7201a8"
 629 |              ],
 630 |              [
 631 |               0.3333333333333333,
 632 |               "#9c179e"
 633 |              ],
 634 |              [
 635 |               0.4444444444444444,
 636 |               "#bd3786"
 637 |              ],
 638 |              [
 639 |               0.5555555555555556,
 640 |               "#d8576b"
 641 |              ],
 642 |              [
 643 |               0.6666666666666666,
 644 |               "#ed7953"
 645 |              ],
 646 |              [
 647 |               0.7777777777777778,
 648 |               "#fb9f3a"
 649 |              ],
 650 |              [
 651 |               0.8888888888888888,
 652 |               "#fdca26"
 653 |              ],
 654 |              [
 655 |               1,
 656 |               "#f0f921"
 657 |              ]
 658 |             ],
 659 |             "type": "surface"
 660 |            }
 661 |           ],
 662 |           "table": [
 663 |            {
 664 |             "cells": {
 665 |              "fill": {
 666 |               "color": "#EBF0F8"
 667 |              },
 668 |              "line": {
 669 |               "color": "white"
 670 |              }
 671 |             },
 672 |             "header": {
 673 |              "fill": {
 674 |               "color": "#C8D4E3"
 675 |              },
 676 |              "line": {
 677 |               "color": "white"
 678 |              }
 679 |             },
 680 |             "type": "table"
 681 |            }
 682 |           ]
 683 |          },
 684 |          "layout": {
 685 |           "annotationdefaults": {
 686 |            "arrowcolor": "#2a3f5f",
 687 |            "arrowhead": 0,
 688 |            "arrowwidth": 1
 689 |           },
 690 |           "autotypenumbers": "strict",
 691 |           "coloraxis": {
 692 |            "colorbar": {
 693 |             "outlinewidth": 0,
 694 |             "ticks": ""
 695 |            }
 696 |           },
 697 |           "colorscale": {
 698 |            "diverging": [
 699 |             [
 700 |              0,
 701 |              "#8e0152"
 702 |             ],
 703 |             [
 704 |              0.1,
 705 |              "#c51b7d"
 706 |             ],
 707 |             [
 708 |              0.2,
 709 |              "#de77ae"
 710 |             ],
 711 |             [
 712 |              0.3,
 713 |              "#f1b6da"
 714 |             ],
 715 |             [
 716 |              0.4,
 717 |              "#fde0ef"
 718 |             ],
 719 |             [
 720 |              0.5,
 721 |              "#f7f7f7"
 722 |             ],
 723 |             [
 724 |              0.6,
 725 |              "#e6f5d0"
 726 |             ],
 727 |             [
 728 |              0.7,
 729 |              "#b8e186"
 730 |             ],
 731 |             [
 732 |              0.8,
 733 |              "#7fbc41"
 734 |             ],
 735 |             [
 736 |              0.9,
 737 |              "#4d9221"
 738 |             ],
 739 |             [
 740 |              1,
 741 |              "#276419"
 742 |             ]
 743 |            ],
 744 |            "sequential": [
 745 |             [
 746 |              0,
 747 |              "#0d0887"
 748 |             ],
 749 |             [
 750 |              0.1111111111111111,
 751 |              "#46039f"
 752 |             ],
 753 |             [
 754 |              0.2222222222222222,
 755 |              "#7201a8"
 756 |             ],
 757 |             [
 758 |              0.3333333333333333,
 759 |              "#9c179e"
 760 |             ],
 761 |             [
 762 |              0.4444444444444444,
 763 |              "#bd3786"
 764 |             ],
 765 |             [
 766 |              0.5555555555555556,
 767 |              "#d8576b"
 768 |             ],
 769 |             [
 770 |              0.6666666666666666,
 771 |              "#ed7953"
 772 |             ],
 773 |             [
 774 |              0.7777777777777778,
 775 |              "#fb9f3a"
 776 |             ],
 777 |             [
 778 |              0.8888888888888888,
 779 |              "#fdca26"
 780 |             ],
 781 |             [
 782 |              1,
 783 |              "#f0f921"
 784 |             ]
 785 |            ],
 786 |            "sequentialminus": [
 787 |             [
 788 |              0,
 789 |              "#0d0887"
 790 |             ],
 791 |             [
 792 |              0.1111111111111111,
 793 |              "#46039f"
 794 |             ],
 795 |             [
 796 |              0.2222222222222222,
 797 |              "#7201a8"
 798 |             ],
 799 |             [
 800 |              0.3333333333333333,
 801 |              "#9c179e"
 802 |             ],
 803 |             [
 804 |              0.4444444444444444,
 805 |              "#bd3786"
 806 |             ],
 807 |             [
 808 |              0.5555555555555556,
 809 |              "#d8576b"
 810 |             ],
 811 |             [
 812 |              0.6666666666666666,
 813 |              "#ed7953"
 814 |             ],
 815 |             [
 816 |              0.7777777777777778,
 817 |              "#fb9f3a"
 818 |             ],
 819 |             [
 820 |              0.8888888888888888,
 821 |              "#fdca26"
 822 |             ],
 823 |             [
 824 |              1,
 825 |              "#f0f921"
 826 |             ]
 827 |            ]
 828 |           },
 829 |           "colorway": [
 830 |            "#636efa",
 831 |            "#EF553B",
 832 |            "#00cc96",
 833 |            "#ab63fa",
 834 |            "#FFA15A",
 835 |            "#19d3f3",
 836 |            "#FF6692",
 837 |            "#B6E880",
 838 |            "#FF97FF",
 839 |            "#FECB52"
 840 |           ],
 841 |           "font": {
 842 |            "color": "#2a3f5f"
 843 |           },
 844 |           "geo": {
 845 |            "bgcolor": "white",
 846 |            "lakecolor": "white",
 847 |            "landcolor": "#E5ECF6",
 848 |            "showlakes": true,
 849 |            "showland": true,
 850 |            "subunitcolor": "white"
 851 |           },
 852 |           "hoverlabel": {
 853 |            "align": "left"
 854 |           },
 855 |           "hovermode": "closest",
 856 |           "mapbox": {
 857 |            "style": "light"
 858 |           },
 859 |           "paper_bgcolor": "white",
 860 |           "plot_bgcolor": "#E5ECF6",
 861 |           "polar": {
 862 |            "angularaxis": {
 863 |             "gridcolor": "white",
 864 |             "linecolor": "white",
 865 |             "ticks": ""
 866 |            },
 867 |            "bgcolor": "#E5ECF6",
 868 |            "radialaxis": {
 869 |             "gridcolor": "white",
 870 |             "linecolor": "white",
 871 |             "ticks": ""
 872 |            }
 873 |           },
 874 |           "scene": {
 875 |            "xaxis": {
 876 |             "backgroundcolor": "#E5ECF6",
 877 |             "gridcolor": "white",
 878 |             "gridwidth": 2,
 879 |             "linecolor": "white",
 880 |             "showbackground": true,
 881 |             "ticks": "",
 882 |             "zerolinecolor": "white"
 883 |            },
 884 |            "yaxis": {
 885 |             "backgroundcolor": "#E5ECF6",
 886 |             "gridcolor": "white",
 887 |             "gridwidth": 2,
 888 |             "linecolor": "white",
 889 |             "showbackground": true,
 890 |             "ticks": "",
 891 |             "zerolinecolor": "white"
 892 |            },
 893 |            "zaxis": {
 894 |             "backgroundcolor": "#E5ECF6",
 895 |             "gridcolor": "white",
 896 |             "gridwidth": 2,
 897 |             "linecolor": "white",
 898 |             "showbackground": true,
 899 |             "ticks": "",
 900 |             "zerolinecolor": "white"
 901 |            }
 902 |           },
 903 |           "shapedefaults": {
 904 |            "line": {
 905 |             "color": "#2a3f5f"
 906 |            }
 907 |           },
 908 |           "ternary": {
 909 |            "aaxis": {
 910 |             "gridcolor": "white",
 911 |             "linecolor": "white",
 912 |             "ticks": ""
 913 |            },
 914 |            "baxis": {
 915 |             "gridcolor": "white",
 916 |             "linecolor": "white",
 917 |             "ticks": ""
 918 |            },
 919 |            "bgcolor": "#E5ECF6",
 920 |            "caxis": {
 921 |             "gridcolor": "white",
 922 |             "linecolor": "white",
 923 |             "ticks": ""
 924 |            }
 925 |           },
 926 |           "title": {
 927 |            "x": 0.05
 928 |           },
 929 |           "xaxis": {
 930 |            "automargin": true,
 931 |            "gridcolor": "white",
 932 |            "linecolor": "white",
 933 |            "ticks": "",
 934 |            "title": {
 935 |             "standoff": 15
 936 |            },
 937 |            "zerolinecolor": "white",
 938 |            "zerolinewidth": 2
 939 |           },
 940 |           "yaxis": {
 941 |            "automargin": true,
 942 |            "gridcolor": "white",
 943 |            "linecolor": "white",
 944 |            "ticks": "",
 945 |            "title": {
 946 |             "standoff": 15
 947 |            },
 948 |            "zerolinecolor": "white",
 949 |            "zerolinewidth": 2
 950 |           }
 951 |          }
 952 |         },
 953 |         "width": 1000
 954 |        }
 955 |       }
 956 |      },
 957 |      "metadata": {},
 958 |      "output_type": "display_data"
 959 |     }
 960 |    ],
 961 |    "source": [
 962 |     "# sankey task formulation\n",
 963 |     "# ==========================================\n",
 964 |     "taxonomy_task_df = pd.read_excel(\"./taxonomy.xlsx\", sheet_name=\"STUDY_TASK\")\n",
 965 |     "df = taxonomy_task_df[['CitationKey', 'Classification', 'Generation']].copy()\n",
 966 |     "\n",
 967 |     "df['Classification'] = df['Classification'].fillna('No Classification')\n",
 968 |     "df['Generation'] = df['Generation'].fillna('Classification Only') \n",
 969 |     "\n",
 970 |     "df = df.replace('None', 'No Classification')\n",
 971 |     "df['Generation'] = df['Generation'].replace('No Classification', 'Classification Only')\n",
 972 |     "df = df.replace('nan', 'No Classification')\n",
 973 |     "\n",
 974 |     "# Explode lists\n",
 975 |     "for col in ['Classification', 'Generation']:\n",
 976 |     "    df[col] = df[col].astype(str).str.split(',')\n",
 977 |     "    df = df.explode(col)\n",
 978 |     "    df[col] = df[col].str.strip()\n",
 979 |     "\n",
 980 |     "df = df[df['Classification'] != '']\n",
 981 |     "df = df[df['Generation'] != '']\n",
 982 |     "\n",
 983 |     "# Calculate Weights\n",
 984 |     "df['Class_Count'] = df.groupby('CitationKey')['Classification'].transform('count')\n",
 985 |     "df['Gen_Count'] = df.groupby('CitationKey')['Generation'].transform('count')\n",
 986 |     "df['Weight'] = 1 / (df['Class_Count'] * df['Gen_Count'])\n",
 987 |     "\n",
 988 |     "# Create Edges\n",
 989 |     "edges = df.groupby(['Classification', 'Generation'])['Weight'].sum().reset_index(name='Value')\n",
 990 |     "edges = edges.rename(columns={'Classification': 'Source', 'Generation': 'Target'})\n",
 991 |     "\n",
 992 |     "# Define Node Properties\n",
 993 |     "all_labels = pd.unique(edges[['Source', 'Target']].values.ravel())\n",
 994 |     "nodes_df = pd.DataFrame({'Label': all_labels})\n",
 995 |     "nodes_df['ID'] = nodes_df.index\n",
 996 |     "label_to_id = dict(zip(nodes_df['Label'], nodes_df['ID']))\n",
 997 |     "\n",
 998 |     "edges['SourceID'] = edges['Source'].map(label_to_id)\n",
 999 |     "edges['TargetID'] = edges['Target'].map(label_to_id)\n",
1000 |     "\n",
1001 |     "# ==========================================\n",
1002 |     "residual_labels = ['Classification Only', 'Generation Only', 'No Classification']\n",
1003 |     "palette = pc.qualitative.Pastel \n",
1004 |     "\n",
1005 |     "node_colors = []\n",
1006 |     "link_colors = []\n",
1007 |     "\n",
1008 |     "# Assign Node Colors\n",
1009 |     "for idx, row in nodes_df.iterrows():\n",
1010 |     "    if row['Label'] in residual_labels:\n",
1011 |     "        # Keep residuals gray\n",
1012 |     "        node_colors.append('rgba(200, 200, 200, 0.5)') \n",
1013 |     "    else:\n",
1014 |     "        # Assign color from Pastel palette\n",
1015 |     "        color_idx = idx % len(palette)\n",
1016 |     "        node_colors.append(palette[color_idx])\n",
1017 |     "\n",
1018 |     "# Assign Link Colors\n",
1019 |     "for idx, row in edges.iterrows():\n",
1020 |     "    source_label = row['Source']\n",
1021 |     "    target_label = row['Target']\n",
1022 |     "    \n",
1023 |     "    if source_label in residual_labels or target_label in residual_labels:\n",
1024 |     "        link_colors.append('rgba(220, 220, 220, 0.5)')\n",
1025 |     "    else:\n",
1026 |     "        source_id = label_to_id[source_label]\n",
1027 |     "        base_color = node_colors[source_id]\n",
1028 |     "        if base_color.startswith('#'):\n",
1029 |     "            h = base_color.lstrip('#')\n",
1030 |     "            rgb = tuple(int(h[i:i+2], 16) for i in (0, 2, 4))\n",
1031 |     "            link_colors.append(f'rgba({rgb[0]}, {rgb[1]}, {rgb[2]}, 0.6)')\n",
1032 |     "        else:\n",
1033 |     "            link_colors.append(base_color)\n",
1034 |     "\n",
1035 |     "# ==========================================\n",
1036 |     "# counts\n",
1037 |     "label_to_studies = defaultdict(set)\n",
1038 |     "for idx, row in taxonomy_task_df.iterrows():\n",
1039 |     "    val_c = str(row['Classification'])\n",
1040 |     "    if val_c != 'None' and val_c != 'nan':\n",
1041 |     "        for tag in val_c.split(','):\n",
1042 |     "            label_to_studies[tag.strip()].add(row['CitationKey'])\n",
1043 |     "    else:\n",
1044 |     "        label_to_studies['No Classification'].add(row['CitationKey'])\n",
1045 |     "\n",
1046 |     "    val_g = str(row['Generation'])\n",
1047 |     "    if val_g != 'None' and val_g != 'nan':\n",
1048 |     "        for tag in val_g.split(','):\n",
1049 |     "            label_to_studies[tag.strip()].add(row['CitationKey'])\n",
1050 |     "    else:\n",
1051 |     "        label_to_studies['Classification Only'].add(row['CitationKey'])\n",
1052 |     "\n",
1053 |     "nodes_df['StudyCount'] = nodes_df['Label'].map(lambda x: len(label_to_studies.get(x, set())))\n",
1054 |     "\n",
1055 |     "# --- Taxonomy IDs ---\n",
1056 |     "taxonomy_ids = {\n",
1057 |     "    \"Binary\": \"F1.1\",\n",
1058 |     "    \"Multi-Class\": \"F1.2\",\n",
1059 |     "    \"Multi-Label\": \"F1.3\",\n",
1060 |     "    \"Vulnerability-Specific\": \"F1.1.1\",\n",
1061 |     "    \"Description\": \"F2.1\",\n",
1062 |     "    \"Reasoning\": \"F2.2\",\n",
1063 |     "    \"Report\": \"F2.3\"\n",
1064 |     "}\n",
1065 |     "\n",
1066 |     "# Label Formatter with () and {}\n",
1067 |     "def format_label(row):\n",
1068 |     "    label = row['Label']\n",
1069 |     "    count = row['StudyCount']\n",
1070 |     "    \n",
1071 |     "    if label == 'No Classification':\n",
1072 |     "        return \"\"\n",
1073 |     "    \n",
1074 |     "    tax_id = taxonomy_ids.get(label, \"\")\n",
1075 |     "    if tax_id:\n",
1076 |     "        return f\"{label} ({tax_id}) {{{count}}}\"\n",
1077 |     "    else:\n",
1078 |     "        return f\"{label} {{{count}}}\"\n",
1079 |     "\n",
1080 |     "nodes_df['LabelDisplay'] = nodes_df.apply(format_label, axis=1)\n",
1081 |     "\n",
1082 |     "# Plot\n",
1083 |     "fig = go.Figure(data=[go.Sankey(\n",
1084 |     "    arrangement=\"snap\",\n",
1085 |     "    node=dict(\n",
1086 |     "        pad=20,\n",
1087 |     "        thickness=20,\n",
1088 |     "        line=dict(color=\"black\", width=0.5),\n",
1089 |     "        label=nodes_df['LabelDisplay'],\n",
1090 |     "        color=node_colors,\n",
1091 |     "        hovertemplate='<b>%{label}</b><br>Volume: %{value:.2f}<extra></extra>',\n",
1092 |     "    ),\n",
1093 |     "    link=dict(\n",
1094 |     "        source=edges['SourceID'],\n",
1095 |     "        target=edges['TargetID'],\n",
1096 |     "        value=edges['Value'],\n",
1097 |     "        color=link_colors\n",
1098 |     "    )\n",
1099 |     ")])\n",
1100 |     "\n",
1101 |     "# ==========================================\n",
1102 |     "fig.update_layout(\n",
1103 |     "    font=dict(\n",
1104 |     "        family=\"Times New Roman, serif\", \n",
1105 |     "        size=20,  \n",
1106 |     "        color=\"black\"\n",
1107 |     "    ),\n",
1108 |     "    width=1000,\n",
1109 |     "    height=500,\n",
1110 |     "    margin=dict(b=60, t=40),\n",
1111 |     "    \n",
1112 |     "    annotations=[\n",
1113 |     "        # Left Column Label\n",
1114 |     "        dict(\n",
1115 |     "            x=0,\n",
1116 |     "            y=-0.1,\n",
1117 |     "            xref=\"paper\",\n",
1118 |     "            yref=\"paper\",\n",
1119 |     "            text=\"Classification (F1)\", # Taxonomy in ()\n",
1120 |     "            showarrow=False,\n",
1121 |     "            font=dict(size=20, color=\"black\"), \n",
1122 |     "            align=\"center\"\n",
1123 |     "        ),\n",
1124 |     "        # Right Column Label\n",
1125 |     "        dict(\n",
1126 |     "            x=1,\n",
1127 |     "            y=-0.1,\n",
1128 |     "            xref=\"paper\",\n",
1129 |     "            yref=\"paper\",\n",
1130 |     "            text=\"Generation (F2)\", # Taxonomy in ()\n",
1131 |     "            showarrow=False,\n",
1132 |     "            font=dict(size=20, color=\"black\"),\n",
1133 |     "            align=\"center\"\n",
1134 |     "        )\n",
1135 |     "    ]\n",
1136 |     ")\n",
1137 |     "\n",
1138 |     "fig.show()"
1139 |    ]
1140 |   },
1141 |   {
1142 |    "cell_type": "code",
1143 |    "execution_count": 3,
1144 |    "id": "e4405f7a",
1145 |    "metadata": {},
1146 |    "outputs": [
1147 |     {
1148 |      "data": {
1149 |       "application/vnd.plotly.v1+json": {
1150 |        "config": {
1151 |         "plotlyServerURL": "https://plot.ly"
1152 |        },
1153 |        "data": [
1154 |         {
1155 |          "arrangement": "snap",
1156 |          "link": {
1157 |           "color": [
1158 |            "rgba(220, 176, 242, 0.4)",
1159 |            "rgba(220, 176, 242, 0.4)",
1160 |            "rgba(220, 176, 242, 0.4)",
1161 |            "rgba(248, 156, 116, 0.4)",
1162 |            "rgba(248, 156, 116, 0.4)",
1163 |            "rgba(248, 156, 116, 0.4)",
1164 |            "rgba(248, 156, 116, 0.4)",
1165 |            "rgba(246, 207, 113, 0.4)",
1166 |            "rgba(246, 207, 113, 0.4)",
1167 |            "rgba(246, 207, 113, 0.4)",
1168 |            "rgba(246, 207, 113, 0.4)",
1169 |            "rgba(246, 207, 113, 0.4)",
1170 |            "rgba(102, 197, 204, 0.4)",
1171 |            "rgba(102, 197, 204, 0.4)",
1172 |            "rgba(102, 197, 204, 0.4)",
1173 |            "rgba(102, 197, 204, 0.4)",
1174 |            "rgba(102, 197, 204, 0.4)",
1175 |            "rgba(200, 200, 200, 0.4)",
1176 |            "rgba(135, 197, 95, 0.4)",
1177 |            "rgba(135, 197, 95, 0.4)",
1178 |            "rgba(158, 185, 243, 0.4)",
1179 |            "rgba(158, 185, 243, 0.4)",
1180 |            "rgba(158, 185, 243, 0.4)",
1181 |            "rgba(158, 185, 243, 0.4)",
1182 |            "rgba(158, 185, 243, 0.4)",
1183 |            "rgba(158, 185, 243, 0.4)",
1184 |            "rgba(158, 185, 243, 0.4)",
1185 |            "rgba(200, 200, 200, 0.4)",
1186 |            "rgba(254, 136, 177, 0.4)",
1187 |            "rgba(254, 136, 177, 0.4)",
1188 |            "rgba(254, 136, 177, 0.4)",
1189 |            "rgba(254, 136, 177, 0.4)",
1190 |            "rgba(254, 136, 177, 0.4)"
1191 |           ],
1192 |           "source": [
1193 |            3,
1194 |            3,
1195 |            3,
1196 |            2,
1197 |            2,
1198 |            2,
1199 |            2,
1200 |            1,
1201 |            1,
1202 |            1,
1203 |            1,
1204 |            1,
1205 |            0,
1206 |            0,
1207 |            0,
1208 |            0,
1209 |            0,
1210 |            12,
1211 |            4,
1212 |            4,
1213 |            5,
1214 |            5,
1215 |            5,
1216 |            5,
1217 |            5,
1218 |            5,
1219 |            5,
1220 |            19,
1221 |            6,
1222 |            6,
1223 |            6,
1224 |            6,
1225 |            6
1226 |           ],
1227 |           "target": [
1228 |            4,
1229 |            5,
1230 |            6,
1231 |            12,
1232 |            4,
1233 |            5,
1234 |            6,
1235 |            12,
1236 |            4,
1237 |            5,
1238 |            19,
1239 |            6,
1240 |            12,
1241 |            4,
1242 |            5,
1243 |            19,
1244 |            6,
1245 |            12,
1246 |            14,
1247 |            16,
1248 |            9,
1249 |            10,
1250 |            16,
1251 |            17,
1252 |            18,
1253 |            20,
1254 |            22,
1255 |            19,
1256 |            11,
1257 |            13,
1258 |            15,
1259 |            21,
1260 |            23
1261 |           ],
1262 |           "value": [
1263 |            1.3499999999999999,
1264 |            0.21428571428571427,
1265 |            42.962686127082414,
1266 |            0.75,
1267 |            2.25,
1268 |            1.3766917293233083,
1269 |            30.223142722001082,
1270 |            1.5833333333333333,
1271 |            7.536111111111111,
1272 |            14.904020467836258,
1273 |            0.3333333333333333,
1274 |            13.764757789080544,
1275 |            28,
1276 |            84.49130116959064,
1277 |            11.873214285714285,
1278 |            7,
1279 |            0.41666666666666663,
1280 |            30.833333333333332,
1281 |            97.21769005847953,
1282 |            2.34375,
1283 |            1.5,
1284 |            1,
1285 |            3.488095238095238,
1286 |            4.833333333333333,
1287 |            16.44715956558062,
1288 |            4,
1289 |            2,
1290 |            7.333333333333333,
1291 |            16.536805555555556,
1292 |            11.989499791144528,
1293 |            17.947916666666668,
1294 |            11.614583333333334,
1295 |            33.91449979114453
1296 |           ]
1297 |          },
1298 |          "node": {
1299 |           "color": [
1300 |            "rgb(102, 197, 204)",
1301 |            "rgb(246, 207, 113)",
1302 |            "rgb(248, 156, 116)",
1303 |            "rgb(220, 176, 242)",
1304 |            "rgb(135, 197, 95)",
1305 |            "rgb(158, 185, 243)",
1306 |            "rgb(254, 136, 177)",
1307 |            "lightgrey",
1308 |            "lightgrey",
1309 |            "rgb(158, 185, 243)",
1310 |            "rgb(158, 185, 243)",
1311 |            "rgb(254, 136, 177)",
1312 |            "lightgrey",
1313 |            "rgb(254, 136, 177)",
1314 |            "rgb(135, 197, 95)",
1315 |            "rgb(254, 136, 177)",
1316 |            "rgb(158, 185, 243)",
1317 |            "rgb(158, 185, 243)",
1318 |            "rgb(158, 185, 243)",
1319 |            "lightgrey",
1320 |            "rgb(158, 185, 243)",
1321 |            "rgb(254, 136, 177)",
1322 |            "rgb(158, 185, 243)",
1323 |            "rgb(254, 136, 177)"
1324 |           ],
1325 |           "hovertemplate": "<b>%{label}</b><br>Weighted Volume: %{value:.2f}<extra></extra>",
1326 |           "label": [
1327 |            "Tiny (S1.3.1)",
1328 |            "Small (S1.3.2)",
1329 |            "Medium (S1.3.3)",
1330 |            "Large (S1.3.4)",
1331 |            "Full Fine-Tuning (T2.2.2.1)",
1332 |            "Parameter-Efficient Fine-Tuning (T2.2.2.2)",
1333 |            "Prompt Engineering (T2.1)",
1334 |            "Pre-Training (T2.2.1)",
1335 |            "Feature Extraction (T1)",
1336 |            "Adapter-Tuning {2}",
1337 |            "Additive-Other {1}",
1338 |            "CoT {38}",
1339 |            "Feature Extraction {32}",
1340 |            "Few-Shot {28}",
1341 |            "Full-Parameter {117}",
1342 |            "In-Context {30}",
1343 |            "Instruction-Tuning {13}",
1344 |            "LoRA Derivates {8}",
1345 |            "Low-Rank Decomposition {27}",
1346 |            "Pre-Training {14}",
1347 |            "Prompt-Tuning {5}",
1348 |            "RAG {25}",
1349 |            "Selective {2}",
1350 |            "Zero-Shot {56}"
1351 |           ],
1352 |           "line": {
1353 |            "color": "black",
1354 |            "width": 0.5
1355 |           },
1356 |           "pad": 15,
1357 |           "thickness": 20
1358 |          },
1359 |          "type": "sankey"
1360 |         }
1361 |        ],
1362 |        "layout": {
1363 |         "annotations": [
1364 |          {
1365 |           "align": "center",
1366 |           "font": {
1367 |            "color": "black",
1368 |            "size": 20
1369 |           },
1370 |           "showarrow": false,
1371 |           "text": "Model Scale (S1.3)",
1372 |           "x": 0,
1373 |           "xref": "paper",
1374 |           "y": -0.1,
1375 |           "yref": "paper"
1376 |          },
1377 |          {
1378 |           "align": "center",
1379 |           "font": {
1380 |            "color": "black",
1381 |            "size": 20
1382 |           },
1383 |           "showarrow": false,
1384 |           "text": "Adaptation Technique (T2)",
1385 |           "x": 1,
1386 |           "xref": "paper",
1387 |           "y": -0.1,
1388 |           "yref": "paper"
1389 |          }
1390 |         ],
1391 |         "font": {
1392 |          "color": "black",
1393 |          "family": "Times New Roman, serif",
1394 |          "size": 17
1395 |         },
1396 |         "height": 600,
1397 |         "margin": {
1398 |          "b": 60,
1399 |          "t": 40
1400 |         },
1401 |         "template": {
1402 |          "data": {
1403 |           "bar": [
1404 |            {
1405 |             "error_x": {
1406 |              "color": "#2a3f5f"
1407 |             },
1408 |             "error_y": {
1409 |              "color": "#2a3f5f"
1410 |             },
1411 |             "marker": {
1412 |              "line": {
1413 |               "color": "#E5ECF6",
1414 |               "width": 0.5
1415 |              },
1416 |              "pattern": {
1417 |               "fillmode": "overlay",
1418 |               "size": 10,
1419 |               "solidity": 0.2
1420 |              }
1421 |             },
1422 |             "type": "bar"
1423 |            }
1424 |           ],
1425 |           "barpolar": [
1426 |            {
1427 |             "marker": {
1428 |              "line": {
1429 |               "color": "#E5ECF6",
1430 |               "width": 0.5
1431 |              },
1432 |              "pattern": {
1433 |               "fillmode": "overlay",
1434 |               "size": 10,
1435 |               "solidity": 0.2
1436 |              }
1437 |             },
1438 |             "type": "barpolar"
1439 |            }
1440 |           ],
1441 |           "carpet": [
1442 |            {
1443 |             "aaxis": {
1444 |              "endlinecolor": "#2a3f5f",
1445 |              "gridcolor": "white",
1446 |              "linecolor": "white",
1447 |              "minorgridcolor": "white",
1448 |              "startlinecolor": "#2a3f5f"
1449 |             },
1450 |             "baxis": {
1451 |              "endlinecolor": "#2a3f5f",
1452 |              "gridcolor": "white",
1453 |              "linecolor": "white",
1454 |              "minorgridcolor": "white",
1455 |              "startlinecolor": "#2a3f5f"
1456 |             },
1457 |             "type": "carpet"
1458 |            }
1459 |           ],
1460 |           "choropleth": [
1461 |            {
1462 |             "colorbar": {
1463 |              "outlinewidth": 0,
1464 |              "ticks": ""
1465 |             },
1466 |             "type": "choropleth"
1467 |            }
1468 |           ],
1469 |           "contour": [
1470 |            {
1471 |             "colorbar": {
1472 |              "outlinewidth": 0,
1473 |              "ticks": ""
1474 |             },
1475 |             "colorscale": [
1476 |              [
1477 |               0,
1478 |               "#0d0887"
1479 |              ],
1480 |              [
1481 |               0.1111111111111111,
1482 |               "#46039f"
1483 |              ],
1484 |              [
1485 |               0.2222222222222222,
1486 |               "#7201a8"
1487 |              ],
1488 |              [
1489 |               0.3333333333333333,
1490 |               "#9c179e"
1491 |              ],
1492 |              [
1493 |               0.4444444444444444,
1494 |               "#bd3786"
1495 |              ],
1496 |              [
1497 |               0.5555555555555556,
1498 |               "#d8576b"
1499 |              ],
1500 |              [
1501 |               0.6666666666666666,
1502 |               "#ed7953"
1503 |              ],
1504 |              [
1505 |               0.7777777777777778,
1506 |               "#fb9f3a"
1507 |              ],
1508 |              [
1509 |               0.8888888888888888,
1510 |               "#fdca26"
1511 |              ],
1512 |              [
1513 |               1,
1514 |               "#f0f921"
1515 |              ]
1516 |             ],
1517 |             "type": "contour"
1518 |            }
1519 |           ],
1520 |           "contourcarpet": [
1521 |            {
1522 |             "colorbar": {
1523 |              "outlinewidth": 0,
1524 |              "ticks": ""
1525 |             },
1526 |             "type": "contourcarpet"
1527 |            }
1528 |           ],
1529 |           "heatmap": [
1530 |            {
1531 |             "colorbar": {
1532 |              "outlinewidth": 0,
1533 |              "ticks": ""
1534 |             },
1535 |             "colorscale": [
1536 |              [
1537 |               0,
1538 |               "#0d0887"
1539 |              ],
1540 |              [
1541 |               0.1111111111111111,
1542 |               "#46039f"
1543 |              ],
1544 |              [
1545 |               0.2222222222222222,
1546 |               "#7201a8"
1547 |              ],
1548 |              [
1549 |               0.3333333333333333,
1550 |               "#9c179e"
1551 |              ],
1552 |              [
1553 |               0.4444444444444444,
1554 |               "#bd3786"
1555 |              ],
1556 |              [
1557 |               0.5555555555555556,
1558 |               "#d8576b"
1559 |              ],
1560 |              [
1561 |               0.6666666666666666,
1562 |               "#ed7953"
1563 |              ],
1564 |              [
1565 |               0.7777777777777778,
1566 |               "#fb9f3a"
1567 |              ],
1568 |              [
1569 |               0.8888888888888888,
1570 |               "#fdca26"
1571 |              ],
1572 |              [
1573 |               1,
1574 |               "#f0f921"
1575 |              ]
1576 |             ],
1577 |             "type": "heatmap"
1578 |            }
1579 |           ],
1580 |           "histogram": [
1581 |            {
1582 |             "marker": {
1583 |              "pattern": {
1584 |               "fillmode": "overlay",
1585 |               "size": 10,
1586 |               "solidity": 0.2
1587 |              }
1588 |             },
1589 |             "type": "histogram"
1590 |            }
1591 |           ],
1592 |           "histogram2d": [
1593 |            {
1594 |             "colorbar": {
1595 |              "outlinewidth": 0,
1596 |              "ticks": ""
1597 |             },
1598 |             "colorscale": [
1599 |              [
1600 |               0,
1601 |               "#0d0887"
1602 |              ],
1603 |              [
1604 |               0.1111111111111111,
1605 |               "#46039f"
1606 |              ],
1607 |              [
1608 |               0.2222222222222222,
1609 |               "#7201a8"
1610 |              ],
1611 |              [
1612 |               0.3333333333333333,
1613 |               "#9c179e"
1614 |              ],
1615 |              [
1616 |               0.4444444444444444,
1617 |               "#bd3786"
1618 |              ],
1619 |              [
1620 |               0.5555555555555556,
1621 |               "#d8576b"
1622 |              ],
1623 |              [
1624 |               0.6666666666666666,
1625 |               "#ed7953"
1626 |              ],
1627 |              [
1628 |               0.7777777777777778,
1629 |               "#fb9f3a"
1630 |              ],
1631 |              [
1632 |               0.8888888888888888,
1633 |               "#fdca26"
1634 |              ],
1635 |              [
1636 |               1,
1637 |               "#f0f921"
1638 |              ]
1639 |             ],
1640 |             "type": "histogram2d"
1641 |            }
1642 |           ],
1643 |           "histogram2dcontour": [
1644 |            {
1645 |             "colorbar": {
1646 |              "outlinewidth": 0,
1647 |              "ticks": ""
1648 |             },
1649 |             "colorscale": [
1650 |              [
1651 |               0,
1652 |               "#0d0887"
1653 |              ],
1654 |              [
1655 |               0.1111111111111111,
1656 |               "#46039f"
1657 |              ],
1658 |              [
1659 |               0.2222222222222222,
1660 |               "#7201a8"
1661 |              ],
1662 |              [
1663 |               0.3333333333333333,
1664 |               "#9c179e"
1665 |              ],
1666 |              [
1667 |               0.4444444444444444,
1668 |               "#bd3786"
1669 |              ],
1670 |              [
1671 |               0.5555555555555556,
1672 |               "#d8576b"
1673 |              ],
1674 |              [
1675 |               0.6666666666666666,
1676 |               "#ed7953"
1677 |              ],
1678 |              [
1679 |               0.7777777777777778,
1680 |               "#fb9f3a"
1681 |              ],
1682 |              [
1683 |               0.8888888888888888,
1684 |               "#fdca26"
1685 |              ],
1686 |              [
1687 |               1,
1688 |               "#f0f921"
1689 |              ]
1690 |             ],
1691 |             "type": "histogram2dcontour"
1692 |            }
1693 |           ],
1694 |           "mesh3d": [
1695 |            {
1696 |             "colorbar": {
1697 |              "outlinewidth": 0,
1698 |              "ticks": ""
1699 |             },
1700 |             "type": "mesh3d"
1701 |            }
1702 |           ],
1703 |           "parcoords": [
1704 |            {
1705 |             "line": {
1706 |              "colorbar": {
1707 |               "outlinewidth": 0,
1708 |               "ticks": ""
1709 |              }
1710 |             },
1711 |             "type": "parcoords"
1712 |            }
1713 |           ],
1714 |           "pie": [
1715 |            {
1716 |             "automargin": true,
1717 |             "type": "pie"
1718 |            }
1719 |           ],
1720 |           "scatter": [
1721 |            {
1722 |             "fillpattern": {
1723 |              "fillmode": "overlay",
1724 |              "size": 10,
1725 |              "solidity": 0.2
1726 |             },
1727 |             "type": "scatter"
1728 |            }
1729 |           ],
1730 |           "scatter3d": [
1731 |            {
1732 |             "line": {
1733 |              "colorbar": {
1734 |               "outlinewidth": 0,
1735 |               "ticks": ""
1736 |              }
1737 |             },
1738 |             "marker": {
1739 |              "colorbar": {
1740 |               "outlinewidth": 0,
1741 |               "ticks": ""
1742 |              }
1743 |             },
1744 |             "type": "scatter3d"
1745 |            }
1746 |           ],
1747 |           "scattercarpet": [
1748 |            {
1749 |             "marker": {
1750 |              "colorbar": {
1751 |               "outlinewidth": 0,
1752 |               "ticks": ""
1753 |              }
1754 |             },
1755 |             "type": "scattercarpet"
1756 |            }
1757 |           ],
1758 |           "scattergeo": [
1759 |            {
1760 |             "marker": {
1761 |              "colorbar": {
1762 |               "outlinewidth": 0,
1763 |               "ticks": ""
1764 |              }
1765 |             },
1766 |             "type": "scattergeo"
1767 |            }
1768 |           ],
1769 |           "scattergl": [
1770 |            {
1771 |             "marker": {
1772 |              "colorbar": {
1773 |               "outlinewidth": 0,
1774 |               "ticks": ""
1775 |              }
1776 |             },
1777 |             "type": "scattergl"
1778 |            }
1779 |           ],
1780 |           "scattermap": [
1781 |            {
1782 |             "marker": {
1783 |              "colorbar": {
1784 |               "outlinewidth": 0,
1785 |               "ticks": ""
1786 |              }
1787 |             },
1788 |             "type": "scattermap"
1789 |            }
1790 |           ],
1791 |           "scattermapbox": [
1792 |            {
1793 |             "marker": {
1794 |              "colorbar": {
1795 |               "outlinewidth": 0,
1796 |               "ticks": ""
1797 |              }
1798 |             },
1799 |             "type": "scattermapbox"
1800 |            }
1801 |           ],
1802 |           "scatterpolar": [
1803 |            {
1804 |             "marker": {
1805 |              "colorbar": {
1806 |               "outlinewidth": 0,
1807 |               "ticks": ""
1808 |              }
1809 |             },
1810 |             "type": "scatterpolar"
1811 |            }
1812 |           ],
1813 |           "scatterpolargl": [
1814 |            {
1815 |             "marker": {
1816 |              "colorbar": {
1817 |               "outlinewidth": 0,
1818 |               "ticks": ""
1819 |              }
1820 |             },
1821 |             "type": "scatterpolargl"
1822 |            }
1823 |           ],
1824 |           "scatterternary": [
1825 |            {
1826 |             "marker": {
1827 |              "colorbar": {
1828 |               "outlinewidth": 0,
1829 |               "ticks": ""
1830 |              }
1831 |             },
1832 |             "type": "scatterternary"
1833 |            }
1834 |           ],
1835 |           "surface": [
1836 |            {
1837 |             "colorbar": {
1838 |              "outlinewidth": 0,
1839 |              "ticks": ""
1840 |             },
1841 |             "colorscale": [
1842 |              [
1843 |               0,
1844 |               "#0d0887"
1845 |              ],
1846 |              [
1847 |               0.1111111111111111,
1848 |               "#46039f"
1849 |              ],
1850 |              [
1851 |               0.2222222222222222,
1852 |               "#7201a8"
1853 |              ],
1854 |              [
1855 |               0.3333333333333333,
1856 |               "#9c179e"
1857 |              ],
1858 |              [
1859 |               0.4444444444444444,
1860 |               "#bd3786"
1861 |              ],
1862 |              [
1863 |               0.5555555555555556,
1864 |               "#d8576b"
1865 |              ],
1866 |              [
1867 |               0.6666666666666666,
1868 |               "#ed7953"
1869 |              ],
1870 |              [
1871 |               0.7777777777777778,
1872 |               "#fb9f3a"
1873 |              ],
1874 |              [
1875 |               0.8888888888888888,
1876 |               "#fdca26"
1877 |              ],
1878 |              [
1879 |               1,
1880 |               "#f0f921"
1881 |              ]
1882 |             ],
1883 |             "type": "surface"
1884 |            }
1885 |           ],
1886 |           "table": [
1887 |            {
1888 |             "cells": {
1889 |              "fill": {
1890 |               "color": "#EBF0F8"
1891 |              },
1892 |              "line": {
1893 |               "color": "white"
1894 |              }
1895 |             },
1896 |             "header": {
1897 |              "fill": {
1898 |               "color": "#C8D4E3"
1899 |              },
1900 |              "line": {
1901 |               "color": "white"
1902 |              }
1903 |             },
1904 |             "type": "table"
1905 |            }
1906 |           ]
1907 |          },
1908 |          "layout": {
1909 |           "annotationdefaults": {
1910 |            "arrowcolor": "#2a3f5f",
1911 |            "arrowhead": 0,
1912 |            "arrowwidth": 1
1913 |           },
1914 |           "autotypenumbers": "strict",
1915 |           "coloraxis": {
1916 |            "colorbar": {
1917 |             "outlinewidth": 0,
1918 |             "ticks": ""
1919 |            }
1920 |           },
1921 |           "colorscale": {
1922 |            "diverging": [
1923 |             [
1924 |              0,
1925 |              "#8e0152"
1926 |             ],
1927 |             [
1928 |              0.1,
1929 |              "#c51b7d"
1930 |             ],
1931 |             [
1932 |              0.2,
1933 |              "#de77ae"
1934 |             ],
1935 |             [
1936 |              0.3,
1937 |              "#f1b6da"
1938 |             ],
1939 |             [
1940 |              0.4,
1941 |              "#fde0ef"
1942 |             ],
1943 |             [
1944 |              0.5,
1945 |              "#f7f7f7"
1946 |             ],
1947 |             [
1948 |              0.6,
1949 |              "#e6f5d0"
1950 |             ],
1951 |             [
1952 |              0.7,
1953 |              "#b8e186"
1954 |             ],
1955 |             [
1956 |              0.8,
1957 |              "#7fbc41"
1958 |             ],
1959 |             [
1960 |              0.9,
1961 |              "#4d9221"
1962 |             ],
1963 |             [
1964 |              1,
1965 |              "#276419"
1966 |             ]
1967 |            ],
1968 |            "sequential": [
1969 |             [
1970 |              0,
1971 |              "#0d0887"
1972 |             ],
1973 |             [
1974 |              0.1111111111111111,
1975 |              "#46039f"
1976 |             ],
1977 |             [
1978 |              0.2222222222222222,
1979 |              "#7201a8"
1980 |             ],
1981 |             [
1982 |              0.3333333333333333,
1983 |              "#9c179e"
1984 |             ],
1985 |             [
1986 |              0.4444444444444444,
1987 |              "#bd3786"
1988 |             ],
1989 |             [
1990 |              0.5555555555555556,
1991 |              "#d8576b"
1992 |             ],
1993 |             [
1994 |              0.6666666666666666,
1995 |              "#ed7953"
1996 |             ],
1997 |             [
1998 |              0.7777777777777778,
1999 |              "#fb9f3a"
2000 |             ],
2001 |             [
2002 |              0.8888888888888888,
2003 |              "#fdca26"
2004 |             ],
2005 |             [
2006 |              1,
2007 |              "#f0f921"
2008 |             ]
2009 |            ],
2010 |            "sequentialminus": [
2011 |             [
2012 |              0,
2013 |              "#0d0887"
2014 |             ],
2015 |             [
2016 |              0.1111111111111111,
2017 |              "#46039f"
2018 |             ],
2019 |             [
2020 |              0.2222222222222222,
2021 |              "#7201a8"
2022 |             ],
2023 |             [
2024 |              0.3333333333333333,
2025 |              "#9c179e"
2026 |             ],
2027 |             [
2028 |              0.4444444444444444,
2029 |              "#bd3786"
2030 |             ],
2031 |             [
2032 |              0.5555555555555556,
2033 |              "#d8576b"
2034 |             ],
2035 |             [
2036 |              0.6666666666666666,
2037 |              "#ed7953"
2038 |             ],
2039 |             [
2040 |              0.7777777777777778,
2041 |              "#fb9f3a"
2042 |             ],
2043 |             [
2044 |              0.8888888888888888,
2045 |              "#fdca26"
2046 |             ],
2047 |             [
2048 |              1,
2049 |              "#f0f921"
2050 |             ]
2051 |            ]
2052 |           },
2053 |           "colorway": [
2054 |            "#636efa",
2055 |            "#EF553B",
2056 |            "#00cc96",
2057 |            "#ab63fa",
2058 |            "#FFA15A",
2059 |            "#19d3f3",
2060 |            "#FF6692",
2061 |            "#B6E880",
2062 |            "#FF97FF",
2063 |            "#FECB52"
2064 |           ],
2065 |           "font": {
2066 |            "color": "#2a3f5f"
2067 |           },
2068 |           "geo": {
2069 |            "bgcolor": "white",
2070 |            "lakecolor": "white",
2071 |            "landcolor": "#E5ECF6",
2072 |            "showlakes": true,
2073 |            "showland": true,
2074 |            "subunitcolor": "white"
2075 |           },
2076 |           "hoverlabel": {
2077 |            "align": "left"
2078 |           },
2079 |           "hovermode": "closest",
2080 |           "mapbox": {
2081 |            "style": "light"
2082 |           },
2083 |           "paper_bgcolor": "white",
2084 |           "plot_bgcolor": "#E5ECF6",
2085 |           "polar": {
2086 |            "angularaxis": {
2087 |             "gridcolor": "white",
2088 |             "linecolor": "white",
2089 |             "ticks": ""
2090 |            },
2091 |            "bgcolor": "#E5ECF6",
2092 |            "radialaxis": {
2093 |             "gridcolor": "white",
2094 |             "linecolor": "white",
2095 |             "ticks": ""
2096 |            }
2097 |           },
2098 |           "scene": {
2099 |            "xaxis": {
2100 |             "backgroundcolor": "#E5ECF6",
2101 |             "gridcolor": "white",
2102 |             "gridwidth": 2,
2103 |             "linecolor": "white",
2104 |             "showbackground": true,
2105 |             "ticks": "",
2106 |             "zerolinecolor": "white"
2107 |            },
2108 |            "yaxis": {
2109 |             "backgroundcolor": "#E5ECF6",
2110 |             "gridcolor": "white",
2111 |             "gridwidth": 2,
2112 |             "linecolor": "white",
2113 |             "showbackground": true,
2114 |             "ticks": "",
2115 |             "zerolinecolor": "white"
2116 |            },
2117 |            "zaxis": {
2118 |             "backgroundcolor": "#E5ECF6",
2119 |             "gridcolor": "white",
2120 |             "gridwidth": 2,
2121 |             "linecolor": "white",
2122 |             "showbackground": true,
2123 |             "ticks": "",
2124 |             "zerolinecolor": "white"
2125 |            }
2126 |           },
2127 |           "shapedefaults": {
2128 |            "line": {
2129 |             "color": "#2a3f5f"
2130 |            }
2131 |           },
2132 |           "ternary": {
2133 |            "aaxis": {
2134 |             "gridcolor": "white",
2135 |             "linecolor": "white",
2136 |             "ticks": ""
2137 |            },
2138 |            "baxis": {
2139 |             "gridcolor": "white",
2140 |             "linecolor": "white",
2141 |             "ticks": ""
2142 |            },
2143 |            "bgcolor": "#E5ECF6",
2144 |            "caxis": {
2145 |             "gridcolor": "white",
2146 |             "linecolor": "white",
2147 |             "ticks": ""
2148 |            }
2149 |           },
2150 |           "title": {
2151 |            "x": 0.05
2152 |           },
2153 |           "xaxis": {
2154 |            "automargin": true,
2155 |            "gridcolor": "white",
2156 |            "linecolor": "white",
2157 |            "ticks": "",
2158 |            "title": {
2159 |             "standoff": 15
2160 |            },
2161 |            "zerolinecolor": "white",
2162 |            "zerolinewidth": 2
2163 |           },
2164 |           "yaxis": {
2165 |            "automargin": true,
2166 |            "gridcolor": "white",
2167 |            "linecolor": "white",
2168 |            "ticks": "",
2169 |            "title": {
2170 |             "standoff": 15
2171 |            },
2172 |            "zerolinecolor": "white",
2173 |            "zerolinewidth": 2
2174 |           }
2175 |          }
2176 |         },
2177 |         "width": 1000
2178 |        }
2179 |       }
2180 |      },
2181 |      "metadata": {},
2182 |      "output_type": "display_data"
2183 |     }
2184 |    ],
2185 |    "source": [
2186 |     "# sankey model & adaptation techniques\n",
2187 |     "# ==========================================\n",
2188 |     "df_models = pd.read_excel(\"taxonomy.xlsx\", sheet_name=\"MODELS_ESTIMATED\")\n",
2189 |     "df_study_model = pd.read_excel(\"taxonomy.xlsx\", sheet_name=\"STUDY_MODEL\")\n",
2190 |     "df_techniques = pd.read_excel(\"taxonomy.xlsx\", sheet_name=\"STUDY_TECHNIQUE\")\n",
2191 |     "\n",
2192 |     "\n",
2193 |     "df_study_model['Adaptation'] = df_study_model['Adaptation'].astype(str).str.split(',')\n",
2194 |     "df_study_model = df_study_model.explode('Adaptation')\n",
2195 |     "df_study_model['Adaptation'] = df_study_model['Adaptation'].str.strip()\n",
2196 |     "\n",
2197 |     "merged_models = pd.merge(\n",
2198 |     "    df_study_model[['CitationKey', 'ModelKey', 'Adaptation']],\n",
2199 |     "    df_models[['ModelKey', 'Scale']],\n",
2200 |     "    on='ModelKey',\n",
2201 |     "    how='left'\n",
2202 |     ")\n",
2203 |     "\n",
2204 |     "full_df = pd.merge(\n",
2205 |     "    merged_models,\n",
2206 |     "    df_techniques[['CitationKey', 'Prompt-Engineering', 'Training']],\n",
2207 |     "    on='CitationKey',\n",
2208 |     "    how='left'\n",
2209 |     ")\n",
2210 |     "\n",
2211 |     "# ==========================================\n",
2212 |     "peft_keywords = ['Low-Rank Decomposition', 'LoRA Derivates', 'Adapter-Tuning', 'Selective', 'Additive-Other', 'Prompt-Tuning', 'Instruction-Tuning']\n",
2213 |     "full_keywords = ['Full-Parameter Fine-Tuning', 'Instruction-Tuning']\n",
2214 |     "prompt_keywords = ['CoT', 'Few-Shot', 'RAG', 'In-Context', 'Zero-Shot']\n",
2215 |     "pre_keywords = ['Pre-Training']\n",
2216 |     "\n",
2217 |     "def resolve_technique(row):\n",
2218 |     "    adaptation = str(row['Adaptation']).upper().strip()\n",
2219 |     "    \n",
2220 |     "    if adaptation == 'PROMPT':\n",
2221 |     "        val = str(row['Prompt-Engineering'])\n",
2222 |     "        if val in ['nan', 'None', '']: return [\"Unspecified Prompting\"]\n",
2223 |     "        tags = [x.strip() for x in val.split(',')]\n",
2224 |     "        valid_tags = [t for t in tags if any(k.lower() in t.lower() for k in prompt_keywords)]\n",
2225 |     "        return valid_tags if valid_tags else tags \n",
2226 |     "\n",
2227 |     "    train_val = str(row['Training'])\n",
2228 |     "    if train_val in ['nan', 'None', '']: return [\"Unspecified Training\"]\n",
2229 |     "    tags = [x.strip() for x in train_val.split(',')]\n",
2230 |     "    relevant_techniques = []\n",
2231 |     "\n",
2232 |     "    if adaptation == 'PEFT':\n",
2233 |     "        for tag in tags:\n",
2234 |     "            if any(k.lower() in tag.lower() for k in peft_keywords):\n",
2235 |     "                relevant_techniques.append(tag)\n",
2236 |     "        if not relevant_techniques: relevant_techniques.append(\"Other PEFT\")\n",
2237 |     "\n",
2238 |     "    elif adaptation == 'FULL':\n",
2239 |     "        for tag in tags:\n",
2240 |     "            if any(k.lower() in tag.lower() for k in full_keywords):\n",
2241 |     "                relevant_techniques.append(tag)\n",
2242 |     "        if not relevant_techniques: relevant_techniques.append(\"Other Fine-Tuning\")\n",
2243 |     "        \n",
2244 |     "    elif adaptation == 'PRE':\n",
2245 |     "         for tag in tags:\n",
2246 |     "            if any(k.lower() in tag.lower() for k in pre_keywords):\n",
2247 |     "                relevant_techniques.append(tag)\n",
2248 |     "         if not relevant_techniques: relevant_techniques.append(\"Pre-Training\")\n",
2249 |     "\n",
2250 |     "    elif adaptation == 'FEATURE':\n",
2251 |     "        return [\"Feature Extraction\"]\n",
2252 |     "\n",
2253 |     "    return relevant_techniques\n",
2254 |     "\n",
2255 |     "full_df['Specific_Techniques'] = full_df.apply(resolve_technique, axis=1)\n",
2256 |     "sankey_df = full_df.explode('Specific_Techniques')\n",
2257 |     "sankey_df = sankey_df.dropna(subset=['Specific_Techniques']) \n",
2258 |     "sankey_df = sankey_df[sankey_df['Specific_Techniques'] != \"\"] \n",
2259 |     "\n",
2260 |     "\n",
2261 |     "# ==========================================\n",
2262 |     "def get_method_category(code):\n",
2263 |     "    code = str(code).upper()\n",
2264 |     "    if code == 'PROMPT': return \"Prompt Engineering\"\n",
2265 |     "    if code == 'FULL': return \"Fine-Tuning\" \n",
2266 |     "    if code == 'PEFT': return \"Parameter-Efficient Fine-Tuning\"\n",
2267 |     "    if code == 'PRE': return \"Pre-Training\"\n",
2268 |     "    if code == 'FEATURE': return \"Feature Extraction\"\n",
2269 |     "    return \"Other\"\n",
2270 |     "\n",
2271 |     "sankey_df['Method_Category'] = sankey_df['Adaptation'].apply(get_method_category)\n",
2272 |     "replace_map = {'Full-Parameter Fine-Tuning': 'Full-Parameter'}\n",
2273 |     "sankey_df['Specific_Techniques'] = sankey_df['Specific_Techniques'].replace(replace_map)\n",
2274 |     "sankey_df['Scale'] = sankey_df['Scale'].astype(str).str.strip().str.title()\n",
2275 |     "\n",
2276 |     "# Weights\n",
2277 |     "sankey_df['Study_Row_Count'] = sankey_df.groupby('CitationKey')['CitationKey'].transform('count')\n",
2278 |     "sankey_df['Weight'] = 1 / sankey_df['Study_Row_Count']\n",
2279 |     "\n",
2280 |     "# Unique Counts (Only needed for Level 2 now based on requirements)\n",
2281 |     "unique_counts_lvl2 = sankey_df.groupby('Specific_Techniques')['CitationKey'].nunique()\n",
2282 |     "\n",
2283 |     "\n",
2284 |     "# ==========================================\n",
2285 |     "raw_to_display = {} \n",
2286 |     "scale_ids = {\n",
2287 |     "    \"Tiny\": \"S1.3.1\",\n",
2288 |     "    \"Small\": \"S1.3.2\",\n",
2289 |     "    \"Medium\": \"S1.3.3\",\n",
2290 |     "    \"Large\": \"S1.3.4\"\n",
2291 |     "}\n",
2292 |     "raw_lvl0 = [\"Tiny\", \"Small\", \"Medium\", \"Large\"]\n",
2293 |     "lvl0_labels = []\n",
2294 |     "\n",
2295 |     "for raw in raw_lvl0:\n",
2296 |     "    if raw in sankey_df['Scale'].unique():\n",
2297 |     "        tax_id = scale_ids.get(raw, \"\")\n",
2298 |     "        # Format: \"Tiny (S1.3.1)\"\n",
2299 |     "        final_label = f\"{raw} ({tax_id})\" if tax_id else raw\n",
2300 |     "        lvl0_labels.append(final_label)\n",
2301 |     "        raw_to_display[raw] = final_label\n",
2302 |     "\n",
2303 |     "cat_ids = {\n",
2304 |     "    \"Feature Extraction\": \"T1\",\n",
2305 |     "    \"Pre-Training\": \"T2.2.1\",\n",
2306 |     "    \"Prompt Engineering\": \"T2.1\",\n",
2307 |     "    \"Fine-Tuning\": \"T2.2.2.1\",\n",
2308 |     "    \"Parameter-Efficient Fine-Tuning\": \"T2.2.2.2\"\n",
2309 |     "}\n",
2310 |     "cat_display_names = {\n",
2311 |     "    \"Fine-Tuning\": \"Full Fine-Tuning\"\n",
2312 |     "}\n",
2313 |     "\n",
2314 |     "raw_lvl1 = [\"Fine-Tuning\", \"Parameter-Efficient Fine-Tuning\", \"Prompt Engineering\", \"Pre-Training\", \"Feature Extraction\"]\n",
2315 |     "lvl1_labels = []\n",
2316 |     "existing_cats = sankey_df['Method_Category'].unique()\n",
2317 |     "\n",
2318 |     "for raw in raw_lvl1:\n",
2319 |     "    if raw in existing_cats:\n",
2320 |     "        tax_id = cat_ids.get(raw, \"\")\n",
2321 |     "        disp_name = cat_display_names.get(raw, raw)\n",
2322 |     "        # Format: \"Pre-Training (T2.2.1)\"\n",
2323 |     "        final_label = f\"{disp_name} ({tax_id})\" if tax_id else disp_name\n",
2324 |     "        lvl1_labels.append(final_label)\n",
2325 |     "        raw_to_display[raw] = final_label\n",
2326 |     "\n",
2327 |     "# Specific Techniques\n",
2328 |     "# Format: \"LoRA {25}\" \n",
2329 |     "raw_lvl2 = sorted(sankey_df['Specific_Techniques'].unique().tolist())\n",
2330 |     "lvl2_labels = []\n",
2331 |     "for raw in raw_lvl2:\n",
2332 |     "    count = unique_counts_lvl2.get(raw, 0)\n",
2333 |     "    # Using triple braces {{{ }}} to print literal braces in f-string\n",
2334 |     "    final_label = f\"{raw} {{{count}}}\"\n",
2335 |     "    lvl2_labels.append(final_label)\n",
2336 |     "    raw_to_display[raw] = final_label\n",
2337 |     "\n",
2338 |     "# Combine all\n",
2339 |     "all_labels = lvl0_labels + lvl1_labels + lvl2_labels\n",
2340 |     "label_map = {label: i for i, label in enumerate(all_labels)}\n",
2341 |     "\n",
2342 |     "\n",
2343 |     "# ==========================================\n",
2344 |     "palette = pc.qualitative.Pastel\n",
2345 |     "grey_color = 'lightgrey'\n",
2346 |     "grey_link = 'rgba(200, 200, 200, 0.4)'\n",
2347 |     "grey_cats = ['Pre-Training', 'Feature Extraction', 'Other']\n",
2348 |     "\n",
2349 |     "color_map = {}\n",
2350 |     "palette_idx = 0\n",
2351 |     "\n",
2352 |     "# A. Scales\n",
2353 |     "for raw_name in raw_lvl0:\n",
2354 |     "    if raw_name in raw_to_display:\n",
2355 |     "        color_map[raw_name] = palette[palette_idx % len(palette)]\n",
2356 |     "        palette_idx += 1\n",
2357 |     "\n",
2358 |     "# B. Categories\n",
2359 |     "for raw_name in raw_lvl1:\n",
2360 |     "    if raw_name in raw_to_display:\n",
2361 |     "        if raw_name in grey_cats:\n",
2362 |     "            color_map[raw_name] = grey_color\n",
2363 |     "        else:\n",
2364 |     "            color_map[raw_name] = palette[palette_idx % len(palette)]\n",
2365 |     "            palette_idx += 1\n",
2366 |     "\n",
2367 |     "def hex_to_rgba(hex_code, opacity=0.4):\n",
2368 |     "    if hex_code == 'lightgrey': return grey_link\n",
2369 |     "    if hex_code.startswith('rgb'): return hex_code.replace(')', f', {opacity})').replace('rgb', 'rgba')\n",
2370 |     "    h = hex_code.lstrip('#')\n",
2371 |     "    rgb = tuple(int(h[i:i+2], 16) for i in (0, 2, 4))\n",
2372 |     "    return f\"rgba({rgb[0]}, {rgb[1]}, {rgb[2]}, {opacity})\"\n",
2373 |     "\n",
2374 |     "\n",
2375 |     "# ==========================================\n",
2376 |     "source = []\n",
2377 |     "target = []\n",
2378 |     "value = []\n",
2379 |     "colors = []\n",
2380 |     "\n",
2381 |     "# --- Flow 1: Scale -> Category ---\n",
2382 |     "flow1 = sankey_df.groupby(['Scale', 'Method_Category'])['Weight'].sum().reset_index()\n",
2383 |     "\n",
2384 |     "for _, row in flow1.iterrows():\n",
2385 |     "    scale_raw = row['Scale']\n",
2386 |     "    cat_raw = row['Method_Category']\n",
2387 |     "    \n",
2388 |     "    src = raw_to_display.get(scale_raw)\n",
2389 |     "    tgt = raw_to_display.get(cat_raw)\n",
2390 |     "    \n",
2391 |     "    if src in label_map and tgt in label_map:\n",
2392 |     "        source.append(label_map[src])\n",
2393 |     "        target.append(label_map[tgt])\n",
2394 |     "        value.append(row['Weight'])\n",
2395 |     "        \n",
2396 |     "        # Color based on Scale raw name\n",
2397 |     "        base_color = color_map.get(scale_raw, grey_color)\n",
2398 |     "        colors.append(hex_to_rgba(base_color))\n",
2399 |     "\n",
2400 |     "# --- Flow 2: Category -> Specific ---\n",
2401 |     "flow2 = sankey_df.groupby(['Method_Category', 'Specific_Techniques'])['Weight'].sum().reset_index()\n",
2402 |     "\n",
2403 |     "for _, row in flow2.iterrows():\n",
2404 |     "    cat_raw = row['Method_Category']\n",
2405 |     "    tech_raw = row['Specific_Techniques']\n",
2406 |     "    \n",
2407 |     "    src = raw_to_display.get(cat_raw)\n",
2408 |     "    tgt = raw_to_display.get(tech_raw)\n",
2409 |     "    \n",
2410 |     "    if src in label_map and tgt in label_map:\n",
2411 |     "        source.append(label_map[src])\n",
2412 |     "        target.append(label_map[tgt])\n",
2413 |     "        value.append(row['Weight'])\n",
2414 |     "        \n",
2415 |     "        # Color based on Category raw name\n",
2416 |     "        base_color = color_map.get(cat_raw, grey_color)\n",
2417 |     "        colors.append(hex_to_rgba(base_color))\n",
2418 |     "\n",
2419 |     "\n",
2420 |     "# ==========================================\n",
2421 |     "node_colors = []\n",
2422 |     "# Map specific technique to its parent category raw name\n",
2423 |     "tech_to_cat = pd.Series(sankey_df.Method_Category.values, index=sankey_df.Specific_Techniques).to_dict()\n",
2424 |     "\n",
2425 |     "for l in all_labels:\n",
2426 |     "    final_color = grey_color\n",
2427 |     "    \n",
2428 |     "    # Reverse lookup from raw_to_display\n",
2429 |     "    raw_key = None\n",
2430 |     "    for k, v in raw_to_display.items():\n",
2431 |     "        if v == l:\n",
2432 |     "            raw_key = k\n",
2433 |     "            break\n",
2434 |     "            \n",
2435 |     "    if raw_key:\n",
2436 |     "        # Case A: Scale or Category\n",
2437 |     "        if raw_key in color_map:\n",
2438 |     "            final_color = color_map[raw_key]\n",
2439 |     "        # Case B: Specific Technique (Inherit)\n",
2440 |     "        elif raw_key in tech_to_cat:\n",
2441 |     "            parent_raw = tech_to_cat[raw_key]\n",
2442 |     "            final_color = color_map.get(parent_raw, grey_color)\n",
2443 |     "            \n",
2444 |     "    node_colors.append(final_color)\n",
2445 |     "\n",
2446 |     "# ==========================================\n",
2447 |     "fig = go.Figure(data=[go.Sankey(\n",
2448 |     "    arrangement=\"snap\",\n",
2449 |     "    node=dict(\n",
2450 |     "        pad=15, thickness=20,\n",
2451 |     "        line=dict(color=\"black\", width=0.5),\n",
2452 |     "        label=all_labels,\n",
2453 |     "        color=node_colors,\n",
2454 |     "        hovertemplate='<b>%{label}</b><br>Weighted Volume: %{value:.2f}<extra></extra>'\n",
2455 |     "    ),\n",
2456 |     "    link=dict(\n",
2457 |     "        source=source, target=target, value=value, color=colors\n",
2458 |     "    )\n",
2459 |     ")])\n",
2460 |     "\n",
2461 |     "fig.update_layout(\n",
2462 |     "    # Global Font Settings (mimics plt.rcParams[\"font.family\"] = \"serif\")\n",
2463 |     "    font=dict(\n",
2464 |     "        family=\"Times New Roman, serif\", \n",
2465 |     "        size=17,  \n",
2466 |     "        color=\"black\"\n",
2467 |     "    ),\n",
2468 |     "    width=1000,\n",
2469 |     "    height=600,\n",
2470 |     "    margin=dict(b=60, t=40),\n",
2471 |     "    \n",
2472 |     "    annotations=[\n",
2473 |     "        # Left Column Label\n",
2474 |     "        dict(\n",
2475 |     "            x=0,\n",
2476 |     "            y=-0.1,\n",
2477 |     "            xref=\"paper\",\n",
2478 |     "            yref=\"paper\",\n",
2479 |     "            text=\"Model Scale (S1.3)\",\n",
2480 |     "            showarrow=False,\n",
2481 |     "            font=dict(size=20, color=\"black\"), \n",
2482 |     "            align=\"center\"\n",
2483 |     "        ),\n",
2484 |     "        # Right Column Label\n",
2485 |     "        dict(\n",
2486 |     "            x=1,\n",
2487 |     "            y=-0.1,\n",
2488 |     "            xref=\"paper\",\n",
2489 |     "            yref=\"paper\",\n",
2490 |     "            text=\"Adaptation Technique (T2)\",\n",
2491 |     "            showarrow=False,\n",
2492 |     "            font=dict(size=20, color=\"black\"),\n",
2493 |     "            align=\"center\"\n",
2494 |     "        )\n",
2495 |     "    ]\n",
2496 |     ")\n",
2497 |     "\n",
2498 |     "fig.show()"
2499 |    ]
2500 |   }
2501 |  ],
2502 |  "metadata": {
2503 |   "kernelspec": {
2504 |    "display_name": "vpn_seg",
2505 |    "language": "python",
2506 |    "name": "python3"
2507 |   },
2508 |   "language_info": {
2509 |    "codemirror_mode": {
2510 |     "name": "ipython",
2511 |     "version": 3
2512 |    },
2513 |    "file_extension": ".py",
2514 |    "mimetype": "text/x-python",
2515 |    "name": "python",
2516 |    "nbconvert_exporter": "python",
2517 |    "pygments_lexer": "ipython3",
2518 |    "version": "3.10.16"
2519 |   }
2520 |  },
2521 |  "nbformat": 4,
2522 |  "nbformat_minor": 5
2523 | }
2524 | 


--------------------------------------------------------------------------------