├── .gitattributes
├── images
├── txsviz.jpg
├── addrstats.jpg
├── txsstats.jpg
├── actorvizaddr.jpg
├── actorvizaddrtx.jpg
├── classification.jpg
├── txscaseanalysis.jpg
├── txsfeatureanalysis.jpg
└── actorsfeatureanalysis.jpg
├── Actors Dataset
├── AddrAddr_edgelist.csv
├── AddrTx_edgelist.csv
├── TxAddr_edgelist.csv
├── wallets_classes.csv
├── wallets_features.csv
├── wallets_features_classes_combined.csv
└── README.md
├── Transactions Dataset
├── txs_classes.csv
├── txs_edgelist.csv
├── txs_features.csv
├── README.md
└── Elliptic++_Transactions_Case_Analysis.ipynb
└── README.md
/.gitattributes:
--------------------------------------------------------------------------------
1 | *.csv filter=lfs diff=lfs merge=lfs -text
2 |
--------------------------------------------------------------------------------
/images/txsviz.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/txsviz.jpg
--------------------------------------------------------------------------------
/images/addrstats.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/addrstats.jpg
--------------------------------------------------------------------------------
/images/txsstats.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/txsstats.jpg
--------------------------------------------------------------------------------
/images/actorvizaddr.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/actorvizaddr.jpg
--------------------------------------------------------------------------------
/images/actorvizaddrtx.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/actorvizaddrtx.jpg
--------------------------------------------------------------------------------
/images/classification.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/classification.jpg
--------------------------------------------------------------------------------
/images/txscaseanalysis.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/txscaseanalysis.jpg
--------------------------------------------------------------------------------
/images/txsfeatureanalysis.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/txsfeatureanalysis.jpg
--------------------------------------------------------------------------------
/images/actorsfeatureanalysis.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/git-disl/EllipticPlusPlus/HEAD/images/actorsfeatureanalysis.jpg
--------------------------------------------------------------------------------
/Actors Dataset/AddrAddr_edgelist.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:ffba894458e262a691e5e4d006f5dc1d0e069fabfe828f443fd157bf7f8393f2
3 | size 200631481
4 |
--------------------------------------------------------------------------------
/Actors Dataset/AddrTx_edgelist.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:f5f903f752387f66a1bccaeff54e293b2e8470fcddf5eb56b88aa06fd23a8f3b
3 | size 21248388
4 |
--------------------------------------------------------------------------------
/Actors Dataset/TxAddr_edgelist.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:9f5afbdde7bc3d91fb7a4655be55799d6504cd0063ae55a0753a5a41189932b8
3 | size 36702878
4 |
--------------------------------------------------------------------------------
/Actors Dataset/wallets_classes.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:4e5132c99f941666bf1fefd4100a1428d339c9252ec6987909e1adf8eac902f9
3 | size 30421134
4 |
--------------------------------------------------------------------------------
/Actors Dataset/wallets_features.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:317daca2810c355ddfdb8c0dab34cf11d1aa90567fe975090c7e5a901386eb77
3 | size 606463522
4 |
--------------------------------------------------------------------------------
/Transactions Dataset/txs_classes.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:013a11742969071a906878ded0319571df0657f9b7133e5c6cdb36217bf0d240
3 | size 2361914
4 |
--------------------------------------------------------------------------------
/Transactions Dataset/txs_edgelist.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:a35053ba68a98e4382cae2ba65b9d9e36b23b6439e02dff084971b1b72a5156e
3 | size 4470584
4 |
--------------------------------------------------------------------------------
/Transactions Dataset/txs_features.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:2db326ec8ddb68f1d810c1834e1ff62e0a8300378f0984a1e3b2ca82a439821b
3 | size 694789588
4 |
--------------------------------------------------------------------------------
/Actors Dataset/wallets_features_classes_combined.csv:
--------------------------------------------------------------------------------
1 | version https://git-lfs.github.com/spec/v1
2 | oid sha256:99bf27f7b76d6578ad59e0a61ec225ecd656fef6ae6958f29435ab286be2cc7d
3 | size 609000048
4 |
--------------------------------------------------------------------------------
/Transactions Dataset/README.md:
--------------------------------------------------------------------------------
1 | # Elliptic++ Transactions Dataset: A Graph Network of Bitcoin Blockchain Transactions
2 |
3 | The Elliptic++ transactions dataset consists of 203k Bitcoin transactions to enable the detection of fraudulent transactions in the Bitcoin network by leveraging graph data.
4 |
5 | If you have any questions or create something with this dataset, please let us know by email: [yelmougy3@gatech.edu](mailto:yelmougy3@gatech.edu).
6 |
7 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
8 |
9 | ## Dataset Summary
10 |
11 | | | |
12 | |---|---|
13 | | # Nodes (transactions) | 203,769 |
14 | | # Edges (money flow) | 234,355 |
15 | | # Time steps | 49 |
16 | | # Illicit (class-1) | 4,545 |
17 | | # Licit (class-2) | 42,019 |
18 | | # Unknown (class-3) | 157,205 |
19 | | # Features | 183 |
20 |
21 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
22 |
23 | ## Dataset Tutorials
24 |
25 | We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.
26 |
27 | [`Transactions dataset statistics`](Elliptic++_Transactions_Dataset_Statistics.ipynb) : overall transactions data statistics.
28 |
29 |
30 |
31 |
32 | [`Transactions graph visualization`](Elliptic++_Transactions_Graph_Visualization.ipynb) : visualizations of the Money Flow Transaction graph (tx-tx graph).
33 |
34 |
35 |
36 |
37 | [`Transactions classification`](Elliptic++_Transactions_Classification.ipynb) : model training and classification on the transactions data.
38 |
39 |
40 |
41 |
42 | [`Transactions case analysis`](Elliptic++_Transactions_Case_Analysis.ipynb) : unique case (EASY, HARD, AVERAGE) analysis using the transactions data.
43 |
44 |
45 |
46 |
47 | [`Transactions feature analysis`](Elliptic++_Transactions_Feature_Analysis.ipynb) : feature importance analysis of the transactions data.
48 |
49 |
50 |
51 |
52 |
53 | ## Transactions Dataset Organization
54 |
55 | .
56 | ├── txs_features.csv # Feature data for all transactions
57 | ├── txs_classes.csv # Class data for all transactions
58 | ├── txs_classes.csv # Class data for all transactions
59 | ├── txs_edgelist.csv # Transaction-Transaction graph edgelist
60 | ├── Elliptic++ Transactions Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
61 | ├── Elliptic++ Transactions Graph Visualization.ipynb # Tutorial notebook: transaction-transaction graph visualization
62 | ├── Elliptic++ Transactions Classification.ipynb # Tutorial notebook: model training and classification
63 | ├── Elliptic++ Transactions Case Analysis.ipynb # Tutorial notebook: Unique case (EASY, HARD, AVERAGE) analysis
64 | ├── Elliptic++ Transactions Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
65 | └── README.md
66 |
67 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
68 |
69 | # Citation
70 |
71 | If you use our dataset in your work, please cite [our paper](https://arxiv.org/pdf/2306.06108.pdf). (Pending publication in ACM SIGKDD '23 conference proceedings)
72 |
73 | > Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3580305.3599803
74 |
75 | For a longer version of the paper, please refer to our ArXiv paper: [ArXiv version](https://arxiv.org/pdf/2306.06108.pdf)
76 |
77 | ```
78 | @article{elmougy2023demystifying,
79 | title={Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics},
80 | author={Elmougy, Youssef and Liu, Ling},
81 | journal={arXiv preprint arXiv:2306.06108},
82 | year={2023}
83 | }
84 | ```
85 |
86 | # Acknowledgement
87 |
88 | Released by: [Youssef Elmougy](https://www.yelmougy.com), [Ling Liu](https://www.cc.gatech.edu/home/lingliu/)
89 |
90 | School of Computer Science, Georgia Institute of Technology
91 |
92 |
93 | If you have any questions or create something with this dataset, please let us know by email: [yelmougy3@gatech.edu](mailto:yelmougy3@gatech.edu).
94 |
95 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
96 |
--------------------------------------------------------------------------------
/Actors Dataset/README.md:
--------------------------------------------------------------------------------
1 | # Elliptic++ Actors (Wallet Addresses) Dataset: A Graph Network of Bitcoin Blockchain Wallet Addresses
2 |
3 | The Elliptic++ dataset consists of 822k wallet addresses to enable the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.
4 |
5 | If you have any questions or create something with this dataset, please let us know by email: [yelmougy3@gatech.edu](mailto:yelmougy3@gatech.edu).
6 |
7 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
8 |
9 | ## Dataset Summary
10 |
11 | | | |
12 | |---|---|
13 | | # Wallet addresses | 822,942 |
14 | | # Nodes (temporal interactions) | 1,268,260 |
15 | | # Edges (addr-addr) | 2,868,964 |
16 | | # Edges (addr-tx-addr) | 1,314,241 |
17 | | # Time steps | 49 |
18 | | # Illicit (class-1) | 14,266 |
19 | | # Licit (class-2) | 251,088 |
20 | | # Unknown (class-3) | 557,588 |
21 | | # Features | 56 |
22 |
23 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
24 |
25 | ## Dataset Tutorials
26 |
27 | We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks cover dataset statistics, graph visualization, model training and classification, and feature refinement.
28 |
29 | [`Actors dataset statistics`](Elliptic++_Actors_Dataset_Statistics.ipynb) : overall actors data statistics.
30 |
31 |
32 |
33 |
34 | [`Actors graph visualization (Actor Interaction)`](Elliptic++_Actors_ActorInteraction_Graph_Viz.ipynb) : visualizations of the Actor Interaction graph (addr-addr graph).
35 |
36 |
37 |
38 |
39 | [`Actors graph visualization (Address-Transaction)`](Elliptic++_Actors_AddrTx_Graph_Viz.ipynb) : visualizations of the Address-Transaction graph (addr-tx-addr graph).
40 |
41 |
42 |
43 |
44 | [`Actors classification`](Elliptic++_Actors_Classification.ipynb) : model training and classification on the actors data.
45 |
46 |
47 |
48 |
49 | [`Actors feature analysis`](Elliptic++_Actors_Feature_Analysis.ipynb) : feature importance analysis of the actors data.
50 |
51 |
52 |
53 |
54 |
55 | ## Top-Level Directory Organization
56 |
57 | .
58 | ├── wallets_features.csv # Feature data for all actors
59 | ├── wallets_features.csv # Feature data for all actors
60 | ├── wallets_classes.csv # Class data for all actors
61 | ├── AddrAddr_edgelist.csv # Address-Address graph edgelist
62 | ├── AddrTx_edgelist.csv # Address-Transaction graph edgelist
63 | ├── TxAddr_edgelist.csv # Transaction-Address graph edgelist
64 | ├── Elliptic++ Actors Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
65 | ├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb # Tutorial notebook: address-address graph visualization
66 | ├── Elliptic++ Actors AddrTx Graph Viz.ipynb # Tutorial notebook: address-transaction-address graph visualization
67 | ├── Elliptic++ Actors Classification.ipynb # Tutorial notebook: model training and classification
68 | ├── Elliptic++ Actors Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
69 | └── README.md
70 |
71 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
72 |
73 | # Citation
74 |
75 | If you use our dataset in your work, please cite [our paper](https://arxiv.org/pdf/2306.06108.pdf). (Pending publication in ACM SIGKDD '23 conference proceedings)
76 |
77 | > Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3580305.3599803
78 |
79 | For a longer version of the paper, please refer to our ArXiv paper: [ArXiv version](https://arxiv.org/pdf/2306.06108.pdf)
80 |
81 | ```
82 | @article{elmougy2023demystifying,
83 | title={Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics},
84 | author={Elmougy, Youssef and Liu, Ling},
85 | journal={arXiv preprint arXiv:2306.06108},
86 | year={2023}
87 | }
88 | ```
89 |
90 | # Acknowledgement
91 |
92 | Released by: [Youssef Elmougy](https://www.yelmougy.com), [Ling Liu](https://www.cc.gatech.edu/home/lingliu/)
93 |
94 | School of Computer Science, Georgia Institute of Technology
95 |
96 |
97 | If you have any questions or create something with this dataset, please let us know by email: [yelmougy3@gatech.edu](mailto:yelmougy3@gatech.edu).
98 |
99 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
100 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses
2 |
3 | The Elliptic++ dataset consists of 203k Bitcoin transactions and 822k wallet addresses to enable both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.
4 |
5 | If you have any questions or create something with this dataset, please let us know by email: [yelmougy3@gatech.edu](mailto:yelmougy3@gatech.edu).
6 |
7 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
8 |
9 | ## Dataset Summary
10 |
11 | The Elliptic++ dataset contains a transactions dataset and an actors (wallet addresses) dataset.
12 |
13 | Elliptic++ Transactions Dataset:
14 |
15 | | | |
16 | |---|---|
17 | | # Nodes (transactions) | 203,769 |
18 | | # Edges (money flow) | 234,355 |
19 | | # Time steps | 49 |
20 | | # Illicit (class-1) | 4,545 |
21 | | # Licit (class-2) | 42,019 |
22 | | # Unknown (class-3) | 157,205 |
23 | | # Features | 183 |
24 |
25 | Elliptic++ Actors (Wallet Addresses) Dataset:
26 |
27 | | | |
28 | |---|---|
29 | | # Wallet addresses | 822,942 |
30 | | # Nodes (temporal interactions) | 1,268,260 |
31 | | # Edges (addr-addr) | 2,868,964 |
32 | | # Edges (addr-tx-addr) | 1,314,241 |
33 | | # Time steps | 49 |
34 | | # Illicit (class-1) | 14,266 |
35 | | # Licit (class-2) | 251,088 |
36 | | # Unknown (class-3) | 557,588 |
37 | | # Features | 56 |
38 |
39 | **DATASET CAN BE FOUND HERE: [Google Drive](https://drive.google.com/drive/folders/1MRPXz79Lu_JGLlJ21MDfML44dKN9R08l?usp=sharing)**
40 |
41 | ## Dataset Tutorials
42 |
43 | We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks are available for both datasets and cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.
44 |
45 | [`Transactions dataset statistics`](Transactions%20Dataset/Elliptic++_Transactions_Dataset_Statistics.ipynb) : overall transactions data statistics.
46 |
54 |
55 | [`Transactions graph visualization`](Transactions%20Dataset/Elliptic++_Transactions_Graph_Visualization.ipynb) : visualizations of the Money Flow Transaction graph (tx-tx graph).
56 |
57 |
58 |
59 |
60 | [`Actors graph visualization (Actor Interaction)`](Actors%20Dataset/Elliptic++_Actors_ActorInteraction_Graph_Viz.ipynb) : visualizations of the Actor Interaction graph (addr-addr graph).
61 |
62 |
63 |
64 |
65 | [`Actors graph visualization (Address-Transaction)`](Actors%20Dataset/Elliptic++_Actors_AddrTx_Graph_Viz.ipynb) : visualizations of the Address-Transaction graph (addr-tx-addr graph).
66 |
67 |
68 |
69 |
70 | [`Transactions classification`](Transactions%20Dataset/Elliptic++_Transactions_Classification.ipynb) : model training and classification on the transactions data.
71 |
72 |
73 |
74 |
75 | [`Actors classification`](Actors%20Dataset/Elliptic++_Actors_Classification.ipynb) : model training and classification on the actors data.
76 |
77 |
78 |
79 |
80 |
81 | [`Transactions case analysis`](Transactions%20Dataset/Elliptic++_Transactions_Case_Analysis.ipynb) : unique case (EASY, HARD, AVERAGE) analysis using the transactions data.
82 |
83 |
84 |
85 |
86 |
87 | [`Transactions feature analysis`](Transactions%20Dataset/Elliptic++_Transactions_Feature_Analysis.ipynb) : feature importance analysis of the transactions data.
88 |
89 |
90 |
91 |
92 | [`Actors feature analysis`](Actors%20Dataset/Elliptic++_Actors_Feature_Analysis.ipynb) : feature importance analysis of the actors data.
93 |