├── README.md ├── test.pkl ├── train.pkl └── valid.pkl /README.md: -------------------------------------------------------------------------------- 1 | # CodeCMR 2 | 3 | CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching (NeurIPS-2020) 4 | 5 | ## Dependencies 6 | 7 | - pandas=0.25.1 8 | - networkx=2.3 9 | 10 | ## Dataset description 11 | 12 | Trainset: 30,000 13 | Validset: 10,000 14 | Testset: 10,000 15 | 16 | Each dataset has 33 columns, the first column is the source code, the other columns are the corresponding binary code on 32 combinations of different compilers (gcc/clang), different platforms (x86/x64/arm/arm64) and different optimizations (O0/O1/O2/O3). 17 | Please first download the data from [google cloud](https://drive.google.com/file/d/1Ep74iF6oidV3zmDrAzi1Q4RIwKugpzG4/view?usp=sharing) and uncompress it: 18 | ``` 19 | 7z x all-arch-nx.zip 20 | ``` 21 | 22 | ## How to load data 23 | 24 | import pandas as pd 25 | import networkx as nx 26 | 27 | df = pd.read_pickle('test.pkl') 28 | print(df.columns) # 33 columns 29 | 30 | sample = df.iloc[0] 31 | src, bin = sample['c_label'], sample['gcc-x64-O0'] 32 | 33 | print(src) # character-level source code 34 | g = nx.read_gpickle(bin) 35 | print(g.graph) # binary code literal features, we only use c_int and c_str 36 | print(g.nodes.data('feat')) # binary code CFG features 37 | 38 | 39 | -------------------------------------------------------------------------------- /test.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binaryai/CodeCMR/32e39c3f6fd85c0e5ca8810888d0045a15fa2b34/test.pkl -------------------------------------------------------------------------------- /train.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binaryai/CodeCMR/32e39c3f6fd85c0e5ca8810888d0045a15fa2b34/train.pkl -------------------------------------------------------------------------------- /valid.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binaryai/CodeCMR/32e39c3f6fd85c0e5ca8810888d0045a15fa2b34/valid.pkl --------------------------------------------------------------------------------