├── .gitignore
├── LICENSE
├── README.md
├── bench
├── dlrm_s_benchmark.sh
├── dlrm_s_criteo_kaggle.sh
├── dlrm_s_criteo_kaggle_C1.sh
├── dlrm_s_criteo_kaggle_C1_C2.sh
├── dlrm_s_criteo_kaggle_C1_C2_C3.sh
├── dlrm_s_criteo_kaggle_lock_gpu_C1.sh
├── dlrm_s_criteo_terabyte.sh
└── run_and_time.sh
├── cache_algo
├── EvLFU_C1.py
├── EvLFU_C1_Cython
│ ├── EvLFU.cpp
│ ├── EvLFU.cpython-36m-x86_64-linux-gnu.so
│ ├── EvLFU.pyx
│ ├── evlfu.hpp
│ ├── script.sh
│ ├── setup_EvLFU.py
│ └── test.py
├── LFU.py
├── LRU.py
├── cpp_socket_client.py
└── old_versions
│ ├── EvLFU4DLRM_C2.py
│ ├── EvLFU_C1_apprx_emb.py
│ ├── EvLFU_C1_sets.py
│ ├── EvLFU_C1_v0.py
│ ├── EvLFU_C1_v1.py
│ ├── LFU_v0.py
│ ├── LFU_v1.py
│ ├── LFU_v2.py
│ └── LRU_v0.py
├── cython
├── cython_compile.py
└── cython_criteo.py
├── data_utils.py
├── dlrm_data_pytorch.py
├── dlrm_s_pytorch.py
├── dlrm_s_pytorch_C1.py
├── dlrm_s_pytorch_C1_C2.py
├── dlrm_s_pytorch_C1_C2_C3.py
├── dlrm_s_pytorch_lock_gpu_C1.py
├── emb_storage
├── file_read.py
├── mmap_file_read.py
├── multi_storage_dummy
│ └── socket-server.py
├── storage_dummy.py
├── storage_manager.py
├── storage_rocksdb.py
├── storage_rocksdb_26_tabs.py
├── storage_sqlite.py
└── storage_sqlite_26_tabs.py
├── evstore_utils.py
├── experiments.md
├── extend_distributed.py
├── input
├── .gitignore
└── readme.txt
├── logs
├── .gitignore
├── sample-inference-criteo_kaggle_5mil.txt
├── sample-inference-criteo_kaggle_all.txt
└── sample-train-criteo_kaggle_5mil.txt
├── misc
├── README.txt
├── dlrm_data_caffe2.py
├── dlrm_s_caffe2.py
├── mixed_precs_caching_v0
│ ├── .gitignore
│ ├── cache_manager.cpp
│ ├── cache_manager.hpp
│ ├── dlrm_client.py
│ ├── evlfu_16.cpp
│ ├── evlfu_16.hpp
│ ├── evlfu_32.cpp
│ ├── evlfu_32.hpp
│ ├── evlfu_4.cpp
│ ├── evlfu_4.hpp
│ ├── evlfu_8.cpp
│ ├── evlfu_8.hpp
│ ├── readme.txt
│ └── test.cpp
└── testing_tensor_cpp
│ ├── CMakeLists.txt
│ ├── evlfu_tensor.cpp
│ ├── evlfu_tensor.hpp
│ └── sample_client.py
├── mixed_precs_caching
├── .gitignore
├── aprx_embedding.cpp
├── aprx_embedding.hpp
├── cache_manager.cpp
├── cache_manager.hpp
├── dlrm_client.py
├── evlfu_16.cpp
├── evlfu_16.hpp
├── evlfu_32.cpp
├── evlfu_32.hpp
├── evlfu_4.cpp
├── evlfu_4.hpp
├── evlfu_8.cpp
├── evlfu_8.hpp
├── lib
│ └── .gitignore
├── readme.txt
├── test.cpp
└── test.py
├── mlperf_logger.py
├── optim
└── rwsadagrad.py
├── script
├── apply_ev_preconditioning.py
├── approximate_embedding
│ └── phase2_similarity_analysis
│ │ ├── README.txt
│ │ ├── csvReader.py
│ │ ├── get_neighbors_CPU_slow.ipynb
│ │ ├── get_neighbors_GPU.ipynb
│ │ ├── most_popular_neighbor.ipynb
│ │ └── rankedWorkload.csv
├── compress_folder_for_github.sh
├── convert_altkeys_to_binary.py
├── convert_ev_to_binary.py
├── data_loader_terabyte.py
├── dissectingmodel.py
├── free_page_cache.sh
├── gnuplot_cdf_direct_io.plt
├── gnuplot_cdf_evlfu_lru.plt
├── gnuplot_cdf_multi_line.plt
├── gnuplot_graph
│ └── cdf_2_line.plt
├── modify_param.py
├── mount_cham_obj_stor.sh
├── plot_cdf.py
├── read_cham_obj_stor.sh
├── reduce_precision.py
├── uncompress_folder_for_github.sh
└── wget_evstore_dataset.sh
├── stored_model
└── .gitignore
├── test
└── dlrm_s_test.sh
├── tools
└── visualize.py
└── tricks
├── md_embedding_bag.py
└── qr_embedding_bag.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.log
2 | logs-old/
3 | !logs
4 | *__pycache__
5 | */__pycache__
6 | */*__pycache__
7 | */*/__pycache__
8 | *.out
9 | run_kaggle_pt
10 | model.pth
11 | *.DS_Store
12 | */.DS_Store
13 | */*.DS_Store
14 | */*/.DS_Store
15 | *.ipynb_checkpoints
16 | */.ipynb_checkpoints
17 | */*.ipynb_checkpoints
18 | */*/.ipynb_checkpoints
19 | file_to_download.txt
20 | index.html
21 | test.txt
22 | out.txt
23 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://www.gnu.org/licenses/old-licenses/gpl-3.0.en.html)
2 | [](https://shields.io/)
3 |
4 | ```
5 | _______ ______ _
6 | | ____\ \ / / ___|| |_ ___ _ __ ___
7 | | _| \ \ / /\___ \| __/ _ \| '__/ _ \
8 | | |___ \ V / ___) | || (_) | | | __/
9 | |_____| \_/ |____/ \__\___/|_| \___| -- Groupability-aware caching systems for DRS
10 |
11 | ```
12 |
13 | This repository contains the implementation code for paper:
14 | **EVSTORE: Storage and Caching Capabilities for Scaling
15 | Embedding Tables in Deep Recommendation Systems**
16 |
17 | Contact Information
18 | --------------------
19 |
20 | **Maintainer**: [Daniar H. Kurniawan](https://people.cs.uchicago.edu/~daniar/), Email: ``daniar@uchicago.edu``
21 |
22 | [//]: <> (**Daniar is on the job market.** Please contact him if you have an opening for an AIOps and ML-Sys engineer role!)
23 |
24 | Feel free to contact Daniar for any suggestions/feedback, bug
25 | reports, or general discussions.
26 |
27 | Please consider citing our EVStore paper at ASPLOS 2023 if you use EVStore. The bib
28 | entry is
29 |
30 | ```
31 | @InProceedings{Daniar-EVStore,
32 | Author = {Daniar H. Kurniawan and Ruipu Wang and Kahfi S. Zulkifli and Fandi A. Wiranata and John Bent and Ymir Vigfusson and Haryadi S. Gunawi},
33 | Title = "EVSTORE: Storage and Caching Capabilities for Scaling
34 | Embedding Tables in Deep Recommendation Systems",
35 | Booktitle = {Proceedings of the 28th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
36 | Address = {Vancouver, Canada},
37 | Month = {MARCH},
38 | Year = {2023}
39 | }
40 | ```
41 |
42 | Run EVStore
43 | -----------
44 |
45 | Please follow the experiments detailed in [Experiments.md](experiments.md).
46 |
47 |
48 | ### Acknowledgement ###
49 |
50 | The DLRM code in this repository is based on [Facebook DLRM](https://github.com/facebookresearch/dlrm).
51 | The cache benchmark repository is based on [Cache2k](https://github.com/cache2k/cache2k) and [Cacheus](https://github.com/sylab/cacheus/).
52 |
--------------------------------------------------------------------------------
/bench/dlrm_s_benchmark.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 |
7 | #check if extra argument is passed to the test
8 | if [[ $# == 1 ]]; then
9 | dlrm_extra_option=$1
10 | else
11 | dlrm_extra_option=""
12 | fi
13 | #echo $dlrm_extra_option
14 |
15 | cpu=1
16 | gpu=1
17 | pt=1
18 | c2=1
19 |
20 | ncores=28 #12 #6
21 | nsockets="0"
22 |
23 | ngpus="1 2 4 8"
24 |
25 | numa_cmd="numactl --physcpubind=0-$((ncores-1)) -m $nsockets" #run on one socket, without HT
26 | dlrm_pt_bin="python dlrm_s_pytorch.py"
27 | dlrm_c2_bin="python dlrm_s_caffe2.py"
28 |
29 | data=random #synthetic
30 | print_freq=100
31 | rand_seed=727
32 |
33 | c2_net="async_scheduling"
34 |
35 | #Model param
36 | mb_size=2048 #1024 #512 #256
37 | nbatches=1000 #500 #100
38 | bot_mlp="512-512-64"
39 | top_mlp="1024-1024-1024-1"
40 | emb_size=64
41 | nindices=100
42 | emb="1000000-1000000-1000000-1000000-1000000-1000000-1000000-1000000"
43 | interaction="dot"
44 | tnworkers=0
45 | tmb_size=16384
46 |
47 | #_args="--mini-batch-size="${mb_size}\
48 | _args=" --num-batches="${nbatches}\
49 | " --data-generation="${data}\
50 | " --arch-mlp-bot="${bot_mlp}\
51 | " --arch-mlp-top="${top_mlp}\
52 | " --arch-sparse-feature-size="${emb_size}\
53 | " --arch-embedding-size="${emb}\
54 | " --num-indices-per-lookup="${nindices}\
55 | " --arch-interaction-op="${interaction}\
56 | " --numpy-rand-seed="${rand_seed}\
57 | " --print-freq="${print_freq}\
58 | " --print-time"\
59 | " --enable-profiling "
60 |
61 | c2_args=" --caffe2-net-type="${c2_net}
62 |
63 |
64 | # CPU Benchmarking
65 | if [ $cpu = 1 ]; then
66 | echo "--------------------------------------------"
67 | echo "CPU Benchmarking - running on $ncores cores"
68 | echo "--------------------------------------------"
69 | if [ $pt = 1 ]; then
70 | outf="model1_CPU_PT_$ncores.log"
71 | outp="dlrm_s_pytorch.prof"
72 | echo "-------------------------------"
73 | echo "Running PT (log file: $outf)"
74 | echo "-------------------------------"
75 | cmd="$numa_cmd $dlrm_pt_bin --mini-batch-size=$mb_size --test-mini-batch-size=$tmb_size --test-num-workers=$tnworkers $_args $dlrm_extra_option > $outf"
76 | echo $cmd
77 | eval $cmd
78 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
79 | echo "Min time per iteration = $min"
80 | # move profiling file(s)
81 | mv $outp ${outf//".log"/".prof"}
82 | mv ${outp//".prof"/".json"} ${outf//".log"/".json"}
83 |
84 | fi
85 | if [ $c2 = 1 ]; then
86 | outf="model1_CPU_C2_$ncores.log"
87 | outp="dlrm_s_caffe2.prof"
88 | echo "-------------------------------"
89 | echo "Running C2 (log file: $outf)"
90 | echo "-------------------------------"
91 | cmd="$numa_cmd $dlrm_c2_bin --mini-batch-size=$mb_size $_args $c2_args $dlrm_extra_option 1> $outf 2> $outp"
92 | echo $cmd
93 | eval $cmd
94 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
95 | echo "Min time per iteration = $min"
96 | # move profiling file (collected from stderr above)
97 | mv $outp ${outf//".log"/".prof"}
98 | fi
99 | fi
100 |
101 | # GPU Benchmarking
102 | if [ $gpu = 1 ]; then
103 | echo "--------------------------------------------"
104 | echo "GPU Benchmarking - running on $ngpus GPUs"
105 | echo "--------------------------------------------"
106 | for _ng in $ngpus
107 | do
108 | # weak scaling
109 | # _mb_size=$((mb_size*_ng))
110 | # strong scaling
111 | _mb_size=$((mb_size*1))
112 | _gpus=$(seq -s, 0 $((_ng-1)))
113 | cuda_arg="CUDA_VISIBLE_DEVICES=$_gpus"
114 | echo "-------------------"
115 | echo "Using GPUS: "$_gpus
116 | echo "-------------------"
117 | if [ $pt = 1 ]; then
118 | outf="model1_GPU_PT_$_ng.log"
119 | outp="dlrm_s_pytorch.prof"
120 | echo "-------------------------------"
121 | echo "Running PT (log file: $outf)"
122 | echo "-------------------------------"
123 | cmd="$cuda_arg $dlrm_pt_bin --mini-batch-size=$_mb_size --test-mini-batch-size=$tmb_size --test-num-workers=$tnworkers $_args --use-gpu $dlrm_extra_option > $outf"
124 | echo $cmd
125 | eval $cmd
126 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
127 | echo "Min time per iteration = $min"
128 | # move profiling file(s)
129 | mv $outp ${outf//".log"/".prof"}
130 | mv ${outp//".prof"/".json"} ${outf//".log"/".json"}
131 | fi
132 | if [ $c2 = 1 ]; then
133 | outf="model1_GPU_C2_$_ng.log"
134 | outp="dlrm_s_caffe2.prof"
135 | echo "-------------------------------"
136 | echo "Running C2 (log file: $outf)"
137 | echo "-------------------------------"
138 | cmd="$cuda_arg $dlrm_c2_bin --mini-batch-size=$_mb_size $_args $c2_args --use-gpu $dlrm_extra_option 1> $outf 2> $outp"
139 | echo $cmd
140 | eval $cmd
141 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}')
142 | echo "Min time per iteration = $min"
143 | # move profiling file (collected from stderr above)
144 | mv $outp ${outf//".log"/".prof"}
145 | fi
146 | done
147 | fi
148 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_kaggle.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | #check if extra argument is passed to the test
10 | if [[ $# == 1 ]]; then
11 | dlrm_extra_option=$1
12 | else
13 | dlrm_extra_option=""
14 | fi
15 | #echo $dlrm_extra_option
16 |
17 | dlrm_pt_bin="python3 dlrm_s_pytorch.py" # python -u : so that the tqdm output will be on terminal
18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py"
19 |
20 | echo "run pytorch ..."
21 | # WARNING: the following parameters will be set based on the data set
22 | # --arch-embedding-size=... (sparse feature sizes)
23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1
25 |
26 | # echo "run caffe2 ..."
27 | # WARNING: the following parameters will be set based on the data set
28 | # --arch-embedding-size=... (sparse feature sizes)
29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log
31 |
32 | echo "finished!"
33 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_kaggle_C1.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | #check if extra argument is passed to the test
10 | if [[ $# == 1 ]]; then
11 | dlrm_extra_option=$1
12 | else
13 | dlrm_extra_option=""
14 | fi
15 | # echo $dlrm_extra_option
16 |
17 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1.py"
18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py"
19 |
20 | echo "run pytorch C1 ..."
21 | # WARNING: the following parameters will be set based on the data set
22 | # --arch-embedding-size=... (sparse feature sizes)
23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1
25 |
26 | # echo "run caffe2 ..."
27 | # WARNING: the following parameters will be set based on the data set
28 | # --arch-embedding-size=... (sparse feature sizes)
29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log
31 |
32 | echo "finished!"
33 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_kaggle_C1_C2.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | CURR_DIR=`pwd`
10 |
11 | # check if the command contains "cpp_algo_socket"
12 | if [[ $1 == *"cpp_algo_socket"* ]]; then
13 | # using socket interface
14 | echo "The CPP caching layer is started by the script below ..."
15 | echo "Will use SOCKET as the interface"
16 |
17 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching
18 | g++ -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; ./a.out &
19 | else
20 | # Each experiment might have different cacheSize, thus we recompile it
21 | echo "Compile the C++ shared library ... "
22 | echo "Will use Ctypes as the interface"
23 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching
24 | g++ -shared -o libcachemanager.so -fPIC -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; mv *.so lib/
25 | echo "C++ shared library (*.so) is updated!"
26 |
27 | # check if this DLRM deployment wants to use specific libcachemanager naming [To enbale multi DLRM deployment]
28 | if [ -z "$2" ]; then
29 | echo "No need to rename the .so"
30 | else
31 | echo "COPY lib/libcachemanager.so -> lib/$2" # will be use by Ctypes!
32 | cp lib/libcachemanager.so lib/$2
33 | fi
34 | fi
35 |
36 | cd $CURR_DIR
37 | #check if extra argument is passed to the test
38 | if [ -z "$1" ]; then
39 | dlrm_extra_option=""
40 | else
41 | dlrm_extra_option=$1
42 | fi
43 | # echo $dlrm_extra_option
44 |
45 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1_C2.py"
46 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py"
47 |
48 | echo "run pytorch C1_C2 ..."
49 | # WARNING: the following parameters will be set based on the data set
50 | # --arch-embedding-size=... (sparse feature sizes)
51 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
52 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1
53 |
54 | # echo "run caffe2 ..."
55 | # WARNING: the following parameters will be set based on the data set
56 | # --arch-embedding-size=... (sparse feature sizes)
57 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
58 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log
59 |
60 | echo "finished!"
61 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_kaggle_C1_C2_C3.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | CURR_DIR=`pwd`
10 |
11 | # check if the command contains "cpp_algo_socket"
12 | if [[ $1 == *"cpp_algo_socket"* ]]; then
13 | # using socket interface
14 | echo "The CPP caching layer is started by the script below ..."
15 | echo "Will use SOCKET as the interface"
16 |
17 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching
18 | g++ -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; ./a.out &
19 | else
20 | # Each experiment might have different cacheSize, thus we recompile it
21 | echo "Compile the C++ shared library ... "
22 | echo "Will use Ctypes as the interface"
23 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching
24 | g++ -shared -o libcachemanager.so -fPIC -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; mv *.so lib/
25 | echo "C++ shared library (*.so) is updated!"
26 |
27 | # check if this DLRM deployment wants to use specific libcachemanager naming [To enbale multi DLRM deployment]
28 | if [ -z "$2" ]; then
29 | echo "No need to rename the .so"
30 | else
31 | echo "COPY lib/libcachemanager.so -> lib/$2" # will be use by Ctypes!
32 | cp lib/libcachemanager.so lib/$2
33 | fi
34 | fi
35 |
36 | cd $CURR_DIR
37 | #check if extra argument is passed to the test
38 | if [ -z "$1" ]; then
39 | dlrm_extra_option=""
40 | else
41 | dlrm_extra_option=$1
42 | fi
43 |
44 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1_C2_C3.py"
45 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py"
46 |
47 | echo "run pytorch C1_C2_C3 ..."
48 | # WARNING: the following parameters will be set based on the data set
49 | # --arch-embedding-size=... (sparse feature sizes)
50 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
51 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1
52 |
53 | # echo "run caffe2 ..."
54 | # WARNING: the following parameters will be set based on the data set
55 | # --arch-embedding-size=... (sparse feature sizes)
56 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
57 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log
58 |
59 | echo "finished!"
60 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_kaggle_lock_gpu_C1.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | #check if extra argument is passed to the test
10 | if [[ $# == 1 ]]; then
11 | dlrm_extra_option=$1
12 | else
13 | dlrm_extra_option=""
14 | fi
15 | # echo $dlrm_extra_option
16 |
17 | dlrm_pt_bin="python3 dlrm_s_pytorch_lock_gpu_C1.py"
18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py"
19 |
20 | echo "run pytorch C1 ..."
21 | # WARNING: the following parameters will be set based on the data set
22 | # --arch-embedding-size=... (sparse feature sizes)
23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1
25 |
26 | # echo "run caffe2 ..."
27 | # WARNING: the following parameters will be set based on the data set
28 | # --arch-embedding-size=... (sparse feature sizes)
29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log
31 |
32 | echo "finished!"
33 |
--------------------------------------------------------------------------------
/bench/dlrm_s_criteo_terabyte.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | #check if extra argument is passed to the test
10 | if [[ $# == 1 ]]; then
11 | dlrm_extra_option=$1
12 | else
13 | dlrm_extra_option=""
14 | fi
15 | #echo $dlrm_extra_option
16 |
17 | dlrm_pt_bin="python dlrm_s_pytorch.py"
18 | dlrm_c2_bin="python dlrm_s_caffe2.py"
19 |
20 | echo "run pytorch ..."
21 | # WARNING: the following parameters will be set based on the data set
22 | # --arch-embedding-size=... (sparse feature sizes)
23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
24 | $dlrm_pt_bin --arch-sparse-feature-size=64 --arch-mlp-bot="13-512-256-64" --arch-mlp-top="512-512-256-1" --max-ind-range=10000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=2048 --print-freq=1024 --print-time --test-mini-batch-size=16384 --test-num-workers=16 $dlrm_extra_option 2>&1 | tee run_terabyte_pt.log
25 |
26 | echo "run caffe2 ..."
27 | # WARNING: the following parameters will be set based on the data set
28 | # --arch-embedding-size=... (sparse feature sizes)
29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp)
30 | $dlrm_c2_bin --arch-sparse-feature-size=64 --arch-mlp-bot="13-512-256-64" --arch-mlp-top="512-512-256-1" --max-ind-range=10000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=2048 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_terabyte_c2.log
31 |
32 | echo "done"
33 |
--------------------------------------------------------------------------------
/bench/run_and_time.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # Copyright (c) Facebook, Inc. and its affiliates.
3 | #
4 | # This source code is licensed under the MIT license found in the
5 | # LICENSE file in the root directory of this source tree.
6 | #
7 | #WARNING: must have compiled PyTorch and caffe2
8 |
9 | #check if extra argument is passed to the test
10 | if [[ $# == 1 ]]; then
11 | dlrm_extra_option=$1
12 | else
13 | dlrm_extra_option=""
14 | fi
15 | #echo $dlrm_extra_option
16 |
17 | python dlrm_s_pytorch.py --arch-sparse-feature-size=128 --arch-mlp-bot="13-512-256-128" --arch-mlp-top="1024-1024-512-256-1" --max-ind-range=40000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2048 --print-time --test-freq=102400 --test-mini-batch-size=16384 --test-num-workers=16 --memory-map --mlperf-logging --mlperf-auc-threshold=0.8025 --mlperf-bin-loader --mlperf-bin-shuffle $dlrm_extra_option 2>&1 | tee run_terabyte_mlperf_pt.log
18 |
19 | echo "done"
20 |
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/EvLFU.cpython-36m-x86_64-linux-gnu.so:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ucare-uchicago/ev-store-dlrm/0954b2cb26a7e4ad1dddcdc3f98480e7d7e16ab5/cache_algo/EvLFU_C1_Cython/EvLFU.cpython-36m-x86_64-linux-gnu.so
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/EvLFU.pyx:
--------------------------------------------------------------------------------
1 | from libcpp.vector cimport vector
2 | from libcpp cimport bool
3 | from libcpp.string cimport string
4 |
5 | cdef extern from "evlfu.hpp":
6 | void init(int capacity)
7 | void request_to_ev_lfu(vector[int] &group_keys, vector[bool] &arr_record_hit, vector[vector[float]] &arr_emb_weights, bool use_gpu)
8 | void load_ev_tables()
9 | void close_ev_tables()
10 |
11 | def cinit(int capacity):
12 | init(capacity)
13 |
14 | def crequest(vector[int] group_keys, use_gpu):
15 | cdef vector[bool] arr_record_hit = [True] * 26
16 | cdef vector[vector[float]] arr_emb_weights = [[0.0]*36]*26
17 | request_to_ev_lfu(group_keys, arr_record_hit, arr_emb_weights, use_gpu)
18 | return arr_record_hit, arr_emb_weights
19 |
20 |
21 | def cload_ev_tables():
22 | load_ev_tables()
23 |
24 | def cclose_ev_tables():
25 | close_ev_tables()
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/evlfu.hpp:
--------------------------------------------------------------------------------
1 |
2 | #ifndef EVLFU_H_INCLUDED
3 | #define EVLFU_H_INCLUDED
4 |
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | using namespace std;
14 |
15 | struct Cache_data
16 | {
17 | Cache_data(vector ev = vector(0), int agg_hit = 0)
18 | {
19 | this->embedding_value = ev;
20 | this->agg_hit = agg_hit;
21 | }
22 | vector embedding_value;
23 | int agg_hit;
24 | };
25 |
26 | void init(int capacity);
27 | void request_to_ev_lfu(vector &group_keys, vector &arr_record_hit, vector> &arr_emb_weights, bool use_gpu);
28 | void load_ev_tables();
29 | void close_ev_tables();
30 |
31 | #endif
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/script.sh:
--------------------------------------------------------------------------------
1 | python setup_EvLFU.py build_ext --inplace
2 | python test.py
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/setup_EvLFU.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup
2 | from Cython.Build import cythonize
3 | from distutils.extension import Extension
4 | from Cython.Distutils import build_ext
5 | # extensions = [
6 | # Extension('EvLFU', ['EvLFU.pyx', 'evlfu_v2.cpp'],
7 | # extra_compile_args=['-std=c++11'],
8 | # language='c++'
9 | # ),
10 | # ]
11 |
12 | # setup(
13 | # ext_modules=cythonize(extensions),
14 | # # extra_compile_args=["-w", '-g'],
15 | # # extra_compile_args=["-O3"],
16 | # )
17 |
18 | ext_modules = [Extension("EvLFU", ["EvLFU.pyx", "evlfu.cpp"], language='c++',)]
19 |
20 | setup(cmdclass = {'build_ext': build_ext}, ext_modules = ext_modules)
--------------------------------------------------------------------------------
/cache_algo/EvLFU_C1_Cython/test.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import sys
3 | import numpy as np
4 | import pandas as pd
5 | import time
6 | import random
7 | import EvLFU
8 |
9 |
10 | workload_dir = "/home/cc/workload/Archive-new-1.0M/"
11 | workload_files = []
12 | for i in range(1, 27):
13 | workload_files.append("workload-group-" + str(i) + ".csv")
14 | arrRawWorkload = []
15 | # read all workloads:
16 | for workload_file in workload_files:
17 | workload = np.asarray(pd.read_csv(workload_dir + workload_file).values[:, 0])
18 | arrRawWorkload.append(workload)
19 |
20 | arrRawWorkload = np.asarray(arrRawWorkload)
21 | # print(arrRawWorkload.shape)
22 | # merge the workloads
23 | arrMergedWorkload = np.stack(arrRawWorkload, axis=1)
24 | groupedWorkloadKeys = arrMergedWorkload
25 | print(arrMergedWorkload.shape)
26 | print("Done merging ALL workloads: total = ", arrRawWorkload.shape[0], 'rows')
27 |
28 | # Run the alg:
29 | EvLFU.cinit(768)
30 | prefectHit = 0
31 |
32 | groupedWorkloadIds = []
33 |
34 | for groupKeys in groupedWorkloadKeys:
35 | groupKeys = groupKeys.tolist()
36 | for i in range(26):
37 | groupKeys[i] = int(groupKeys[i].split('-')[1])
38 |
39 | groupedWorkloadIds.append(groupKeys)
40 |
41 | EvLFU.cload_ev_tables()
42 | start_time = time.time()
43 |
44 | for group_row_ids in groupedWorkloadIds:
45 | # print(type(groupKeys))
46 | # print(type(groupKeys[0]))
47 | aggHitMissRecord, x = EvLFU.crequest(group_row_ids, False)
48 | # print(aggHitMissRecord)
49 | # print(x)
50 | # exit(0)
51 | flag = True
52 | for isHit in aggHitMissRecord:
53 | if not isHit:
54 | flag = False
55 | break
56 | if flag:
57 | prefectHit += 1
58 | print("perfect hit:", prefectHit)
59 | print(time.time() - start_time)
60 | EvLFU.cclose_ev_tables()
--------------------------------------------------------------------------------
/cache_algo/LFU.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 | cap = -1
7 | least_freq = 1
8 | # to store the frequency of the keys
9 | node_for_freq = []
10 | # to search the key within all the cached keys
11 | node_for_key = dict()
12 |
13 | def init(capacity):
14 | global cap
15 | cap = capacity
16 | node_for_freq.append(0)# For the frequency == 0 : USELESS
17 | node_for_freq.append([]) # For the frequency == 1
18 |
19 | def _update( key, value, freq):
20 | # increment the frequency
21 | global node_for_key, node_for_freq, least_freq
22 | # remove the key from the old frequency
23 | node_for_freq[freq].remove(key)
24 |
25 | if len(node_for_freq[least_freq]) == 0:
26 | # update the least_freq if there is no more item in this frequency list
27 | # node_for_freq.pop(least_freq) # remove this empty freq list, to save memory
28 | least_freq += 1
29 |
30 | # update frequency
31 | node_for_key[key][1] = freq + 1
32 | # use a +1 because the idx 0 is not used
33 | if ((freq + 1) == len(node_for_freq)):
34 | node_for_freq.append([])
35 | node_for_freq[freq + 1].append(key)
36 |
37 | def set( key, value):
38 | global node_for_key, node_for_freq, cap, least_freq
39 |
40 | # check if full
41 | if (len(node_for_key) >= cap):
42 | # evict 1 item
43 | key_to_remove = node_for_freq[least_freq].pop(0)
44 | node_for_key.pop(key_to_remove)
45 |
46 | # Insert the new item
47 | node_for_key[key] = [value, 1]
48 | node_for_freq[1].append(key)
49 |
50 | # update least freq
51 | least_freq = 1
52 |
53 | def request(key, table_id, row_id):
54 | global node_for_key, node_for_freq
55 | # check if the key is cached
56 | if key in node_for_key:
57 | # Yes, get the value
58 | value, freq = node_for_key[key]
59 | # Update item's frequency
60 | _update(key, value, freq)
61 | return value, True
62 | else:
63 | # MISS: get value from secondary storage
64 | value = storage_manager.get_val_from_storage(table_id, row_id)
65 | set(key, value)
66 | return value, False
67 |
68 | # Multi keys request
69 | def request_to_lfu( group_row_ids, use_gpu = False):
70 | arr_record_hit = []
71 | arr_emb_weights = []
72 | agg_hit = 0
73 | if (use_gpu):
74 | # This code assume that we only run this on a single GPU node
75 | device = torch.device("cuda:0")
76 |
77 | for i, row_id in enumerate(group_row_ids):
78 | # Table_id is started at 1
79 | # Key for row3 of table1 is 1-3
80 | key = str(i+1) + "-" + str(row_id)
81 | val, is_hit = request(key, i+1, row_id)
82 | # convert list of embedding values to tensor
83 | ev_tensor = torch.FloatTensor([val]) # val is a python list
84 | ev_tensor.requires_grad = True
85 | if (use_gpu):
86 | # This code assume that we only run this on a single GPU node
87 | ev_tensor = ev_tensor.to(device)
88 | arr_emb_weights.append(ev_tensor)
89 | if is_hit:
90 | arr_record_hit.append(True)
91 | agg_hit += 1
92 | else:
93 | arr_record_hit.append(False)
94 |
95 | return arr_record_hit, arr_emb_weights
96 |
97 |
--------------------------------------------------------------------------------
/cache_algo/LRU.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 |
7 | cap = -1
8 | LRUCache = collections.OrderedDict()# each item is a dictionary embedding value
9 |
10 | def init(capacity):
11 | global cap
12 | cap = capacity
13 |
14 | # Inserting the NEW key
15 | def set(key, value):
16 | global LRUCache, cap
17 | if (len(LRUCache) >= cap):
18 | # evicting the first key in LRU list
19 | evict_key, evict_val = LRUCache.popitem(last=False)
20 | # inserting new key
21 | LRUCache[key] = value
22 |
23 | # single key request
24 | def request(key, table_id, row_id):
25 | global LRUCache
26 | if (key in LRUCache):
27 | value = LRUCache[key]
28 | # Update position of the hit item to first. Optional.
29 | LRUCache.move_to_end(key, last=True)
30 | return value, True
31 | else:
32 | # MISS: get value from secondary storage
33 | value = storage_manager.get_val_from_storage(table_id, row_id)
34 | set(key, value)
35 | return value, False
36 |
37 | # Multi keys request
38 | def request_to_lru( group_row_ids, use_gpu = False):
39 | arr_record_hit = []
40 | arr_emb_weights = []
41 | agg_hit = 0
42 | if (use_gpu):
43 | # This code assume that we only run this on a single GPU node
44 | device = torch.device("cuda:0")
45 |
46 | for i, row_id in enumerate(group_row_ids):
47 | # Table_id is started at 1
48 | # Key for row3 of table1 is 1-3
49 | key = str(i+1) + "-" + str(row_id)
50 | val, is_hit = request(key, i+1, row_id)
51 | # convert list of embedding values to tensor
52 | ev_tensor = torch.FloatTensor([val]) # val is a python list
53 | ev_tensor.requires_grad = True
54 | if (use_gpu):
55 | # This code assume that we only run this on a single GPU node
56 | ev_tensor = ev_tensor.to(device)
57 | arr_emb_weights.append(ev_tensor)
58 | if is_hit:
59 | arr_record_hit.append(True)
60 | agg_hit += 1
61 | else:
62 | arr_record_hit.append(False)
63 |
64 | return arr_record_hit, arr_emb_weights
65 |
66 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/EvLFU_C1_v0.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 |
7 | ###########################################EvLFU##########################################
8 | cap_C1 = 500
9 | min_C1 = 0
10 | vals_C1 = dict()
11 | counts_C1 = dict()
12 | lists_C1 = dict()
13 | lists_C1[0] = []
14 | # flushing part:
15 | nPerfectItem_C1 = 0
16 | flushRate_C1 = 0.4
17 | perfectItemCapacity_C1 = 1.0
18 |
19 | def init():
20 | pass
21 |
22 | def set(key, value, aggHit):
23 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1
24 | if cap_C1 <= 0:
25 | return
26 | if vals_C1.get(key) is not None:
27 | vals_C1[key] = value
28 | get_val_from_mem(key, aggHit)
29 | return
30 |
31 | # Flushing:
32 | if nPerfectItem_C1 >= int(cap_C1 * perfectItemCapacity_C1):
33 | # print("flushing!")
34 | for i in range(0, int(flushRate_C1 * cap_C1) + 1):
35 | evictKey = lists_C1.get(26)[0]
36 | lists_C1.get(26).remove(evictKey)
37 | vals_C1.pop(evictKey)
38 | counts_C1.pop(evictKey)
39 |
40 | nPerfectItem_C1 = len(lists_C1.get(26))
41 | if len(vals_C1) < cap_C1:
42 | min_C1 = aggHit
43 |
44 | # key allows to insert in the cache:
45 | if len(vals_C1) >= cap_C1:
46 | evictKey = lists_C1.get(min_C1)[0] # TODO: Use pop!!
47 | # print("lists_C1.get(min_C1 = " + str(min_C1) + ") = " + str(lists_C1.get(min_C1)))
48 | lists_C1.get(min_C1).remove(evictKey)
49 | try:
50 | vals_C1.pop(evictKey)
51 | except:
52 | print("KeyError when vals_C1.pop key =" + str(evictKey))
53 | print(vals_C1.keys())
54 | print("cap_C1 " + str(cap_C1))
55 | print(lists_C1.get(min_C1))
56 | exit(-1)
57 | try:
58 | counts_C1.pop(evictKey)
59 | except:
60 | print("KeyError when counts_C1.pop key =" + str(evictKey))
61 | exit(-1)
62 |
63 | # If the key is new, insert the value:
64 | vals_C1[key] = value
65 | counts_C1[key] = aggHit
66 |
67 | if lists_C1.get(aggHit) is None:
68 | lists_C1[aggHit] = []
69 | lists_C1 = dict(sorted(lists_C1.items()))
70 | # if (key in lists_C1[aggHit]):
71 | # print("aggHit = " + str(aggHit))
72 | # print(lists_C1[aggHit])
73 | # print("ERROR 1: key already in lists, no need to append " + key)
74 | # exit(-1)
75 | lists_C1.get(aggHit).append(key) # ========
76 |
77 | # Update minimum agghit
78 | if aggHit < min_C1:
79 | min_C1 = aggHit
80 | while (lists_C1.get(min_C1) is None) or len(lists_C1.get(min_C1)) == 0:
81 | min_C1 += 1
82 |
83 | def get_val_from_mem(key, aggHit): # Get From Mem
84 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1
85 | if vals_C1.get(key) is None:
86 | return None
87 | count = counts_C1.get(key)
88 | newCount = count
89 | if count < aggHit:
90 | newCount = aggHit
91 | counts_C1[key] = newCount
92 | lists_C1.get(count).remove(key)
93 |
94 | if count == min_C1:
95 | while (lists_C1.get(min_C1) is None) or len(lists_C1.get(min_C1)) == 0:
96 | min_C1 += 1
97 | if lists_C1.get(newCount) is None:
98 | lists_C1[newCount] = []
99 | lists_C1 = dict(sorted(lists_C1.items()))
100 | # if (key in lists_C1[newCount]):
101 | # print("newCount = " + str(newCount))
102 | # print(lists_C1[newCount])
103 | # print("ERROR 3: key already in lists, no need to append " + key)
104 | # exit(-1)
105 | lists_C1.get(newCount).append(key) # ========
106 | return vals_C1[key]
107 |
108 | def update(key, tableId, rowId, aggHit, nGroup):
109 | # Get value from EV-LFU cache
110 | val = get_val_from_mem(key, aggHit)
111 | if val is None:
112 | # On MISS
113 | # Get value from secondary storage
114 | # ADDING IF CONDITION HERE IS SLOW!
115 | # if (storage_manager.storage_type == storage_manager.EmbStorage.DUMMY):
116 | # Dummy storage will always use tableid + rowId because the data are stored in 26 tables
117 | val = storage_manager.get_val_from_storage(tableId, rowId)
118 | # else :
119 | # faster for rocksdb
120 | # val = storage_manager.get_val_from_storage_by_key(key) #only on rocksdb
121 | set(key, val, aggHit)
122 | return val
123 |
124 | def request_to_ev_lfu( group_rowIds, use_gpu = False):
125 | recordHitOrMiss = []
126 | group_keys = []
127 | missing_keys = []
128 | emb_weights = []
129 | aggHit = 0
130 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1
131 | for i, rowId in enumerate(group_rowIds):
132 | # TableId is started at 1
133 | # Key for row3 of table1 is 1-3
134 | key = str(i+1) + "-" + str(rowId)
135 | group_keys.append(key)
136 | if vals_C1.get(key) is not None:
137 | recordHitOrMiss.append(True)
138 | aggHit += 1
139 | else:
140 | missing_keys.append(key)
141 | recordHitOrMiss.append(False)
142 |
143 | # TODO: Get the missing keys from storage
144 | # missing_keys
145 |
146 | if (use_gpu):
147 | # This code assume that we only run this on a single GPU node
148 | device = torch.device("cuda:0")
149 |
150 | for i, rowId in enumerate(group_rowIds):
151 | # The tableId is started at 1 instead of 0
152 | val = update(group_keys[i], i + 1, rowId, aggHit, len(recordHitOrMiss)) # the data could either come from EV-LFU or MemStor or PyrocksDB
153 | # convert list of embedding values to tensor
154 | ev_tensor = torch.FloatTensor([val]) # val is a python list
155 | ev_tensor.requires_grad = True
156 | if (use_gpu):
157 | ev_tensor = ev_tensor.to(device)
158 | emb_weights.append(ev_tensor)
159 |
160 | if lists_C1.get(26) and not len(lists_C1.get(26)) == 0:
161 | nPerfectItem_C1 = len(lists_C1.get(26))
162 |
163 | return recordHitOrMiss, emb_weights
164 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/EvLFU_C1_v1.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import sys
3 | sys.path.append('emb_storage')
4 | import storage_manager
5 | import random
6 |
7 | ###########################################EvLFU##########################################
8 | cap_C1 = -1
9 | min_C1 = 0
10 | vals_C1 = dict() # each value is [embedding value, agg_hit]
11 | lists_C1 = dict() # will group the keys based on the agg_hit (count)
12 | # flushing part:
13 | n_perfect_item_C1 = 0
14 | flush_rate_C1 = 0.4
15 | perfect_item_cap_C1 = 1.0
16 | max_perfect_item_C1 = 0
17 |
18 | def init(capacity):
19 | global lists_C1, max_perfect_item_C1, perfect_item_cap_C1, cap_C1
20 | cap_C1 = capacity
21 | # initializing the dict
22 | i = 0
23 | while (i <= 26):
24 | lists_C1[i] = []
25 | i += 1
26 | max_perfect_item_C1 = int(cap_C1 * perfect_item_cap_C1)
27 |
28 | # Inserting the NEW key
29 | def set(key, value, agg_hit):
30 | global cap_C1, min_C1, vals_C1, lists_C1, n_perfect_item_C1, max_perfect_item_C1, flush_rate_C1
31 |
32 | # Flushing:
33 | if n_perfect_item_C1 >= max_perfect_item_C1:
34 | print("flushing!")
35 | print("n_perfect_item_C1 = " + str(n_perfect_item_C1))
36 | print("max_perfect_item_C1 = " + str(max_perfect_item_C1))
37 | for i in range(0, int(flush_rate_C1 * cap_C1) + 1):
38 | key_to_evict = lists_C1[26].pop(0)
39 | vals_C1.pop(key_to_evict)
40 | # adjust the n_perfect_item counter
41 | n_perfect_item_C1 = len(lists_C1[26])
42 | else:
43 | # cache is full
44 | if len(vals_C1) >= cap_C1:
45 | # make a space for the new key
46 | while(lists_C1[min_C1] == []):
47 | # find the right key to pop
48 | # Update minimum agg_hit
49 | min_C1 += 1
50 | if (min_C1 > 26):
51 | min_C1 = 1
52 | key_to_evict = lists_C1[min_C1].pop(0)
53 | vals_C1.pop(key_to_evict)
54 |
55 | # insert the new value:
56 | vals_C1[key] = [value, agg_hit]
57 | lists_C1[agg_hit].append(key) # ========
58 |
59 | if agg_hit < min_C1:
60 | min_C1 = agg_hit
61 |
62 | def update_agg_hit(key, agg_hit): # Get From Mem
63 | global vals_C1, lists_C1
64 | ev_vals = vals_C1.get(key)
65 | if ev_vals is None:
66 | return None
67 | # old_agg_hit = ev_vals[1]
68 | if ev_vals[1] < agg_hit:
69 | # update the old agg_hit
70 | lists_C1[ev_vals[1]].remove(key)
71 | lists_C1[agg_hit].append(key) # ========
72 | vals_C1[key][1] = agg_hit
73 | # Increase the min_freq if the current lists freq is []
74 | # Nope, the new aggHit can jump, No need to do anything
75 | return ev_vals[0]
76 |
77 | # Updating the existing keys and inserting the missing keys
78 | def update(key, table_id, row_id, agg_hit, missing_value = None):
79 | # TODO: This can be done in multi threaded way (on Java and C++)
80 | # Get value from EV-LFU cache
81 | val = update_agg_hit(key, agg_hit)
82 | if val:
83 | return val
84 | else:
85 | # On MISS: Get value from secondary storage
86 | # DON't put "IF CONDITION" HERE! IT IS SLOW!
87 | if missing_value is None:
88 | # this key might be kicked out while inserting previous key
89 | missing_value = storage_manager.get_val_from_storage(table_id, row_id)
90 | # missing_value = storage_manager.get_val_from_storage_by_key(key) #only on rocksdb
91 | set(key, missing_value, agg_hit)
92 | return missing_value
93 |
94 | def request_to_ev_lfu( group_row_ids, use_gpu = False, approx_emb_thres = -1, ev_dim = 36):
95 | arr_record_hit = []
96 | arr_group_keys = []
97 | arr_missing_keys = []
98 | arr_missing_values = []
99 | arr_emb_weights = []
100 | pick_random_ev = False
101 | agg_hit = 0
102 | global vals_C1, lists_C1, n_perfect_item_C1
103 | for i, row_id in enumerate(group_row_ids):
104 | # Table_id is started at 1
105 | # Key for row3 of table1 is 1-3
106 | key = str(i+1) + "-" + str(row_id)
107 | arr_group_keys.append(key)
108 | if key in vals_C1.keys():
109 | arr_record_hit.append(True)
110 | agg_hit += 1
111 | else:
112 | arr_missing_keys.append([i+1, row_id])
113 | arr_record_hit.append(False)
114 |
115 | # Get all missing keys from storage at once
116 | arr_missing_values = storage_manager.get_arr_val_from_storage(arr_missing_keys)
117 |
118 | if (use_gpu):
119 | # This code assume that we only run this on a single GPU node
120 | device = torch.device("cuda:0")
121 |
122 | # Update
123 | for i, row_id in enumerate(group_row_ids):
124 | # TODO: C++ and java code should do this in multithreaded way
125 | # The table_id is started at 1 instead of 0
126 | if (arr_record_hit[i]):
127 | val = update(arr_group_keys[i], i + 1, row_id, agg_hit)
128 | else:
129 | # plug the missing values that we get from secondary storage
130 | val = update(arr_group_keys[i], i + 1, row_id, agg_hit, arr_missing_values.pop(0))
131 | # convert list of embedding values to tensor
132 | ev_tensor = torch.FloatTensor([val]) # val is a python list
133 | ev_tensor.requires_grad = True
134 | if (use_gpu):
135 | ev_tensor = ev_tensor.to(device)
136 | arr_emb_weights.append(ev_tensor)
137 |
138 | if agg_hit == 26:
139 | # update the number of perfect item
140 | n_perfect_item_C1 = len(lists_C1[26])
141 | return arr_record_hit, arr_emb_weights
142 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/LFU_v0.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 | cap = -1
7 | least_freq = 1
8 | node_for_freq = collections.defaultdict(collections.OrderedDict)
9 | node_for_key = dict()
10 |
11 | def init(capacity):
12 | global cap
13 | cap = capacity
14 |
15 | def _update( key, value):
16 | global node_for_key, node_for_freq, least_freq
17 | _, freq = node_for_key[key]
18 | node_for_freq[freq].pop(key)
19 | if len(node_for_freq[least_freq]) == 0:
20 | least_freq += 1
21 | node_for_freq[freq+1][key] = (value, freq+1)
22 | node_for_key[key] = (value, freq+1)
23 |
24 | def set( key, value):
25 | global node_for_key, node_for_freq, cap, least_freq
26 | if (len(node_for_key) >= cap):
27 | # evict 1 item
28 | removed = node_for_freq[least_freq].popitem(last=False)
29 | node_for_key.pop(removed[0])
30 | # Insert the new item
31 | node_for_key[key] = (value,1)
32 | node_for_freq[1][key] = (value,1)
33 |
34 | def request(key, table_id, row_id):
35 | global node_for_key, node_for_freq
36 | if key in node_for_key:
37 | value = node_for_key[key][0]
38 | # Update item's frequency
39 | _update(key, value)
40 | return value, True
41 | else:
42 | # MISS: get value from secondary storage
43 | value = storage_manager.get_val_from_storage(table_id, row_id)
44 | set(key, value)
45 | return value, False
46 |
47 | # Multi keys request
48 | def request_to_lfu( group_row_ids, use_gpu = False):
49 | arr_record_hit = []
50 | arr_emb_weights = []
51 | agg_hit = 0
52 | if (use_gpu):
53 | # This code assume that we only run this on a single GPU node
54 | device = torch.device("cuda:0")
55 |
56 | for i, row_id in enumerate(group_row_ids):
57 | # Table_id is started at 1
58 | # Key for row3 of table1 is 1-3
59 | key = str(i+1) + "-" + str(row_id)
60 | val, is_hit = request(key, i+1, row_id)
61 | # convert list of embedding values to tensor
62 | ev_tensor = torch.FloatTensor([val]) # val is a python list
63 | ev_tensor.requires_grad = True
64 | if (use_gpu):
65 | # This code assume that we only run this on a single GPU node
66 | ev_tensor = ev_tensor.to(device)
67 | arr_emb_weights.append(ev_tensor)
68 | if is_hit:
69 | arr_record_hit.append(True)
70 | agg_hit += 1
71 | else:
72 | arr_record_hit.append(False)
73 |
74 | return arr_record_hit, arr_emb_weights
75 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/LFU_v1.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 | cap = -1
7 | least_freq = 1
8 | # to store the frequency of the keys
9 | node_for_freq = dict()
10 | # to search the key within all the cached keys
11 | node_for_key = dict()
12 |
13 | def init(capacity):
14 | global cap
15 | cap = capacity
16 | node_for_freq[1] = collections.OrderedDict() # For the frequency == 1
17 |
18 | def _update( key, value, freq):
19 | # increment the frequency
20 | global node_for_key, node_for_freq, least_freq
21 | # remove the key from the old frequency
22 | node_for_freq[freq].pop(key)
23 |
24 | if len(node_for_freq[least_freq]) == 0:
25 | # update the least_freq if there is no more item in this frequency list
26 | least_freq += 1
27 | node_for_freq.pop(least_freq)
28 |
29 | # update frequency
30 | node_for_key[key][1] = freq + 1
31 | if ((freq + 1) not in node_for_freq.keys()):
32 | node_for_freq[freq + 1] = collections.OrderedDict()
33 | node_for_freq[freq + 1][key] = ""
34 |
35 | def set( key, value):
36 | global node_for_key, node_for_freq, cap, least_freq
37 |
38 | # check if full
39 | if (len(node_for_key) >= cap):
40 | # evict 1 item
41 | key_to_remove = node_for_freq[least_freq].popitem(last=False)
42 | node_for_key.pop(key_to_remove)
43 |
44 | # Insert the new item
45 | node_for_key[key] = [value, 1]
46 | node_for_freq[1][key] = ""
47 |
48 | # update least freq
49 | least_freq = 1
50 |
51 | def request(key, table_id, row_id):
52 | global node_for_key, node_for_freq
53 | # check if the key is cached
54 | if key in node_for_key:
55 | # Yes, get the value
56 | value, freq = node_for_key[key]
57 | # Update item's frequency
58 | _update(key, value, freq)
59 | return value, True
60 | else:
61 | # MISS: get value from secondary storage
62 | value = storage_manager.get_val_from_storage(table_id, row_id)
63 | set(key, value)
64 | return value, False
65 |
66 | # Multi keys request
67 | def request_to_lfu( group_row_ids, use_gpu = False):
68 | arr_record_hit = []
69 | arr_emb_weights = []
70 | agg_hit = 0
71 | if (use_gpu):
72 | # This code assume that we only run this on a single GPU node
73 | device = torch.device("cuda:0")
74 |
75 | for i, row_id in enumerate(group_row_ids):
76 | # Table_id is started at 1
77 | # Key for row3 of table1 is 1-3
78 | key = str(i+1) + "-" + str(row_id)
79 | val, is_hit = request(key, i+1, row_id)
80 | # convert list of embedding values to tensor
81 | ev_tensor = torch.FloatTensor([val]) # val is a python list
82 | ev_tensor.requires_grad = True
83 | if (use_gpu):
84 | # This code assume that we only run this on a single GPU node
85 | ev_tensor = ev_tensor.to(device)
86 | arr_emb_weights.append(ev_tensor)
87 | if is_hit:
88 | arr_record_hit.append(True)
89 | agg_hit += 1
90 | else:
91 | arr_record_hit.append(False)
92 |
93 | return arr_record_hit, arr_emb_weights
94 |
95 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/LFU_v2.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 | cap = -1
7 | least_freq = 1
8 | # to store the frequency of the keys
9 | node_for_freq = dict()
10 | # to search the key within all the cached keys
11 | node_for_key = dict()
12 |
13 | def init(capacity):
14 | global cap
15 | cap = capacity
16 | node_for_freq[1] = [] # For the frequency == 1
17 |
18 | def _update( key, value, freq):
19 | # increment the frequency
20 | global node_for_key, node_for_freq, least_freq
21 | # remove the key from the old frequency
22 | node_for_freq[freq].remove(key)
23 |
24 | if len(node_for_freq[least_freq]) == 0:
25 | # update the least_freq if there is no more item in this frequency list
26 | node_for_freq.pop(least_freq) # remove this empty freq list, to save memory
27 | least_freq += 1
28 |
29 | # update frequency
30 | node_for_key[key][1] = freq + 1
31 | if ((freq + 1) not in node_for_freq.keys()):
32 | node_for_freq[freq + 1] = []
33 | node_for_freq[freq + 1].append(key)
34 |
35 | def set( key, value):
36 | global node_for_key, node_for_freq, cap, least_freq
37 |
38 | # check if full
39 | if (len(node_for_key) >= cap):
40 | # evict 1 item
41 | key_to_remove = node_for_freq[least_freq].pop(0)
42 | node_for_key.pop(key_to_remove)
43 |
44 | # Insert the new item
45 | node_for_key[key] = [value, 1]
46 | node_for_freq[1].append(key)
47 |
48 | # update least freq
49 | least_freq = 1
50 |
51 | def request(key, table_id, row_id):
52 | global node_for_key, node_for_freq
53 | # check if the key is cached
54 | if key in node_for_key:
55 | # Yes, get the value
56 | value, freq = node_for_key[key]
57 | # Update item's frequency
58 | _update(key, value, freq)
59 | return value, True
60 | else:
61 | # MISS: get value from secondary storage
62 | value = storage_manager.get_val_from_storage(table_id, row_id)
63 | set(key, value)
64 | return value, False
65 |
66 | # Multi keys request
67 | def request_to_lfu( group_row_ids, use_gpu = False):
68 | arr_record_hit = []
69 | arr_emb_weights = []
70 | agg_hit = 0
71 | if (use_gpu):
72 | # This code assume that we only run this on a single GPU node
73 | device = torch.device("cuda:0")
74 |
75 | for i, row_id in enumerate(group_row_ids):
76 | # Table_id is started at 1
77 | # Key for row3 of table1 is 1-3
78 | key = str(i+1) + "-" + str(row_id)
79 | val, is_hit = request(key, i+1, row_id)
80 | # convert list of embedding values to tensor
81 | ev_tensor = torch.FloatTensor([val]) # val is a python list
82 | ev_tensor.requires_grad = True
83 | if (use_gpu):
84 | # This code assume that we only run this on a single GPU node
85 | ev_tensor = ev_tensor.to(device)
86 | arr_emb_weights.append(ev_tensor)
87 | if is_hit:
88 | arr_record_hit.append(True)
89 | agg_hit += 1
90 | else:
91 | arr_record_hit.append(False)
92 |
93 | return arr_record_hit, arr_emb_weights
94 |
95 |
--------------------------------------------------------------------------------
/cache_algo/old_versions/LRU_v0.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import torch
3 | import sys
4 | sys.path.append('emb_storage')
5 | import storage_manager
6 | from functools import lru_cache
7 |
8 | cap = -1
9 | vals = dict() # each item is a dictionary embedding value
10 | lru_list = [] # store the order of the keys in LRU manner
11 |
12 | def init(capacity):
13 | global cap
14 | cap = capacity
15 |
16 | # Inserting the NEW key
17 | def set(key, value):
18 | global vals, lru_list, cap
19 |
20 | if (len(lru_list) >= cap):
21 | # evicting the first key in LRU list
22 | vals.pop(lru_list.pop(0))
23 |
24 | # inserting new key
25 | vals[key] = value
26 | lru_list.append(key)
27 |
28 | # single key request
29 | def request(key, table_id, row_id):
30 | global vals, lru_list
31 | is_hit = False
32 | value = vals.get(key)
33 | if (value == None):
34 | # MISS: get value from secondary storage
35 | value = storage_manager.get_val_from_storage(table_id, row_id)
36 | set(key, value)
37 | else:
38 | # update the position; put it in the back of the Q
39 | lru_list.remove(key)
40 | lru_list.append(key)
41 | is_hit = True
42 | return value, is_hit
43 |
44 | @lru_cache(maxsize=5000)
45 | def request_memoization(key, table_id, row_id):
46 | # MISS: get value from secondary storage
47 | value = storage_manager.get_val_from_storage(table_id, row_id)
48 | set(key, value)
49 | return value, False
50 |
51 | # Multi keys request
52 | def request_to_lru( group_row_ids, use_gpu = False):
53 | arr_record_hit = []
54 | arr_emb_weights = []
55 | agg_hit = 0
56 | if (use_gpu):
57 | # This code assume that we only run this on a single GPU node
58 | device = torch.device("cuda:0")
59 |
60 | for i, row_id in enumerate(group_row_ids):
61 | # Table_id is started at 1
62 | # Key for row3 of table1 is 1-3
63 | key = str(i+1) + "-" + str(row_id)
64 | # val, is_hit = request_memoization(key, i+1, row_id)
65 | val, is_hit = request(key, i+1, row_id)
66 | # convert list of embedding values to tensor
67 | ev_tensor = torch.FloatTensor([val]) # val is a python list
68 | ev_tensor.requires_grad = True
69 | if (use_gpu):
70 | # This code assume that we only run this on a single GPU node
71 | ev_tensor = ev_tensor.to(device)
72 | arr_emb_weights.append(ev_tensor)
73 | if is_hit:
74 | arr_record_hit.append(True)
75 | agg_hit += 1
76 | else:
77 | arr_record_hit.append(False)
78 |
79 | return arr_record_hit, arr_emb_weights
80 |
81 |
--------------------------------------------------------------------------------
/cython/cython_compile.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | #
3 | # This source code is licensed under the MIT license found in the
4 | # LICENSE file in the root directory of this source tree.
5 | #
6 | # Description: compile .so from python code
7 |
8 | from __future__ import absolute_import, division, print_function, unicode_literals
9 |
10 | from setuptools import setup
11 | from Cython.Build import cythonize
12 | from distutils.extension import Extension
13 |
14 | ext_modules = [
15 | Extension(
16 | "data_utils_cython",
17 | ["data_utils_cython.pyx"],
18 | extra_compile_args=['-O3'],
19 | extra_link_args=['-O3'],
20 | )
21 | ]
22 |
23 | setup(
24 | name='data_utils_cython',
25 | ext_modules=cythonize(ext_modules)
26 | )
27 |
--------------------------------------------------------------------------------
/cython/cython_criteo.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) Facebook, Inc. and its affiliates.
2 | #
3 | # This source code is licensed under the MIT license found in the
4 | # LICENSE file in the root directory of this source tree.
5 | #
6 | # Description: run dataset pre-processing in standalone mode
7 | # WARNING: These steps are required to work with Cython
8 | # 1. Instal Cython
9 | # > sudo yum install Cython
10 | # 2. Please copy data_utils.py into data_utils_cython.pyx
11 | # 3. Compile the data_utils_cython.pyx to generate .so
12 | # (it's important to keep extension .pyx rather than .py
13 | # to ensure the C/C++ .so no .py is loaded at import time)
14 | # > python cython_compile.py build_ext --inplace
15 | # This should create data_utils_cython.so, which can be loaded below with "import"
16 | # 4. Run standalone datatset preprocessing to generate .npz files
17 | # a. Kaggle
18 | # > python cython_criteo.py --data-set=kaggle --raw-data-file=./input/train.txt
19 | # --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz
20 | # b. Terabyte
21 | # > python cython_criteo.py --max-ind-range=10000000 [--memory-map] --data-set=terabyte
22 | # --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz
23 |
24 | from __future__ import absolute_import, division, print_function, unicode_literals
25 |
26 | import data_utils_cython as duc
27 |
28 | if __name__ == "__main__":
29 | ### import packages ###
30 | import argparse
31 |
32 | ### parse arguments ###
33 | parser = argparse.ArgumentParser(
34 | description="Preprocess Criteo dataset"
35 | )
36 | # model related parameters
37 | parser.add_argument("--max-ind-range", type=int, default=-1)
38 | parser.add_argument("--data-sub-sample-rate", type=float, default=0.0) # in [0, 1]
39 | parser.add_argument("--data-randomize", type=str, default="total") # or day or none
40 | parser.add_argument("--memory-map", action="store_true", default=False)
41 | parser.add_argument("--data-set", type=str, default="kaggle") # or terabyte
42 | parser.add_argument("--raw-data-file", type=str, default="")
43 | parser.add_argument("--processed-data-file", type=str, default="")
44 | args = parser.parse_args()
45 |
46 | duc.loadDataset(
47 | args.data_set,
48 | args.max_ind_range,
49 | args.data_sub_sample_rate,
50 | args.data_randomize,
51 | "train",
52 | args.raw_data_file,
53 | args.processed_data_file,
54 | args.memory_map
55 | )
56 |
--------------------------------------------------------------------------------
/emb_storage/file_read.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | import struct
4 |
5 | BINARY_DIR_NAME = "binary/"
6 | arr_files = []
7 | TOTAL_BYTE_PER_ROW = -1
8 | EV_DIMENSION = 36
9 |
10 | # Load value as bytes!!
11 | def open_files_as_binary(ev_path_c1, bit_precision = 32):
12 | global arr_files, TOTAL_BYTE_PER_ROW
13 | BYTE_PRECISION = int(bit_precision/8)
14 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
15 |
16 | print("**************** Opening all Binary EV-files")
17 | print("**************** from = " + ev_path_c1)
18 | arr_files.append("ID Zero is not being used!")
19 | for ev_idx in range(0, 26):
20 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin"
21 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename)
22 | print("************* Opening Binnary EV = " + bin_ev_path)
23 | arr_files.append(open(bin_ev_path, 'rb'))
24 | print("**************** All Files are opened!")
25 | print("**************** TOTAL_BYTE_PER_ROW = " + str(TOTAL_BYTE_PER_ROW))
26 |
27 | def get(tableId, rowId):
28 | # tableId started at id = 1
29 | file = arr_files[tableId]
30 | # print(TOTAL_BYTE_PER_ROW * rowId )
31 | file.seek(TOTAL_BYTE_PER_ROW * rowId)
32 | blob = file.read(TOTAL_BYTE_PER_ROW)
33 | return struct.unpack('f'*36, blob)
34 |
35 | def close():
36 | arr_files.pop(0) # this item0 is not really a file
37 | for file in arr_files:
38 | file.close()
39 | print("**************** All Files are closed!")
40 |
--------------------------------------------------------------------------------
/emb_storage/mmap_file_read.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | import struct
4 | import mmap
5 |
6 | BINARY_DIR_NAME = "binary/"
7 | arr_files = []
8 | arr_mmap_files = []
9 | TOTAL_BYTE_PER_ROW = -1
10 | EV_DIMENSION = 36
11 |
12 | # Load value as bytes!!
13 | def open_files_as_binary(ev_path_c1, bit_precision = 32):
14 | global arr_files, TOTAL_BYTE_PER_ROW
15 | BYTE_PRECISION = int(bit_precision/8)
16 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
17 |
18 | print("**************** Opening all Binary EV-files")
19 | print("**************** from = " + ev_path_c1)
20 | arr_files.append("ID Zero is not being used!")
21 | arr_mmap_files.append("ID Zero is not being used!")
22 | for ev_idx in range(0, 26):
23 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin"
24 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename)
25 | print("************* Opening Binnary EV = " + bin_ev_path)
26 | file = open(bin_ev_path, 'rb')
27 | arr_files.append(file)
28 | arr_mmap_files.append(mmap.mmap(file.fileno(), 0, prot=mmap.PROT_READ))
29 | print("**************** All Files are opened!")
30 | print("**************** TOTAL_BYTE_PER_ROW = " + str(TOTAL_BYTE_PER_ROW))
31 |
32 | def get(tableId, rowId):
33 | # tableId started at id = 1
34 | # file = arr_files[tableId]
35 | file = arr_mmap_files[tableId]
36 | # print(TOTAL_BYTE_PER_ROW * rowId )
37 | file.seek(TOTAL_BYTE_PER_ROW * rowId)
38 | blob = file.read(TOTAL_BYTE_PER_ROW)
39 | # return struct.unpack('f'*36, blob[0:TOTAL_BYTE_PER_ROW])
40 | return struct.unpack('f'*36, blob)
41 |
42 | def close():
43 | arr_files.pop(0) # this item0 is not really a file
44 | for file in arr_files:
45 | file.close()
46 | print("**************** All Files are closed!")
47 |
--------------------------------------------------------------------------------
/emb_storage/multi_storage_dummy/socket-server.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | import argparse
3 | import socket
4 | import pandas as pd
5 | import os
6 | import torch
7 | import struct
8 |
9 | parser = argparse.ArgumentParser(description="EvLFU server")
10 | parser.add_argument("--port", type=int, default=8000)
11 | parser.add_argument("--ev-path", type=str, default="")
12 | args = parser.parse_args()
13 |
14 | # Call EvLFU service through socket
15 | HOST = '127.0.0.1' # Standard loopback interface address (localhost)
16 | PORT = args.port # 65432 # Port to listen on (non-privileged ports are > 1023)
17 | MAX_BUFFER = 1024
18 | BINARY_DIR_NAME = "binary/"
19 | TOTAL_EV_TABLE = 26
20 | EV_DIMENSION = 36
21 |
22 | # This is ROCKSDB client or dummyMemStor client
23 |
24 | EvTable_C1 = []
25 |
26 | # Load value as bytes!!
27 | def load(ev_path_c1, bit_precision = 32):
28 | # We are still storing it as array of floats. TODO: Store it as binary!
29 | print("**************** Loading EV Table to DummyMemoryStorage")
30 | print("**************** Load new set of EV Table from = " + ev_path_c1)
31 | global EvTable_C1
32 | EvTable_C1.append("Buffer: table0 is not used")
33 | BYTE_PRECISION = int(bit_precision/8)
34 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
35 |
36 | for ev_idx in range(0, 26):
37 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin"
38 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename)
39 | print("************* Loading Binnary EV = " + bin_ev_path)
40 |
41 | curr_table = []
42 | # put
43 | with open(bin_ev_path, 'rb') as f:
44 | data = f.read()
45 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
46 | for i in range(0, num_of_indexes):
47 | # put
48 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
49 | blob = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
50 | curr_table.append(blob)
51 |
52 | # Try reading the blob
53 | # print(struct.unpack('f'*36, blob[0:144]))
54 | f.close()
55 | EvTable_C1.append(curr_table)
56 | print("**************** All EvTable loaded in the Memory!")
57 |
58 | def load_as_list(ev_path_c1):
59 | # We are still storing it as array of floats. TODO: Store it as binary!
60 | print("**************** Loading EV Table to DummyMemoryStorage")
61 | print("**************** Load new set of EV Table from = " + ev_path_c1)
62 | global EvTable_C1
63 | EvTable_C1.append("Buffer: table0 is not used")
64 | for ev_idx in range(0, 26):
65 | # Read new EV Table from file
66 | ev_path = os.path.join(ev_path_c1,
67 | "ev-table-" + str(ev_idx + 1) + ".csv")
68 | print("********************* Loading EV = " + ev_path)
69 | new_ev_df = pd.read_csv(ev_path, dtype=float, delimiter=',')
70 | # Convert to numpy first before to tensor
71 | new_ev_arr = new_ev_df.to_numpy()
72 | # Convert to tensor
73 | # Option 1: Store it as numpy array (Slower for reading)
74 | # EvTable_C1[ev_idx + 1] = new_ev_arr
75 | # Option 2: Store it as pure python list
76 | EvTable_C1.append(new_ev_arr.tolist())
77 | break
78 | print("**************** All EvTable loaded in the Memory!")
79 |
80 | def get(tableId, rowId):
81 | # tableId started at id = 1
82 | global EvTable_C1
83 | return EvTable_C1[tableId][rowId]
84 |
85 | def get_many(arrTableId, arrRowId):
86 | # tableId started at id = 1
87 | global EvTable_C1
88 | arrVal = []
89 | for i in range(len(arrTableId)):
90 | arrVal.append(EvTable_C1[arrTableId[i]][arrRowId[i]])
91 | # return array of values
92 | return arrVal
93 |
94 | def listen():
95 | print("This server is ready to look up the ev-value based on the key!")
96 | print("Start listening at port: " + str(args.port))
97 | with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
98 | s.bind((HOST, PORT))
99 | s.listen()
100 | conn, addr = s.accept()
101 | with conn:
102 | print('Connected to client at: ', addr)
103 | while True:
104 | buf = conn.recv(MAX_BUFFER)
105 | if buf:
106 |
107 | keys = str(buf, 'utf8').split('\n')
108 | # print("keys: " + str(keys))
109 | for key in keys:
110 | tableId, rowId = key.split('-', 2)
111 | val = get(int(tableId), int(rowId))
112 | conn.sendall(val)
113 | # print("Done sending the values of " + str(keys))
114 |
115 | # tableId, rowId = str(buf, 'utf8').split('-', 2)
116 | # val = get(int(tableId), int(rowId))
117 | # print(val)
118 | # print(struct.unpack('f'*36, val[0:144]))
119 | # conn.sendall(val)
120 |
121 | if __name__=="__main__":
122 | load(args.ev_path)
123 | listen()
124 |
--------------------------------------------------------------------------------
/emb_storage/storage_dummy.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import os
3 | import torch
4 | import struct
5 | import sys
6 | sys.path.append('../../')
7 |
8 | import evstore_utils
9 | import storage_manager
10 |
11 | EvTable_C1 = []
12 | MAX_BUFFER = 256
13 | BINARY_DIR_NAME = "binary/"
14 | TOTAL_EV_TABLE = 26
15 | EV_DIMENSION = 36
16 |
17 |
18 | def load(ev_path_c1):
19 | # return load_as_binary(ev_path_c1)
20 | return load_as_list(ev_path_c1)
21 |
22 | def get(tableId, rowId):
23 | # tableId started at id = 1
24 | # return get_as_binary(tableId, rowId)
25 | return get_as_list(tableId, rowId)
26 |
27 | def get_nrows_pertable(file_path):
28 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path)
29 | return ln_emb
30 |
31 | # Load value as bytes!!
32 | def load_as_binary(ev_path_c1, bit_precision = 32):
33 | print("**************** Loading EV Table to DummyMemoryStorage")
34 | print("**************** Load new set of EV Table from = " + ev_path_c1)
35 | global EvTable_C1
36 | EvTable_C1.append("Buffer: table0 is not used")
37 | BYTE_PRECISION = int(bit_precision/8)
38 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
39 | ln_emb = get_nrows_pertable(storage_manager.training_config_path)
40 |
41 | for ev_idx in range(0, 26):
42 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin"
43 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename)
44 | print("************* Loading Binnary EV = " + bin_ev_path)
45 |
46 | curr_table = []
47 | # put
48 | with open(bin_ev_path, 'rb') as f:
49 | data = f.read()
50 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
51 | assert(ln_emb[ev_idx] == num_of_indexes)
52 | for i in range(0, num_of_indexes):
53 | # put
54 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
55 | blob = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
56 | curr_table.append(blob)
57 |
58 | # Try reading the blob
59 | # print(struct.unpack('f'*36, blob[0:144]))
60 | f.close()
61 | EvTable_C1.append(curr_table)
62 | print("**************** All EvTable loaded in the Memory!")
63 |
64 | def get_as_binary(tableId, rowId):
65 | # tableId started at id = 1
66 | global EvTable_C1
67 | blob = EvTable_C1[tableId][rowId]
68 | return struct.unpack('f'*36, blob)
69 |
70 | def load_as_list(ev_path_c1):
71 | print("**************** Loading EV Table to DummyMemoryStorage")
72 | print("**************** Load new set of EV Table from = " + ev_path_c1)
73 | global EvTable_C1
74 | EvTable_C1.append("Buffer: table0 is not used")
75 | for ev_idx in range(0, 26):
76 | # Read new EV Table from file
77 | ev_path = os.path.join(ev_path_c1,
78 | "ev-table-" + str(ev_idx + 1) + ".csv")
79 | print("********************* Loading EV = " + ev_path)
80 | new_ev_df = pd.read_csv(ev_path, dtype=float, delimiter=',')
81 | # Convert to numpy first before to tensor
82 | new_ev_arr = new_ev_df.to_numpy()
83 | # Convert to tensor
84 | # Option 1: Store it as numpy array (Slower for reading)
85 | # EvTable_C1[ev_idx + 1] = new_ev_arr
86 | # Option 2: Store it as pure python list
87 | EvTable_C1.append(new_ev_arr.tolist())
88 | print("**************** All EvTable loaded in the Memory!")
89 |
90 | def get_as_list(tableId, rowId):
91 | # tableId started at id = 1
92 | global EvTable_C1
93 | # print(EvTable_C1[tableId][rowId])
94 | # exit()
95 | return EvTable_C1[tableId][rowId]
96 |
--------------------------------------------------------------------------------
/emb_storage/storage_rocksdb.py:
--------------------------------------------------------------------------------
1 | import pyrocksdb
2 | import time
3 | import os
4 | import pandas as pd
5 | import argparse
6 | import struct
7 | import numpy as np
8 | from array import *
9 | from tqdm import tqdm
10 | import torch
11 | import shutil
12 | from pathlib import Path
13 | import sys
14 | sys.path.append('../../')
15 |
16 | import evstore_utils
17 | import storage_manager
18 |
19 | ROCKSDB_DB_DIR = "/mnt/extra/db-ev-storage/rocksdb/"
20 | BINARY_DIR_NAME = "binary/"
21 | TOTAL_EV_TABLE = 26
22 | EV_DIMENSION = 36
23 |
24 | class RocksDBClient:
25 |
26 | # will read the BINARY values from the rocksdb
27 | def get(self, tableId, rowId):
28 | # TableId start from index 1
29 | # assert(tableId >= 1)
30 | # assert(tableId <= TOTAL_EV_TABLE)
31 | # tableId started at 1, but the db connection started at id 0
32 | blob = self.db_conn.get(self.read_opts, str(tableId) + "-" + str(rowId))
33 | # convert to float list
34 | # return struct.unpack('f'*EV_DIMENSION, blob.data[0:144])
35 | return struct.unpack('f'*EV_DIMENSION, blob.data)
36 |
37 | # will read the BINARY values from the rocksdb
38 | def getByKey(self, key):
39 | # TableId start from index 1
40 | # tableId started at 1, but the db connection started at id 0
41 | blob = self.db_conn.get(self.read_opts, key)
42 | # convert to float list
43 | print(blob)
44 | val = struct.unpack('f'*EV_DIMENSION, blob.data)
45 | print(val)
46 | exit()
47 | return val
48 |
49 | def open_db_conn(self):
50 | print("Will prepare db connection")
51 | opts = pyrocksdb.Options()
52 | # for multi-thread
53 | opts.IncreaseParallelism()
54 | opts.OptimizeLevelStyleCompaction()
55 | self.db_conn = pyrocksdb.DB()
56 | status = self.db_conn.open(opts, os.path.join(ROCKSDB_DB_DIR, "ev-table-all.db"))
57 | assert(status.ok())
58 | print("All db connections are ready!")
59 |
60 | def close_db_conn(self):
61 | print("Closing rocksdb connections")
62 | self.db_conn.close()
63 |
64 | def get_nrows_pertable(self, file_path):
65 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path)
66 | return ln_emb
67 |
68 | def load(self, ev_dir, bit_precision = 32):
69 | # delete the db dir if exists
70 | if os.path.exists(ROCKSDB_DB_DIR) and os.path.isdir(ROCKSDB_DB_DIR):
71 | shutil.rmtree(ROCKSDB_DB_DIR)
72 | # recreate the dir to hold new rocksdb data
73 | Path(os.path.join(ROCKSDB_DB_DIR)).mkdir(parents=True, exist_ok=True)
74 |
75 | print("**************** Loading EV Table to ROCKSDB")
76 | print("**************** Load new set of EV Table from = " + ev_dir)
77 |
78 | assert(bit_precision%4 == 0)
79 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path)
80 |
81 | BYTE_PRECISION = int(bit_precision/8)
82 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
83 |
84 | db = pyrocksdb.DB()
85 | opts = pyrocksdb.Options()
86 | # for multi-thread
87 | opts.IncreaseParallelism()
88 | opts.OptimizeLevelStyleCompaction()
89 | opts.create_if_missing = True
90 | db_filename = "ev-table-all.db"
91 | db_filename = os.path.join(ROCKSDB_DB_DIR, db_filename)
92 | #print(db_filename)
93 | s = db.open(opts, db_filename)
94 | assert(s.ok())
95 |
96 | # Storing binary ev-tables to rocksDB
97 | for ev_idx in range(0, TOTAL_EV_TABLE):
98 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin"
99 |
100 | # RocksDB loads the BINARY EV-Tables!
101 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename)
102 | print("************* Loading EV = " + bin_ev_path)
103 |
104 | # put
105 | with open(bin_ev_path, 'rb') as f:
106 | data = f.read()
107 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
108 |
109 | # Verify that the number of unique values per table is the same as what the DLRM model expect
110 | assert(ln_emb[ev_idx] == num_of_indexes)
111 |
112 | opts = pyrocksdb.WriteOptions()
113 | #for nrow in tqdm(range(0, num_of_indexes)):
114 | for i in range(0, num_of_indexes):
115 | # put
116 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
117 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
118 | k = str(ev_idx+1) + "-" + str(i)
119 | db.put(opts, k, v)
120 | print(" === db-path: " + db_filename)
121 | f.close()
122 | print("**************** All EvTable loaded in the RocksDB!")
123 | db.close()
124 |
125 | def __init__(self):
126 | self.db_conn = None
127 | self.read_opts = pyrocksdb.ReadOptions()
128 |
129 |
--------------------------------------------------------------------------------
/emb_storage/storage_rocksdb_26_tabs.py:
--------------------------------------------------------------------------------
1 | import pyrocksdb
2 | import time
3 | import os
4 | import pandas as pd
5 | import argparse
6 | import struct
7 | import numpy as np
8 | from array import *
9 | from tqdm import tqdm
10 | import struct
11 | import torch
12 | import shutil
13 | from pathlib import Path
14 | import sys
15 | sys.path.append('../../')
16 |
17 | import evstore_utils
18 | import storage_manager
19 |
20 | ROCKSDB_DB_PATH = "/mnt/extra/db-ev-storage/rocksdb/"
21 | BINARY_DIR_NAME = "binary/"
22 | TOTAL_EV_TABLE = 26
23 | EV_DIMENSION = 36
24 |
25 | class RocksDBClient:
26 |
27 | # will read the BINARY values from the rocksdb
28 | def get(self, tableId, rowId):
29 | opts = pyrocksdb.ReadOptions()
30 | # assert(tableId >= 1)
31 | # assert(tableId <= TOTAL_EV_TABLE)
32 | # tableId started at 1, but the db connection started at id 0
33 | blob = self.arr_db_conn[tableId - 1].get(self.read_opts, str(rowId))
34 | # convert to float list
35 | return struct.unpack('f'*36, blob.data[0:144])
36 |
37 | def open_db_conn(self):
38 | print("Will prepare db connection")
39 | opts = pyrocksdb.Options()
40 | # for multi-thread
41 | opts.IncreaseParallelism()
42 | opts.OptimizeLevelStyleCompaction()
43 | for i in range(TOTAL_EV_TABLE):
44 | db_conn = pyrocksdb.DB()
45 | status = db_conn.open(opts, os.path.join(ROCKSDB_DB_PATH, "ev-table-" + str(i+1) + ".db"))
46 | assert(status.ok())
47 | self.arr_db_conn.append(db_conn)
48 | print("All db connections are ready!")
49 |
50 | def close_db_conn(self):
51 | print("Closing rocksdb connections")
52 | for db_conn in self.arr_db_conn:
53 | db_conn.close()
54 |
55 | def get_nrows_pertable(self, file_path):
56 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path)
57 | return ln_emb
58 |
59 | def load(self, ev_dir, bit_precision = 32):
60 | # delete the db dir if exists
61 | if os.path.exists(ROCKSDB_DB_PATH) and os.path.isdir(ROCKSDB_DB_PATH):
62 | shutil.rmtree(ROCKSDB_DB_PATH)
63 | # recreate the dir to hold new rocksdb data
64 | Path(os.path.join(ROCKSDB_DB_PATH)).mkdir(parents=True, exist_ok=True)
65 |
66 | print("**************** Loading EV Table to ROCKSDB")
67 | print("**************** Load new set of EV Table from = " + ev_dir)
68 |
69 | assert(bit_precision%4 == 0)
70 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path)
71 |
72 | BYTE_PRECISION = int(bit_precision/8)
73 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
74 | # Storing binary ev-tables to rocksDB
75 | for ev_idx in range(0, TOTAL_EV_TABLE):
76 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin"
77 |
78 | # RocksDB loads the BINARY EV-Tables!
79 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, binFilename)
80 | print("************* Loading EV = " + bin_ev_path)
81 |
82 | db = pyrocksdb.DB()
83 | opts = pyrocksdb.Options()
84 | # for multi-thread
85 | opts.IncreaseParallelism()
86 | opts.OptimizeLevelStyleCompaction()
87 | opts.create_if_missing = True
88 | dbFilename = "ev-table-" + str(ev_idx + 1) + ".db"
89 | dbFilename = os.path.join(ROCKSDB_DB_PATH, dbFilename)
90 | #print(dbFilename)
91 | s = db.open(opts, dbFilename)
92 | assert(s.ok())
93 | # put
94 | with open(bin_ev_path, 'rb') as f:
95 | data = f.read()
96 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
97 |
98 | # Verify that the number of unique values per table is the same as what the DLRM model expect
99 | assert(ln_emb[ev_idx] == num_of_indexes)
100 |
101 | opts = pyrocksdb.WriteOptions()
102 | #for nrow in tqdm(range(0, num_of_indexes)):
103 | for i in range(0, num_of_indexes):
104 | # put
105 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
106 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
107 | k = str(i)
108 | db.put(opts, k, v)
109 | db.close()
110 | print(" === db-path: " + dbFilename)
111 | f.close()
112 | print("**************** All EvTable loaded in the RocksDB!")
113 |
114 | def __init__(self):
115 | self.arr_db_conn = []
116 | self.read_opts = pyrocksdb.ReadOptions()
117 |
118 |
--------------------------------------------------------------------------------
/emb_storage/storage_sqlite.py:
--------------------------------------------------------------------------------
1 | import sqlite3
2 | import time
3 | import os
4 | import pandas as pd
5 | import argparse
6 | import struct
7 | import numpy as np
8 | from array import *
9 | from tqdm import tqdm
10 | import torch
11 | import shutil
12 | from pathlib import Path
13 | import sys
14 | sys.path.append('../../')
15 |
16 | import evstore_utils
17 | import storage_manager
18 |
19 | SQLITE_DB_DIR = "/mnt/extra/db-ev-storage/sqlite/"
20 | BINARY_DIR_NAME = "binary/"
21 | TOTAL_EV_TABLE = 26
22 | EV_DIMENSION = 36
23 | DB_NAME = "ev-table-all.db"
24 |
25 | class SQLiteClient:
26 |
27 | # will read the BINARY values from the SQLiteDB
28 | def get(self, tableId, rowId):
29 | # TableId start from index 1
30 | # assert(tableId >= 1)
31 | # assert(tableId <= TOTAL_EV_TABLE)
32 | # The row at SQLite is started from 1 instead of 0
33 | realRowId = rowId + 1 + self.db_add_up_tables[tableId-1]
34 | blob = self.db_cursor.execute("SELECT * FROM tab1 where rowid={};".format(realRowId)).fetchone()
35 | # print(tableId)
36 | # print(rowId)
37 | # print(blob)
38 | # assert(blob != None)
39 | return struct.unpack('f'*EV_DIMENSION, blob[0])
40 |
41 | def get_nrows_pertable(self, file_path):
42 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path)
43 | return ln_emb
44 |
45 | def load(self, ev_dir, bit_precision = 32):
46 | # delete the db dir if exists
47 | if os.path.exists(SQLITE_DB_DIR) and os.path.isdir(SQLITE_DB_DIR):
48 | shutil.rmtree(SQLITE_DB_DIR)
49 | # recreate the dir to hold new sqlite data
50 | Path(os.path.join(SQLITE_DB_DIR)).mkdir(parents=True, exist_ok=True)
51 | db = sqlite3.connect(self.db_file_path)
52 | db_cursor = db.cursor()
53 |
54 | print("**************** Loading EV Table to SQLite")
55 | print("**************** Load new set of EV Table from = " + ev_dir)
56 |
57 | assert(bit_precision%4 == 0)
58 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path)
59 |
60 | BYTE_PRECISION = int(bit_precision/8)
61 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
62 | table_name = "tab1"
63 | # Storing binary ev-tables to SQLite
64 | for ev_idx in range(0, TOTAL_EV_TABLE):
65 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin"
66 | # table_name = "ev_table_" + str(ev_idx + 1)
67 |
68 | db_cursor.execute("CREATE TABLE if not exists " + table_name + " (b BLOB);")
69 |
70 | # SQLite loads the BINARY EV-Tables!
71 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename)
72 | print("************* Loading EV = " + bin_ev_path)
73 | # put
74 | with open(bin_ev_path, 'rb') as f:
75 | data = f.read()
76 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
77 |
78 | # Verify that the number of unique values per table is the same as what the DLRM model expect
79 | assert(ln_emb[ev_idx] == num_of_indexes)
80 |
81 | bin_ev_path = "/home/cc/ev-tables-sqlite/bin_workload"
82 |
83 | #for nrow in tqdm(range(0, num_of_indexes)):
84 | for i in range(0, num_of_indexes):
85 | # put
86 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
87 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
88 | k = str(ev_idx+1) + "-" + str(i)
89 | db_cursor.execute("insert into " + table_name + " values(?)", (v, ))
90 | print(" === db-path: " + table_name)
91 | f.close()
92 | print("**************** All EvTable loaded in the SQLite!")
93 | db.commit()
94 | db.close()
95 |
96 | def open_db_conn(self):
97 | print("Will prepare db connection")
98 | self.db_conn = sqlite3.connect(self.db_file_path)
99 | self.db_cursor = self.db_conn.cursor()
100 | print("All db connections are ready!")
101 |
102 | def close_db_conn(self):
103 | print("Closing sqlite connections")
104 | self.db_conn.close()
105 |
106 | def __init__(self):
107 | self.db_conn = None
108 | self.db_cursor = None
109 | self.db_file_path = os.path.join(SQLITE_DB_DIR, DB_NAME)
110 | self.db_ln_tables = self.get_nrows_pertable(storage_manager.training_config_path)
111 | self.db_add_up_tables = [0 for _ in range(len(self.db_ln_tables))]
112 | for i in range(len(self.db_ln_tables)-1):
113 | self.db_add_up_tables[i+1] = self.db_add_up_tables[i] + self.db_ln_tables[i]
114 |
--------------------------------------------------------------------------------
/emb_storage/storage_sqlite_26_tabs.py:
--------------------------------------------------------------------------------
1 | import sqlite3
2 | import time
3 | import os
4 | import pandas as pd
5 | import argparse
6 | import struct
7 | import numpy as np
8 | from array import *
9 | from tqdm import tqdm
10 | import torch
11 | import shutil
12 | from pathlib import Path
13 | import sys
14 | sys.path.append('../../')
15 |
16 | import evstore_utils
17 | import storage_manager
18 |
19 | SQLITE_DB_DIR = "/mnt/extra/db-ev-storage/sqlite/"
20 | BINARY_DIR_NAME = "binary/"
21 | TOTAL_EV_TABLE = 26
22 | EV_DIMENSION = 36
23 | DB_NAME = "ev-table-all.db"
24 |
25 | class SQLiteClient:
26 |
27 | # will read the BINARY values from the sqlite
28 | def get(self, tableId, rowId):
29 | # TableId start from index 1
30 | # assert(tableId >= 1)
31 | # assert(tableId <= TOTAL_EV_TABLE)
32 | # The row at SQLite is started from 1 instead of 0
33 | blob = self.db_cursor.execute("SELECT * FROM ev_table_{} where rowid={};".format(tableId, rowId + 1)).fetchone()
34 | # print(tableId)
35 | # print(rowId)
36 | # print(blob)
37 | # assert(blob != None)
38 | return struct.unpack('f'*EV_DIMENSION, blob[0])
39 |
40 | def get_nrows_pertable(self, file_path):
41 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path)
42 | return ln_emb
43 |
44 | def load(self, ev_dir, bit_precision = 32):
45 | # delete the db dir if exists
46 | if os.path.exists(SQLITE_DB_DIR) and os.path.isdir(SQLITE_DB_DIR):
47 | shutil.rmtree(SQLITE_DB_DIR)
48 | # recreate the dir to hold new sqlite data
49 | Path(os.path.join(SQLITE_DB_DIR)).mkdir(parents=True, exist_ok=True)
50 | db = sqlite3.connect(self.db_file_path)
51 | db_cursor = db.cursor()
52 |
53 | print("**************** Loading EV Table to SQLite")
54 | print("**************** Load new set of EV Table from = " + ev_dir)
55 |
56 | assert(bit_precision%4 == 0)
57 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path)
58 |
59 | BYTE_PRECISION = int(bit_precision/8)
60 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION
61 |
62 | # Storing binary ev-tables to SQLite
63 | for ev_idx in range(0, TOTAL_EV_TABLE):
64 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin"
65 | table_name = "ev_table_" + str(ev_idx + 1)
66 |
67 | db_cursor.execute("CREATE TABLE if not exists " + table_name + " (b BLOB);")
68 |
69 | # SQLite loads the BINARY EV-Tables!
70 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename)
71 | print("************* Loading EV = " + bin_ev_path)
72 | # put
73 | with open(bin_ev_path, 'rb') as f:
74 | data = f.read()
75 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW
76 |
77 | # Verify that the number of unique values per table is the same as what the DLRM model expect
78 | assert(ln_emb[ev_idx] == num_of_indexes)
79 |
80 | bin_ev_path = "/home/cc/ev-tables-sqlite/bin_workload"
81 |
82 | #for nrow in tqdm(range(0, num_of_indexes)):
83 | for i in range(0, num_of_indexes):
84 | # put
85 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension
86 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW]
87 | k = str(ev_idx+1) + "-" + str(i)
88 | db_cursor.execute("insert into " + table_name + " values(?)", (v, ))
89 | print(" === db-path: " + table_name)
90 | f.close()
91 | print("**************** All EvTable loaded in the SQLite!")
92 | db.commit()
93 | db.close()
94 |
95 | def open_db_conn(self):
96 | print("Will prepare db connection")
97 | self.db_conn = sqlite3.connect(self.db_file_path)
98 | self.db_cursor = self.db_conn.cursor()
99 | print("All db connections are ready!")
100 |
101 | def close_db_conn(self):
102 | print("Closing sqlite connections")
103 | self.db_conn.close()
104 |
105 | def __init__(self):
106 | self.db_conn = None
107 | self.db_cursor = None
108 | self.db_file_path = os.path.join(SQLITE_DB_DIR, DB_NAME)
109 |
--------------------------------------------------------------------------------
/evstore_utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import pandas as pd
3 | # import EvLFU
4 | import struct
5 | from pathlib import Path
6 | import ast
7 | import numpy as np
8 | import torch
9 |
10 | TRAINING_CONFIG_FILE = "training_config.txt"
11 |
12 | # Replacing the current embedding layer at the current model
13 | def load_new_ev_table (ld_model, ev_path):
14 | print("Load new set of EV Table from = " + ev_path)
15 | for ev_idx in range(0, 26):
16 | new_ev_path = os.path.join(ev_path, "ev-table-"+ str(ev_idx + 1) + ".csv")
17 | new_ev_df = pd.read_csv(new_ev_path, dtype=float, delimiter=',')
18 | # # Convert to numpy first before to tensor
19 | new_ev_arr = new_ev_df.to_numpy()
20 | # # Convert to tensor
21 | new_ev_tensor = torch.FloatTensor(new_ev_arr)
22 |
23 | print("Loading NEW EV per embedding layer = " + new_ev_path)
24 |
25 | # Create key since the entire model will be accessed
26 | key = str("emb_l."+str(ev_idx)+".weight")
27 | # Replace the current embedding tensor with the new one based on the key
28 | ld_model["state_dict"][key] = new_ev_tensor
29 | print("Done loading all EV-Table from " + ev_path)
30 |
31 | def store_training_config(file_path, table_feature_map, nbatches, nbatches_test, ln_emb, m_den):
32 | # store the config to a file to avoid redoing the computation during inference-only
33 | with open(file_path, 'w') as f:
34 | f.write('The order of the arguments: table_feature_map, nbatches, nbatches_test, ln_emb, m_den\n')
35 | f.write(str(table_feature_map)+"\n")
36 | f.write(str(nbatches)+"\n")
37 | f.write(str(nbatches_test)+"\n")
38 | f.write(str(ln_emb.tolist())+"\n")
39 | f.write(str(m_den)+"\n")
40 | print("Done writing training config to : " + file_path + "\n")
41 |
42 | def read_training_config(file_path):
43 | print("Read training config from : " + file_path)
44 | with open(file_path) as f:
45 | lines = [line.rstrip() for line in f]
46 |
47 | table_feature_map = ast.literal_eval(lines[1])
48 | nbatches = int(lines[2])
49 | nbatches_test = int(lines[3])
50 | ln_emb = np.array(ast.literal_eval(lines[4]))
51 | m_den = int(lines[5])
52 | return table_feature_map, nbatches, nbatches_test, ln_emb, m_den
53 |
54 | def prepare_inference_trace_folder(input_data_name, percent_data_for_inference):
55 | print("Create folder to store the model and ev-tables")
56 | outdir = os.path.join("logs", "inf-workload-traces", input_data_name, "inference=" + str(percent_data_for_inference))
57 | Path(outdir).mkdir(parents=True, exist_ok=True)
58 | return outdir
59 |
60 | def write_inf_workload_to_file(workload_traces_outdir, arr_inference_workload):
61 | # Create + open 26 different files
62 | print("Total inference = " + str(len(arr_inference_workload)))
63 | arrfile = []
64 |
65 | for idx in range(0,26):
66 | arrfile.append(open(workload_traces_outdir + "/workload-group-" + str(idx + 1) + ".csv",'w'))
67 | arrfile[idx].write("G" + str(idx + 1) + "_key\n")
68 |
69 | for grouped_keys in arr_inference_workload:
70 | id = 0
71 | for key in grouped_keys:
72 | arrfile[id].write(key + "\n")
73 | id += 1
--------------------------------------------------------------------------------
/input/.gitignore:
--------------------------------------------------------------------------------
1 | *
2 | #!compressed4git*
3 | !.gitignore
4 | !readme.txt
5 | #!*tar.gz.part*
6 |
--------------------------------------------------------------------------------
/input/readme.txt:
--------------------------------------------------------------------------------
1 | ------ Display Advertising Challenge ------
2 |
3 | Dataset: dac-v1
4 |
5 | This dataset contains feature values and click feedback for millions of display
6 | ads. Its purpose is to benchmark algorithms for clickthrough rate (CTR) prediction.
7 | It has been used for the Display Advertising Challenge hosted by Kaggle:
8 | https://www.kaggle.com/c/criteo-display-ad-challenge/
9 |
10 | ===================================================
11 |
12 | Full description:
13 |
14 | This dataset contains 2 files:
15 | train.txt
16 | test.txt
17 | corresponding to the training and test parts of the data.
18 |
19 | ====================================================
20 |
21 | Dataset construction:
22 |
23 | The training dataset consists of a portion of Criteo's traffic over a period
24 | of 7 days. Each row corresponds to a display ad served by Criteo and the first
25 | column is indicates whether this ad has been clicked or not.
26 | The positive (clicked) and negatives (non-clicked) examples have both been
27 | subsampled (but at different rates) in order to reduce the dataset size.
28 |
29 | There are 13 features taking integer values (mostly count features) and 26
30 | categorical features. The values of the categorical features have been hashed
31 | onto 32 bits for anonymization purposes.
32 | The semantic of these features is undisclosed. Some features may have missing values.
33 |
34 | The rows are chronologically ordered.
35 |
36 | The test set is computed in the same way as the training set but it
37 | corresponds to events on the day following the training period.
38 | The first column (label) has been removed.
39 |
40 | ====================================================
41 |
42 | Format:
43 |
44 | The columns are tab separeted with the following schema:
45 |