├── .gitignore ├── LICENSE ├── README.md ├── bench ├── dlrm_s_benchmark.sh ├── dlrm_s_criteo_kaggle.sh ├── dlrm_s_criteo_kaggle_C1.sh ├── dlrm_s_criteo_kaggle_C1_C2.sh ├── dlrm_s_criteo_kaggle_C1_C2_C3.sh ├── dlrm_s_criteo_kaggle_lock_gpu_C1.sh ├── dlrm_s_criteo_terabyte.sh └── run_and_time.sh ├── cache_algo ├── EvLFU_C1.py ├── EvLFU_C1_Cython │ ├── EvLFU.cpp │ ├── EvLFU.cpython-36m-x86_64-linux-gnu.so │ ├── EvLFU.pyx │ ├── evlfu.hpp │ ├── script.sh │ ├── setup_EvLFU.py │ └── test.py ├── LFU.py ├── LRU.py ├── cpp_socket_client.py └── old_versions │ ├── EvLFU4DLRM_C2.py │ ├── EvLFU_C1_apprx_emb.py │ ├── EvLFU_C1_sets.py │ ├── EvLFU_C1_v0.py │ ├── EvLFU_C1_v1.py │ ├── LFU_v0.py │ ├── LFU_v1.py │ ├── LFU_v2.py │ └── LRU_v0.py ├── cython ├── cython_compile.py └── cython_criteo.py ├── data_utils.py ├── dlrm_data_pytorch.py ├── dlrm_s_pytorch.py ├── dlrm_s_pytorch_C1.py ├── dlrm_s_pytorch_C1_C2.py ├── dlrm_s_pytorch_C1_C2_C3.py ├── dlrm_s_pytorch_lock_gpu_C1.py ├── emb_storage ├── file_read.py ├── mmap_file_read.py ├── multi_storage_dummy │ └── socket-server.py ├── storage_dummy.py ├── storage_manager.py ├── storage_rocksdb.py ├── storage_rocksdb_26_tabs.py ├── storage_sqlite.py └── storage_sqlite_26_tabs.py ├── evstore_utils.py ├── experiments.md ├── extend_distributed.py ├── input ├── .gitignore └── readme.txt ├── logs ├── .gitignore ├── sample-inference-criteo_kaggle_5mil.txt ├── sample-inference-criteo_kaggle_all.txt └── sample-train-criteo_kaggle_5mil.txt ├── misc ├── README.txt ├── dlrm_data_caffe2.py ├── dlrm_s_caffe2.py ├── mixed_precs_caching_v0 │ ├── .gitignore │ ├── cache_manager.cpp │ ├── cache_manager.hpp │ ├── dlrm_client.py │ ├── evlfu_16.cpp │ ├── evlfu_16.hpp │ ├── evlfu_32.cpp │ ├── evlfu_32.hpp │ ├── evlfu_4.cpp │ ├── evlfu_4.hpp │ ├── evlfu_8.cpp │ ├── evlfu_8.hpp │ ├── readme.txt │ └── test.cpp └── testing_tensor_cpp │ ├── CMakeLists.txt │ ├── evlfu_tensor.cpp │ ├── evlfu_tensor.hpp │ └── sample_client.py ├── mixed_precs_caching ├── .gitignore ├── aprx_embedding.cpp ├── aprx_embedding.hpp ├── cache_manager.cpp ├── cache_manager.hpp ├── dlrm_client.py ├── evlfu_16.cpp ├── evlfu_16.hpp ├── evlfu_32.cpp ├── evlfu_32.hpp ├── evlfu_4.cpp ├── evlfu_4.hpp ├── evlfu_8.cpp ├── evlfu_8.hpp ├── lib │ └── .gitignore ├── readme.txt ├── test.cpp └── test.py ├── mlperf_logger.py ├── optim └── rwsadagrad.py ├── script ├── apply_ev_preconditioning.py ├── approximate_embedding │ └── phase2_similarity_analysis │ │ ├── README.txt │ │ ├── csvReader.py │ │ ├── get_neighbors_CPU_slow.ipynb │ │ ├── get_neighbors_GPU.ipynb │ │ ├── most_popular_neighbor.ipynb │ │ └── rankedWorkload.csv ├── compress_folder_for_github.sh ├── convert_altkeys_to_binary.py ├── convert_ev_to_binary.py ├── data_loader_terabyte.py ├── dissectingmodel.py ├── free_page_cache.sh ├── gnuplot_cdf_direct_io.plt ├── gnuplot_cdf_evlfu_lru.plt ├── gnuplot_cdf_multi_line.plt ├── gnuplot_graph │ └── cdf_2_line.plt ├── modify_param.py ├── mount_cham_obj_stor.sh ├── plot_cdf.py ├── read_cham_obj_stor.sh ├── reduce_precision.py ├── uncompress_folder_for_github.sh └── wget_evstore_dataset.sh ├── stored_model └── .gitignore ├── test └── dlrm_s_test.sh ├── tools └── visualize.py └── tricks ├── md_embedding_bag.py └── qr_embedding_bag.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.log 2 | logs-old/ 3 | !logs 4 | *__pycache__ 5 | */__pycache__ 6 | */*__pycache__ 7 | */*/__pycache__ 8 | *.out 9 | run_kaggle_pt 10 | model.pth 11 | *.DS_Store 12 | */.DS_Store 13 | */*.DS_Store 14 | */*/.DS_Store 15 | *.ipynb_checkpoints 16 | */.ipynb_checkpoints 17 | */*.ipynb_checkpoints 18 | */*/.ipynb_checkpoints 19 | file_to_download.txt 20 | index.html 21 | test.txt 22 | out.txt 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-3.0.en.html) 2 | [![Platform](https://img.shields.io/badge/Platform-x86--64-brightgreen)](https://shields.io/) 3 | 4 | ``` 5 | _______ ______ _ 6 | | ____\ \ / / ___|| |_ ___ _ __ ___ 7 | | _| \ \ / /\___ \| __/ _ \| '__/ _ \ 8 | | |___ \ V / ___) | || (_) | | | __/ 9 | |_____| \_/ |____/ \__\___/|_| \___| -- Groupability-aware caching systems for DRS 10 | 11 | ``` 12 | 13 | This repository contains the implementation code for paper:
14 | **EVSTORE: Storage and Caching Capabilities for Scaling 15 | Embedding Tables in Deep Recommendation Systems**
16 | 17 | Contact Information 18 | -------------------- 19 | 20 | **Maintainer**: [Daniar H. Kurniawan](https://people.cs.uchicago.edu/~daniar/), Email: ``daniar@uchicago.edu`` 21 | 22 | [//]: <> (**Daniar is on the job market.** Please contact him if you have an opening for an AIOps and ML-Sys engineer role!) 23 | 24 | Feel free to contact Daniar for any suggestions/feedback, bug 25 | reports, or general discussions. 26 | 27 | Please consider citing our EVStore paper at ASPLOS 2023 if you use EVStore. The bib 28 | entry is 29 | 30 | ``` 31 | @InProceedings{Daniar-EVStore, 32 | Author = {Daniar H. Kurniawan and Ruipu Wang and Kahfi S. Zulkifli and Fandi A. Wiranata and John Bent and Ymir Vigfusson and Haryadi S. Gunawi}, 33 | Title = "EVSTORE: Storage and Caching Capabilities for Scaling 34 | Embedding Tables in Deep Recommendation Systems", 35 | Booktitle = {Proceedings of the 28th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)}, 36 | Address = {Vancouver, Canada}, 37 | Month = {MARCH}, 38 | Year = {2023} 39 | } 40 | ``` 41 | 42 | Run EVStore 43 | ----------- 44 | 45 | Please follow the experiments detailed in [Experiments.md](experiments.md). 46 | 47 | 48 | ### Acknowledgement ### 49 | 50 | The DLRM code in this repository is based on [Facebook DLRM](https://github.com/facebookresearch/dlrm). 51 | The cache benchmark repository is based on [Cache2k](https://github.com/cache2k/cache2k) and [Cacheus](https://github.com/sylab/cacheus/). 52 | -------------------------------------------------------------------------------- /bench/dlrm_s_benchmark.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | 7 | #check if extra argument is passed to the test 8 | if [[ $# == 1 ]]; then 9 | dlrm_extra_option=$1 10 | else 11 | dlrm_extra_option="" 12 | fi 13 | #echo $dlrm_extra_option 14 | 15 | cpu=1 16 | gpu=1 17 | pt=1 18 | c2=1 19 | 20 | ncores=28 #12 #6 21 | nsockets="0" 22 | 23 | ngpus="1 2 4 8" 24 | 25 | numa_cmd="numactl --physcpubind=0-$((ncores-1)) -m $nsockets" #run on one socket, without HT 26 | dlrm_pt_bin="python dlrm_s_pytorch.py" 27 | dlrm_c2_bin="python dlrm_s_caffe2.py" 28 | 29 | data=random #synthetic 30 | print_freq=100 31 | rand_seed=727 32 | 33 | c2_net="async_scheduling" 34 | 35 | #Model param 36 | mb_size=2048 #1024 #512 #256 37 | nbatches=1000 #500 #100 38 | bot_mlp="512-512-64" 39 | top_mlp="1024-1024-1024-1" 40 | emb_size=64 41 | nindices=100 42 | emb="1000000-1000000-1000000-1000000-1000000-1000000-1000000-1000000" 43 | interaction="dot" 44 | tnworkers=0 45 | tmb_size=16384 46 | 47 | #_args="--mini-batch-size="${mb_size}\ 48 | _args=" --num-batches="${nbatches}\ 49 | " --data-generation="${data}\ 50 | " --arch-mlp-bot="${bot_mlp}\ 51 | " --arch-mlp-top="${top_mlp}\ 52 | " --arch-sparse-feature-size="${emb_size}\ 53 | " --arch-embedding-size="${emb}\ 54 | " --num-indices-per-lookup="${nindices}\ 55 | " --arch-interaction-op="${interaction}\ 56 | " --numpy-rand-seed="${rand_seed}\ 57 | " --print-freq="${print_freq}\ 58 | " --print-time"\ 59 | " --enable-profiling " 60 | 61 | c2_args=" --caffe2-net-type="${c2_net} 62 | 63 | 64 | # CPU Benchmarking 65 | if [ $cpu = 1 ]; then 66 | echo "--------------------------------------------" 67 | echo "CPU Benchmarking - running on $ncores cores" 68 | echo "--------------------------------------------" 69 | if [ $pt = 1 ]; then 70 | outf="model1_CPU_PT_$ncores.log" 71 | outp="dlrm_s_pytorch.prof" 72 | echo "-------------------------------" 73 | echo "Running PT (log file: $outf)" 74 | echo "-------------------------------" 75 | cmd="$numa_cmd $dlrm_pt_bin --mini-batch-size=$mb_size --test-mini-batch-size=$tmb_size --test-num-workers=$tnworkers $_args $dlrm_extra_option > $outf" 76 | echo $cmd 77 | eval $cmd 78 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}') 79 | echo "Min time per iteration = $min" 80 | # move profiling file(s) 81 | mv $outp ${outf//".log"/".prof"} 82 | mv ${outp//".prof"/".json"} ${outf//".log"/".json"} 83 | 84 | fi 85 | if [ $c2 = 1 ]; then 86 | outf="model1_CPU_C2_$ncores.log" 87 | outp="dlrm_s_caffe2.prof" 88 | echo "-------------------------------" 89 | echo "Running C2 (log file: $outf)" 90 | echo "-------------------------------" 91 | cmd="$numa_cmd $dlrm_c2_bin --mini-batch-size=$mb_size $_args $c2_args $dlrm_extra_option 1> $outf 2> $outp" 92 | echo $cmd 93 | eval $cmd 94 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}') 95 | echo "Min time per iteration = $min" 96 | # move profiling file (collected from stderr above) 97 | mv $outp ${outf//".log"/".prof"} 98 | fi 99 | fi 100 | 101 | # GPU Benchmarking 102 | if [ $gpu = 1 ]; then 103 | echo "--------------------------------------------" 104 | echo "GPU Benchmarking - running on $ngpus GPUs" 105 | echo "--------------------------------------------" 106 | for _ng in $ngpus 107 | do 108 | # weak scaling 109 | # _mb_size=$((mb_size*_ng)) 110 | # strong scaling 111 | _mb_size=$((mb_size*1)) 112 | _gpus=$(seq -s, 0 $((_ng-1))) 113 | cuda_arg="CUDA_VISIBLE_DEVICES=$_gpus" 114 | echo "-------------------" 115 | echo "Using GPUS: "$_gpus 116 | echo "-------------------" 117 | if [ $pt = 1 ]; then 118 | outf="model1_GPU_PT_$_ng.log" 119 | outp="dlrm_s_pytorch.prof" 120 | echo "-------------------------------" 121 | echo "Running PT (log file: $outf)" 122 | echo "-------------------------------" 123 | cmd="$cuda_arg $dlrm_pt_bin --mini-batch-size=$_mb_size --test-mini-batch-size=$tmb_size --test-num-workers=$tnworkers $_args --use-gpu $dlrm_extra_option > $outf" 124 | echo $cmd 125 | eval $cmd 126 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}') 127 | echo "Min time per iteration = $min" 128 | # move profiling file(s) 129 | mv $outp ${outf//".log"/".prof"} 130 | mv ${outp//".prof"/".json"} ${outf//".log"/".json"} 131 | fi 132 | if [ $c2 = 1 ]; then 133 | outf="model1_GPU_C2_$_ng.log" 134 | outp="dlrm_s_caffe2.prof" 135 | echo "-------------------------------" 136 | echo "Running C2 (log file: $outf)" 137 | echo "-------------------------------" 138 | cmd="$cuda_arg $dlrm_c2_bin --mini-batch-size=$_mb_size $_args $c2_args --use-gpu $dlrm_extra_option 1> $outf 2> $outp" 139 | echo $cmd 140 | eval $cmd 141 | min=$(grep "iteration" $outf | awk 'BEGIN{best=999999} {if (best > $7) best=$7} END{print best}') 142 | echo "Min time per iteration = $min" 143 | # move profiling file (collected from stderr above) 144 | mv $outp ${outf//".log"/".prof"} 145 | fi 146 | done 147 | fi 148 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_kaggle.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | #check if extra argument is passed to the test 10 | if [[ $# == 1 ]]; then 11 | dlrm_extra_option=$1 12 | else 13 | dlrm_extra_option="" 14 | fi 15 | #echo $dlrm_extra_option 16 | 17 | dlrm_pt_bin="python3 dlrm_s_pytorch.py" # python -u : so that the tqdm output will be on terminal 18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py" 19 | 20 | echo "run pytorch ..." 21 | # WARNING: the following parameters will be set based on the data set 22 | # --arch-embedding-size=... (sparse feature sizes) 23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1 25 | 26 | # echo "run caffe2 ..." 27 | # WARNING: the following parameters will be set based on the data set 28 | # --arch-embedding-size=... (sparse feature sizes) 29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log 31 | 32 | echo "finished!" 33 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_kaggle_C1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | #check if extra argument is passed to the test 10 | if [[ $# == 1 ]]; then 11 | dlrm_extra_option=$1 12 | else 13 | dlrm_extra_option="" 14 | fi 15 | # echo $dlrm_extra_option 16 | 17 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1.py" 18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py" 19 | 20 | echo "run pytorch C1 ..." 21 | # WARNING: the following parameters will be set based on the data set 22 | # --arch-embedding-size=... (sparse feature sizes) 23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1 25 | 26 | # echo "run caffe2 ..." 27 | # WARNING: the following parameters will be set based on the data set 28 | # --arch-embedding-size=... (sparse feature sizes) 29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log 31 | 32 | echo "finished!" 33 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_kaggle_C1_C2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | CURR_DIR=`pwd` 10 | 11 | # check if the command contains "cpp_algo_socket" 12 | if [[ $1 == *"cpp_algo_socket"* ]]; then 13 | # using socket interface 14 | echo "The CPP caching layer is started by the script below ..." 15 | echo "Will use SOCKET as the interface" 16 | 17 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching 18 | g++ -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; ./a.out & 19 | else 20 | # Each experiment might have different cacheSize, thus we recompile it 21 | echo "Compile the C++ shared library ... " 22 | echo "Will use Ctypes as the interface" 23 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching 24 | g++ -shared -o libcachemanager.so -fPIC -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; mv *.so lib/ 25 | echo "C++ shared library (*.so) is updated!" 26 | 27 | # check if this DLRM deployment wants to use specific libcachemanager naming [To enbale multi DLRM deployment] 28 | if [ -z "$2" ]; then 29 | echo "No need to rename the .so" 30 | else 31 | echo "COPY lib/libcachemanager.so -> lib/$2" # will be use by Ctypes! 32 | cp lib/libcachemanager.so lib/$2 33 | fi 34 | fi 35 | 36 | cd $CURR_DIR 37 | #check if extra argument is passed to the test 38 | if [ -z "$1" ]; then 39 | dlrm_extra_option="" 40 | else 41 | dlrm_extra_option=$1 42 | fi 43 | # echo $dlrm_extra_option 44 | 45 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1_C2.py" 46 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py" 47 | 48 | echo "run pytorch C1_C2 ..." 49 | # WARNING: the following parameters will be set based on the data set 50 | # --arch-embedding-size=... (sparse feature sizes) 51 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 52 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1 53 | 54 | # echo "run caffe2 ..." 55 | # WARNING: the following parameters will be set based on the data set 56 | # --arch-embedding-size=... (sparse feature sizes) 57 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 58 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log 59 | 60 | echo "finished!" 61 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_kaggle_C1_C2_C3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | CURR_DIR=`pwd` 10 | 11 | # check if the command contains "cpp_algo_socket" 12 | if [[ $1 == *"cpp_algo_socket"* ]]; then 13 | # using socket interface 14 | echo "The CPP caching layer is started by the script below ..." 15 | echo "Will use SOCKET as the interface" 16 | 17 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching 18 | g++ -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; ./a.out & 19 | else 20 | # Each experiment might have different cacheSize, thus we recompile it 21 | echo "Compile the C++ shared library ... " 22 | echo "Will use Ctypes as the interface" 23 | cd /mnt/extra/ev-store-dlrm/mixed_precs_caching 24 | g++ -shared -o libcachemanager.so -fPIC -O3 evlfu_4.cpp evlfu_8.cpp evlfu_16.cpp evlfu_32.cpp aprx_embedding.cpp cache_manager.cpp -pthread; mv *.so lib/ 25 | echo "C++ shared library (*.so) is updated!" 26 | 27 | # check if this DLRM deployment wants to use specific libcachemanager naming [To enbale multi DLRM deployment] 28 | if [ -z "$2" ]; then 29 | echo "No need to rename the .so" 30 | else 31 | echo "COPY lib/libcachemanager.so -> lib/$2" # will be use by Ctypes! 32 | cp lib/libcachemanager.so lib/$2 33 | fi 34 | fi 35 | 36 | cd $CURR_DIR 37 | #check if extra argument is passed to the test 38 | if [ -z "$1" ]; then 39 | dlrm_extra_option="" 40 | else 41 | dlrm_extra_option=$1 42 | fi 43 | 44 | dlrm_pt_bin="python3 dlrm_s_pytorch_C1_C2_C3.py" 45 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py" 46 | 47 | echo "run pytorch C1_C2_C3 ..." 48 | # WARNING: the following parameters will be set based on the data set 49 | # --arch-embedding-size=... (sparse feature sizes) 50 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 51 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1 52 | 53 | # echo "run caffe2 ..." 54 | # WARNING: the following parameters will be set based on the data set 55 | # --arch-embedding-size=... (sparse feature sizes) 56 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 57 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log 58 | 59 | echo "finished!" 60 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_kaggle_lock_gpu_C1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | #check if extra argument is passed to the test 10 | if [[ $# == 1 ]]; then 11 | dlrm_extra_option=$1 12 | else 13 | dlrm_extra_option="" 14 | fi 15 | # echo $dlrm_extra_option 16 | 17 | dlrm_pt_bin="python3 dlrm_s_pytorch_lock_gpu_C1.py" 18 | # dlrm_c2_bin="python3 dlrm_s_caffe2.py" 19 | 20 | echo "run pytorch C1 ..." 21 | # WARNING: the following parameters will be set based on the data set 22 | # --arch-embedding-size=... (sparse feature sizes) 23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 24 | $dlrm_pt_bin --arch-sparse-feature-size=36 --arch-mlp-bot="13-512-256-64-36" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time --test-mini-batch-size=1 --test-num-workers=0 $dlrm_extra_option 2>&1 25 | 26 | # echo "run caffe2 ..." 27 | # WARNING: the following parameters will be set based on the data set 28 | # --arch-embedding-size=... (sparse feature sizes) 29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 30 | # $dlrm_c2_bin --arch-sparse-feature-size=16 --arch-mlp-bot="13-512-256-64-16" --arch-mlp-top="512-256-1" --data-generation=dataset --data-set=kaggle --raw-data-file=./input/train.txt --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=128 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_kaggle_c2.log 31 | 32 | echo "finished!" 33 | -------------------------------------------------------------------------------- /bench/dlrm_s_criteo_terabyte.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | #check if extra argument is passed to the test 10 | if [[ $# == 1 ]]; then 11 | dlrm_extra_option=$1 12 | else 13 | dlrm_extra_option="" 14 | fi 15 | #echo $dlrm_extra_option 16 | 17 | dlrm_pt_bin="python dlrm_s_pytorch.py" 18 | dlrm_c2_bin="python dlrm_s_caffe2.py" 19 | 20 | echo "run pytorch ..." 21 | # WARNING: the following parameters will be set based on the data set 22 | # --arch-embedding-size=... (sparse feature sizes) 23 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 24 | $dlrm_pt_bin --arch-sparse-feature-size=64 --arch-mlp-bot="13-512-256-64" --arch-mlp-top="512-512-256-1" --max-ind-range=10000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=2048 --print-freq=1024 --print-time --test-mini-batch-size=16384 --test-num-workers=16 $dlrm_extra_option 2>&1 | tee run_terabyte_pt.log 25 | 26 | echo "run caffe2 ..." 27 | # WARNING: the following parameters will be set based on the data set 28 | # --arch-embedding-size=... (sparse feature sizes) 29 | # --arch-mlp-bot=... (the input to the first layer of bottom mlp) 30 | $dlrm_c2_bin --arch-sparse-feature-size=64 --arch-mlp-bot="13-512-256-64" --arch-mlp-top="512-512-256-1" --max-ind-range=10000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=0.1 --mini-batch-size=2048 --print-freq=1024 --print-time $dlrm_extra_option 2>&1 | tee run_terabyte_c2.log 31 | 32 | echo "done" 33 | -------------------------------------------------------------------------------- /bench/run_and_time.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright (c) Facebook, Inc. and its affiliates. 3 | # 4 | # This source code is licensed under the MIT license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | #WARNING: must have compiled PyTorch and caffe2 8 | 9 | #check if extra argument is passed to the test 10 | if [[ $# == 1 ]]; then 11 | dlrm_extra_option=$1 12 | else 13 | dlrm_extra_option="" 14 | fi 15 | #echo $dlrm_extra_option 16 | 17 | python dlrm_s_pytorch.py --arch-sparse-feature-size=128 --arch-mlp-bot="13-512-256-128" --arch-mlp-top="1024-1024-512-256-1" --max-ind-range=40000000 --data-generation=dataset --data-set=terabyte --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz --loss-function=bce --round-targets=True --learning-rate=1.0 --mini-batch-size=2048 --print-freq=2048 --print-time --test-freq=102400 --test-mini-batch-size=16384 --test-num-workers=16 --memory-map --mlperf-logging --mlperf-auc-threshold=0.8025 --mlperf-bin-loader --mlperf-bin-shuffle $dlrm_extra_option 2>&1 | tee run_terabyte_mlperf_pt.log 18 | 19 | echo "done" 20 | -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/EvLFU.cpython-36m-x86_64-linux-gnu.so: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ucare-uchicago/ev-store-dlrm/0954b2cb26a7e4ad1dddcdc3f98480e7d7e16ab5/cache_algo/EvLFU_C1_Cython/EvLFU.cpython-36m-x86_64-linux-gnu.so -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/EvLFU.pyx: -------------------------------------------------------------------------------- 1 | from libcpp.vector cimport vector 2 | from libcpp cimport bool 3 | from libcpp.string cimport string 4 | 5 | cdef extern from "evlfu.hpp": 6 | void init(int capacity) 7 | void request_to_ev_lfu(vector[int] &group_keys, vector[bool] &arr_record_hit, vector[vector[float]] &arr_emb_weights, bool use_gpu) 8 | void load_ev_tables() 9 | void close_ev_tables() 10 | 11 | def cinit(int capacity): 12 | init(capacity) 13 | 14 | def crequest(vector[int] group_keys, use_gpu): 15 | cdef vector[bool] arr_record_hit = [True] * 26 16 | cdef vector[vector[float]] arr_emb_weights = [[0.0]*36]*26 17 | request_to_ev_lfu(group_keys, arr_record_hit, arr_emb_weights, use_gpu) 18 | return arr_record_hit, arr_emb_weights 19 | 20 | 21 | def cload_ev_tables(): 22 | load_ev_tables() 23 | 24 | def cclose_ev_tables(): 25 | close_ev_tables() -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/evlfu.hpp: -------------------------------------------------------------------------------- 1 | 2 | #ifndef EVLFU_H_INCLUDED 3 | #define EVLFU_H_INCLUDED 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | using namespace std; 14 | 15 | struct Cache_data 16 | { 17 | Cache_data(vector ev = vector(0), int agg_hit = 0) 18 | { 19 | this->embedding_value = ev; 20 | this->agg_hit = agg_hit; 21 | } 22 | vector embedding_value; 23 | int agg_hit; 24 | }; 25 | 26 | void init(int capacity); 27 | void request_to_ev_lfu(vector &group_keys, vector &arr_record_hit, vector> &arr_emb_weights, bool use_gpu); 28 | void load_ev_tables(); 29 | void close_ev_tables(); 30 | 31 | #endif -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/script.sh: -------------------------------------------------------------------------------- 1 | python setup_EvLFU.py build_ext --inplace 2 | python test.py -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/setup_EvLFU.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | from Cython.Build import cythonize 3 | from distutils.extension import Extension 4 | from Cython.Distutils import build_ext 5 | # extensions = [ 6 | # Extension('EvLFU', ['EvLFU.pyx', 'evlfu_v2.cpp'], 7 | # extra_compile_args=['-std=c++11'], 8 | # language='c++' 9 | # ), 10 | # ] 11 | 12 | # setup( 13 | # ext_modules=cythonize(extensions), 14 | # # extra_compile_args=["-w", '-g'], 15 | # # extra_compile_args=["-O3"], 16 | # ) 17 | 18 | ext_modules = [Extension("EvLFU", ["EvLFU.pyx", "evlfu.cpp"], language='c++',)] 19 | 20 | setup(cmdclass = {'build_ext': build_ext}, ext_modules = ext_modules) -------------------------------------------------------------------------------- /cache_algo/EvLFU_C1_Cython/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import sys 3 | import numpy as np 4 | import pandas as pd 5 | import time 6 | import random 7 | import EvLFU 8 | 9 | 10 | workload_dir = "/home/cc/workload/Archive-new-1.0M/" 11 | workload_files = [] 12 | for i in range(1, 27): 13 | workload_files.append("workload-group-" + str(i) + ".csv") 14 | arrRawWorkload = [] 15 | # read all workloads: 16 | for workload_file in workload_files: 17 | workload = np.asarray(pd.read_csv(workload_dir + workload_file).values[:, 0]) 18 | arrRawWorkload.append(workload) 19 | 20 | arrRawWorkload = np.asarray(arrRawWorkload) 21 | # print(arrRawWorkload.shape) 22 | # merge the workloads 23 | arrMergedWorkload = np.stack(arrRawWorkload, axis=1) 24 | groupedWorkloadKeys = arrMergedWorkload 25 | print(arrMergedWorkload.shape) 26 | print("Done merging ALL workloads: total = ", arrRawWorkload.shape[0], 'rows') 27 | 28 | # Run the alg: 29 | EvLFU.cinit(768) 30 | prefectHit = 0 31 | 32 | groupedWorkloadIds = [] 33 | 34 | for groupKeys in groupedWorkloadKeys: 35 | groupKeys = groupKeys.tolist() 36 | for i in range(26): 37 | groupKeys[i] = int(groupKeys[i].split('-')[1]) 38 | 39 | groupedWorkloadIds.append(groupKeys) 40 | 41 | EvLFU.cload_ev_tables() 42 | start_time = time.time() 43 | 44 | for group_row_ids in groupedWorkloadIds: 45 | # print(type(groupKeys)) 46 | # print(type(groupKeys[0])) 47 | aggHitMissRecord, x = EvLFU.crequest(group_row_ids, False) 48 | # print(aggHitMissRecord) 49 | # print(x) 50 | # exit(0) 51 | flag = True 52 | for isHit in aggHitMissRecord: 53 | if not isHit: 54 | flag = False 55 | break 56 | if flag: 57 | prefectHit += 1 58 | print("perfect hit:", prefectHit) 59 | print(time.time() - start_time) 60 | EvLFU.cclose_ev_tables() -------------------------------------------------------------------------------- /cache_algo/LFU.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | cap = -1 7 | least_freq = 1 8 | # to store the frequency of the keys 9 | node_for_freq = [] 10 | # to search the key within all the cached keys 11 | node_for_key = dict() 12 | 13 | def init(capacity): 14 | global cap 15 | cap = capacity 16 | node_for_freq.append(0)# For the frequency == 0 : USELESS 17 | node_for_freq.append([]) # For the frequency == 1 18 | 19 | def _update( key, value, freq): 20 | # increment the frequency 21 | global node_for_key, node_for_freq, least_freq 22 | # remove the key from the old frequency 23 | node_for_freq[freq].remove(key) 24 | 25 | if len(node_for_freq[least_freq]) == 0: 26 | # update the least_freq if there is no more item in this frequency list 27 | # node_for_freq.pop(least_freq) # remove this empty freq list, to save memory 28 | least_freq += 1 29 | 30 | # update frequency 31 | node_for_key[key][1] = freq + 1 32 | # use a +1 because the idx 0 is not used 33 | if ((freq + 1) == len(node_for_freq)): 34 | node_for_freq.append([]) 35 | node_for_freq[freq + 1].append(key) 36 | 37 | def set( key, value): 38 | global node_for_key, node_for_freq, cap, least_freq 39 | 40 | # check if full 41 | if (len(node_for_key) >= cap): 42 | # evict 1 item 43 | key_to_remove = node_for_freq[least_freq].pop(0) 44 | node_for_key.pop(key_to_remove) 45 | 46 | # Insert the new item 47 | node_for_key[key] = [value, 1] 48 | node_for_freq[1].append(key) 49 | 50 | # update least freq 51 | least_freq = 1 52 | 53 | def request(key, table_id, row_id): 54 | global node_for_key, node_for_freq 55 | # check if the key is cached 56 | if key in node_for_key: 57 | # Yes, get the value 58 | value, freq = node_for_key[key] 59 | # Update item's frequency 60 | _update(key, value, freq) 61 | return value, True 62 | else: 63 | # MISS: get value from secondary storage 64 | value = storage_manager.get_val_from_storage(table_id, row_id) 65 | set(key, value) 66 | return value, False 67 | 68 | # Multi keys request 69 | def request_to_lfu( group_row_ids, use_gpu = False): 70 | arr_record_hit = [] 71 | arr_emb_weights = [] 72 | agg_hit = 0 73 | if (use_gpu): 74 | # This code assume that we only run this on a single GPU node 75 | device = torch.device("cuda:0") 76 | 77 | for i, row_id in enumerate(group_row_ids): 78 | # Table_id is started at 1 79 | # Key for row3 of table1 is 1-3 80 | key = str(i+1) + "-" + str(row_id) 81 | val, is_hit = request(key, i+1, row_id) 82 | # convert list of embedding values to tensor 83 | ev_tensor = torch.FloatTensor([val]) # val is a python list 84 | ev_tensor.requires_grad = True 85 | if (use_gpu): 86 | # This code assume that we only run this on a single GPU node 87 | ev_tensor = ev_tensor.to(device) 88 | arr_emb_weights.append(ev_tensor) 89 | if is_hit: 90 | arr_record_hit.append(True) 91 | agg_hit += 1 92 | else: 93 | arr_record_hit.append(False) 94 | 95 | return arr_record_hit, arr_emb_weights 96 | 97 | -------------------------------------------------------------------------------- /cache_algo/LRU.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | 7 | cap = -1 8 | LRUCache = collections.OrderedDict()# each item is a dictionary embedding value 9 | 10 | def init(capacity): 11 | global cap 12 | cap = capacity 13 | 14 | # Inserting the NEW key 15 | def set(key, value): 16 | global LRUCache, cap 17 | if (len(LRUCache) >= cap): 18 | # evicting the first key in LRU list 19 | evict_key, evict_val = LRUCache.popitem(last=False) 20 | # inserting new key 21 | LRUCache[key] = value 22 | 23 | # single key request 24 | def request(key, table_id, row_id): 25 | global LRUCache 26 | if (key in LRUCache): 27 | value = LRUCache[key] 28 | # Update position of the hit item to first. Optional. 29 | LRUCache.move_to_end(key, last=True) 30 | return value, True 31 | else: 32 | # MISS: get value from secondary storage 33 | value = storage_manager.get_val_from_storage(table_id, row_id) 34 | set(key, value) 35 | return value, False 36 | 37 | # Multi keys request 38 | def request_to_lru( group_row_ids, use_gpu = False): 39 | arr_record_hit = [] 40 | arr_emb_weights = [] 41 | agg_hit = 0 42 | if (use_gpu): 43 | # This code assume that we only run this on a single GPU node 44 | device = torch.device("cuda:0") 45 | 46 | for i, row_id in enumerate(group_row_ids): 47 | # Table_id is started at 1 48 | # Key for row3 of table1 is 1-3 49 | key = str(i+1) + "-" + str(row_id) 50 | val, is_hit = request(key, i+1, row_id) 51 | # convert list of embedding values to tensor 52 | ev_tensor = torch.FloatTensor([val]) # val is a python list 53 | ev_tensor.requires_grad = True 54 | if (use_gpu): 55 | # This code assume that we only run this on a single GPU node 56 | ev_tensor = ev_tensor.to(device) 57 | arr_emb_weights.append(ev_tensor) 58 | if is_hit: 59 | arr_record_hit.append(True) 60 | agg_hit += 1 61 | else: 62 | arr_record_hit.append(False) 63 | 64 | return arr_record_hit, arr_emb_weights 65 | 66 | -------------------------------------------------------------------------------- /cache_algo/old_versions/EvLFU_C1_v0.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | 7 | ###########################################EvLFU########################################## 8 | cap_C1 = 500 9 | min_C1 = 0 10 | vals_C1 = dict() 11 | counts_C1 = dict() 12 | lists_C1 = dict() 13 | lists_C1[0] = [] 14 | # flushing part: 15 | nPerfectItem_C1 = 0 16 | flushRate_C1 = 0.4 17 | perfectItemCapacity_C1 = 1.0 18 | 19 | def init(): 20 | pass 21 | 22 | def set(key, value, aggHit): 23 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1 24 | if cap_C1 <= 0: 25 | return 26 | if vals_C1.get(key) is not None: 27 | vals_C1[key] = value 28 | get_val_from_mem(key, aggHit) 29 | return 30 | 31 | # Flushing: 32 | if nPerfectItem_C1 >= int(cap_C1 * perfectItemCapacity_C1): 33 | # print("flushing!") 34 | for i in range(0, int(flushRate_C1 * cap_C1) + 1): 35 | evictKey = lists_C1.get(26)[0] 36 | lists_C1.get(26).remove(evictKey) 37 | vals_C1.pop(evictKey) 38 | counts_C1.pop(evictKey) 39 | 40 | nPerfectItem_C1 = len(lists_C1.get(26)) 41 | if len(vals_C1) < cap_C1: 42 | min_C1 = aggHit 43 | 44 | # key allows to insert in the cache: 45 | if len(vals_C1) >= cap_C1: 46 | evictKey = lists_C1.get(min_C1)[0] # TODO: Use pop!! 47 | # print("lists_C1.get(min_C1 = " + str(min_C1) + ") = " + str(lists_C1.get(min_C1))) 48 | lists_C1.get(min_C1).remove(evictKey) 49 | try: 50 | vals_C1.pop(evictKey) 51 | except: 52 | print("KeyError when vals_C1.pop key =" + str(evictKey)) 53 | print(vals_C1.keys()) 54 | print("cap_C1 " + str(cap_C1)) 55 | print(lists_C1.get(min_C1)) 56 | exit(-1) 57 | try: 58 | counts_C1.pop(evictKey) 59 | except: 60 | print("KeyError when counts_C1.pop key =" + str(evictKey)) 61 | exit(-1) 62 | 63 | # If the key is new, insert the value: 64 | vals_C1[key] = value 65 | counts_C1[key] = aggHit 66 | 67 | if lists_C1.get(aggHit) is None: 68 | lists_C1[aggHit] = [] 69 | lists_C1 = dict(sorted(lists_C1.items())) 70 | # if (key in lists_C1[aggHit]): 71 | # print("aggHit = " + str(aggHit)) 72 | # print(lists_C1[aggHit]) 73 | # print("ERROR 1: key already in lists, no need to append " + key) 74 | # exit(-1) 75 | lists_C1.get(aggHit).append(key) # ======== 76 | 77 | # Update minimum agghit 78 | if aggHit < min_C1: 79 | min_C1 = aggHit 80 | while (lists_C1.get(min_C1) is None) or len(lists_C1.get(min_C1)) == 0: 81 | min_C1 += 1 82 | 83 | def get_val_from_mem(key, aggHit): # Get From Mem 84 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1 85 | if vals_C1.get(key) is None: 86 | return None 87 | count = counts_C1.get(key) 88 | newCount = count 89 | if count < aggHit: 90 | newCount = aggHit 91 | counts_C1[key] = newCount 92 | lists_C1.get(count).remove(key) 93 | 94 | if count == min_C1: 95 | while (lists_C1.get(min_C1) is None) or len(lists_C1.get(min_C1)) == 0: 96 | min_C1 += 1 97 | if lists_C1.get(newCount) is None: 98 | lists_C1[newCount] = [] 99 | lists_C1 = dict(sorted(lists_C1.items())) 100 | # if (key in lists_C1[newCount]): 101 | # print("newCount = " + str(newCount)) 102 | # print(lists_C1[newCount]) 103 | # print("ERROR 3: key already in lists, no need to append " + key) 104 | # exit(-1) 105 | lists_C1.get(newCount).append(key) # ======== 106 | return vals_C1[key] 107 | 108 | def update(key, tableId, rowId, aggHit, nGroup): 109 | # Get value from EV-LFU cache 110 | val = get_val_from_mem(key, aggHit) 111 | if val is None: 112 | # On MISS 113 | # Get value from secondary storage 114 | # ADDING IF CONDITION HERE IS SLOW! 115 | # if (storage_manager.storage_type == storage_manager.EmbStorage.DUMMY): 116 | # Dummy storage will always use tableid + rowId because the data are stored in 26 tables 117 | val = storage_manager.get_val_from_storage(tableId, rowId) 118 | # else : 119 | # faster for rocksdb 120 | # val = storage_manager.get_val_from_storage_by_key(key) #only on rocksdb 121 | set(key, val, aggHit) 122 | return val 123 | 124 | def request_to_ev_lfu( group_rowIds, use_gpu = False): 125 | recordHitOrMiss = [] 126 | group_keys = [] 127 | missing_keys = [] 128 | emb_weights = [] 129 | aggHit = 0 130 | global cap_C1, min_C1, vals_C1, counts_C1, lists_C1, nPerfectItem_C1, flushRate_C1, perfectItemCapacity_C1 131 | for i, rowId in enumerate(group_rowIds): 132 | # TableId is started at 1 133 | # Key for row3 of table1 is 1-3 134 | key = str(i+1) + "-" + str(rowId) 135 | group_keys.append(key) 136 | if vals_C1.get(key) is not None: 137 | recordHitOrMiss.append(True) 138 | aggHit += 1 139 | else: 140 | missing_keys.append(key) 141 | recordHitOrMiss.append(False) 142 | 143 | # TODO: Get the missing keys from storage 144 | # missing_keys 145 | 146 | if (use_gpu): 147 | # This code assume that we only run this on a single GPU node 148 | device = torch.device("cuda:0") 149 | 150 | for i, rowId in enumerate(group_rowIds): 151 | # The tableId is started at 1 instead of 0 152 | val = update(group_keys[i], i + 1, rowId, aggHit, len(recordHitOrMiss)) # the data could either come from EV-LFU or MemStor or PyrocksDB 153 | # convert list of embedding values to tensor 154 | ev_tensor = torch.FloatTensor([val]) # val is a python list 155 | ev_tensor.requires_grad = True 156 | if (use_gpu): 157 | ev_tensor = ev_tensor.to(device) 158 | emb_weights.append(ev_tensor) 159 | 160 | if lists_C1.get(26) and not len(lists_C1.get(26)) == 0: 161 | nPerfectItem_C1 = len(lists_C1.get(26)) 162 | 163 | return recordHitOrMiss, emb_weights 164 | -------------------------------------------------------------------------------- /cache_algo/old_versions/EvLFU_C1_v1.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import sys 3 | sys.path.append('emb_storage') 4 | import storage_manager 5 | import random 6 | 7 | ###########################################EvLFU########################################## 8 | cap_C1 = -1 9 | min_C1 = 0 10 | vals_C1 = dict() # each value is [embedding value, agg_hit] 11 | lists_C1 = dict() # will group the keys based on the agg_hit (count) 12 | # flushing part: 13 | n_perfect_item_C1 = 0 14 | flush_rate_C1 = 0.4 15 | perfect_item_cap_C1 = 1.0 16 | max_perfect_item_C1 = 0 17 | 18 | def init(capacity): 19 | global lists_C1, max_perfect_item_C1, perfect_item_cap_C1, cap_C1 20 | cap_C1 = capacity 21 | # initializing the dict 22 | i = 0 23 | while (i <= 26): 24 | lists_C1[i] = [] 25 | i += 1 26 | max_perfect_item_C1 = int(cap_C1 * perfect_item_cap_C1) 27 | 28 | # Inserting the NEW key 29 | def set(key, value, agg_hit): 30 | global cap_C1, min_C1, vals_C1, lists_C1, n_perfect_item_C1, max_perfect_item_C1, flush_rate_C1 31 | 32 | # Flushing: 33 | if n_perfect_item_C1 >= max_perfect_item_C1: 34 | print("flushing!") 35 | print("n_perfect_item_C1 = " + str(n_perfect_item_C1)) 36 | print("max_perfect_item_C1 = " + str(max_perfect_item_C1)) 37 | for i in range(0, int(flush_rate_C1 * cap_C1) + 1): 38 | key_to_evict = lists_C1[26].pop(0) 39 | vals_C1.pop(key_to_evict) 40 | # adjust the n_perfect_item counter 41 | n_perfect_item_C1 = len(lists_C1[26]) 42 | else: 43 | # cache is full 44 | if len(vals_C1) >= cap_C1: 45 | # make a space for the new key 46 | while(lists_C1[min_C1] == []): 47 | # find the right key to pop 48 | # Update minimum agg_hit 49 | min_C1 += 1 50 | if (min_C1 > 26): 51 | min_C1 = 1 52 | key_to_evict = lists_C1[min_C1].pop(0) 53 | vals_C1.pop(key_to_evict) 54 | 55 | # insert the new value: 56 | vals_C1[key] = [value, agg_hit] 57 | lists_C1[agg_hit].append(key) # ======== 58 | 59 | if agg_hit < min_C1: 60 | min_C1 = agg_hit 61 | 62 | def update_agg_hit(key, agg_hit): # Get From Mem 63 | global vals_C1, lists_C1 64 | ev_vals = vals_C1.get(key) 65 | if ev_vals is None: 66 | return None 67 | # old_agg_hit = ev_vals[1] 68 | if ev_vals[1] < agg_hit: 69 | # update the old agg_hit 70 | lists_C1[ev_vals[1]].remove(key) 71 | lists_C1[agg_hit].append(key) # ======== 72 | vals_C1[key][1] = agg_hit 73 | # Increase the min_freq if the current lists freq is [] 74 | # Nope, the new aggHit can jump, No need to do anything 75 | return ev_vals[0] 76 | 77 | # Updating the existing keys and inserting the missing keys 78 | def update(key, table_id, row_id, agg_hit, missing_value = None): 79 | # TODO: This can be done in multi threaded way (on Java and C++) 80 | # Get value from EV-LFU cache 81 | val = update_agg_hit(key, agg_hit) 82 | if val: 83 | return val 84 | else: 85 | # On MISS: Get value from secondary storage 86 | # DON't put "IF CONDITION" HERE! IT IS SLOW! 87 | if missing_value is None: 88 | # this key might be kicked out while inserting previous key 89 | missing_value = storage_manager.get_val_from_storage(table_id, row_id) 90 | # missing_value = storage_manager.get_val_from_storage_by_key(key) #only on rocksdb 91 | set(key, missing_value, agg_hit) 92 | return missing_value 93 | 94 | def request_to_ev_lfu( group_row_ids, use_gpu = False, approx_emb_thres = -1, ev_dim = 36): 95 | arr_record_hit = [] 96 | arr_group_keys = [] 97 | arr_missing_keys = [] 98 | arr_missing_values = [] 99 | arr_emb_weights = [] 100 | pick_random_ev = False 101 | agg_hit = 0 102 | global vals_C1, lists_C1, n_perfect_item_C1 103 | for i, row_id in enumerate(group_row_ids): 104 | # Table_id is started at 1 105 | # Key for row3 of table1 is 1-3 106 | key = str(i+1) + "-" + str(row_id) 107 | arr_group_keys.append(key) 108 | if key in vals_C1.keys(): 109 | arr_record_hit.append(True) 110 | agg_hit += 1 111 | else: 112 | arr_missing_keys.append([i+1, row_id]) 113 | arr_record_hit.append(False) 114 | 115 | # Get all missing keys from storage at once 116 | arr_missing_values = storage_manager.get_arr_val_from_storage(arr_missing_keys) 117 | 118 | if (use_gpu): 119 | # This code assume that we only run this on a single GPU node 120 | device = torch.device("cuda:0") 121 | 122 | # Update 123 | for i, row_id in enumerate(group_row_ids): 124 | # TODO: C++ and java code should do this in multithreaded way 125 | # The table_id is started at 1 instead of 0 126 | if (arr_record_hit[i]): 127 | val = update(arr_group_keys[i], i + 1, row_id, agg_hit) 128 | else: 129 | # plug the missing values that we get from secondary storage 130 | val = update(arr_group_keys[i], i + 1, row_id, agg_hit, arr_missing_values.pop(0)) 131 | # convert list of embedding values to tensor 132 | ev_tensor = torch.FloatTensor([val]) # val is a python list 133 | ev_tensor.requires_grad = True 134 | if (use_gpu): 135 | ev_tensor = ev_tensor.to(device) 136 | arr_emb_weights.append(ev_tensor) 137 | 138 | if agg_hit == 26: 139 | # update the number of perfect item 140 | n_perfect_item_C1 = len(lists_C1[26]) 141 | return arr_record_hit, arr_emb_weights 142 | -------------------------------------------------------------------------------- /cache_algo/old_versions/LFU_v0.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | cap = -1 7 | least_freq = 1 8 | node_for_freq = collections.defaultdict(collections.OrderedDict) 9 | node_for_key = dict() 10 | 11 | def init(capacity): 12 | global cap 13 | cap = capacity 14 | 15 | def _update( key, value): 16 | global node_for_key, node_for_freq, least_freq 17 | _, freq = node_for_key[key] 18 | node_for_freq[freq].pop(key) 19 | if len(node_for_freq[least_freq]) == 0: 20 | least_freq += 1 21 | node_for_freq[freq+1][key] = (value, freq+1) 22 | node_for_key[key] = (value, freq+1) 23 | 24 | def set( key, value): 25 | global node_for_key, node_for_freq, cap, least_freq 26 | if (len(node_for_key) >= cap): 27 | # evict 1 item 28 | removed = node_for_freq[least_freq].popitem(last=False) 29 | node_for_key.pop(removed[0]) 30 | # Insert the new item 31 | node_for_key[key] = (value,1) 32 | node_for_freq[1][key] = (value,1) 33 | 34 | def request(key, table_id, row_id): 35 | global node_for_key, node_for_freq 36 | if key in node_for_key: 37 | value = node_for_key[key][0] 38 | # Update item's frequency 39 | _update(key, value) 40 | return value, True 41 | else: 42 | # MISS: get value from secondary storage 43 | value = storage_manager.get_val_from_storage(table_id, row_id) 44 | set(key, value) 45 | return value, False 46 | 47 | # Multi keys request 48 | def request_to_lfu( group_row_ids, use_gpu = False): 49 | arr_record_hit = [] 50 | arr_emb_weights = [] 51 | agg_hit = 0 52 | if (use_gpu): 53 | # This code assume that we only run this on a single GPU node 54 | device = torch.device("cuda:0") 55 | 56 | for i, row_id in enumerate(group_row_ids): 57 | # Table_id is started at 1 58 | # Key for row3 of table1 is 1-3 59 | key = str(i+1) + "-" + str(row_id) 60 | val, is_hit = request(key, i+1, row_id) 61 | # convert list of embedding values to tensor 62 | ev_tensor = torch.FloatTensor([val]) # val is a python list 63 | ev_tensor.requires_grad = True 64 | if (use_gpu): 65 | # This code assume that we only run this on a single GPU node 66 | ev_tensor = ev_tensor.to(device) 67 | arr_emb_weights.append(ev_tensor) 68 | if is_hit: 69 | arr_record_hit.append(True) 70 | agg_hit += 1 71 | else: 72 | arr_record_hit.append(False) 73 | 74 | return arr_record_hit, arr_emb_weights 75 | -------------------------------------------------------------------------------- /cache_algo/old_versions/LFU_v1.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | cap = -1 7 | least_freq = 1 8 | # to store the frequency of the keys 9 | node_for_freq = dict() 10 | # to search the key within all the cached keys 11 | node_for_key = dict() 12 | 13 | def init(capacity): 14 | global cap 15 | cap = capacity 16 | node_for_freq[1] = collections.OrderedDict() # For the frequency == 1 17 | 18 | def _update( key, value, freq): 19 | # increment the frequency 20 | global node_for_key, node_for_freq, least_freq 21 | # remove the key from the old frequency 22 | node_for_freq[freq].pop(key) 23 | 24 | if len(node_for_freq[least_freq]) == 0: 25 | # update the least_freq if there is no more item in this frequency list 26 | least_freq += 1 27 | node_for_freq.pop(least_freq) 28 | 29 | # update frequency 30 | node_for_key[key][1] = freq + 1 31 | if ((freq + 1) not in node_for_freq.keys()): 32 | node_for_freq[freq + 1] = collections.OrderedDict() 33 | node_for_freq[freq + 1][key] = "" 34 | 35 | def set( key, value): 36 | global node_for_key, node_for_freq, cap, least_freq 37 | 38 | # check if full 39 | if (len(node_for_key) >= cap): 40 | # evict 1 item 41 | key_to_remove = node_for_freq[least_freq].popitem(last=False) 42 | node_for_key.pop(key_to_remove) 43 | 44 | # Insert the new item 45 | node_for_key[key] = [value, 1] 46 | node_for_freq[1][key] = "" 47 | 48 | # update least freq 49 | least_freq = 1 50 | 51 | def request(key, table_id, row_id): 52 | global node_for_key, node_for_freq 53 | # check if the key is cached 54 | if key in node_for_key: 55 | # Yes, get the value 56 | value, freq = node_for_key[key] 57 | # Update item's frequency 58 | _update(key, value, freq) 59 | return value, True 60 | else: 61 | # MISS: get value from secondary storage 62 | value = storage_manager.get_val_from_storage(table_id, row_id) 63 | set(key, value) 64 | return value, False 65 | 66 | # Multi keys request 67 | def request_to_lfu( group_row_ids, use_gpu = False): 68 | arr_record_hit = [] 69 | arr_emb_weights = [] 70 | agg_hit = 0 71 | if (use_gpu): 72 | # This code assume that we only run this on a single GPU node 73 | device = torch.device("cuda:0") 74 | 75 | for i, row_id in enumerate(group_row_ids): 76 | # Table_id is started at 1 77 | # Key for row3 of table1 is 1-3 78 | key = str(i+1) + "-" + str(row_id) 79 | val, is_hit = request(key, i+1, row_id) 80 | # convert list of embedding values to tensor 81 | ev_tensor = torch.FloatTensor([val]) # val is a python list 82 | ev_tensor.requires_grad = True 83 | if (use_gpu): 84 | # This code assume that we only run this on a single GPU node 85 | ev_tensor = ev_tensor.to(device) 86 | arr_emb_weights.append(ev_tensor) 87 | if is_hit: 88 | arr_record_hit.append(True) 89 | agg_hit += 1 90 | else: 91 | arr_record_hit.append(False) 92 | 93 | return arr_record_hit, arr_emb_weights 94 | 95 | -------------------------------------------------------------------------------- /cache_algo/old_versions/LFU_v2.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | cap = -1 7 | least_freq = 1 8 | # to store the frequency of the keys 9 | node_for_freq = dict() 10 | # to search the key within all the cached keys 11 | node_for_key = dict() 12 | 13 | def init(capacity): 14 | global cap 15 | cap = capacity 16 | node_for_freq[1] = [] # For the frequency == 1 17 | 18 | def _update( key, value, freq): 19 | # increment the frequency 20 | global node_for_key, node_for_freq, least_freq 21 | # remove the key from the old frequency 22 | node_for_freq[freq].remove(key) 23 | 24 | if len(node_for_freq[least_freq]) == 0: 25 | # update the least_freq if there is no more item in this frequency list 26 | node_for_freq.pop(least_freq) # remove this empty freq list, to save memory 27 | least_freq += 1 28 | 29 | # update frequency 30 | node_for_key[key][1] = freq + 1 31 | if ((freq + 1) not in node_for_freq.keys()): 32 | node_for_freq[freq + 1] = [] 33 | node_for_freq[freq + 1].append(key) 34 | 35 | def set( key, value): 36 | global node_for_key, node_for_freq, cap, least_freq 37 | 38 | # check if full 39 | if (len(node_for_key) >= cap): 40 | # evict 1 item 41 | key_to_remove = node_for_freq[least_freq].pop(0) 42 | node_for_key.pop(key_to_remove) 43 | 44 | # Insert the new item 45 | node_for_key[key] = [value, 1] 46 | node_for_freq[1].append(key) 47 | 48 | # update least freq 49 | least_freq = 1 50 | 51 | def request(key, table_id, row_id): 52 | global node_for_key, node_for_freq 53 | # check if the key is cached 54 | if key in node_for_key: 55 | # Yes, get the value 56 | value, freq = node_for_key[key] 57 | # Update item's frequency 58 | _update(key, value, freq) 59 | return value, True 60 | else: 61 | # MISS: get value from secondary storage 62 | value = storage_manager.get_val_from_storage(table_id, row_id) 63 | set(key, value) 64 | return value, False 65 | 66 | # Multi keys request 67 | def request_to_lfu( group_row_ids, use_gpu = False): 68 | arr_record_hit = [] 69 | arr_emb_weights = [] 70 | agg_hit = 0 71 | if (use_gpu): 72 | # This code assume that we only run this on a single GPU node 73 | device = torch.device("cuda:0") 74 | 75 | for i, row_id in enumerate(group_row_ids): 76 | # Table_id is started at 1 77 | # Key for row3 of table1 is 1-3 78 | key = str(i+1) + "-" + str(row_id) 79 | val, is_hit = request(key, i+1, row_id) 80 | # convert list of embedding values to tensor 81 | ev_tensor = torch.FloatTensor([val]) # val is a python list 82 | ev_tensor.requires_grad = True 83 | if (use_gpu): 84 | # This code assume that we only run this on a single GPU node 85 | ev_tensor = ev_tensor.to(device) 86 | arr_emb_weights.append(ev_tensor) 87 | if is_hit: 88 | arr_record_hit.append(True) 89 | agg_hit += 1 90 | else: 91 | arr_record_hit.append(False) 92 | 93 | return arr_record_hit, arr_emb_weights 94 | 95 | -------------------------------------------------------------------------------- /cache_algo/old_versions/LRU_v0.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import torch 3 | import sys 4 | sys.path.append('emb_storage') 5 | import storage_manager 6 | from functools import lru_cache 7 | 8 | cap = -1 9 | vals = dict() # each item is a dictionary embedding value 10 | lru_list = [] # store the order of the keys in LRU manner 11 | 12 | def init(capacity): 13 | global cap 14 | cap = capacity 15 | 16 | # Inserting the NEW key 17 | def set(key, value): 18 | global vals, lru_list, cap 19 | 20 | if (len(lru_list) >= cap): 21 | # evicting the first key in LRU list 22 | vals.pop(lru_list.pop(0)) 23 | 24 | # inserting new key 25 | vals[key] = value 26 | lru_list.append(key) 27 | 28 | # single key request 29 | def request(key, table_id, row_id): 30 | global vals, lru_list 31 | is_hit = False 32 | value = vals.get(key) 33 | if (value == None): 34 | # MISS: get value from secondary storage 35 | value = storage_manager.get_val_from_storage(table_id, row_id) 36 | set(key, value) 37 | else: 38 | # update the position; put it in the back of the Q 39 | lru_list.remove(key) 40 | lru_list.append(key) 41 | is_hit = True 42 | return value, is_hit 43 | 44 | @lru_cache(maxsize=5000) 45 | def request_memoization(key, table_id, row_id): 46 | # MISS: get value from secondary storage 47 | value = storage_manager.get_val_from_storage(table_id, row_id) 48 | set(key, value) 49 | return value, False 50 | 51 | # Multi keys request 52 | def request_to_lru( group_row_ids, use_gpu = False): 53 | arr_record_hit = [] 54 | arr_emb_weights = [] 55 | agg_hit = 0 56 | if (use_gpu): 57 | # This code assume that we only run this on a single GPU node 58 | device = torch.device("cuda:0") 59 | 60 | for i, row_id in enumerate(group_row_ids): 61 | # Table_id is started at 1 62 | # Key for row3 of table1 is 1-3 63 | key = str(i+1) + "-" + str(row_id) 64 | # val, is_hit = request_memoization(key, i+1, row_id) 65 | val, is_hit = request(key, i+1, row_id) 66 | # convert list of embedding values to tensor 67 | ev_tensor = torch.FloatTensor([val]) # val is a python list 68 | ev_tensor.requires_grad = True 69 | if (use_gpu): 70 | # This code assume that we only run this on a single GPU node 71 | ev_tensor = ev_tensor.to(device) 72 | arr_emb_weights.append(ev_tensor) 73 | if is_hit: 74 | arr_record_hit.append(True) 75 | agg_hit += 1 76 | else: 77 | arr_record_hit.append(False) 78 | 79 | return arr_record_hit, arr_emb_weights 80 | 81 | -------------------------------------------------------------------------------- /cython/cython_compile.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # 3 | # This source code is licensed under the MIT license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | # Description: compile .so from python code 7 | 8 | from __future__ import absolute_import, division, print_function, unicode_literals 9 | 10 | from setuptools import setup 11 | from Cython.Build import cythonize 12 | from distutils.extension import Extension 13 | 14 | ext_modules = [ 15 | Extension( 16 | "data_utils_cython", 17 | ["data_utils_cython.pyx"], 18 | extra_compile_args=['-O3'], 19 | extra_link_args=['-O3'], 20 | ) 21 | ] 22 | 23 | setup( 24 | name='data_utils_cython', 25 | ext_modules=cythonize(ext_modules) 26 | ) 27 | -------------------------------------------------------------------------------- /cython/cython_criteo.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. 2 | # 3 | # This source code is licensed under the MIT license found in the 4 | # LICENSE file in the root directory of this source tree. 5 | # 6 | # Description: run dataset pre-processing in standalone mode 7 | # WARNING: These steps are required to work with Cython 8 | # 1. Instal Cython 9 | # > sudo yum install Cython 10 | # 2. Please copy data_utils.py into data_utils_cython.pyx 11 | # 3. Compile the data_utils_cython.pyx to generate .so 12 | # (it's important to keep extension .pyx rather than .py 13 | # to ensure the C/C++ .so no .py is loaded at import time) 14 | # > python cython_compile.py build_ext --inplace 15 | # This should create data_utils_cython.so, which can be loaded below with "import" 16 | # 4. Run standalone datatset preprocessing to generate .npz files 17 | # a. Kaggle 18 | # > python cython_criteo.py --data-set=kaggle --raw-data-file=./input/train.txt 19 | # --processed-data-file=./input/kaggleAdDisplayChallenge_processed.npz 20 | # b. Terabyte 21 | # > python cython_criteo.py --max-ind-range=10000000 [--memory-map] --data-set=terabyte 22 | # --raw-data-file=./input/day --processed-data-file=./input/terabyte_processed.npz 23 | 24 | from __future__ import absolute_import, division, print_function, unicode_literals 25 | 26 | import data_utils_cython as duc 27 | 28 | if __name__ == "__main__": 29 | ### import packages ### 30 | import argparse 31 | 32 | ### parse arguments ### 33 | parser = argparse.ArgumentParser( 34 | description="Preprocess Criteo dataset" 35 | ) 36 | # model related parameters 37 | parser.add_argument("--max-ind-range", type=int, default=-1) 38 | parser.add_argument("--data-sub-sample-rate", type=float, default=0.0) # in [0, 1] 39 | parser.add_argument("--data-randomize", type=str, default="total") # or day or none 40 | parser.add_argument("--memory-map", action="store_true", default=False) 41 | parser.add_argument("--data-set", type=str, default="kaggle") # or terabyte 42 | parser.add_argument("--raw-data-file", type=str, default="") 43 | parser.add_argument("--processed-data-file", type=str, default="") 44 | args = parser.parse_args() 45 | 46 | duc.loadDataset( 47 | args.data_set, 48 | args.max_ind_range, 49 | args.data_sub_sample_rate, 50 | args.data_randomize, 51 | "train", 52 | args.raw_data_file, 53 | args.processed_data_file, 54 | args.memory_map 55 | ) 56 | -------------------------------------------------------------------------------- /emb_storage/file_read.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import struct 4 | 5 | BINARY_DIR_NAME = "binary/" 6 | arr_files = [] 7 | TOTAL_BYTE_PER_ROW = -1 8 | EV_DIMENSION = 36 9 | 10 | # Load value as bytes!! 11 | def open_files_as_binary(ev_path_c1, bit_precision = 32): 12 | global arr_files, TOTAL_BYTE_PER_ROW 13 | BYTE_PRECISION = int(bit_precision/8) 14 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 15 | 16 | print("**************** Opening all Binary EV-files") 17 | print("**************** from = " + ev_path_c1) 18 | arr_files.append("ID Zero is not being used!") 19 | for ev_idx in range(0, 26): 20 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin" 21 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename) 22 | print("************* Opening Binnary EV = " + bin_ev_path) 23 | arr_files.append(open(bin_ev_path, 'rb')) 24 | print("**************** All Files are opened!") 25 | print("**************** TOTAL_BYTE_PER_ROW = " + str(TOTAL_BYTE_PER_ROW)) 26 | 27 | def get(tableId, rowId): 28 | # tableId started at id = 1 29 | file = arr_files[tableId] 30 | # print(TOTAL_BYTE_PER_ROW * rowId ) 31 | file.seek(TOTAL_BYTE_PER_ROW * rowId) 32 | blob = file.read(TOTAL_BYTE_PER_ROW) 33 | return struct.unpack('f'*36, blob) 34 | 35 | def close(): 36 | arr_files.pop(0) # this item0 is not really a file 37 | for file in arr_files: 38 | file.close() 39 | print("**************** All Files are closed!") 40 | -------------------------------------------------------------------------------- /emb_storage/mmap_file_read.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import struct 4 | import mmap 5 | 6 | BINARY_DIR_NAME = "binary/" 7 | arr_files = [] 8 | arr_mmap_files = [] 9 | TOTAL_BYTE_PER_ROW = -1 10 | EV_DIMENSION = 36 11 | 12 | # Load value as bytes!! 13 | def open_files_as_binary(ev_path_c1, bit_precision = 32): 14 | global arr_files, TOTAL_BYTE_PER_ROW 15 | BYTE_PRECISION = int(bit_precision/8) 16 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 17 | 18 | print("**************** Opening all Binary EV-files") 19 | print("**************** from = " + ev_path_c1) 20 | arr_files.append("ID Zero is not being used!") 21 | arr_mmap_files.append("ID Zero is not being used!") 22 | for ev_idx in range(0, 26): 23 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin" 24 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename) 25 | print("************* Opening Binnary EV = " + bin_ev_path) 26 | file = open(bin_ev_path, 'rb') 27 | arr_files.append(file) 28 | arr_mmap_files.append(mmap.mmap(file.fileno(), 0, prot=mmap.PROT_READ)) 29 | print("**************** All Files are opened!") 30 | print("**************** TOTAL_BYTE_PER_ROW = " + str(TOTAL_BYTE_PER_ROW)) 31 | 32 | def get(tableId, rowId): 33 | # tableId started at id = 1 34 | # file = arr_files[tableId] 35 | file = arr_mmap_files[tableId] 36 | # print(TOTAL_BYTE_PER_ROW * rowId ) 37 | file.seek(TOTAL_BYTE_PER_ROW * rowId) 38 | blob = file.read(TOTAL_BYTE_PER_ROW) 39 | # return struct.unpack('f'*36, blob[0:TOTAL_BYTE_PER_ROW]) 40 | return struct.unpack('f'*36, blob) 41 | 42 | def close(): 43 | arr_files.pop(0) # this item0 is not really a file 44 | for file in arr_files: 45 | file.close() 46 | print("**************** All Files are closed!") 47 | -------------------------------------------------------------------------------- /emb_storage/multi_storage_dummy/socket-server.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import socket 4 | import pandas as pd 5 | import os 6 | import torch 7 | import struct 8 | 9 | parser = argparse.ArgumentParser(description="EvLFU server") 10 | parser.add_argument("--port", type=int, default=8000) 11 | parser.add_argument("--ev-path", type=str, default="") 12 | args = parser.parse_args() 13 | 14 | # Call EvLFU service through socket 15 | HOST = '127.0.0.1' # Standard loopback interface address (localhost) 16 | PORT = args.port # 65432 # Port to listen on (non-privileged ports are > 1023) 17 | MAX_BUFFER = 1024 18 | BINARY_DIR_NAME = "binary/" 19 | TOTAL_EV_TABLE = 26 20 | EV_DIMENSION = 36 21 | 22 | # This is ROCKSDB client or dummyMemStor client 23 | 24 | EvTable_C1 = [] 25 | 26 | # Load value as bytes!! 27 | def load(ev_path_c1, bit_precision = 32): 28 | # We are still storing it as array of floats. TODO: Store it as binary! 29 | print("**************** Loading EV Table to DummyMemoryStorage") 30 | print("**************** Load new set of EV Table from = " + ev_path_c1) 31 | global EvTable_C1 32 | EvTable_C1.append("Buffer: table0 is not used") 33 | BYTE_PRECISION = int(bit_precision/8) 34 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 35 | 36 | for ev_idx in range(0, 26): 37 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin" 38 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename) 39 | print("************* Loading Binnary EV = " + bin_ev_path) 40 | 41 | curr_table = [] 42 | # put 43 | with open(bin_ev_path, 'rb') as f: 44 | data = f.read() 45 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 46 | for i in range(0, num_of_indexes): 47 | # put 48 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 49 | blob = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 50 | curr_table.append(blob) 51 | 52 | # Try reading the blob 53 | # print(struct.unpack('f'*36, blob[0:144])) 54 | f.close() 55 | EvTable_C1.append(curr_table) 56 | print("**************** All EvTable loaded in the Memory!") 57 | 58 | def load_as_list(ev_path_c1): 59 | # We are still storing it as array of floats. TODO: Store it as binary! 60 | print("**************** Loading EV Table to DummyMemoryStorage") 61 | print("**************** Load new set of EV Table from = " + ev_path_c1) 62 | global EvTable_C1 63 | EvTable_C1.append("Buffer: table0 is not used") 64 | for ev_idx in range(0, 26): 65 | # Read new EV Table from file 66 | ev_path = os.path.join(ev_path_c1, 67 | "ev-table-" + str(ev_idx + 1) + ".csv") 68 | print("********************* Loading EV = " + ev_path) 69 | new_ev_df = pd.read_csv(ev_path, dtype=float, delimiter=',') 70 | # Convert to numpy first before to tensor 71 | new_ev_arr = new_ev_df.to_numpy() 72 | # Convert to tensor 73 | # Option 1: Store it as numpy array (Slower for reading) 74 | # EvTable_C1[ev_idx + 1] = new_ev_arr 75 | # Option 2: Store it as pure python list 76 | EvTable_C1.append(new_ev_arr.tolist()) 77 | break 78 | print("**************** All EvTable loaded in the Memory!") 79 | 80 | def get(tableId, rowId): 81 | # tableId started at id = 1 82 | global EvTable_C1 83 | return EvTable_C1[tableId][rowId] 84 | 85 | def get_many(arrTableId, arrRowId): 86 | # tableId started at id = 1 87 | global EvTable_C1 88 | arrVal = [] 89 | for i in range(len(arrTableId)): 90 | arrVal.append(EvTable_C1[arrTableId[i]][arrRowId[i]]) 91 | # return array of values 92 | return arrVal 93 | 94 | def listen(): 95 | print("This server is ready to look up the ev-value based on the key!") 96 | print("Start listening at port: " + str(args.port)) 97 | with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: 98 | s.bind((HOST, PORT)) 99 | s.listen() 100 | conn, addr = s.accept() 101 | with conn: 102 | print('Connected to client at: ', addr) 103 | while True: 104 | buf = conn.recv(MAX_BUFFER) 105 | if buf: 106 | 107 | keys = str(buf, 'utf8').split('\n') 108 | # print("keys: " + str(keys)) 109 | for key in keys: 110 | tableId, rowId = key.split('-', 2) 111 | val = get(int(tableId), int(rowId)) 112 | conn.sendall(val) 113 | # print("Done sending the values of " + str(keys)) 114 | 115 | # tableId, rowId = str(buf, 'utf8').split('-', 2) 116 | # val = get(int(tableId), int(rowId)) 117 | # print(val) 118 | # print(struct.unpack('f'*36, val[0:144])) 119 | # conn.sendall(val) 120 | 121 | if __name__=="__main__": 122 | load(args.ev_path) 123 | listen() 124 | -------------------------------------------------------------------------------- /emb_storage/storage_dummy.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import os 3 | import torch 4 | import struct 5 | import sys 6 | sys.path.append('../../') 7 | 8 | import evstore_utils 9 | import storage_manager 10 | 11 | EvTable_C1 = [] 12 | MAX_BUFFER = 256 13 | BINARY_DIR_NAME = "binary/" 14 | TOTAL_EV_TABLE = 26 15 | EV_DIMENSION = 36 16 | 17 | 18 | def load(ev_path_c1): 19 | # return load_as_binary(ev_path_c1) 20 | return load_as_list(ev_path_c1) 21 | 22 | def get(tableId, rowId): 23 | # tableId started at id = 1 24 | # return get_as_binary(tableId, rowId) 25 | return get_as_list(tableId, rowId) 26 | 27 | def get_nrows_pertable(file_path): 28 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path) 29 | return ln_emb 30 | 31 | # Load value as bytes!! 32 | def load_as_binary(ev_path_c1, bit_precision = 32): 33 | print("**************** Loading EV Table to DummyMemoryStorage") 34 | print("**************** Load new set of EV Table from = " + ev_path_c1) 35 | global EvTable_C1 36 | EvTable_C1.append("Buffer: table0 is not used") 37 | BYTE_PRECISION = int(bit_precision/8) 38 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 39 | ln_emb = get_nrows_pertable(storage_manager.training_config_path) 40 | 41 | for ev_idx in range(0, 26): 42 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin" 43 | bin_ev_path = os.path.join(ev_path_c1, BINARY_DIR_NAME, binFilename) 44 | print("************* Loading Binnary EV = " + bin_ev_path) 45 | 46 | curr_table = [] 47 | # put 48 | with open(bin_ev_path, 'rb') as f: 49 | data = f.read() 50 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 51 | assert(ln_emb[ev_idx] == num_of_indexes) 52 | for i in range(0, num_of_indexes): 53 | # put 54 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 55 | blob = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 56 | curr_table.append(blob) 57 | 58 | # Try reading the blob 59 | # print(struct.unpack('f'*36, blob[0:144])) 60 | f.close() 61 | EvTable_C1.append(curr_table) 62 | print("**************** All EvTable loaded in the Memory!") 63 | 64 | def get_as_binary(tableId, rowId): 65 | # tableId started at id = 1 66 | global EvTable_C1 67 | blob = EvTable_C1[tableId][rowId] 68 | return struct.unpack('f'*36, blob) 69 | 70 | def load_as_list(ev_path_c1): 71 | print("**************** Loading EV Table to DummyMemoryStorage") 72 | print("**************** Load new set of EV Table from = " + ev_path_c1) 73 | global EvTable_C1 74 | EvTable_C1.append("Buffer: table0 is not used") 75 | for ev_idx in range(0, 26): 76 | # Read new EV Table from file 77 | ev_path = os.path.join(ev_path_c1, 78 | "ev-table-" + str(ev_idx + 1) + ".csv") 79 | print("********************* Loading EV = " + ev_path) 80 | new_ev_df = pd.read_csv(ev_path, dtype=float, delimiter=',') 81 | # Convert to numpy first before to tensor 82 | new_ev_arr = new_ev_df.to_numpy() 83 | # Convert to tensor 84 | # Option 1: Store it as numpy array (Slower for reading) 85 | # EvTable_C1[ev_idx + 1] = new_ev_arr 86 | # Option 2: Store it as pure python list 87 | EvTable_C1.append(new_ev_arr.tolist()) 88 | print("**************** All EvTable loaded in the Memory!") 89 | 90 | def get_as_list(tableId, rowId): 91 | # tableId started at id = 1 92 | global EvTable_C1 93 | # print(EvTable_C1[tableId][rowId]) 94 | # exit() 95 | return EvTable_C1[tableId][rowId] 96 | -------------------------------------------------------------------------------- /emb_storage/storage_rocksdb.py: -------------------------------------------------------------------------------- 1 | import pyrocksdb 2 | import time 3 | import os 4 | import pandas as pd 5 | import argparse 6 | import struct 7 | import numpy as np 8 | from array import * 9 | from tqdm import tqdm 10 | import torch 11 | import shutil 12 | from pathlib import Path 13 | import sys 14 | sys.path.append('../../') 15 | 16 | import evstore_utils 17 | import storage_manager 18 | 19 | ROCKSDB_DB_DIR = "/mnt/extra/db-ev-storage/rocksdb/" 20 | BINARY_DIR_NAME = "binary/" 21 | TOTAL_EV_TABLE = 26 22 | EV_DIMENSION = 36 23 | 24 | class RocksDBClient: 25 | 26 | # will read the BINARY values from the rocksdb 27 | def get(self, tableId, rowId): 28 | # TableId start from index 1 29 | # assert(tableId >= 1) 30 | # assert(tableId <= TOTAL_EV_TABLE) 31 | # tableId started at 1, but the db connection started at id 0 32 | blob = self.db_conn.get(self.read_opts, str(tableId) + "-" + str(rowId)) 33 | # convert to float list 34 | # return struct.unpack('f'*EV_DIMENSION, blob.data[0:144]) 35 | return struct.unpack('f'*EV_DIMENSION, blob.data) 36 | 37 | # will read the BINARY values from the rocksdb 38 | def getByKey(self, key): 39 | # TableId start from index 1 40 | # tableId started at 1, but the db connection started at id 0 41 | blob = self.db_conn.get(self.read_opts, key) 42 | # convert to float list 43 | print(blob) 44 | val = struct.unpack('f'*EV_DIMENSION, blob.data) 45 | print(val) 46 | exit() 47 | return val 48 | 49 | def open_db_conn(self): 50 | print("Will prepare db connection") 51 | opts = pyrocksdb.Options() 52 | # for multi-thread 53 | opts.IncreaseParallelism() 54 | opts.OptimizeLevelStyleCompaction() 55 | self.db_conn = pyrocksdb.DB() 56 | status = self.db_conn.open(opts, os.path.join(ROCKSDB_DB_DIR, "ev-table-all.db")) 57 | assert(status.ok()) 58 | print("All db connections are ready!") 59 | 60 | def close_db_conn(self): 61 | print("Closing rocksdb connections") 62 | self.db_conn.close() 63 | 64 | def get_nrows_pertable(self, file_path): 65 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path) 66 | return ln_emb 67 | 68 | def load(self, ev_dir, bit_precision = 32): 69 | # delete the db dir if exists 70 | if os.path.exists(ROCKSDB_DB_DIR) and os.path.isdir(ROCKSDB_DB_DIR): 71 | shutil.rmtree(ROCKSDB_DB_DIR) 72 | # recreate the dir to hold new rocksdb data 73 | Path(os.path.join(ROCKSDB_DB_DIR)).mkdir(parents=True, exist_ok=True) 74 | 75 | print("**************** Loading EV Table to ROCKSDB") 76 | print("**************** Load new set of EV Table from = " + ev_dir) 77 | 78 | assert(bit_precision%4 == 0) 79 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path) 80 | 81 | BYTE_PRECISION = int(bit_precision/8) 82 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 83 | 84 | db = pyrocksdb.DB() 85 | opts = pyrocksdb.Options() 86 | # for multi-thread 87 | opts.IncreaseParallelism() 88 | opts.OptimizeLevelStyleCompaction() 89 | opts.create_if_missing = True 90 | db_filename = "ev-table-all.db" 91 | db_filename = os.path.join(ROCKSDB_DB_DIR, db_filename) 92 | #print(db_filename) 93 | s = db.open(opts, db_filename) 94 | assert(s.ok()) 95 | 96 | # Storing binary ev-tables to rocksDB 97 | for ev_idx in range(0, TOTAL_EV_TABLE): 98 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin" 99 | 100 | # RocksDB loads the BINARY EV-Tables! 101 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename) 102 | print("************* Loading EV = " + bin_ev_path) 103 | 104 | # put 105 | with open(bin_ev_path, 'rb') as f: 106 | data = f.read() 107 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 108 | 109 | # Verify that the number of unique values per table is the same as what the DLRM model expect 110 | assert(ln_emb[ev_idx] == num_of_indexes) 111 | 112 | opts = pyrocksdb.WriteOptions() 113 | #for nrow in tqdm(range(0, num_of_indexes)): 114 | for i in range(0, num_of_indexes): 115 | # put 116 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 117 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 118 | k = str(ev_idx+1) + "-" + str(i) 119 | db.put(opts, k, v) 120 | print(" === db-path: " + db_filename) 121 | f.close() 122 | print("**************** All EvTable loaded in the RocksDB!") 123 | db.close() 124 | 125 | def __init__(self): 126 | self.db_conn = None 127 | self.read_opts = pyrocksdb.ReadOptions() 128 | 129 | -------------------------------------------------------------------------------- /emb_storage/storage_rocksdb_26_tabs.py: -------------------------------------------------------------------------------- 1 | import pyrocksdb 2 | import time 3 | import os 4 | import pandas as pd 5 | import argparse 6 | import struct 7 | import numpy as np 8 | from array import * 9 | from tqdm import tqdm 10 | import struct 11 | import torch 12 | import shutil 13 | from pathlib import Path 14 | import sys 15 | sys.path.append('../../') 16 | 17 | import evstore_utils 18 | import storage_manager 19 | 20 | ROCKSDB_DB_PATH = "/mnt/extra/db-ev-storage/rocksdb/" 21 | BINARY_DIR_NAME = "binary/" 22 | TOTAL_EV_TABLE = 26 23 | EV_DIMENSION = 36 24 | 25 | class RocksDBClient: 26 | 27 | # will read the BINARY values from the rocksdb 28 | def get(self, tableId, rowId): 29 | opts = pyrocksdb.ReadOptions() 30 | # assert(tableId >= 1) 31 | # assert(tableId <= TOTAL_EV_TABLE) 32 | # tableId started at 1, but the db connection started at id 0 33 | blob = self.arr_db_conn[tableId - 1].get(self.read_opts, str(rowId)) 34 | # convert to float list 35 | return struct.unpack('f'*36, blob.data[0:144]) 36 | 37 | def open_db_conn(self): 38 | print("Will prepare db connection") 39 | opts = pyrocksdb.Options() 40 | # for multi-thread 41 | opts.IncreaseParallelism() 42 | opts.OptimizeLevelStyleCompaction() 43 | for i in range(TOTAL_EV_TABLE): 44 | db_conn = pyrocksdb.DB() 45 | status = db_conn.open(opts, os.path.join(ROCKSDB_DB_PATH, "ev-table-" + str(i+1) + ".db")) 46 | assert(status.ok()) 47 | self.arr_db_conn.append(db_conn) 48 | print("All db connections are ready!") 49 | 50 | def close_db_conn(self): 51 | print("Closing rocksdb connections") 52 | for db_conn in self.arr_db_conn: 53 | db_conn.close() 54 | 55 | def get_nrows_pertable(self, file_path): 56 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path) 57 | return ln_emb 58 | 59 | def load(self, ev_dir, bit_precision = 32): 60 | # delete the db dir if exists 61 | if os.path.exists(ROCKSDB_DB_PATH) and os.path.isdir(ROCKSDB_DB_PATH): 62 | shutil.rmtree(ROCKSDB_DB_PATH) 63 | # recreate the dir to hold new rocksdb data 64 | Path(os.path.join(ROCKSDB_DB_PATH)).mkdir(parents=True, exist_ok=True) 65 | 66 | print("**************** Loading EV Table to ROCKSDB") 67 | print("**************** Load new set of EV Table from = " + ev_dir) 68 | 69 | assert(bit_precision%4 == 0) 70 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path) 71 | 72 | BYTE_PRECISION = int(bit_precision/8) 73 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 74 | # Storing binary ev-tables to rocksDB 75 | for ev_idx in range(0, TOTAL_EV_TABLE): 76 | binFilename = "ev-table-" + str(ev_idx + 1) + ".bin" 77 | 78 | # RocksDB loads the BINARY EV-Tables! 79 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, binFilename) 80 | print("************* Loading EV = " + bin_ev_path) 81 | 82 | db = pyrocksdb.DB() 83 | opts = pyrocksdb.Options() 84 | # for multi-thread 85 | opts.IncreaseParallelism() 86 | opts.OptimizeLevelStyleCompaction() 87 | opts.create_if_missing = True 88 | dbFilename = "ev-table-" + str(ev_idx + 1) + ".db" 89 | dbFilename = os.path.join(ROCKSDB_DB_PATH, dbFilename) 90 | #print(dbFilename) 91 | s = db.open(opts, dbFilename) 92 | assert(s.ok()) 93 | # put 94 | with open(bin_ev_path, 'rb') as f: 95 | data = f.read() 96 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 97 | 98 | # Verify that the number of unique values per table is the same as what the DLRM model expect 99 | assert(ln_emb[ev_idx] == num_of_indexes) 100 | 101 | opts = pyrocksdb.WriteOptions() 102 | #for nrow in tqdm(range(0, num_of_indexes)): 103 | for i in range(0, num_of_indexes): 104 | # put 105 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 106 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 107 | k = str(i) 108 | db.put(opts, k, v) 109 | db.close() 110 | print(" === db-path: " + dbFilename) 111 | f.close() 112 | print("**************** All EvTable loaded in the RocksDB!") 113 | 114 | def __init__(self): 115 | self.arr_db_conn = [] 116 | self.read_opts = pyrocksdb.ReadOptions() 117 | 118 | -------------------------------------------------------------------------------- /emb_storage/storage_sqlite.py: -------------------------------------------------------------------------------- 1 | import sqlite3 2 | import time 3 | import os 4 | import pandas as pd 5 | import argparse 6 | import struct 7 | import numpy as np 8 | from array import * 9 | from tqdm import tqdm 10 | import torch 11 | import shutil 12 | from pathlib import Path 13 | import sys 14 | sys.path.append('../../') 15 | 16 | import evstore_utils 17 | import storage_manager 18 | 19 | SQLITE_DB_DIR = "/mnt/extra/db-ev-storage/sqlite/" 20 | BINARY_DIR_NAME = "binary/" 21 | TOTAL_EV_TABLE = 26 22 | EV_DIMENSION = 36 23 | DB_NAME = "ev-table-all.db" 24 | 25 | class SQLiteClient: 26 | 27 | # will read the BINARY values from the SQLiteDB 28 | def get(self, tableId, rowId): 29 | # TableId start from index 1 30 | # assert(tableId >= 1) 31 | # assert(tableId <= TOTAL_EV_TABLE) 32 | # The row at SQLite is started from 1 instead of 0 33 | realRowId = rowId + 1 + self.db_add_up_tables[tableId-1] 34 | blob = self.db_cursor.execute("SELECT * FROM tab1 where rowid={};".format(realRowId)).fetchone() 35 | # print(tableId) 36 | # print(rowId) 37 | # print(blob) 38 | # assert(blob != None) 39 | return struct.unpack('f'*EV_DIMENSION, blob[0]) 40 | 41 | def get_nrows_pertable(self, file_path): 42 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path) 43 | return ln_emb 44 | 45 | def load(self, ev_dir, bit_precision = 32): 46 | # delete the db dir if exists 47 | if os.path.exists(SQLITE_DB_DIR) and os.path.isdir(SQLITE_DB_DIR): 48 | shutil.rmtree(SQLITE_DB_DIR) 49 | # recreate the dir to hold new sqlite data 50 | Path(os.path.join(SQLITE_DB_DIR)).mkdir(parents=True, exist_ok=True) 51 | db = sqlite3.connect(self.db_file_path) 52 | db_cursor = db.cursor() 53 | 54 | print("**************** Loading EV Table to SQLite") 55 | print("**************** Load new set of EV Table from = " + ev_dir) 56 | 57 | assert(bit_precision%4 == 0) 58 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path) 59 | 60 | BYTE_PRECISION = int(bit_precision/8) 61 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 62 | table_name = "tab1" 63 | # Storing binary ev-tables to SQLite 64 | for ev_idx in range(0, TOTAL_EV_TABLE): 65 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin" 66 | # table_name = "ev_table_" + str(ev_idx + 1) 67 | 68 | db_cursor.execute("CREATE TABLE if not exists " + table_name + " (b BLOB);") 69 | 70 | # SQLite loads the BINARY EV-Tables! 71 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename) 72 | print("************* Loading EV = " + bin_ev_path) 73 | # put 74 | with open(bin_ev_path, 'rb') as f: 75 | data = f.read() 76 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 77 | 78 | # Verify that the number of unique values per table is the same as what the DLRM model expect 79 | assert(ln_emb[ev_idx] == num_of_indexes) 80 | 81 | bin_ev_path = "/home/cc/ev-tables-sqlite/bin_workload" 82 | 83 | #for nrow in tqdm(range(0, num_of_indexes)): 84 | for i in range(0, num_of_indexes): 85 | # put 86 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 87 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 88 | k = str(ev_idx+1) + "-" + str(i) 89 | db_cursor.execute("insert into " + table_name + " values(?)", (v, )) 90 | print(" === db-path: " + table_name) 91 | f.close() 92 | print("**************** All EvTable loaded in the SQLite!") 93 | db.commit() 94 | db.close() 95 | 96 | def open_db_conn(self): 97 | print("Will prepare db connection") 98 | self.db_conn = sqlite3.connect(self.db_file_path) 99 | self.db_cursor = self.db_conn.cursor() 100 | print("All db connections are ready!") 101 | 102 | def close_db_conn(self): 103 | print("Closing sqlite connections") 104 | self.db_conn.close() 105 | 106 | def __init__(self): 107 | self.db_conn = None 108 | self.db_cursor = None 109 | self.db_file_path = os.path.join(SQLITE_DB_DIR, DB_NAME) 110 | self.db_ln_tables = self.get_nrows_pertable(storage_manager.training_config_path) 111 | self.db_add_up_tables = [0 for _ in range(len(self.db_ln_tables))] 112 | for i in range(len(self.db_ln_tables)-1): 113 | self.db_add_up_tables[i+1] = self.db_add_up_tables[i] + self.db_ln_tables[i] 114 | -------------------------------------------------------------------------------- /emb_storage/storage_sqlite_26_tabs.py: -------------------------------------------------------------------------------- 1 | import sqlite3 2 | import time 3 | import os 4 | import pandas as pd 5 | import argparse 6 | import struct 7 | import numpy as np 8 | from array import * 9 | from tqdm import tqdm 10 | import torch 11 | import shutil 12 | from pathlib import Path 13 | import sys 14 | sys.path.append('../../') 15 | 16 | import evstore_utils 17 | import storage_manager 18 | 19 | SQLITE_DB_DIR = "/mnt/extra/db-ev-storage/sqlite/" 20 | BINARY_DIR_NAME = "binary/" 21 | TOTAL_EV_TABLE = 26 22 | EV_DIMENSION = 36 23 | DB_NAME = "ev-table-all.db" 24 | 25 | class SQLiteClient: 26 | 27 | # will read the BINARY values from the sqlite 28 | def get(self, tableId, rowId): 29 | # TableId start from index 1 30 | # assert(tableId >= 1) 31 | # assert(tableId <= TOTAL_EV_TABLE) 32 | # The row at SQLite is started from 1 instead of 0 33 | blob = self.db_cursor.execute("SELECT * FROM ev_table_{} where rowid={};".format(tableId, rowId + 1)).fetchone() 34 | # print(tableId) 35 | # print(rowId) 36 | # print(blob) 37 | # assert(blob != None) 38 | return struct.unpack('f'*EV_DIMENSION, blob[0]) 39 | 40 | def get_nrows_pertable(self, file_path): 41 | _, _, _, ln_emb, _ = evstore_utils.read_training_config(file_path) 42 | return ln_emb 43 | 44 | def load(self, ev_dir, bit_precision = 32): 45 | # delete the db dir if exists 46 | if os.path.exists(SQLITE_DB_DIR) and os.path.isdir(SQLITE_DB_DIR): 47 | shutil.rmtree(SQLITE_DB_DIR) 48 | # recreate the dir to hold new sqlite data 49 | Path(os.path.join(SQLITE_DB_DIR)).mkdir(parents=True, exist_ok=True) 50 | db = sqlite3.connect(self.db_file_path) 51 | db_cursor = db.cursor() 52 | 53 | print("**************** Loading EV Table to SQLite") 54 | print("**************** Load new set of EV Table from = " + ev_dir) 55 | 56 | assert(bit_precision%4 == 0) 57 | ln_emb = self.get_nrows_pertable(storage_manager.training_config_path) 58 | 59 | BYTE_PRECISION = int(bit_precision/8) 60 | TOTAL_BYTE_PER_ROW = EV_DIMENSION * BYTE_PRECISION 61 | 62 | # Storing binary ev-tables to SQLite 63 | for ev_idx in range(0, TOTAL_EV_TABLE): 64 | bin_filename = "ev-table-" + str(ev_idx + 1) + ".bin" 65 | table_name = "ev_table_" + str(ev_idx + 1) 66 | 67 | db_cursor.execute("CREATE TABLE if not exists " + table_name + " (b BLOB);") 68 | 69 | # SQLite loads the BINARY EV-Tables! 70 | bin_ev_path = os.path.join(ev_dir, BINARY_DIR_NAME, bin_filename) 71 | print("************* Loading EV = " + bin_ev_path) 72 | # put 73 | with open(bin_ev_path, 'rb') as f: 74 | data = f.read() 75 | num_of_indexes = len(data) // TOTAL_BYTE_PER_ROW 76 | 77 | # Verify that the number of unique values per table is the same as what the DLRM model expect 78 | assert(ln_emb[ev_idx] == num_of_indexes) 79 | 80 | bin_ev_path = "/home/cc/ev-tables-sqlite/bin_workload" 81 | 82 | #for nrow in tqdm(range(0, num_of_indexes)): 83 | for i in range(0, num_of_indexes): 84 | # put 85 | byte_offset = BYTE_PRECISION * i * EV_DIMENSION # 36 -> dimension 86 | v = data[ byte_offset : byte_offset + TOTAL_BYTE_PER_ROW] 87 | k = str(ev_idx+1) + "-" + str(i) 88 | db_cursor.execute("insert into " + table_name + " values(?)", (v, )) 89 | print(" === db-path: " + table_name) 90 | f.close() 91 | print("**************** All EvTable loaded in the SQLite!") 92 | db.commit() 93 | db.close() 94 | 95 | def open_db_conn(self): 96 | print("Will prepare db connection") 97 | self.db_conn = sqlite3.connect(self.db_file_path) 98 | self.db_cursor = self.db_conn.cursor() 99 | print("All db connections are ready!") 100 | 101 | def close_db_conn(self): 102 | print("Closing sqlite connections") 103 | self.db_conn.close() 104 | 105 | def __init__(self): 106 | self.db_conn = None 107 | self.db_cursor = None 108 | self.db_file_path = os.path.join(SQLITE_DB_DIR, DB_NAME) 109 | -------------------------------------------------------------------------------- /evstore_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | # import EvLFU 4 | import struct 5 | from pathlib import Path 6 | import ast 7 | import numpy as np 8 | import torch 9 | 10 | TRAINING_CONFIG_FILE = "training_config.txt" 11 | 12 | # Replacing the current embedding layer at the current model 13 | def load_new_ev_table (ld_model, ev_path): 14 | print("Load new set of EV Table from = " + ev_path) 15 | for ev_idx in range(0, 26): 16 | new_ev_path = os.path.join(ev_path, "ev-table-"+ str(ev_idx + 1) + ".csv") 17 | new_ev_df = pd.read_csv(new_ev_path, dtype=float, delimiter=',') 18 | # # Convert to numpy first before to tensor 19 | new_ev_arr = new_ev_df.to_numpy() 20 | # # Convert to tensor 21 | new_ev_tensor = torch.FloatTensor(new_ev_arr) 22 | 23 | print("Loading NEW EV per embedding layer = " + new_ev_path) 24 | 25 | # Create key since the entire model will be accessed 26 | key = str("emb_l."+str(ev_idx)+".weight") 27 | # Replace the current embedding tensor with the new one based on the key 28 | ld_model["state_dict"][key] = new_ev_tensor 29 | print("Done loading all EV-Table from " + ev_path) 30 | 31 | def store_training_config(file_path, table_feature_map, nbatches, nbatches_test, ln_emb, m_den): 32 | # store the config to a file to avoid redoing the computation during inference-only 33 | with open(file_path, 'w') as f: 34 | f.write('The order of the arguments: table_feature_map, nbatches, nbatches_test, ln_emb, m_den\n') 35 | f.write(str(table_feature_map)+"\n") 36 | f.write(str(nbatches)+"\n") 37 | f.write(str(nbatches_test)+"\n") 38 | f.write(str(ln_emb.tolist())+"\n") 39 | f.write(str(m_den)+"\n") 40 | print("Done writing training config to : " + file_path + "\n") 41 | 42 | def read_training_config(file_path): 43 | print("Read training config from : " + file_path) 44 | with open(file_path) as f: 45 | lines = [line.rstrip() for line in f] 46 | 47 | table_feature_map = ast.literal_eval(lines[1]) 48 | nbatches = int(lines[2]) 49 | nbatches_test = int(lines[3]) 50 | ln_emb = np.array(ast.literal_eval(lines[4])) 51 | m_den = int(lines[5]) 52 | return table_feature_map, nbatches, nbatches_test, ln_emb, m_den 53 | 54 | def prepare_inference_trace_folder(input_data_name, percent_data_for_inference): 55 | print("Create folder to store the model and ev-tables") 56 | outdir = os.path.join("logs", "inf-workload-traces", input_data_name, "inference=" + str(percent_data_for_inference)) 57 | Path(outdir).mkdir(parents=True, exist_ok=True) 58 | return outdir 59 | 60 | def write_inf_workload_to_file(workload_traces_outdir, arr_inference_workload): 61 | # Create + open 26 different files 62 | print("Total inference = " + str(len(arr_inference_workload))) 63 | arrfile = [] 64 | 65 | for idx in range(0,26): 66 | arrfile.append(open(workload_traces_outdir + "/workload-group-" + str(idx + 1) + ".csv",'w')) 67 | arrfile[idx].write("G" + str(idx + 1) + "_key\n") 68 | 69 | for grouped_keys in arr_inference_workload: 70 | id = 0 71 | for key in grouped_keys: 72 | arrfile[id].write(key + "\n") 73 | id += 1 -------------------------------------------------------------------------------- /input/.gitignore: -------------------------------------------------------------------------------- 1 | * 2 | #!compressed4git* 3 | !.gitignore 4 | !readme.txt 5 | #!*tar.gz.part* 6 | -------------------------------------------------------------------------------- /input/readme.txt: -------------------------------------------------------------------------------- 1 | ------ Display Advertising Challenge ------ 2 | 3 | Dataset: dac-v1 4 | 5 | This dataset contains feature values and click feedback for millions of display 6 | ads. Its purpose is to benchmark algorithms for clickthrough rate (CTR) prediction. 7 | It has been used for the Display Advertising Challenge hosted by Kaggle: 8 | https://www.kaggle.com/c/criteo-display-ad-challenge/ 9 | 10 | =================================================== 11 | 12 | Full description: 13 | 14 | This dataset contains 2 files: 15 | train.txt 16 | test.txt 17 | corresponding to the training and test parts of the data. 18 | 19 | ==================================================== 20 | 21 | Dataset construction: 22 | 23 | The training dataset consists of a portion of Criteo's traffic over a period 24 | of 7 days. Each row corresponds to a display ad served by Criteo and the first 25 | column is indicates whether this ad has been clicked or not. 26 | The positive (clicked) and negatives (non-clicked) examples have both been 27 | subsampled (but at different rates) in order to reduce the dataset size. 28 | 29 | There are 13 features taking integer values (mostly count features) and 26 30 | categorical features. The values of the categorical features have been hashed 31 | onto 32 bits for anonymization purposes. 32 | The semantic of these features is undisclosed. Some features may have missing values. 33 | 34 | The rows are chronologically ordered. 35 | 36 | The test set is computed in the same way as the training set but it 37 | corresponds to events on the day following the training period. 38 | The first column (label) has been removed. 39 | 40 | ==================================================== 41 | 42 | Format: 43 | 44 | The columns are tab separeted with the following schema: 45 |