├── .gitignore
├── logo.png
├── requirements.txt
├── doc
    ├── m_node_superpod.png
    ├── mgpu_scalability.png
    ├── one_node_perf_a100.png
    ├── clue-gpt2-loss-n-acc.png
    ├── profiler
    │   ├── GPT3_8B_memory.png
    │   └── GPT3_8B_4xV100_access.png
    ├── yard_network_fabric.md
    └── optimization_options.md
├── .code.yml
├── MANIFEST.in
├── .flake8
├── examples
    ├── benchmark
    │   ├── run_benchmark.sh
    │   ├── run_a100_benchmark_example.sh
    │   ├── run_a100_benchmark_large_model.sh
    │   ├── run_a100_benchmark_small_model.sh
    │   ├── is_run_this_file.py
    │   ├── generate_res_table.py
    │   └── process_logs.py
    ├── optimizations
    │   ├── __init__.py
    │   ├── global_opt_flags.py
    │   ├── test_tiling.py
    │   └── ls_hf_transformer_encoder_layer.py
    ├── README.md
    ├── data_loader.py
    ├── ps_config.py
    ├── imdb_dataset.py
    ├── moe
    │   ├── moe_bert.py
    │   └── huggingface_bert_moe.py
    ├── train_simple_net.py
    ├── huggingface_bert.py
    ├── simple_net.py
    └── run_transformers.sh
├── CHANGE_LOG.md
├── .pre-commit-config.yaml
├── LICENSE
├── __init__.py
├── patrickstar
    ├── profiler
    │   ├── __init__.py
    │   └── profiler.py
    ├── ops
    │   ├── op_builder
    │   │   ├── __init__.py
    │   │   └── cpu_adam.py
    │   ├── __init__.py
    │   ├── csrc
    │   │   └── includes
    │   │   │   └── context.h
    │   └── embedding.py
    ├── fp16
    │   └── __init__.py
    ├── core
    │   ├── memtracer
    │   │   ├── __init__.py
    │   │   ├── training_stage_mgr.py
    │   │   └── metronome.py
    │   ├── __init__.py
    │   ├── comm.py
    │   ├── const.py
    │   ├── tensor_stub.py
    │   ├── torch_profiler_hook.py
    │   └── memory_cache.py
    ├── manager
    │   ├── __init__.py
    │   ├── cuda_context.py
    │   └── runtime_config.py
    ├── __init__.py
    ├── utils
    │   ├── __init__.py
    │   ├── singleton_meta.py
    │   ├── helper.py
    │   ├── model_size_calculator.py
    │   ├── distributed.py
    │   ├── memory.py
    │   ├── logging.py
    │   ├── memory_monitor.py
    │   └── global_timer.py
    └── runtime
    │   └── __init__.py
├── tools
    └── merge_checkpoint.py
├── unitest
    ├── test_torch_scope.py
    ├── test_optimizer_init.py
    ├── test_embedding_ops.py
    ├── test_utils.py
    ├── test_memory_cache.py
    ├── test_eviction_policy.py
    ├── test_model_init.py
    ├── test_chunk_list.py
    ├── test_client.py
    ├── test_chunk_data.py
    └── common.py
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *__pycache__*
2 | *DS_Store*
3 | 


--------------------------------------------------------------------------------
/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/logo.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch
2 | pytest
3 | psutil
4 | ninja
5 | rich
6 | transformers
7 | scipy
8 | 


--------------------------------------------------------------------------------
/doc/m_node_superpod.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/m_node_superpod.png


--------------------------------------------------------------------------------
/doc/mgpu_scalability.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/mgpu_scalability.png


--------------------------------------------------------------------------------
/doc/one_node_perf_a100.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/one_node_perf_a100.png


--------------------------------------------------------------------------------
/doc/clue-gpt2-loss-n-acc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/clue-gpt2-loss-n-acc.png


--------------------------------------------------------------------------------
/doc/profiler/GPT3_8B_memory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/profiler/GPT3_8B_memory.png


--------------------------------------------------------------------------------
/doc/profiler/GPT3_8B_4xV100_access.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/profiler/GPT3_8B_4xV100_access.png


--------------------------------------------------------------------------------
/.code.yml:
--------------------------------------------------------------------------------
1 | source:
2 |   third_party_source:
3 |     filepath_regex: [".*/patrickstar/ops/csrc/*/.*",
4 |                      ".*/patrickstar/ops/op_builder/.*"]
5 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt README.md
2 | recursive-include requirements *.txt
3 | recursive-include patrickstar *.cpp *.h *.cu *.tr *.cuh *.cc
4 | recursive-include csrc *.cpp *.h *.cu *.tr *.cuh *.cc
5 | 


--------------------------------------------------------------------------------
/.flake8:
--------------------------------------------------------------------------------
 1 | [flake8]
 2 | ignore =
 3 |     ;W503 line break before binary operator
 4 |     W503,
 5 |     ;E203 whitespace before ':'
 6 |     E203,
 7 | 
 8 | ; exclude file
 9 | exclude =
10 |     .tox,
11 |     .git,
12 |     __pycache__,
13 |     build,
14 |     dist,
15 |     *.pyc,
16 |     *.egg-info,
17 |     .cache,
18 |     .eggs
19 | 
20 | max-line-length = 120
21 | 
22 | per-file-ignores = __init__.py:F401
23 | 


--------------------------------------------------------------------------------
/examples/benchmark/run_benchmark.sh:
--------------------------------------------------------------------------------
 1 | mkdir -p ./logs
 2 | 
 3 | export MODEL_NAME=""
 4 | export BS=32
 5 | export CS=64
 6 | export CPU_EBD=1
 7 | export SP=0
 8 | export ACT_OFFLOAD=0
 9 | export NO_RETRY=1
10 | export SKIP_LOG_EXSIT=1
11 | 
12 | for MODEL_NAME in "GPT2small"
13 | do
14 | for BS in 32
15 | do
16 | for CS in 64
17 | do
18 | for CPU_EBD in 1
19 | do
20 | for SP in 0
21 | do
22 | for ACT_OFFLOAD in 0 1
23 | do
24 | echo "****************** Begin ***************************"
25 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} "
26 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD}"
27 | bash ../run_transformers.sh
28 | echo "****************** Finished ***************************"
29 | echo ""
30 | echo ""
31 | done
32 | done
33 | done
34 | done
35 | done
36 | done
37 | 


--------------------------------------------------------------------------------
/doc/yard_network_fabric.md:
--------------------------------------------------------------------------------
 1 | ## Network Topology of a node of WeChat Yard
 2 | ```nvidia-smi topo -m```
 3 | 
 4 | GPU0  GPU1  GPU2  GPU3  GPU4  GPU5  GPU6  GPU7
 5 |   
 6 | GPU0   X   NV1  NV2  NV1  SYS  SYS  SYS  NV2
 7 | 
 8 | GPU1  NV1   X   NV1  NV2  SYS  SYS  NV2  SYS
 9 | 
10 | GPU2  NV2  NV1   X   NV2  SYS  NV1  SYS  SYS
11 | 
12 | GPU3  NV1  NV2  NV2   X   NV1  SYS  SYS  SYS
13 | 
14 | GPU4  SYS  SYS  SYS  NV1   X   NV2  NV2  NV1
15 | 
16 | GPU5  SYS  SYS  NV1  SYS  NV2   X   NV1  NV2
17 | 
18 | GPU6  SYS  NV2  SYS  SYS  NV2  NV1   X   NV1
19 | 
20 | GPU7  NV2  SYS  SYS  SYS  NV1  NV2  NV1    X
21 | 
22 | ```nvidia-smi nvlink --status -i 0```
23 | 
24 | GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-4b6ebbfe-8eac-8fed-1939-b4c545eafa7f)
25 | 
26 |    Link 0: 25.781 GB/s
27 |    
28 |    Link 1: 25.781 GB/s
29 |    
30 |    Link 2: 25.781 GB/s
31 |    
32 |    Link 3: 25.781 GB/s
33 |    
34 |    Link 4: 25.781 GB/s
35 |    
36 |    Link 5: 25.781 GB/s
37 | 


--------------------------------------------------------------------------------
/examples/benchmark/run_a100_benchmark_example.sh:
--------------------------------------------------------------------------------
 1 | export MODEL_NAME=""
 2 | export BS=12
 3 | export CS=384
 4 | export CPU_EBD=0
 5 | export SP=0
 6 | export ACT_OFFLOAD=0
 7 | export NO_RETRY=0
 8 | export SKIP_LOG_EXSIT=0
 9 | export MSC=1
10 | export CACHE=1
11 | export GPU_NUM=8
12 | export MODEL_TYPE="BERT"
13 | 
14 | 
15 | for GPU_NUM in 8
16 | do
17 | for MODEL_NAME in "GPT_DS_40B"
18 | do
19 | for BS in 4 
20 | do
21 | for CS in 288 
22 | do
23 | for CPU_EBD in 0
24 | do
25 | for SP in 0
26 | do
27 | for ACT_OFFLOAD in 0
28 | do
29 | for MSC in 1
30 | do
31 | for CACHE in 1
32 | do
33 | echo "****************** Begin ***************************"
34 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} "
35 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}"
36 | bash ../run_transformers.sh
37 | echo "****************** Finished ***************************"
38 | echo ""
39 | echo ""
40 | done
41 | done
42 | done
43 | done
44 | done
45 | done
46 | done
47 | done
48 | done
49 | 


--------------------------------------------------------------------------------
/examples/benchmark/run_a100_benchmark_large_model.sh:
--------------------------------------------------------------------------------
 1 | export MODEL_NAME=""
 2 | export BS=32
 3 | export CS=64
 4 | export CPU_EBD=1
 5 | export SP=0
 6 | export ACT_OFFLOAD=0
 7 | export NO_RETRY=0
 8 | export SKIP_LOG_EXSIT=1
 9 | export MSC=1
10 | export CACHE=1
11 | export GPU_NUM=1
12 | 
13 | 
14 | for GPU_NUM in 1 2 4 8
15 | do
16 | for MODEL_NAME in "GPT_DS_20B" "GPT_DS_40B"
17 | do
18 | for BS in 8 4 16
19 | do
20 | for CS in 256 384
21 | do
22 | for CPU_EBD in 0
23 | do
24 | for SP in 0
25 | do
26 | for ACT_OFFLOAD in 0
27 | do
28 | for MSC in 1
29 | do
30 | for CACHE in 0 1
31 | do
32 | echo "****************** Begin ***************************"
33 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} "
34 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}"
35 | bash ../run_transformers.sh
36 | echo "****************** Finished ***************************"
37 | echo ""
38 | echo ""
39 | done
40 | done
41 | done
42 | done
43 | done
44 | done
45 | done
46 | done
47 | done
48 | 


--------------------------------------------------------------------------------
/examples/benchmark/run_a100_benchmark_small_model.sh:
--------------------------------------------------------------------------------
 1 | export MODEL_NAME=""
 2 | export BS=32
 3 | export CS=64
 4 | export CPU_EBD=1
 5 | export SP=0
 6 | export ACT_OFFLOAD=0
 7 | export NO_RETRY=0
 8 | export SKIP_LOG_EXSIT=1
 9 | export MSC=1
10 | export CACHE=1
11 | export GPU_NUM=1
12 | 
13 | 
14 | for GPU_NUM in 1 2 4 8
15 | do
16 | for MODEL_NAME in "GPT_DS_20B" "GPT_DS_40B"
17 | do
18 | for BS in 8 4 16
19 | do
20 | for CS in 256 384
21 | do
22 | for CPU_EBD in 0
23 | do
24 | for SP in 0
25 | do
26 | for ACT_OFFLOAD in 0
27 | do
28 | for MSC in 0 1
29 | do
30 | for CACHE in 0 1
31 | do
32 | echo "****************** Begin ***************************"
33 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} "
34 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}"
35 | bash ../run_transformers.sh
36 | echo "****************** Finished ***************************"
37 | echo ""
38 | echo ""
39 | done
40 | done
41 | done
42 | done
43 | done
44 | done
45 | done
46 | done
47 | done
48 | 


--------------------------------------------------------------------------------
/CHANGE_LOG.md:
--------------------------------------------------------------------------------
 1 | ## v0.4.5 Dec. 2021
 2 | Refactor the files in example and add chunk size searching.=
 3 | Evaluate on 8 nodes of SuperPod. Fix bugs in multi-GPU mem tracer.
 4 | 
 5 | 
 6 | ### v0.4.4 Dec. 2021
 7 | The system is successfully evaluated on a multi-node system.
 8 | The benchmark scripts are integrated with memory-centric tiling borrowed from DeepSpeed.
 9 | It trains an 18B model on WeChat Yard.
10 | 
11 | 
12 | ### v0.4.3 Nov. 2021
13 | The system is evaluated on A100 SuperPod.
14 | Some optimizations are developed to improve further the model scale and efficiency, including memory saving communication (MSC) and allocation cache (CACHE).
15 | A severe bug caused by asyn chunk copy using stream is identified and fixed.
16 | It trains a 50B model on an 8xA100 SuperPod node.
17 | 
18 | 
19 | ### v0.4.0 Nov. 2021,
20 | The system is upgraded with a better memory tracer.
21 | We improve the max model scale further than v0.3.0 (15B vs. 12B) on the WeChat Yard Platform.
22 | 
23 | ### v0.3.0 Oct. 2021.
24 | Our initial version significantly surpasses DeepSpeed both in model-scale and computing efficiency.
25 | 


--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
 1 | # See https://pre-commit.com for more information
 2 | # See https://pre-commit.com/hooks.html for more hooks
 3 | repos:
 4 | -   repo: https://github.com/pre-commit/pre-commit-hooks
 5 |     rev: v2.4.0
 6 |     hooks:
 7 |     -   id: trailing-whitespace
 8 |     -   id: end-of-file-fixer
 9 |     -   id: check-added-large-files
10 | -   repo: https://github.com/doublify/pre-commit-clang-format
11 |     rev: master
12 |     hooks:
13 |     -   id: clang-format
14 |         files: \.(c|cc|cxx|cpp|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|vert|cu)$
15 | -   repo: https://github.com/ambv/black
16 |     rev: stable
17 |     hooks:
18 |     - id: black
19 | -   repo: https://github.com/pycqa/flake8
20 |     rev: ''  # pick a git hash / tag to point to
21 |     hooks:
22 |     -   id: flake8
23 | -   repo: https://github.com/Lucas-C/pre-commit-hooks
24 |     rev: "v1.1.7"
25 |     hooks:
26 |     -   id: forbid-crlf
27 |     -   id: remove-crlf
28 |     -   id: forbid-tabs
29 |     -   id: remove-tabs
30 |         args: [ --whitespaces-count, "2" ]  # defaults to: 4
31 |     -   id: insert-license
32 |         files: \.(c|cc|cxx|cpp|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|vert|cu)$
33 |         args:
34 |         - --license-filepath
35 |         - LICENSE          # defaults to: LICENSE.txt
36 |         - --comment-style
37 |         - //                              # defaults to: #
38 |     -   id: insert-license
39 |         files: \.(py)$
40 |         args:
41 |         - --license-filepath
42 |         - LICENSE          # defaults to: LICENSE.txt
43 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without modification,
 6 | are permitted provided that the following conditions are met:
 7 | 
 8 |  * Redistributions of source code must retain the above copyright notice, this
 9 |    list of conditions and the following disclaimer.
10 | 
11 |  * Redistributions in binary form must reproduce the above copyright notice,
12 |    this list of conditions and the following disclaimer in the documentation
13 |    and/or other materials provided with the distribution.
14 | 
15 |  * Neither the name of the psutil authors nor the names of its contributors
16 |    may be used to endorse or promote products derived from this software without
17 |    specific prior written permission.
18 | 
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 


--------------------------------------------------------------------------------
/examples/optimizations/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 


--------------------------------------------------------------------------------
/patrickstar/profiler/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | 
31 | from .profiler import profiler
32 | 


--------------------------------------------------------------------------------
/patrickstar/ops/op_builder/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .cpu_adam import CPUAdamBuilder
31 | 


--------------------------------------------------------------------------------
/patrickstar/fp16/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .loss_scaler import LossScaler, DynamicLossScaler
31 | 


--------------------------------------------------------------------------------
/examples/optimizations/global_opt_flags.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | USE_TILE = False
31 | USE_ACT_OFFLOAD = False
32 | 


--------------------------------------------------------------------------------
/patrickstar/ops/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .embedding import Embedding
31 | from .fp16_cpu_adam import FP16Adam
32 | 


--------------------------------------------------------------------------------
/patrickstar/core/memtracer/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .memtracer import RuntimeMemTracer
31 | from .metronome import Metronome
32 | 


--------------------------------------------------------------------------------
/patrickstar/manager/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .cuda_context import CUDAContext
31 | from .runtime_config import _runtime_config
32 | 


--------------------------------------------------------------------------------
/patrickstar/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .core import PatrickStarClient
31 | from .core.memtracer import RuntimeMemTracer
32 | from .ops import FP16Adam
33 | from .runtime import initialize_engine
34 | from .utils import global_timer
35 | from .utils import see_memory_usage
36 | from .utils.model_size_calculator import get_ps_model_size, estimate_bert_mac
37 | 


--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
 1 | ## PatrickStar examples
 2 | 
 3 | ### Use PatrickStar with HuggingFace
 4 | 
 5 | `huggingface_bert.py` is a fine-tuning Huggingface example with Patrickstar. Could you compare it with the [official Huggingface example](https://huggingface.co/transformers/custom_datasets.html#seq-imdb) to know how to apply PatrickStar to existed projects.
 6 | 
 7 | Before running the example, you need to prepare the data:
 8 | 
 9 | ```bash
10 | wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
11 | tar -xf aclImdb_v1.tar.gz
12 | ```
13 | 
14 | And change the directory used in `get_dataset()`. After these, you are ready to go:
15 | 
16 | ```bash
17 | python huggingface_bert.py
18 | ```
19 | 
20 | ### Use PatrickStar to train large model
21 | 
22 | `run_transformers.sh` and `pretrain_demo.py` is an example to train large PTMs with PatrickStar. You could run different size of model by adding config to`run_transformers.sh`.
23 | 
24 | The following command will run a model with 4B params:
25 | 
26 | ```bash
27 | env MODEL_NAME=GPT2_4B RES_CHECK=0 DIST_PLAN="patrickstar" bash run_transformers.sh
28 | ```
29 | 
30 | For the available `MODEL_NAME`, please check `pretrain_demo.py`.
31 | 
32 | Check the accuracy of PatrickStar with Bert:
33 | 
34 | ```bash
35 | bash RES_CHECK=1 run_transformers.sh
36 | ```
37 | 
38 | ### MoE support
39 | 
40 | PatrickStar also support training MoE models. In the `examples/moe` directory, run:
41 | 
42 | ```bash
43 | python -m torch.distributed.launch --nproc_per_node=4 huggingface_bert_moe.py
44 | ```
45 | 
46 | Note that you need to install [FastMoE](https://github.com/laekov/fastmoe) before running this example.
47 | 
48 | 
49 | ### Search the best chunk size
50 | 
51 | Chunk size (CS) is an important hyperparameter for patrickstar.
52 | Although you can set an CS value empirically by run your training task serveral times. We provide an systemic way to find a CS with less memory footprint. Using the following command to search the chunk size.
53 | 
54 | ```
55 |  env CS_SEARCH=1 bash run_transformers.sh
56 | ```
57 | 


--------------------------------------------------------------------------------
/patrickstar/utils/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .distributed import get_world_size, get_rank, get_local_world_size
31 | from .helper import getsizeof, get_space_of
32 | from .logging import log_dist, logger, print_rank
33 | from .memory import get_memory_info
34 | from .memory_monitor import (
35 |     see_memory_usage,
36 |     get_sys_memory_used,
37 | )
38 | from .singleton_meta import SingletonMeta
39 | 


--------------------------------------------------------------------------------
/patrickstar/core/memtracer/training_stage_mgr.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from patrickstar.core.const import TrainingStage
31 | 
32 | 
33 | class TrainingStageMgr:
34 |     def __init__(self):
35 |         """
36 |         Tell us in which stage the training are. (FWD, BWD, ADAM)
37 |         Also tell us whether in an warmup iteration.
38 |         """
39 |         self.training_phase = TrainingStage.UNSTART
40 |         self.is_warmup = False
41 | 


--------------------------------------------------------------------------------
/patrickstar/core/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .chunk_data import Chunk
31 | from .chunk_list import ChunkList
32 | from .chunk_tensor_index import ChunkTensorIndex
33 | from .client import PatrickStarClient
34 | from .const import AccessType, ChunkState, TensorState, TrainingStage, ChunkType
35 | from .hook import setup_patrickstar_hooks
36 | from .parameter import PSParameter, register_param, is_param_registered, ParamType
37 | from .preprocess import PSPreProcessCtx, torch_scope
38 | 


--------------------------------------------------------------------------------
/tools/merge_checkpoint.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import fire
31 | import torch
32 | 
33 | from patrickstar.utils import logger
34 | 
35 | 
36 | def merge_checkpoint(pattern, num):
37 |     merged_state_dict = {}
38 |     for i in range(num):
39 |         filename = pattern.replace("*", f"{i}")
40 |         merged_state_dict.update(torch.load(filename))
41 | 
42 |     merged_filename = pattern.replace("*", "merged")
43 |     logger.warning(f"Merged checkpoint will be saved to {merged_filename}")
44 |     torch.save(merged_state_dict, merged_filename)
45 | 
46 | 
47 | if __name__ == "__main__":
48 |     fire.Fire(merge_checkpoint)
49 | 


--------------------------------------------------------------------------------
/patrickstar/utils/singleton_meta.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | 
31 | class SingletonMeta(type):
32 |     """
33 |     The Singleton class can be implemented in different ways in Python. Some
34 |     possible methods include: base class, decorator, metaclass. We will use the
35 |     metaclass because it is best suited for this purpose.
36 |     """
37 | 
38 |     _instances = {}
39 | 
40 |     def __call__(cls, *args, **kwargs):
41 |         """
42 |         Possible changes to the value of the `__init__` argument do not affect
43 |         the returned instance.
44 |         """
45 |         if cls not in cls._instances:
46 |             instance = super().__call__(*args, **kwargs)
47 |             cls._instances[cls] = instance
48 |         return cls._instances[cls]
49 | 


--------------------------------------------------------------------------------
/patrickstar/manager/cuda_context.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from patrickstar.utils import SingletonMeta
31 | from patrickstar.utils import logger, get_world_size
32 | import torch
33 | 
34 | 
35 | class CUDAContext(metaclass=SingletonMeta):
36 |     def __init__(self):
37 |         self.compute_stream = torch.cuda.current_stream()
38 |         if get_world_size() == 1:
39 |             self.copy_stream = torch.cuda.Stream()
40 |         else:
41 |             # TODO(zilinzhu) The async copy mechanism has some
42 |             # weird numeric bugs in multi-process setting.
43 |             # Turn it off before fixing that.
44 |             logger.warning(
45 |                 "Asynchronized move will not be enabled for world size larger than 1"
46 |             )
47 |             self.copy_stream = self.compute_stream
48 | 


--------------------------------------------------------------------------------
/examples/benchmark/is_run_this_file.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import argparse
31 | from process_logs import is_run_this_file
32 | 
33 | 
34 | def add_args(parser):
35 |     group = parser.add_argument_group(title="patrickstar")
36 |     group.add_argument(
37 |         "--file",
38 |         type=str,
39 |         help="file name.",
40 |     )
41 |     group.add_argument(
42 |         "--path",
43 |         type=str,
44 |         help="path name.",
45 |     )
46 |     return parser
47 | 
48 | 
49 | if __name__ == "__main__":
50 |     parser = argparse.ArgumentParser(description="PatrickStar Arguments")
51 |     parser = add_args(parser)
52 |     args = parser.parse_args()
53 |     IS_RUN = is_run_this_file(args.path, args.file, {}, {})
54 | 
55 |     if IS_RUN:
56 |         print(1)
57 |     else:
58 |         print(0)
59 | 


--------------------------------------------------------------------------------
/examples/optimizations/test_tiling.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import torch
31 | import pytest
32 | import copy
33 | from tiling import TiledLinear
34 | 
35 | 
36 | @pytest.mark.parametrize("in_splits,out_splits", [(1, 1), (2, 2)])
37 | @pytest.mark.parametrize("in_f,out_f", [(32, 32), (23, 29), (29, 23)])
38 | def test_tiled_forward(in_splits, out_splits, in_f, out_f):
39 |     base = torch.nn.Linear(in_f, out_f)
40 |     test = TiledLinear(
41 |         in_f,
42 |         out_f,
43 |         bias=True,
44 |         init_linear=copy.deepcopy(base),
45 |         out_splits=out_splits,
46 |         in_splits=in_splits,
47 |     )
48 | 
49 |     inp = torch.rand(in_f)
50 | 
51 |     base_out = base(copy.deepcopy(inp))
52 |     test_out = test(copy.deepcopy(inp))
53 | 
54 |     assert torch.allclose(base_out, test_out, rtol=1e-4)
55 | 


--------------------------------------------------------------------------------
/unitest/test_torch_scope.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | 
34 | from common import distributed_test
35 | from patrickstar.core import PatrickStarClient, PSPreProcessCtx, torch_scope, ParamType
36 | 
37 | 
38 | class TestTorchScopeContext(unittest.TestCase):
39 |     def setUp(self):
40 |         pass
41 | 
42 |     @distributed_test(world_size=[1])
43 |     def test_torch_scope(self):
44 |         def model_provider():
45 |             with torch_scope():
46 |                 return torch.nn.Linear(5, 10)
47 | 
48 |         default_chunk_size = 1 * 1024 * 1024
49 |         client = PatrickStarClient(0, default_chunk_size)
50 | 
51 |         with PSPreProcessCtx(client, dtype=torch.float):
52 |             ps_model = model_provider()
53 | 
54 |         assert ps_model.weight.ps_attr.param_type == ParamType.TORCH_BASED
55 | 
56 | 
57 | if __name__ == "__main__":
58 |     unittest.main()
59 | 


--------------------------------------------------------------------------------
/patrickstar/manager/runtime_config.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | from copy import deepcopy
30 | 
31 | from patrickstar.utils import SingletonMeta
32 | 
33 | 
34 | class RuntimeConfig(metaclass=SingletonMeta):
35 |     def __init__(self):
36 |         self.config = {
37 |             "use_chunk": True,
38 |             # Whether the torch based tensors will do allreduce,
39 |             # this is strongly related to `torch_scope`
40 |             "do_allreduce": True,
41 |         }
42 |         self.old_configs = []
43 | 
44 |     @property
45 |     def use_chunk(self):
46 |         return self.config["use_chunk"]
47 | 
48 |     @property
49 |     def do_allreduce(self):
50 |         return self.config["do_allreduce"]
51 | 
52 |     def push(self):
53 |         self.old_configs.append(self.config)
54 |         self.config = deepcopy(self.config)
55 | 
56 |     def pop(self):
57 |         self.config = self.old_configs.pop()
58 | 
59 | 
60 | _runtime_config = RuntimeConfig()
61 | 


--------------------------------------------------------------------------------
/patrickstar/utils/helper.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import torch
31 | from patrickstar.core.const import ParamType, AccessType
32 | 
33 | 
34 | def get_real_data_tensor(param):
35 |     if param.ps_attr.param_type == ParamType.TORCH_BASED:
36 |         return param.data
37 |     elif param.ps_attr.param_type == ParamType.CHUNK_BASED:
38 |         return param.ps_attr.access_tensor(AccessType.DATA)
39 |     else:
40 |         raise RuntimeError
41 | 
42 | 
43 | def getsizeof(data_type: torch.dtype):
44 |     if data_type == torch.float:
45 |         return 4
46 |     elif data_type == torch.half:
47 |         return 2
48 |     elif data_type == torch.int8:
49 |         return 1
50 |     elif data_type == torch.int16:
51 |         return 2
52 |     elif data_type == torch.int32:
53 |         return 4
54 |     elif data_type == torch.int64:
55 |         return 8
56 |     else:
57 |         raise TypeError(f"getsizeof dose not support data type {data_type}")
58 | 
59 | 
60 | def get_space_of(tensor):
61 |     return tensor.numel() * getsizeof(tensor.dtype)
62 | 


--------------------------------------------------------------------------------
/examples/data_loader.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import torch
31 | from torch.utils.data import SequentialSampler
32 | 
33 | 
34 | def get_bert_data_loader(
35 |     batch_size,
36 |     total_samples,
37 |     sequence_length,
38 |     device,
39 |     data_type=torch.float,
40 |     is_distrbuted=False,
41 | ):
42 |     train_data = torch.randint(
43 |         low=0,
44 |         high=1000,
45 |         size=(total_samples, sequence_length),
46 |         device=device,
47 |         dtype=torch.long,
48 |     )
49 |     train_label = torch.randint(
50 |         low=0, high=2, size=(total_samples,), device=device, dtype=torch.long
51 |     )
52 |     train_dataset = torch.utils.data.TensorDataset(train_data, train_label)
53 |     if is_distrbuted:
54 |         sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
55 |     else:
56 |         sampler = SequentialSampler(train_dataset)
57 |     train_loader = torch.utils.data.DataLoader(
58 |         train_dataset, batch_size=batch_size, sampler=sampler
59 |     )
60 |     return train_loader
61 | 


--------------------------------------------------------------------------------
/patrickstar/core/comm.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from patrickstar.utils import get_world_size
31 | 
32 | 
33 | class CommGroupInfo(object):
34 |     def __init__(self, chunk_type, id):
35 |         self.chunk_type = chunk_type
36 |         self.id = id
37 | 
38 |     def __hash__(self):
39 |         return hash((self.chunk_type, self.id))
40 | 
41 |     def __eq__(self, other):
42 |         return (self.chunk_type, self.id) == (other.chunk_type, other.id)
43 | 
44 |     def __str__(self):
45 |         return f"({self.chunk_type}, {self.id})"
46 | 
47 | 
48 | class CommInfo(object):
49 |     def __init__(self, chunk_type, group_id, offset):
50 |         assert offset < get_world_size()
51 |         self.group = CommGroupInfo(chunk_type=chunk_type, id=group_id)
52 |         self.offset = offset
53 | 
54 |     @property
55 |     def chunk_type(self):
56 |         return self.group.chunk_type
57 | 
58 |     @property
59 |     def group_id(self):
60 |         return self.group.id
61 | 
62 |     def __str__(self):
63 |         return f"({self.group}, {self.offset})"
64 | 


--------------------------------------------------------------------------------
/patrickstar/utils/model_size_calculator.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from patrickstar.core.parameter import is_param_registered
31 | 
32 | 
33 | def get_ps_model_size(model):
34 |     numel = 0
35 |     param_cnt = 0
36 |     for _, param in model.named_parameters(recurse=True):
37 |         if is_param_registered(param):
38 |             numel += param.ps_attr.numel
39 |         else:
40 |             numel += param.numel()
41 |         param_cnt += 1
42 |     return numel, param_cnt
43 | 
44 | 
45 | def estimate_bert_mac(config, batch_size, sequence_length, model_size):
46 |     nvidia_total_macs = (
47 |         96
48 |         * batch_size
49 |         * sequence_length
50 |         * config.num_hidden_layers
51 |         * config.hidden_size ** 2
52 |         * (
53 |             1
54 |             + sequence_length / (6 * config.hidden_size)
55 |             + config.vocab_size / (16 * config.num_hidden_layers * config.hidden_size)
56 |         )
57 |     )
58 | 
59 |     tera_flops = model_size * batch_size * sequence_length * 2 * 4
60 |     return tera_flops, nvidia_total_macs
61 | 


--------------------------------------------------------------------------------
/unitest/test_optimizer_init.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | from transformers import BertModel, BertConfig
34 | 
35 | from common import distributed_test
36 | from patrickstar.core import PSPreProcessCtx
37 | from patrickstar.core import PatrickStarClient
38 | from patrickstar.ops import FP16Adam
39 | 
40 | 
41 | class TestOptimizerInitContext(unittest.TestCase):
42 |     def setUp(self):
43 |         pass
44 | 
45 |     @distributed_test(world_size=[1])
46 |     def test_optimizer_init(self):
47 |         def model_provider():
48 |             cfg = BertConfig()
49 |             cfg.vocab_size = 10
50 |             model = BertModel(cfg)
51 |             return model
52 | 
53 |         default_chunk_size = 32 * 1024 * 1024
54 |         client = PatrickStarClient(0, default_chunk_size)
55 | 
56 |         torch.manual_seed(0)
57 |         with PSPreProcessCtx(client, dtype=torch.float):
58 |             ps_model = model_provider()
59 | 
60 |         FP16Adam(client, ps_model.parameters())
61 | 
62 | 
63 | if __name__ == "__main__":
64 |     unittest.main()
65 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from setuptools import setup, find_packages
31 | from torch.utils.cpp_extension import BuildExtension
32 | from patrickstar.ops.op_builder import CPUAdamBuilder
33 | 
34 | 
35 | def fetch_requirements(path):
36 |     with open(path, "r") as fd:
37 |         return [r.strip() for r in fd.readlines()]
38 | 
39 | 
40 | require_list = fetch_requirements("requirements.txt")
41 | 
42 | setup(
43 |     name="patrickstar",
44 |     version="0.4.6",
45 |     description="PatrickStart library",
46 |     long_description="PatrickStar: Parallel Training of Large Language Models via a Chunk-based Parameter Server",
47 |     long_description_content_type="text/markdown",
48 |     author="Tencent PatrickStar Team",
49 |     author_email="fangjiarui123@gmail.com",
50 |     url="https://fangjiarui.github.io/",
51 |     install_requires=require_list,
52 |     setup_requires=require_list,
53 |     packages=find_packages(),
54 |     include_package_data=True,
55 |     classifiers=[
56 |         "Programming Language :: Python :: 3.6",
57 |         "Programming Language :: Python :: 3.7",
58 |         "Programming Language :: Python :: 3.8",
59 |     ],
60 |     license="BSD",
61 |     ext_modules=[CPUAdamBuilder().builder()],
62 |     cmdclass={"build_ext": BuildExtension.with_options(use_ninja=False)},
63 | )
64 | 


--------------------------------------------------------------------------------
/patrickstar/core/const.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from enum import Enum
31 | 
32 | 
33 | class AccessType(Enum):
34 |     DATA = 1
35 |     GRAD = 2
36 | 
37 | 
38 | class ChunkState(Enum):
39 |     r"""Chunk state during training."""
40 |     FREE = 0
41 |     # Chunk memory is allocated.
42 |     # Tensors are used for computing.
43 |     COMPUTE = 1
44 |     # Holding meaningful data.
45 |     HOLD = 2
46 |     HOLD_AFTER_FWD = 3
47 |     HOLD_AFTER_BWD = 4
48 | 
49 |     # Chunk memory is not allocated.
50 |     RELEASED = 5
51 | 
52 | 
53 | class TensorState(Enum):
54 |     r"""Tensor state during training
55 | 
56 |     Notice that this is the state of the tensor in the chunk,
57 |     while `ChunkState` is the state of the whole state.
58 |     """
59 |     # Can be released.
60 |     FREE = 0
61 |     # In computation, cannot be moved.
62 |     COMPUTE = 1
63 |     # Can be moved, cannot be released.
64 |     HOLD = 2
65 |     HOLD_AFTER_FWD = 3
66 |     HOLD_AFTER_BWD = 4
67 | 
68 | 
69 | class TrainingStage(Enum):
70 |     UNSTART = 0
71 |     FWD = 1
72 |     BWD = 2
73 |     ADAM = 3
74 | 
75 | 
76 | class ChunkType(Enum):
77 |     PARAM_FP16 = 0
78 |     PARAM_FP32 = 1
79 |     MOMENTUM = 2
80 |     VARIANCE = 3
81 |     UNDEF = 4
82 | 
83 | 
84 | class ParamType(Enum):
85 |     CHUNK_BASED = 0
86 |     TORCH_BASED = 1
87 | 


--------------------------------------------------------------------------------
/patrickstar/utils/distributed.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | import os
30 | import torch
31 | 
32 | from .logging import logger
33 | 
34 | 
35 | def get_rank():
36 |     if torch.distributed.is_initialized():
37 |         return torch.distributed.get_rank()
38 |     return 0
39 | 
40 | 
41 | def get_world_size():
42 |     if torch.distributed.is_initialized():
43 |         return torch.distributed.get_world_size()
44 |     return 1
45 | 
46 | 
47 | # Use global variable to prevent changing of the environment variable
48 | # and to make sure the warning is only logged once.
49 | _local_world_size = None
50 | 
51 | 
52 | def get_local_world_size():
53 |     global _local_world_size
54 |     if _local_world_size is None:
55 |         if torch.distributed.is_initialized():
56 |             if "LOCAL_WORLD_SIZE" in os.environ:
57 |                 _local_world_size = int(os.environ["LOCAL_WORLD_SIZE"])
58 |             else:
59 |                 logger.warning(
60 |                     "If you are training with multiple nodes, it's recommand to "
61 |                     "set LOCAL_WORLD_SIZE manually to make better use of CPU memory. "
62 |                     "Otherwise, get_world_size() is used instead."
63 |                 )
64 |                 _local_world_size = get_world_size()
65 |         else:
66 |             _local_world_size = 1
67 |     return _local_world_size
68 | 


--------------------------------------------------------------------------------
/unitest/test_embedding_ops.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | from torch.nn import Embedding as TorchEmbedding
34 | from transformers import BertConfig
35 | 
36 | from common import distributed_test
37 | from patrickstar.ops import Embedding as PSEmbedding
38 | 
39 | 
40 | class TestClientAccess(unittest.TestCase):
41 |     def setUp(self):
42 |         pass
43 | 
44 |     @distributed_test(world_size=[1])
45 |     def test_embedding(self):
46 |         cfg = BertConfig()
47 |         cfg.hidden_dropout_prob = 0
48 |         test_device = torch.device("cuda:0")
49 |         seq_len = 10
50 |         torch.manual_seed(0)
51 |         input_ids = torch.randint(
52 |             low=0,
53 |             high=cfg.vocab_size - 1,
54 |             size=(1, seq_len),
55 |             dtype=torch.long,
56 |             device=test_device,
57 |         )
58 | 
59 |         torch.manual_seed(0)
60 |         torch_embedding = TorchEmbedding(cfg.vocab_size, 64)
61 |         torch.manual_seed(0)
62 |         PSEmbedding.use_cpu = True
63 |         ps_embedding = PSEmbedding(cfg.vocab_size, 64)
64 | 
65 |         res = ps_embedding(input_ids)
66 |         torch_res = torch_embedding.to(test_device)(input_ids)
67 | 
68 |         self.assertLess(torch.max(torch_res.cpu() - res.cpu()), 1e-2)
69 | 
70 | 
71 | if __name__ == "__main__":
72 |     unittest.main()
73 | 


--------------------------------------------------------------------------------
/patrickstar/core/tensor_stub.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import torch
31 | 
32 | from patrickstar.core.const import AccessType, ParamType
33 | 
34 | 
35 | class TensorInfo(object):
36 |     r"""The info related to certain tensor."""
37 | 
38 |     def __init__(
39 |         self,
40 |         chunk_id: int,
41 |         tensor_id: int,
42 |         start_offset: int,
43 |         numel: int,
44 |         param: torch.nn.Parameter,
45 |         access_type: AccessType,
46 |         param_name="",
47 |     ):
48 |         self.tensor_id = tensor_id
49 |         self.chunk_id = chunk_id
50 |         self.start_offset = start_offset
51 |         self.numel = numel
52 |         self.param = param
53 |         self.tensor_name = (
54 |             f"{param_name}.data"
55 |             if (access_type == AccessType.DATA)
56 |             else f"{param_name}.grad"
57 |         )
58 |         self.access_type = access_type
59 | 
60 |     def __str__(self):
61 |         return (
62 |             f"tensor_id: {self.tensor_id}, name: {self.tensor_name}, "
63 |             f"shape: {self.param.shape}, chunk_id: {self.chunk_id}, "
64 |             f"start_offset: {self.start_offset}, numel: {self.numel}, state: {self.state()}"
65 |         )
66 | 
67 |     def state(self):
68 |         if self.param.ps_attr.param_type == ParamType.TORCH_BASED:
69 |             return None
70 |         else:
71 |             return self.param.ps_attr.get_state(self.access_type)
72 | 


--------------------------------------------------------------------------------
/patrickstar/utils/memory.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from collections import namedtuple
31 | 
32 | import psutil
33 | 
34 | 
35 | ps_mem_info = namedtuple("ps_mem_info", ["total", "free", "cached", "buffers", "used"])
36 | 
37 | 
38 | def get_memory_info():
39 |     try:
40 |         # psutil reads the memory info from /proc/memory_info,
41 |         # which results in returning the host memory instead of
42 |         # that of container.
43 |         # Here we try to read the container memory with method in:
44 |         # https://stackoverflow.com/a/46213331/5163915
45 |         # TODO(zilinzhu) Make this robust on most OS.
46 |         mems = {}
47 |         with open("/sys/fs/cgroup/memory/memory.meminfo", "rb") as f:
48 |             for line in f:
49 |                 fields = line.split()
50 |                 mems[fields[0]] = int(fields[1]) * 1024
51 |         total = mems[b"MemTotal:"]
52 |         free = mems[b"MemFree:"]
53 |         cached = mems[b"Cached:"]
54 |         buffers = mems[b"Buffers:"]
55 |         used = total - free - cached - buffers
56 |         if used < 0:
57 |             used = total - free
58 |         mem_info = ps_mem_info(
59 |             total=total, free=free, cached=cached, buffers=buffers, used=used
60 |         )
61 |     except FileNotFoundError:
62 |         mems = psutil.virtual_memory()
63 |         mem_info = ps_mem_info(
64 |             total=mems.total,
65 |             free=mems.free,
66 |             cached=mems.cached,
67 |             buffers=mems.buffers,
68 |             used=mems.used,
69 |         )
70 |     return mem_info
71 | 


--------------------------------------------------------------------------------
/unitest/test_utils.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | from patrickstar.utils.memory_monitor import get_sys_memory_used
34 | from patrickstar.core.memtracer.memtracer import AsyncMemoryMonitor
35 | 
36 | 
37 | class TestAsynMemoryMonitor(unittest.TestCase):
38 |     def setUp(self):
39 |         pass
40 | 
41 |     def helper_func(self):
42 |         dev = torch.device("cuda:0")
43 |         m = 400
44 |         n = 500
45 |         k = 600
46 |         a = torch.randn(m, k, device=torch.device("cuda:0"))
47 |         b = torch.randn(k, n, device=torch.device("cuda:0"))
48 |         c = torch.randn(m, n, device=torch.device("cuda:0"))
49 |         print(f"mem usage before matmul: {get_sys_memory_used(dev)}")
50 |         start_mem = get_sys_memory_used(dev)
51 |         for i in range(10):
52 |             c += torch.matmul(a, b)
53 |         print(f"mem usage after matmul: {get_sys_memory_used(dev)}")
54 |         finish_mem = get_sys_memory_used(dev)
55 |         return max(start_mem, finish_mem)
56 | 
57 |     def test_async_mem_monitor(self):
58 |         mem_monitor = AsyncMemoryMonitor()
59 |         mem_monitor.start()
60 |         max_mem_coarse = self.helper_func()
61 |         max_mem_fine = mem_monitor.finish()
62 |         self.assertTrue(max_mem_fine >= max_mem_coarse)
63 |         # max_mem fine 3760640, corse 2960384
64 |         # indicates the operator will generate singnificant temp buff.
65 |         print(f"max_mem fine {max_mem_fine}, corse {max_mem_coarse}")
66 | 
67 | 
68 | if __name__ == "__main__":
69 |     unittest.main()
70 | 


--------------------------------------------------------------------------------
/patrickstar/ops/op_builder/cpu_adam.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | """
31 | Copyright 2020 The Microsoft DeepSpeed Team
32 | """
33 | import os
34 | import sys
35 | 
36 | from .builder import CUDAOpBuilder
37 | 
38 | 
39 | class CPUAdamBuilder(CUDAOpBuilder):
40 |     BUILD_VAR = "DS_BUILD_CPU_ADAM"
41 |     NAME = "cpu_adam"
42 |     BASE_DIR = "patrickstar/ops/csrc"
43 | 
44 |     def __init__(self):
45 |         super().__init__(name=self.NAME)
46 | 
47 |     def is_compatible(self):
48 |         # Disable on Windows.
49 |         return sys.platform != "win32"
50 | 
51 |     def absolute_name(self):
52 |         return f"patrickstar.ops.adam.{self.NAME}_op"
53 | 
54 |     def sources(self):
55 |         return [
56 |             os.path.join(CPUAdamBuilder.BASE_DIR, "adam/cpu_adam.cpp"),
57 |         ]
58 | 
59 |     def include_paths(self):
60 |         import torch
61 | 
62 |         cuda_include = os.path.join(torch.utils.cpp_extension.CUDA_HOME, "include")
63 |         return [os.path.join(CPUAdamBuilder.BASE_DIR, "includes"), cuda_include]
64 | 
65 |     def cxx_args(self):
66 |         import torch
67 | 
68 |         cuda_lib64 = os.path.join(torch.utils.cpp_extension.CUDA_HOME, "lib64")
69 |         cpu_arch = self.cpu_arch()
70 |         simd_width = self.simd_width()
71 | 
72 |         return [
73 |             "-O3",
74 |             "-std=c++14",
75 |             f"-L{cuda_lib64}",
76 |             "-lcudart",
77 |             "-lcublas",
78 |             "-g",
79 |             "-Wno-reorder",
80 |             cpu_arch,
81 |             "-fopenmp",
82 |             simd_width,
83 |         ]
84 | 


--------------------------------------------------------------------------------
/unitest/test_memory_cache.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | 
34 | from patrickstar.core.memory_cache import MemoryCache
35 | from patrickstar.core.memtracer import RuntimeMemTracer
36 | 
37 | 
38 | class TestMemoryCache(unittest.TestCase):
39 |     def setUp(self):
40 |         self.default_chunk_size = 40
41 | 
42 |     def test_case1(self):
43 |         self.compute_device = (
44 |             torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
45 |         )
46 |         memtracer = RuntimeMemTracer()
47 |         memory_cache = MemoryCache(2, memtracer)
48 | 
49 |         payload1 = memory_cache.pop_or_allocate(
50 |             self.compute_device, 10, torch.float, False
51 |         )
52 |         payload1_addr = payload1.data_ptr()
53 |         memory_cache.push(payload1)
54 |         payload2 = memory_cache.pop_or_allocate(
55 |             self.compute_device, 10, torch.float, False
56 |         )
57 |         self.assertTrue(payload1_addr == payload2.data_ptr())
58 | 
59 |         payload3 = memory_cache.pop_or_allocate(
60 |             self.compute_device, 10, torch.float, False
61 |         )
62 |         self.assertTrue(payload1_addr != payload3.data_ptr())
63 |         print("payload3 ", payload3.data_ptr())
64 | 
65 |         payload2_addr = payload2.data_ptr()
66 |         memory_cache.push(payload2)
67 |         memory_cache.push(payload3)
68 | 
69 |         payload4 = memory_cache.pop_or_allocate(
70 |             self.compute_device,
71 |             10,
72 |             torch.float,
73 |             False,
74 |         )
75 |         self.assertTrue(payload2_addr == payload4.data_ptr())
76 | 
77 | 
78 | if __name__ == "__main__":
79 |     unittest.main()
80 | 


--------------------------------------------------------------------------------
/unitest/test_eviction_policy.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | from patrickstar.core.eviction_policy import LatestAccessChunkEvictionPolicy
34 | from patrickstar.core.chunk_data import Chunk
35 | from patrickstar.core.memtracer import RuntimeMemTracer
36 | 
37 | 
38 | class TestEvictionPolicy(unittest.TestCase):
39 |     def setUp(self):
40 |         pass
41 | 
42 |     def test_chunk_eviction(self):
43 |         id_to_chunk_list = {}
44 |         dev = torch.device("cpu:0")
45 |         mem_tracer = RuntimeMemTracer(
46 |             local_rank=0, config={"use_async_mem_monitor": True}
47 |         )
48 |         id_to_chunk_list[0] = Chunk(10, torch.float, 0, mem_tracer, None, 0, False)
49 |         id_to_chunk_list[0].allocate_payload(dev)
50 |         id_to_chunk_list[1] = Chunk(10, torch.float, 1, mem_tracer, None, 0, False)
51 |         id_to_chunk_list[1].allocate_payload(dev)
52 |         metronome = mem_tracer.metronome
53 |         metronome.set_warmup(True)
54 |         policy = LatestAccessChunkEvictionPolicy(metronome)
55 | 
56 |         # trace chunk access
57 |         policy.trace_access(0, dev)
58 |         metronome.tiktac()
59 |         policy.trace_access(1, dev)
60 |         print(policy.chunk_access_dict)
61 | 
62 |         # Finish warmup
63 |         metronome.set_warmup(False)
64 |         metronome.reset()
65 | 
66 |         # Test eviction strategy
67 |         ret_list = policy.derive_eviction_list(id_to_chunk_list, 10, dev)
68 |         self.assertTrue(ret_list == [0])
69 | 
70 |         metronome.tiktac()
71 |         ret_list = policy.derive_eviction_list(id_to_chunk_list, 10, dev)
72 |         self.assertTrue(ret_list == [1])
73 | 
74 | 
75 | if __name__ == "__main__":
76 |     unittest.main()
77 | 


--------------------------------------------------------------------------------
/examples/ps_config.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | 
31 | def get_patrickstar_config(
32 |     args, lr=0.001, betas=(0.9, 0.999), eps=1e-6, weight_decay=0
33 | ):
34 |     config = {
35 |         # The same format as optimizer config of DeepSpeed
36 |         # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters
37 |         "optimizer": {
38 |             "type": "Adam",
39 |             "params": {
40 |                 "lr": lr,
41 |                 "betas": betas,
42 |                 "eps": eps,
43 |                 "weight_decay": weight_decay,
44 |                 "use_hybrid_adam": args.use_hybrid_adam,
45 |             },
46 |         },
47 |         "fp16": {
48 |             "enabled": True,
49 |             # Set "loss_scale" to 0 to use DynamicLossScaler.
50 |             "loss_scale": 0,
51 |             "initial_scale_power": args.init_loss_scale_power,
52 |             "loss_scale_window": 1000,
53 |             "hysteresis": 2,
54 |             "min_loss_scale": 1,
55 |         },
56 |         "default_chunk_size": args.default_chunk_size,
57 |         "release_after_init": args.release_after_init,
58 |         "use_fake_dist": args.use_fake_dist,
59 |         "use_cpu_embedding": args.use_cpu_embedding,
60 |         "client": {
61 |             "mem_tracer": {
62 |                 "use_async_mem_monitor": args.with_async_mem_monitor,
63 |                 "warmup_gpu_chunk_mem_ratio": 0.1,
64 |                 "overall_gpu_mem_ratio": 0.9,
65 |                 "overall_cpu_mem_ratio": 0.9,
66 |                 "margin_use_ratio": 0.8,
67 |                 "use_fake_dist": False,
68 |                 "with_static_partition": args.with_static_partition,
69 |             },
70 |             "opts": {
71 |                 "with_mem_saving_comm": args.with_mem_saving_comm,
72 |                 "with_mem_cache": args.with_mem_cache,
73 |                 "with_async_move": args.with_async_move,
74 |             },
75 |         },
76 |     }
77 | 
78 |     return config
79 | 


--------------------------------------------------------------------------------
/patrickstar/core/memtracer/metronome.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | from .training_stage_mgr import TrainingStageMgr
31 | 
32 | 
33 | class Metronome(object):
34 |     """
35 |     A metronome for memory stats sampling.
36 |     Use two indicators to tell us where the training is now
37 |     One is moment, indicates the moment of one iteration.
38 |     The other is training stage, indicates FWD/BWD/ADAM and is this iteration is
39 |     a warmup iteration.
40 | 
41 |     It also contain the training stage information.
42 |     """
43 | 
44 |     def __init__(self):
45 |         self._moment = 0
46 |         self._total_moment = None
47 |         self.training_stage_mgr = TrainingStageMgr()
48 | 
49 |     def set_training_phase(self, phase):
50 |         self.training_stage_mgr.training_phase = phase
51 | 
52 |     def set_warmup(self, flag):
53 |         self.training_stage_mgr.is_warmup = flag
54 | 
55 |     def is_warmup(self):
56 |         return self.training_stage_mgr.is_warmup
57 | 
58 |     def training_stage(self):
59 |         return self.training_stage_mgr.training_phase
60 | 
61 |     def get_total_mom(self):
62 |         assert self._total_moment is not None, "Don not use get_total during warmup"
63 |         return self._total_moment
64 | 
65 |     def tiktac(self):
66 |         """
67 |         The function should be called right before and after computing of an operator.
68 |         """
69 |         self._moment += 1
70 | 
71 |     def moment(self):
72 |         return self._moment
73 | 
74 |     def reset(self):
75 |         """
76 |         The function is called after a trainig iteration is finished.
77 |         """
78 |         self._total_moment = self._moment
79 |         self._moment = 0
80 | 
81 |     def next_moment(self):
82 |         assert self._total_moment is not None
83 |         return min(self._total_moment, self._moment + 1) % self._total_moment
84 | 
85 |     def prev_moment(self):
86 |         assert self._total_moment is not None
87 |         return max(0, self._moment - 1) % self._total_moment
88 | 


--------------------------------------------------------------------------------
/unitest/test_model_init.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import unittest
31 | 
32 | import torch
33 | from transformers import BertModel, BertConfig
34 | 
35 | from common import distributed_test
36 | from patrickstar.core import PatrickStarClient, ParamType
37 | from patrickstar.core.preprocess import PSPreProcessCtx
38 | 
39 | 
40 | class TestModelInitContext(unittest.TestCase):
41 |     def setUp(self):
42 |         pass
43 | 
44 |     @distributed_test(world_size=[2], backend="gloo", use_fake_dist=True)
45 |     def test_model_init(self):
46 |         def model_provider():
47 |             cfg = BertConfig()
48 |             cfg.vocab_size = 10
49 |             model = BertModel(cfg)
50 |             return model
51 | 
52 |         compute_device = torch.device("cpu:0")
53 |         default_chunk_size = 32 * 1024 * 1024
54 |         client = PatrickStarClient(0, default_chunk_size)
55 | 
56 |         torch.manual_seed(0)
57 |         with PSPreProcessCtx(client, dtype=torch.float, release_after_init=True):
58 |             ps_model = model_provider()
59 | 
60 |         torch.manual_seed(0)
61 |         torch_model = model_provider()
62 | 
63 |         for ps_param, torch_param in zip(
64 |             ps_model.parameters(), torch_model.parameters()
65 |         ):
66 |             if ps_param.ps_attr.param_type == ParamType.TORCH_BASED:
67 |                 self.assertLess(
68 |                     torch.max(torch_param.data - ps_param),
69 |                     1e-4,
70 |                     "PyTorch tensors are not consist with each other",
71 |                 )
72 |             else:
73 |                 ps_data = client.access_data(ps_param, compute_device)
74 |                 if ps_param.ps_attr.is_local():
75 |                     self.assertLess(
76 |                         torch.max(torch_param.data - ps_data),
77 |                         1e-4,
78 |                         f"{ps_param.ps_attr.name} ps tensor and pytorch tensor are not consist with each other",
79 |                     )
80 |                 client.release_data(ps_param)
81 | 
82 | 
83 | if __name__ == "__main__":
84 | 
85 |     unittest.main()
86 | 


--------------------------------------------------------------------------------
/examples/imdb_dataset.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import os
31 | from pathlib import Path
32 | 
33 | import torch
34 | from sklearn.model_selection import train_test_split
35 | from transformers import BertTokenizerFast
36 | 
37 | 
38 | # wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
39 | # tar -xf aclImdb_v1.tar.gz
40 | def get_dataset(data_path):
41 |     def read_imdb_split(split_dir):
42 |         split_dir = Path(split_dir)
43 |         texts = []
44 |         labels = []
45 |         for label_dir in ["pos", "neg"]:
46 |             for text_file in (split_dir / label_dir).iterdir():
47 |                 texts.append(text_file.read_text())
48 |                 labels.append(0 if label_dir == "neg" else 1)
49 | 
50 |         return texts, labels
51 | 
52 |     train_texts, train_labels = read_imdb_split(os.path.join(data_path, "train"))
53 |     test_texts, test_labels = read_imdb_split(os.path.join(data_path, "test"))
54 |     train_texts, val_texts, train_labels, val_labels = train_test_split(
55 |         train_texts, train_labels, test_size=0.2
56 |     )
57 | 
58 |     tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
59 | 
60 |     train_encodings = tokenizer(train_texts, truncation=True, padding=True)
61 |     val_encodings = tokenizer(val_texts, truncation=True, padding=True)
62 |     test_encodings = tokenizer(test_texts, truncation=True, padding=True)
63 | 
64 |     class IMDbDataset(torch.utils.data.Dataset):
65 |         def __init__(self, encodings, labels):
66 |             self.encodings = encodings
67 |             self.labels = labels
68 | 
69 |         def __getitem__(self, idx):
70 |             item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
71 |             item["labels"] = torch.tensor(self.labels[idx])
72 |             return item
73 | 
74 |         def __len__(self):
75 |             return len(self.labels)
76 | 
77 |     train_dataset = IMDbDataset(train_encodings, train_labels)
78 |     val_dataset = IMDbDataset(val_encodings, val_labels)
79 |     test_dataset = IMDbDataset(test_encodings, test_labels)
80 | 
81 |     return train_dataset, val_dataset, test_dataset
82 | 


--------------------------------------------------------------------------------
/examples/moe/moe_bert.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | from transformers import BertLayer, BertForSequenceClassification
30 | from transformers.models.bert.modeling_bert import BertAttention
31 | 
32 | from patrickstar.core import torch_scope
33 | from patrickstar.utils import logger, get_world_size
34 | 
35 | try:
36 |     import fmoe
37 | except ImportError:
38 |     logger.error("Please install FastMoE to use MoE with PatrickStar.")
39 | 
40 | 
41 | def __init__(self, config):
42 |     super(BertLayer, self).__init__()
43 |     self.chunk_size_feed_forward = config.chunk_size_feed_forward
44 |     self.seq_len_dim = 1
45 |     self.attention = BertAttention(config)
46 |     self.is_decoder = config.is_decoder
47 |     self.add_cross_attention = config.add_cross_attention
48 |     if self.add_cross_attention:
49 |         assert (
50 |             self.is_decoder
51 |         ), f"{self} should be used as a decoder model if cross attention is added"
52 |         self.crossattention = BertAttention(config)
53 |     # The MoE modules are mainly of model parallel, we need to use `torch_scope`
54 |     # to separate it from the other chunk based data parallel modules.
55 |     # Also, MoE modules will take cart of its own communication, that's why
56 |     # we need to disable allreduce in the torch scope.
57 |     with torch_scope(do_allreduce=False):
58 |         self.output = fmoe.FMoETransformerMLP(
59 |             num_expert=2,
60 |             world_size=get_world_size(),
61 |             d_model=config.hidden_size,
62 |             d_hidden=config.intermediate_size,
63 |             gate=fmoe.gates.NaiveGate,
64 |         )
65 | 
66 | 
67 | def feed_forward_chunk(self, attention_output):
68 |     layer_output = self.output(attention_output)
69 |     return layer_output
70 | 
71 | 
72 | def build_moe_bert():
73 |     # Normally you should write your own Model and create the MoE parts
74 |     # in it. Here we directly substitute the origin huggingface Bert model
75 |     # for simplicity.
76 |     BertLayer.__init__ = __init__
77 |     BertLayer.feed_forward_chunk = feed_forward_chunk
78 |     model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
79 |     return model
80 | 


--------------------------------------------------------------------------------
/patrickstar/utils/logging.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | import logging
30 | import sys
31 | from rich.logging import RichHandler
32 | 
33 | import torch.distributed as dist
34 | 
35 | 
36 | class LoggerFactory:
37 |     @staticmethod
38 |     def create_logger(name=None, level=logging.WARNING):
39 |         """create a logger
40 |         Args:
41 |             name (str): name of the logger
42 |             level: level of logger
43 |         Raises:
44 |             ValueError is name is None
45 |         """
46 | 
47 |         if name is None:
48 |             raise ValueError("name for logger cannot be None")
49 | 
50 |         # formatter = logging.Formatter(
51 |         #     "[%(asctime)s] [%(levelname)s] "
52 |         #     "[%(filename)s:%(lineno)d:%(funcName)s] %(message)s")
53 | 
54 |         formatter = logging.Formatter("[%(asctime)s] [%(levelname)s] %(message)s")
55 | 
56 |         logger_ = logging.getLogger(name)
57 |         logger_.setLevel(level)
58 |         logger_.propagate = False
59 |         ch = logging.StreamHandler(stream=sys.stdout)
60 |         ch.setFormatter(formatter)
61 |         logger_.addHandler(RichHandler())
62 |         return logger_
63 | 
64 | 
65 | logger = LoggerFactory.create_logger(name="PatrickStar", level=logging.WARNING)
66 | 
67 | 
68 | def log_dist(message, ranks=[0], level=logging.INFO):
69 |     """Log message when one of following condition meets
70 |     + not dist.is_initialized()
71 |     + dist.get_rank() in ranks if ranks is not None or ranks = [-1]
72 |     Args:
73 |         message (str)
74 |         ranks (list)
75 |         level (int)
76 |     """
77 |     should_log = not dist.is_initialized()
78 |     ranks = ranks or []
79 |     my_rank = dist.get_rank() if dist.is_initialized() else -1
80 |     if ranks and not should_log:
81 |         should_log = ranks[0] == -1
82 |         should_log = should_log or (my_rank in set(ranks))
83 |     if should_log:
84 |         final_message = "[Rank {}] {}".format(my_rank, message)
85 |         logger.log(level, final_message)
86 | 
87 | 
88 | def print_rank(message, rank=0, debug=False, force=False):
89 |     if (not dist.is_initialized() or dist.get_rank() == rank) and (debug or force):
90 |         logger.info(message)
91 | 


--------------------------------------------------------------------------------
/examples/benchmark/generate_res_table.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import os
31 | import collections
32 | from process_logs import collect_info_from_dir
33 | 
34 | if __name__ == "__main__":
35 |     overall_res_dict = {}
36 |     overall_file_dict = {}
37 |     for file in os.listdir("./"):
38 |         if os.path.isdir(file) and "logs" in file:
39 |             res_dict, file_dict = collect_info_from_dir(file)
40 |             overall_res_dict.update(res_dict)
41 |             overall_file_dict.update(file_dict)
42 | 
43 |     detail_res_table = {}
44 |     best_res_table = {}
45 |     pos = {1: 0, 2: 2, 4: 4, 8: 6}
46 | 
47 |     for k, v in overall_res_dict.items():
48 |         plan = k.split("_")
49 |         model_scale = plan[0]
50 |         bs = plan[1]
51 |         gpu_num = int(plan[2])
52 |         key = (model_scale, bs)
53 |         if key not in detail_res_table:
54 |             detail_res_table[key] = [None for i in range(8)]
55 | 
56 |         filename = overall_file_dict[k]
57 |         detail_res_table[key][pos[gpu_num]] = v * gpu_num
58 |         detail_res_table[key][pos[gpu_num] + 1] = filename
59 | 
60 |         if model_scale not in best_res_table:
61 |             best_res_table[model_scale] = [0 for i in range(8)]
62 |         if v * gpu_num > best_res_table[model_scale][pos[gpu_num]]:
63 |             best_res_table[model_scale][pos[gpu_num]] = v * gpu_num
64 |             best_res_table[model_scale][pos[gpu_num] + 1] = bs  # filename
65 | 
66 |     od = collections.OrderedDict(sorted(detail_res_table.items()))
67 |     with open("benchmark_res.csv", "w") as wfh:
68 |         for k, v in od.items():
69 |             for item in k:
70 |                 wfh.write(str(item))
71 |                 wfh.write(",")
72 |             for item in v:
73 |                 wfh.write(str(item))
74 |                 wfh.write(",")
75 |             wfh.write("\n")
76 | 
77 |     od = collections.OrderedDict(sorted(best_res_table.items()))
78 | 
79 |     with open("best_res.csv", "w") as wfh:
80 |         for k, v in od.items():
81 |             wfh.write(str(k))
82 |             wfh.write(",")
83 |             for item in v:
84 |                 wfh.write(str(item))
85 |                 wfh.write(",")
86 |             wfh.write("\n")
87 | 


--------------------------------------------------------------------------------
/patrickstar/profiler/profiler.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import pickle
31 | import time
32 | 
33 | from patrickstar.utils import SingletonMeta
34 | 
35 | 
36 | class Profiler(metaclass=SingletonMeta):
37 |     def __init__(self):
38 |         self._nested_level = 0
39 |         self.start_time = None
40 |         self.warmup_finish_time = None
41 |         self.end_time = None
42 |         # memory info
43 |         # [(moment, time, memory)]
44 |         self.gpu_memory_used = []
45 |         self.gpu_chunk_memory_used = []
46 |         self.cpu_memory_used = []
47 |         self.cpu_chunk_memory_used = []
48 |         # stage info
49 |         # [(time, stage_converted)]
50 |         self.stage_convert_time = []
51 |         # chunk info
52 |         # {chunk_id:
53 |         #     "type": type,
54 |         #     "life_cycle": [(time, type, to_device)]}
55 |         self.chunk_life_cycle = {}
56 | 
57 |     def start(self):
58 |         if self.start_time is None:
59 |             self.start_time = time.time()
60 |         self._nested_level += 1
61 | 
62 |     def end(self):
63 |         self._nested_level = max(0, self._nested_level - 1)
64 |         if self._nested_level == 0:
65 |             self.end_time = time.time()
66 | 
67 |     def started(self):
68 |         return self._nested_level > 0
69 | 
70 |     def warmup_finish(self):
71 |         if self.warmup_finish_time is None:
72 |             self.warmup_finish_time = time.time()
73 | 
74 |     def state_dict(self):
75 |         return {
76 |             "start_time": self.start_time,
77 |             "end_time": self.end_time if self.end_time is not None else time.time(),
78 |             "warmup_finish_time": self.warmup_finish_time,
79 |             "gpu_memory_used": self.gpu_memory_used,
80 |             "gpu_chunk_memory_used": self.gpu_chunk_memory_used,
81 |             "cpu_memory_used": self.cpu_memory_used,
82 |             "cpu_chunk_memory_used": self.cpu_chunk_memory_used,
83 |             "stage_convert_time": self.stage_convert_time,
84 |             "chunk_life_cycle": self.chunk_life_cycle,
85 |         }
86 | 
87 |     def save(self, filename):
88 |         with open(filename, "wb") as f:
89 |             pickle.dump(self.state_dict(), f)
90 | 
91 | 
92 | profiler = Profiler()
93 | 


--------------------------------------------------------------------------------
/patrickstar/ops/csrc/includes/context.h:
--------------------------------------------------------------------------------
  1 | #pragma once
  2 | 
  3 | #include <ATen/cuda/CUDAContext.h>
  4 | #include <cuda_runtime_api.h>
  5 | #include <cassert>
  6 | #include <iostream>
  7 | #include <vector>
  8 | #include "cublas_v2.h"
  9 | #include "cuda.h"
 10 | #include "curand.h"
 11 | 
 12 | #define WARP_SIZE 32
 13 | 
 14 | #define CUDA_CHECK(callstr)                                                                    \
 15 |     {                                                                                          \
 16 |         cudaError_t error_code = callstr;                                                      \
 17 |         if (error_code != cudaSuccess) {                                                       \
 18 |             std::cerr << "CUDA error " << error_code << " at " << __FILE__ << ":" << __LINE__; \
 19 |             assert(0);                                                                         \
 20 |         }                                                                                      \
 21 |     }
 22 | 
 23 | #define CUDA_1D_KERNEL_LOOP(i, n) \
 24 |     for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x)
 25 | 
 26 | #define CUDA_2D_KERNEL_LOOP(i, n, j, m)                                                          \
 27 |     for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x) \
 28 |         for (size_t j = blockIdx.y * blockDim.y + threadIdx.y; j < (m); j += blockDim.y * gridDim.y)
 29 | 
 30 | #define DS_CUDA_NUM_THREADS 512
 31 | #define DS_MAXIMUM_NUM_BLOCKS 262144
 32 | 
 33 | inline int DS_GET_BLOCKS(const int N)
 34 | {
 35 |     return (std::max)(
 36 |         (std::min)((N + DS_CUDA_NUM_THREADS - 1) / DS_CUDA_NUM_THREADS, DS_MAXIMUM_NUM_BLOCKS),
 37 |         // Use at least 1 block, since CUDA does not allow empty block
 38 |         1);
 39 | }
 40 | 
 41 | class Context {
 42 | public:
 43 |     Context() : _workspace(nullptr), _seed(42), _curr_offset(0)
 44 |     {
 45 |         curandCreateGenerator(&_gen, CURAND_RNG_PSEUDO_DEFAULT);
 46 |         curandSetPseudoRandomGeneratorSeed(_gen, 123);
 47 |         if (cublasCreate(&_cublasHandle) != CUBLAS_STATUS_SUCCESS) {
 48 |             auto message = std::string("Fail to create cublas handle.");
 49 |             std::cerr << message << std::endl;
 50 |             throw std::runtime_error(message);
 51 |         }
 52 |     }
 53 | 
 54 |     virtual ~Context()
 55 |     {
 56 |         cublasDestroy(_cublasHandle);
 57 |         cudaFree(_workspace);
 58 |     }
 59 | 
 60 |     static Context& Instance()
 61 |     {
 62 |         static Context _ctx;
 63 |         return _ctx;
 64 |     }
 65 | 
 66 |     void SetWorkSpace(void* workspace)
 67 |     {
 68 |         if (!workspace) { throw std::runtime_error("Workspace is null."); }
 69 |         _workspace = workspace;
 70 |     }
 71 | 
 72 |     void* GetWorkSpace() { return _workspace; }
 73 | 
 74 |     curandGenerator_t& GetRandGenerator() { return _gen; }
 75 | 
 76 |     cudaStream_t GetCurrentStream()
 77 |     {
 78 |         // get current pytorch stream.
 79 |         cudaStream_t stream = at::cuda::getCurrentCUDAStream();
 80 |         return stream;
 81 |     }
 82 | 
 83 |     cudaStream_t GetNewStream() { return at::cuda::getStreamFromPool(); }
 84 | 
 85 |     cublasHandle_t GetCublasHandle() { return _cublasHandle; }
 86 | 
 87 |     std::pair<uint64_t, uint64_t> IncrementOffset(uint64_t offset_inc)
 88 |     {
 89 |         uint64_t offset = _curr_offset;
 90 |         _curr_offset += offset_inc;
 91 |         return std::pair<uint64_t, uint64_t>(_seed, offset);
 92 |     }
 93 | 
 94 |     void SetSeed(uint64_t new_seed) { _seed = new_seed; }
 95 | 
 96 |     const std::vector<std::array<int, 3>>& GetGemmAlgos() const { return _gemm_algos; }
 97 | 
 98 | private:
 99 |     curandGenerator_t _gen;
100 |     cublasHandle_t _cublasHandle;
101 |     void* _workspace;
102 |     uint64_t _seed;
103 |     uint64_t _curr_offset;
104 |     std::vector<std::array<int, 3>> _gemm_algos;
105 | };
106 | 


--------------------------------------------------------------------------------
/patrickstar/utils/memory_monitor.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import gc
31 | 
32 | import psutil
33 | import torch
34 | from .distributed import get_rank, get_local_world_size
35 | from .memory import get_memory_info
36 | 
37 | 
38 | def get_sys_memory_used(device):
39 |     """
40 |     Get the free memory info of device.
41 |     Notice that for CPU, this function will return 1/N of the total free memory,
42 |     where N is the world size.
43 |     """
44 |     if device.type == "cuda":
45 |         ret = torch.cuda.memory_allocated()
46 |         # get the peak memory to report correct data, so reset the counter for the next call
47 |         if hasattr(torch.cuda, "reset_peak_memory_stats"):  # pytorch 1.4+
48 |             torch.cuda.reset_peak_memory_stats()
49 |     elif device.type == "cpu":
50 |         mem_info = get_memory_info()
51 |         ret = mem_info.used / get_local_world_size()
52 |     return ret
53 | 
54 | 
55 | def see_memory_usage(message, force=False, scale_name="MB"):
56 |     if not force:
57 |         return
58 |     if not get_rank() == 0:
59 |         return
60 | 
61 |     # python doesn't do real-time garbage collection so do it explicitly to get the correct RAM reports
62 |     gc.collect()
63 | 
64 |     if scale_name == "MB":
65 |         scale = 1024 * 1024
66 |     elif scale_name == "B":
67 |         scale = 1
68 |     # Print message except when distributed but not rank 0
69 |     print(message)
70 |     print(
71 |         f"MA {round(torch.cuda.memory_allocated() / scale, 2)} {scale_name} \
72 |         Max_MA {round(torch.cuda.max_memory_allocated() / scale, 2)} {scale_name} \
73 |         CA {round(torch.cuda.memory_reserved() / scale, 2)} {scale_name} \
74 |         Max_CA {round(torch.cuda.max_memory_reserved() / scale)} {scale_name} "
75 |     )
76 | 
77 |     # TODO(zilinzhu) Find how to get the available and percent value of the
78 |     # memory in docker to substitute psutil.virtual_memory to get_memory_info.
79 |     vm_stats = psutil.virtual_memory()
80 |     used_gb = round(((vm_stats.total - vm_stats.available) / (1024 ** 3)), 2)
81 |     print(f"CPU Virtual Memory: used = {used_gb} GB, percent = {vm_stats.percent}%")
82 | 
83 |     # get the peak memory to report correct data, so reset the counter for the next call
84 |     if hasattr(torch.cuda, "reset_peak_memory_stats"):  # pytorch 1.4+
85 |         torch.cuda.reset_peak_memory_stats()
86 | 


--------------------------------------------------------------------------------
/patrickstar/ops/embedding.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import torch
 31 | import torch.nn as nn
 32 | 
 33 | from patrickstar.utils import logger
 34 | 
 35 | 
 36 | class _CopyInputToCPU(torch.autograd.Function):
 37 |     @staticmethod
 38 |     def symbolic(graph, input_):
 39 |         return input_.to(torch.device("cpu:0"))
 40 | 
 41 |     @staticmethod
 42 |     def forward(ctx, input_):
 43 |         logger.debug(f"Copy input to cpu and {input_.dtype}.")
 44 |         return input_.to(torch.device("cpu:0"))
 45 | 
 46 |     @staticmethod
 47 |     def backward(ctx, grad_output):
 48 |         target_device = torch.device(f"cuda:{torch.cuda.current_device()}")
 49 |         logger.debug("Copy grad_output to cuda.")
 50 |         return grad_output.to(target_device)
 51 | 
 52 | 
 53 | class _CopyActToGPU(torch.autograd.Function):
 54 |     @staticmethod
 55 |     def symbolic(graph, input_):
 56 |         target_device = torch.device(f"cuda:{torch.cuda.current_device()}")
 57 | 
 58 |         return input_.to(target_device)
 59 | 
 60 |     @staticmethod
 61 |     def forward(ctx, input_):
 62 |         target_device = torch.device(f"cuda:{torch.cuda.current_device()}")
 63 | 
 64 |         logger.debug(f"Copy grad_output to cuda, input dtype {input_.dtype}.")
 65 |         return input_.to(target_device)
 66 | 
 67 |     @staticmethod
 68 |     def backward(ctx, grad_output):
 69 |         return grad_output.to(torch.device("cpu:0")).float()
 70 | 
 71 | 
 72 | def copy_to_cpu(input_):
 73 |     return _CopyInputToCPU.apply(input_)
 74 | 
 75 | 
 76 | def copy_to_gpu(input_):
 77 |     return _CopyActToGPU.apply(input_)
 78 | 
 79 | 
 80 | class Embedding(nn.Embedding):
 81 |     r"""CPU Embedding.
 82 | 
 83 |     If `use_cpu` is set, the embedding operations will
 84 |     be performed on CPU.
 85 |     """
 86 |     use_cpu = False
 87 |     # `instances` is a helper class static member for
 88 |     # preprocess context. For detail, see comments there.
 89 |     instances = []
 90 | 
 91 |     def __init__(self, *args, **kwargs):
 92 |         super().__init__(*args, **kwargs)
 93 |         self.use_cpu = Embedding.use_cpu
 94 |         Embedding.instances.append(self)
 95 | 
 96 |     def forward(self, input_):
 97 |         if self.use_cpu:
 98 |             input_ = copy_to_cpu(input_)
 99 |         else:
100 |             input_ = copy_to_gpu(input_)
101 |         output = super().forward(input_)
102 |         if self.use_cpu:
103 |             output = copy_to_gpu(output)
104 |         return output.to(torch.half)
105 | 


--------------------------------------------------------------------------------
/examples/moe/huggingface_bert_moe.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | import argparse
 30 | from tqdm import tqdm
 31 | 
 32 | import torch
 33 | from torch.utils.data import DataLoader
 34 | 
 35 | from patrickstar.runtime import initialize_engine
 36 | from patrickstar.utils import get_rank
 37 | 
 38 | from examples.imdb_dataset import get_dataset
 39 | from moe_bert import build_moe_bert
 40 | 
 41 | parser = argparse.ArgumentParser()
 42 | parser.add_argument("--type", dest="type", type=str, choices=["patrickstar", "torch"])
 43 | parser.add_argument("--local_rank", dest="local_rank", type=int, default=None)
 44 | args = parser.parse_args()
 45 | 
 46 | torch.distributed.init_process_group(backend="nccl")
 47 | torch.cuda.set_device(get_rank())
 48 | 
 49 | train_dataset, _, test_dataset = get_dataset("/root/aclImdb")
 50 | 
 51 | device = (
 52 |     torch.device(f"cuda:{get_rank()}")
 53 |     if torch.cuda.is_available()
 54 |     else torch.device("cpu")
 55 | )
 56 | 
 57 | if args.type == "patrickstar":
 58 | 
 59 |     def model_func():
 60 |         return build_moe_bert()
 61 | 
 62 |     config = {
 63 |         # The same format as optimizer config of DeepSpeed
 64 |         # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters
 65 |         "optimizer": {
 66 |             "type": "Adam",
 67 |             "params": {
 68 |                 "lr": 5e-5,
 69 |                 "betas": (0.9, 0.999),
 70 |                 "eps": 1e-6,
 71 |                 "weight_decay": 0,
 72 |                 "use_hybrid_adam": True,
 73 |             },
 74 |         },
 75 |         "default_chunk_size": 64 * 1024 * 1024,
 76 |         "release_after_init": True,
 77 |         "use_cpu_embedding": False,
 78 |     }
 79 | 
 80 |     model, optim = initialize_engine(
 81 |         model_func=model_func, local_rank=args.local_rank, config=config
 82 |     )
 83 | else:
 84 |     model = build_moe_bert()
 85 |     optim = torch.optim.Adam(model.parameters(), lr=5e-5)
 86 |     model.cuda()
 87 | 
 88 | 
 89 | train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
 90 | 
 91 | for batch in tqdm(train_loader):
 92 |     optim.zero_grad()
 93 |     input_ids = batch["input_ids"].to(device)
 94 |     attention_mask = batch["attention_mask"].to(device)
 95 |     labels = batch["labels"].to(device)
 96 |     outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
 97 |     loss = outputs[0]
 98 |     if args.type == "patrickstar":
 99 |         model.backward(loss)
100 |     else:
101 |         loss.backward()
102 |     optim.step()
103 | 


--------------------------------------------------------------------------------
/examples/train_simple_net.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import logging
 31 | import torch
 32 | 
 33 | from patrickstar.runtime import initialize_engine
 34 | from patrickstar.utils import logger
 35 | 
 36 | from simple_net import SimpleModel, get_bert_data_loader
 37 | 
 38 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
 39 | 
 40 | BATCH_SIZE = 8
 41 | HIDDEN_DIM = 4
 42 | SEQ_LEN = 128
 43 | 
 44 | 
 45 | def model_func():
 46 |     return SimpleModel(
 47 |         hidden_dim=HIDDEN_DIM, seq_len=SEQ_LEN, is_ckp=True, is_share_param=True
 48 |     )
 49 | 
 50 | 
 51 | LR = 5e-5
 52 | BETAS = (0.9, 0.999)
 53 | EPS = 1e-6
 54 | WEIGHT_DECAY = 0
 55 | 
 56 | # TEST_CASE = "torch"
 57 | TEST_CASE = "patrickstar"
 58 | logger.setLevel(logging.WARNING)
 59 | print(f"TEST_CASE {TEST_CASE}")
 60 | config = {
 61 |     # The same format as optimizer config of DeepSpeed
 62 |     # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters
 63 |     "optimizer": {
 64 |         "type": "Adam",
 65 |         "params": {
 66 |             "lr": LR,
 67 |             "betas": BETAS,
 68 |             "eps": EPS,
 69 |             "weight_decay": WEIGHT_DECAY,
 70 |             "use_hybrid_adam": True,
 71 |         },
 72 |     },
 73 |     "fp16": {
 74 |         "enabled": True,
 75 |         "loss_scale": 0,
 76 |         "initial_scale_power": 2 ** 3,
 77 |         "loss_scale_window": 1000,
 78 |         "hysteresis": 2,
 79 |         "min_loss_scale": 1,
 80 |     },
 81 |     "default_chunk_size": 1024,
 82 |     "use_fake_dist": False,
 83 |     "use_cpu_embedding": False,
 84 | }
 85 | 
 86 | torch.manual_seed(0)
 87 | if TEST_CASE == "patrickstar":
 88 |     model, optim = initialize_engine(model_func=model_func, local_rank=0, config=config)
 89 | elif TEST_CASE == "torch":
 90 |     model = model_func()
 91 |     optim = torch.optim.Adam(
 92 |         model.parameters(), LR=LR, BETAS=BETAS, EPS=EPS, WEIGHT_DECAY=WEIGHT_DECAY
 93 |     )
 94 |     model.cuda()
 95 | else:
 96 |     raise RuntimeError
 97 | 
 98 | train_loader = get_bert_data_loader(BATCH_SIZE, 10000, 128, device, False)
 99 | 
100 | for epoch in range(3):
101 |     for i, batch in enumerate(train_loader):
102 |         optim.zero_grad()
103 |         input_ids, labels = batch
104 |         loss = model(input_ids, labels)
105 |         if TEST_CASE == "patrickstar":
106 |             model.backward(loss)
107 |             optim.step()
108 |         elif TEST_CASE == "torch":
109 |             loss.backward()
110 |             optim.zero_grad()
111 |             optim.step()
112 |         print(i, loss.item())
113 |         if i == 10:
114 |             exit()
115 | 
116 | model.eval()
117 | 


--------------------------------------------------------------------------------
/unitest/test_chunk_list.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import unittest
 31 | 
 32 | import torch
 33 | 
 34 | from common import distributed_test
 35 | from patrickstar.core import ChunkList, ChunkState, ChunkType
 36 | from patrickstar.core.eviction_policy import LatestAccessChunkEvictionPolicy
 37 | from patrickstar.core.memtracer import RuntimeMemTracer
 38 | 
 39 | 
 40 | class TestChunkData(unittest.TestCase):
 41 |     def setUp(self):
 42 |         self.memtracer = RuntimeMemTracer()
 43 |         self.policy = LatestAccessChunkEvictionPolicy(self.memtracer.metronome)
 44 | 
 45 |     @distributed_test(world_size=[1])
 46 |     def test_add_chunk(self):
 47 |         self.memtracer.metronome.set_warmup(False)
 48 |         chunk_list = ChunkList(0, self.memtracer, self.policy)
 49 |         assert chunk_list.size() == 0
 50 | 
 51 |         chunk_list.new_chunk(
 52 |             chunk_id=0,
 53 |             chunk_size=20,
 54 |             data_type=torch.float,
 55 |             is_dummy=False,
 56 |             chunk_type=ChunkType.PARAM_FP32,
 57 |         )
 58 | 
 59 |         assert chunk_list.size() == 1
 60 |         assert chunk_list[0].get_state() == ChunkState.RELEASED
 61 | 
 62 |     @distributed_test(world_size=[1], use_fake_dist=True)
 63 |     def test_new_chunk(self):
 64 |         compute_device = (
 65 |             torch.device(f"cuda:{torch.cuda.current_device()}")
 66 |             if torch.cuda.is_available()
 67 |             else torch.device("cpu:0")
 68 |         )
 69 |         self.memtracer.metronome.set_warmup(False)
 70 |         chunk_list = ChunkList(0, self.memtracer, self.policy)
 71 | 
 72 |         new_chunk_id = 123
 73 |         chunk_list.new_chunk(
 74 |             chunk_id=new_chunk_id,
 75 |             chunk_size=20,
 76 |             data_type=torch.float,
 77 |             is_dummy=False,
 78 |             chunk_type=ChunkType.PARAM_FP32,
 79 |         )
 80 |         chunk_list.access_chunk(new_chunk_id, compute_device)
 81 | 
 82 |         assert chunk_list[new_chunk_id].get_state() == ChunkState.FREE
 83 | 
 84 |         self.assertEqual(
 85 |             chunk_list.last_chunk_id(ChunkType.PARAM_FP32),
 86 |             new_chunk_id,
 87 |             "check last_chunk_id",
 88 |         )
 89 | 
 90 |         chunk_list.new_chunk(
 91 |             chunk_id=1,
 92 |             chunk_size=20,
 93 |             data_type=torch.float,
 94 |             is_dummy=False,
 95 |             chunk_type=ChunkType.PARAM_FP32,
 96 |         )
 97 | 
 98 |         self.assertEqual(chunk_list.size(), 2)
 99 | 
100 |         self.assertEqual(
101 |             chunk_list.last_chunk_id(ChunkType.PARAM_FP32), 1, "check last_chunk_id"
102 |         )
103 | 
104 | 
105 | if __name__ == "__main__":
106 |     unittest.main()
107 | 


--------------------------------------------------------------------------------
/patrickstar/core/torch_profiler_hook.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | 
 31 | import time
 32 | import torch
 33 | 
 34 | from patrickstar.core.hook import (
 35 |     PreBackwardFunction,
 36 |     PostBackwardFunction,
 37 |     _apply_to_tensors_only,
 38 | )
 39 | from patrickstar.utils import get_sys_memory_used, logger
 40 | from patrickstar.profiler import profiler
 41 | 
 42 | 
 43 | def _cur_mem_usage():
 44 |     """
 45 |     A function used to sample memory usage at the moment
 46 |     before and after an operator sharted and finished.
 47 |     """
 48 |     dev = torch.device(f"cuda:{torch.cuda.current_device()}")
 49 |     gpu_mem_used = get_sys_memory_used(dev)
 50 |     return gpu_mem_used
 51 | 
 52 | 
 53 | def _record_mem_stats():
 54 |     """
 55 |     Record memory statistics at this moment for the profiler.
 56 |     """
 57 |     mem_cur_mon = _cur_mem_usage()
 58 |     profiler.gpu_memory_used.append((None, time.time(), mem_cur_mon))
 59 | 
 60 | 
 61 | def _register_hooks_recursively(module, name=""):
 62 |     r"""Register hook in post order traverse."""
 63 | 
 64 |     for child_name, child in module.named_children():
 65 |         logger.debug(f"{child.__class__.__name__}")
 66 |         _register_hooks_recursively(child, name + child_name)
 67 | 
 68 |     # Early return on modules with no parameters or buffers that
 69 |     # are not in their children.
 70 |     if (
 71 |         len(list(module.named_parameters(recurse=False))) == 0
 72 |         and len(list(module.named_buffers(recurse=False))) == 0
 73 |     ):
 74 |         return
 75 | 
 76 |     def _pre_post_forward_module_hook(module, *args):
 77 |         _record_mem_stats()
 78 | 
 79 |     # The hook can modify the output
 80 |     def _pre_backward_module_hook(module, inputs, output):
 81 |         def _run_before_backward_function(sub_module):
 82 |             _record_mem_stats()
 83 | 
 84 |         return _apply_to_tensors_only(
 85 |             module, PreBackwardFunction, _run_before_backward_function, output
 86 |         )
 87 | 
 88 |     def _post_backward_module_hook(module, inputs):
 89 |         def _run_after_backward_function(sub_module):
 90 |             _record_mem_stats()
 91 | 
 92 |         return _apply_to_tensors_only(
 93 |             module, PostBackwardFunction, _run_after_backward_function, inputs
 94 |         )
 95 | 
 96 |     module.register_forward_pre_hook(_pre_post_forward_module_hook)
 97 |     module.register_forward_hook(_pre_post_forward_module_hook)
 98 | 
 99 |     module.register_forward_hook(_pre_backward_module_hook)
100 |     module.register_forward_pre_hook(_post_backward_module_hook)
101 | 
102 | 
103 | def register_torch_profiler_hook(module):
104 |     """
105 |     Collect activation statistis during training.
106 |     """
107 |     _register_hooks_recursively(module)
108 | 


--------------------------------------------------------------------------------
/patrickstar/runtime/__init__.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | import torch
31 | from patrickstar.core import PSPreProcessCtx, PatrickStarClient
32 | from patrickstar.core.memtracer import RuntimeMemTracer
33 | from patrickstar.utils import logger, log_dist
34 | from .engine import PatrickStarEngine
35 | import time
36 | 
37 | DEFAULT_CHUNK_SIZE = 32 * 1024 * 1024
38 | 
39 | 
40 | def initialize_engine(model_func, local_rank, config=None, client=None):
41 |     """Initialize the PatrickStar Engine.
42 |     Arguments:
43 |         model_func: Required: nn.module class before apply any wrappers
44 |         client: Required: PatrickStarClient for orchestrating chunks.
45 |         config: Optional: config json for optimizer.
46 |     Returns:
47 |         A tuple of ``engine`` and ``optimizer``
48 |         * ``engine``: PatrickStar runtime engine which wraps the client model for distributed training.
49 |         * ``optimizer``: Wrapped optimizer if a user defined ``optimizer`` is supplied, or if
50 |           optimizer is specified in json config else ``None``.
51 |     """
52 |     if isinstance(model_func, torch.nn.Module):
53 |         logger.debug(
54 |             "Passing nn.Module into initialize_engine. "
55 |             "Make sure you have intialized the model within PSPreProcessCtx"
56 |         )
57 |         assert client is not None, "Must pass the client when passing a nn.Module."
58 |         model = model_func
59 |     else:
60 |         assert callable(model_func), "model_func need to be callable."
61 | 
62 |         if config is None:
63 |             default_chunk_size = DEFAULT_CHUNK_SIZE
64 |             release_after_init = False
65 |             use_cpu_embedding = True
66 |         else:
67 |             default_chunk_size = config.get("default_chunk_size", DEFAULT_CHUNK_SIZE)
68 |             release_after_init = config.get("release_after_init", False)
69 |             use_cpu_embedding = config.get("use_cpu_embedding", True)
70 | 
71 |         client = PatrickStarClient(
72 |             rank=local_rank,
73 |             default_chunk_size=default_chunk_size,
74 |             config=config.get("client", None),
75 |         )
76 | 
77 |         start_time = time.time()
78 |         log_dist("begin initialize the model parameters...")
79 |         with PSPreProcessCtx(
80 |             client=client,
81 |             dtype=torch.float,
82 |             release_after_init=release_after_init,
83 |             use_cpu_embedding=use_cpu_embedding,
84 |         ):
85 |             model = model_func()
86 |         end_time = time.time()
87 |         log_dist(
88 |             f"finished initialized the model parameters... {end_time  - start_time} s"
89 |         )
90 | 
91 |     engine = PatrickStarEngine(model=model, client=client, config=config)
92 |     client.start_mem_tracer()
93 |     return (engine, engine.optimizer)
94 | 


--------------------------------------------------------------------------------
/examples/huggingface_bert.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import torch
 31 | from torch.utils.data import DataLoader
 32 | from transformers import BertForSequenceClassification
 33 | 
 34 | from patrickstar.runtime import initialize_engine
 35 | from patrickstar.utils import get_rank
 36 | 
 37 | from imdb_dataset import get_dataset
 38 | 
 39 | 
 40 | # Uncomment these lines when doing multiprocess training
 41 | # torch.distributed.init_process_group(backend='nccl')
 42 | # torch.cuda.set_device(get_rank())
 43 | 
 44 | train_dataset, _, test_dataset = get_dataset("/root/aclImdb")
 45 | 
 46 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
 47 | 
 48 | 
 49 | def model_func():
 50 |     model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
 51 |     # For large models, please uncomment the following lines to utilize gradient checkpointing
 52 |     # model.gradient_checkpointing_enable()
 53 |     return model
 54 | 
 55 | 
 56 | config = {
 57 |     # The same format as optimizer config of DeepSpeed
 58 |     # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters
 59 |     "optimizer": {
 60 |         "type": "Adam",
 61 |         "params": {
 62 |             "lr": 5e-5,
 63 |             "betas": (0.9, 0.999),
 64 |             "eps": 1e-6,
 65 |             "weight_decay": 0,
 66 |             "use_hybrid_adam": True,
 67 |         },
 68 |     },
 69 |     "fp16": {
 70 |         "enabled": True,
 71 |         "loss_scale": 0,
 72 |         "initial_scale_power": 2 ** 3,
 73 |         "loss_scale_window": 1000,
 74 |         "hysteresis": 2,
 75 |         "min_loss_scale": 1,
 76 |     },
 77 |     "default_chunk_size": 64 * 1024 * 1024,
 78 |     "release_after_init": False,
 79 |     "use_cpu_embedding": False,
 80 | }
 81 | 
 82 | model, optim = initialize_engine(
 83 |     model_func=model_func, local_rank=get_rank(), config=config
 84 | )
 85 | 
 86 | train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
 87 | 
 88 | print("train loss:")
 89 | 
 90 | for i, batch in enumerate(train_loader):
 91 |     optim.zero_grad()
 92 |     input_ids = batch["input_ids"].to(device)
 93 |     attention_mask = batch["attention_mask"].to(device)
 94 |     labels = batch["labels"].to(device)
 95 |     outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
 96 |     loss = outputs[0]
 97 |     model.backward(loss)
 98 |     optim.step()
 99 |     print(i, loss.item())
100 |     if i == 10:
101 |         break
102 | 
103 | model.eval()
104 | 
105 | print("test loss:")
106 | 
107 | test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)
108 | for i, batch in enumerate(test_loader):
109 |     input_ids = batch["input_ids"].to(device)
110 |     attention_mask = batch["attention_mask"].to(device)
111 |     labels = batch["labels"].to(device)
112 |     outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
113 |     loss = outputs[0]
114 |     print(i, loss.item())
115 |     if i == 5:
116 |         break
117 | 


--------------------------------------------------------------------------------
/examples/benchmark/process_logs.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import os
 31 | import sys
 32 | import numpy as np
 33 | from scipy.stats import t
 34 | 
 35 | 
 36 | def is_run_this_file(path, file, res_dict, file_dict):
 37 |     """
 38 |     collect throughput performance form log
 39 |     and update res_dict and file_dict
 40 |     ret: is_run, whether to run the script
 41 |     """
 42 |     model_name = ""
 43 |     gpu_num = 0
 44 |     bs = 0
 45 | 
 46 |     # if the file is not exist.
 47 |     # do not execute training
 48 |     if not os.path.isfile(path + "/" + file):
 49 |         return True
 50 | 
 51 |     f = open(path + "/" + file)
 52 |     is_run = True
 53 | 
 54 |     perf_list = np.array([])
 55 |     if not os.path.isdir(file):
 56 |         fn_list = file.split(".")[1].split("_")
 57 |         for i in range(len(fn_list)):
 58 |             if "gpu" in fn_list[i]:
 59 |                 model_name = fn_list[i - 1]
 60 |                 gpu_num = fn_list[i + 1]
 61 |             elif "bs" == fn_list[i]:
 62 |                 bs = fn_list[i + 1]
 63 |         key = model_name + "_" + bs + "_" + gpu_num
 64 |         iter_f = iter(f)
 65 |         for line in iter_f:
 66 |             if "Tflops" in line and "WARM" not in line:
 67 |                 sline = line.split()
 68 |                 perf = float(sline[-2])
 69 | 
 70 |                 perf_list = np.append(perf_list, perf)
 71 | 
 72 |                 is_run = False
 73 |             if "RuntimeError" in line:
 74 |                 return False
 75 | 
 76 |     if len(perf_list) == 0:
 77 |         return False
 78 | 
 79 |     # calculate CI of perf_list
 80 |     perf_list = perf_list[1:-1]
 81 |     m = perf_list.mean()
 82 |     s = perf_list.std()
 83 |     dof = len(perf_list) - 1
 84 |     confidence = 0.95
 85 |     t_crit = np.abs(t.ppf((1 - confidence) / 2, dof))
 86 |     ic_perf = (
 87 |         -s * t_crit / np.sqrt(len(perf_list)),
 88 |         +s * t_crit / np.sqrt(len(perf_list)),
 89 |     )
 90 | 
 91 |     res_dict[key] = (*ic_perf, m)
 92 |     file_dict[key] = file
 93 | 
 94 |     return is_run
 95 | 
 96 | 
 97 | def collect_info_from_dir(path):
 98 |     res_dict = {}
 99 |     file_dict = {}
100 |     files = os.listdir(path)
101 |     for file in files:
102 |         is_run_this_file(path, file, res_dict, file_dict)
103 |     print("process ", path)
104 |     return res_dict, file_dict
105 | 
106 | 
107 | if __name__ == "__main__":
108 |     res_dict = {}
109 |     file_dict = {}
110 |     if len(sys.argv) > 1:
111 |         PATH = str(sys.argv[1])
112 |     else:
113 |         PATH = "./logs_GPT2small"
114 |     files = os.listdir(PATH)
115 |     res_dict, file_dict = collect_info_from_dir(PATH)
116 |     new_res_list = []
117 |     for k, v in res_dict.items():
118 |         plan = k.split("_")
119 |         # model_name, bs, gpu_num, best perf, file
120 |         new_res_list.append((plan[0], plan[1], plan[2], v, file_dict[k]))
121 | 
122 |     new_res_list.sort()
123 |     for elem in new_res_list:
124 |         print(elem)
125 | 


--------------------------------------------------------------------------------
/examples/optimizations/ls_hf_transformer_encoder_layer.py:
--------------------------------------------------------------------------------
 1 | # BSD 3-Clause License
 2 | #
 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
 4 | #
 5 | # Redistribution and use in source and binary forms, with or without modification,
 6 | # are permitted provided that the following conditions are met:
 7 | #
 8 | #  * Redistributions of source code must retain the above copyright notice, this
 9 | #    list of conditions and the following disclaimer.
10 | #
11 | #  * Redistributions in binary form must reproduce the above copyright notice,
12 | #    this list of conditions and the following disclaimer in the documentation
13 | #    and/or other materials provided with the distribution.
14 | #
15 | #  * Neither the name of the psutil authors nor the names of its contributors
16 | #    may be used to endorse or promote products derived from this software without
17 | #    specific prior written permission.
18 | #
19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 
30 | try:
31 |     from lightseq.training.ops.pytorch.transformer_encoder_layer import (
32 |         LSTransformerEncoderLayer,
33 |     )
34 | except ImportError:
35 |     raise RuntimeError("pip install lightseq first!")
36 | 
37 | 
38 | class LSHFTransformerEncoderLayer(LSTransformerEncoderLayer):
39 |     def __init__(self, *args, **kwargs):
40 |         super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs)
41 | 
42 |     def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs):
43 |         encoder_padding_mask /= -10000.0
44 |         encoder_padding_mask = encoder_padding_mask.squeeze()
45 |         output = super().forward(hidden_states, encoder_padding_mask)
46 |         return (output, None, None, None)
47 | 
48 | 
49 | def gen_bert_config(training_args, config):
50 |     bert_config = LSTransformerEncoderLayer.get_config(
51 |         max_batch_tokens=4096,
52 |         max_seq_len=config.max_position_embeddings,
53 |         hidden_size=config.hidden_size,
54 |         intermediate_size=config.intermediate_size,
55 |         nhead=config.num_attention_heads,
56 |         attn_prob_dropout_ratio=config.attention_probs_dropout_prob,
57 |         activation_dropout_ratio=config.hidden_dropout_prob,
58 |         hidden_dropout_ratio=config.hidden_dropout_prob,
59 |         pre_layer_norm=False,
60 |         fp16=training_args.use_fp16,
61 |         local_rank=training_args.local_rank,
62 |         activation_fn="gelu",
63 |     )
64 |     return bert_config
65 | 
66 | 
67 | def get_hf_bert_enc_layer_params(layer):
68 |     init_ws = []
69 |     init_bs = []
70 | 
71 |     init_ws.append(layer.attention.self.query.weight.detach().clone())
72 |     init_bs.append(layer.attention.self.query.bias.detach().clone())
73 |     init_ws.append(layer.attention.self.key.weight.detach().clone())
74 |     init_bs.append(layer.attention.self.key.bias.detach().clone())
75 |     init_ws.append(layer.attention.self.value.weight.detach().clone())
76 |     init_bs.append(layer.attention.self.value.bias.detach().clone())
77 |     init_ws.append(layer.attention.output.dense.weight.detach().clone())
78 |     init_bs.append(layer.attention.output.dense.bias.detach().clone())
79 |     init_ws.append(layer.attention.output.LayerNorm.weight.detach().clone())
80 |     init_bs.append(layer.attention.output.LayerNorm.bias.detach().clone())
81 | 
82 |     init_ws.append(layer.intermediate.dense.weight.detach().clone())
83 |     init_bs.append(layer.intermediate.dense.bias.detach().clone())
84 |     init_ws.append(layer.output.dense.weight.detach().clone())
85 |     init_bs.append(layer.output.dense.bias.detach().clone())
86 |     init_ws.append(layer.output.LayerNorm.weight.detach().clone())
87 |     init_bs.append(layer.output.LayerNorm.bias.detach().clone())
88 | 
89 |     return init_ws, init_bs
90 | 
91 | 
92 | def inject_ls_enc_layer(model, training_args, config):
93 |     for i in range(config.num_hidden_layers):
94 |         bert_config = gen_bert_config(training_args, config)
95 |         init_ws, init_bs = get_hf_bert_enc_layer_params(model.bert.encoder.layer[i])
96 |         model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer(
97 |             bert_config, init_ws, init_bs
98 |         ).cuda()
99 | 


--------------------------------------------------------------------------------
/unitest/test_client.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import logging
 31 | import unittest
 32 | 
 33 | import torch
 34 | 
 35 | from common import distributed_test
 36 | from patrickstar import RuntimeMemTracer
 37 | from patrickstar.core import PatrickStarClient, AccessType, register_param, ChunkType
 38 | from patrickstar.core.parameter import ParamType
 39 | 
 40 | 
 41 | class TestClientAccess(unittest.TestCase):
 42 |     def setUp(self):
 43 |         self.default_chunk_size = 40
 44 |         logging.info("SetUp finished")
 45 | 
 46 |     @distributed_test(world_size=[1])
 47 |     def test_append_ps_tensor(self):
 48 |         RuntimeMemTracer(0)
 49 |         self.client = PatrickStarClient(
 50 |             rank=0, default_chunk_size=self.default_chunk_size
 51 |         )
 52 | 
 53 |         self.compute_device = torch.device("cpu:0")
 54 | 
 55 |         param_size_list = [10, 11, 12, 13]
 56 | 
 57 |         param_list = []
 58 |         param_payload_ref_list = []
 59 |         for idx, psize in enumerate(param_size_list):
 60 |             param = torch.nn.Parameter(torch.rand(psize))
 61 |             param_list.append(param)
 62 |             param_payload_ref_list.append(param.data.clone())
 63 | 
 64 |             register_param(param, ParamType.CHUNK_BASED, torch.float, f"param_{idx}")
 65 |             self.client.append_tensor(
 66 |                 [param], torch.float, AccessType.DATA, ChunkType.PARAM_FP32
 67 |             )
 68 | 
 69 |             real_payload = self.client.access_data(param, torch.device("cpu:0"))
 70 |             real_payload.copy_(param.data)
 71 |             self.client.release_data(param)
 72 |             self.assertTrue(param.data.numel() == 0)
 73 | 
 74 |         self.client.display_chunk_info()
 75 |         for param, payload_ref in zip(param_list, param_payload_ref_list):
 76 |             real_payload = self.client.access_data(param, torch.device("cpu:0"))
 77 |             self.assertEqual(torch.max(real_payload - payload_ref), 0)
 78 |             self.client.release_data(param)
 79 | 
 80 |     @distributed_test(world_size=[1])
 81 |     def test_append_torch_tensor(self):
 82 |         self.client = PatrickStarClient(
 83 |             rank=0, default_chunk_size=self.default_chunk_size
 84 |         )
 85 | 
 86 |         self.compute_device = torch.device("cpu:0")
 87 | 
 88 |         param_size_list = [10, 11, 12, 13]
 89 | 
 90 |         param_list = []
 91 |         param_payload_ref_list = []
 92 |         for idx, psize in enumerate(param_size_list):
 93 |             param = torch.nn.Parameter(torch.rand(psize))
 94 |             param_list.append(param)
 95 |             register_param(param, ParamType.TORCH_BASED, torch.float, f"param_{idx}")
 96 |             param_payload_ref_list.append(param.data.clone())
 97 |             self.client.append_tensor(
 98 |                 [param], torch.float, AccessType.DATA, ChunkType.PARAM_FP32
 99 |             )
100 | 
101 |             real_payload = self.client.access_data(param, torch.device("cpu:0"))
102 |             real_payload.copy_(param.data)
103 |             self.client.release_data(param)
104 | 
105 |         self.client.display_chunk_info()
106 |         for param, payload_ref in zip(param_list, param_payload_ref_list):
107 |             real_payload = self.client.access_data(param, torch.device("cpu:0"))
108 |             self.assertEqual(torch.max(real_payload - payload_ref), 0)
109 |             self.client.release_data(param)
110 | 
111 | 
112 | if __name__ == "__main__":
113 | 
114 |     unittest.main()
115 | 


--------------------------------------------------------------------------------
/patrickstar/core/memory_cache.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import torch
 31 | from patrickstar.core.memtracer.memtracer import RuntimeMemTracer
 32 | from patrickstar.utils.helper import getsizeof
 33 | 
 34 | 
 35 | class MemoryCache(object):
 36 |     def __init__(self, capacity, memtracer: RuntimeMemTracer):
 37 |         r""" "
 38 |         A cache of chunk to avoid too much memory allocation and free.
 39 |         `capacity` chunks always stay in the GPU memory.
 40 |         If we have allocated a chunk on the target device, just reuse the cached one.
 41 |         Params:
 42 |             `capacity` : the capacity size of each type of tensor cache list.
 43 |         Returns:
 44 |             None or a `torch.Tensor`.
 45 |         """
 46 |         self._capacity = capacity
 47 |         self._cached_tensors = {}
 48 |         self._memtracer = memtracer
 49 | 
 50 |     def _new_mem(self, size, data_type, device_type, pin_memory):
 51 |         space_size = getsizeof(data_type) * size
 52 |         ret = torch.zeros(
 53 |             size,
 54 |             dtype=data_type,
 55 |             device=device_type,
 56 |             pin_memory=pin_memory,
 57 |         )
 58 |         self._memtracer.add(device_type.type, space_size, pin_memory)
 59 |         return ret
 60 | 
 61 |     def pop_or_allocate(
 62 |         self,
 63 |         device_type: torch.device,
 64 |         size: int,
 65 |         data_type: torch.dtype,
 66 |         pin_memory: bool,
 67 |     ) -> torch.Tensor:
 68 |         """
 69 |         Return a tensor including `size` `device_type` elements on `device_type`.
 70 |         Delete the reference to the tenor in MemoryCache.
 71 |         Return:
 72 |             torch.Tensor
 73 |         """
 74 |         assert isinstance(
 75 |             device_type, torch.device
 76 |         ), "device_type must be type of torch.device"
 77 |         if (device_type, data_type) not in self._cached_tensors:
 78 |             return self._new_mem(size, data_type, device_type, pin_memory)
 79 |         tensors = self._cached_tensors[(device_type, data_type)]
 80 |         i = -1
 81 |         for i in range(len(tensors)):
 82 |             if tensors[i].numel() == size:
 83 |                 break
 84 |         if i == -1:
 85 |             return self._new_mem(size, data_type, device_type, pin_memory)
 86 |         new_tensor_ref = tensors[i]
 87 |         # delete the reference to tensors[i] in MemoryCache
 88 |         tensors.pop(i)
 89 |         return new_tensor_ref
 90 | 
 91 |     def push(self, payload):
 92 |         """
 93 |         NOTE() must set payload to None outside of this function.
 94 |         Recycle a payload tensor.
 95 |         If the cache is fulled, delete the payload.
 96 |         Returns:
 97 |             success pushed or not.
 98 |         """
 99 |         device_type = payload.device
100 |         data_type = payload.dtype
101 |         if (device_type, data_type) not in self._cached_tensors and self._capacity > 0:
102 |             self._cached_tensors[(device_type, data_type)] = [payload.zero_()]
103 |         else:
104 |             size = payload.numel()
105 |             # the cache is fulled
106 |             if len(self._cached_tensors[(device_type, data_type)]) == self._capacity:
107 |                 is_pinned_flag = payload.is_pinned()
108 |                 del payload
109 |                 space_size = getsizeof(data_type) * size
110 |                 self._memtracer.delete(device_type.type, space_size, is_pinned_flag)
111 |             else:
112 |                 self._cached_tensors[(device_type, data_type)].append(payload.zero_())
113 | 


--------------------------------------------------------------------------------
/unitest/test_chunk_data.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import unittest
 31 | 
 32 | import torch
 33 | 
 34 | from common import distributed_test
 35 | from patrickstar.core import AccessType, ChunkTensorIndex
 36 | from patrickstar.core import register_param, ParamType
 37 | 
 38 | 
 39 | class TestChunkData(unittest.TestCase):
 40 |     def setUp(self):
 41 |         self.default_chunk_size = 40
 42 | 
 43 |     @distributed_test(world_size=[1])
 44 |     def test_allocate(self):
 45 |         self.compute_device = (
 46 |             torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
 47 |         )
 48 |         # Statically construct chunk layout -> chunk_tensor_index
 49 |         chunk_tensor_index = ChunkTensorIndex(self.default_chunk_size)
 50 | 
 51 |         param1 = torch.nn.Parameter(torch.zeros(10))
 52 |         register_param(param1, ParamType.CHUNK_BASED, torch.float, "param1")
 53 |         chunk_tensor_index.add_tensor(
 54 |             chunk_id=0,
 55 |             tensor_id=param1.ps_attr.data_id(),
 56 |             start_offset=0,
 57 |             numel=param1.numel(),
 58 |             param=param1,
 59 |             access_type=AccessType.DATA,
 60 |         )
 61 | 
 62 |         self.assertTrue(
 63 |             chunk_tensor_index.tensor_id_to_chunk_id(param1.ps_attr.data_id()) == 0
 64 |         )
 65 |         self.assertTrue(chunk_tensor_index.get_chunk_id(param1, AccessType.DATA) == 0)
 66 | 
 67 |         param2 = torch.nn.Parameter(torch.zeros(15))
 68 |         register_param(param2, ParamType.CHUNK_BASED, torch.float, "param2")
 69 |         self.assertTrue(
 70 |             chunk_tensor_index.get_chunk_id(param2, AccessType.DATA) is None
 71 |         )
 72 |         ret = chunk_tensor_index.try_insert_tensor(0, param2, AccessType.DATA)
 73 |         self.assertTrue(ret)
 74 |         tensor_info = chunk_tensor_index.get_tensor_info(param2.ps_attr.data_id())
 75 |         self.assertTrue(tensor_info.start_offset == 10)
 76 | 
 77 |         param3 = torch.nn.Parameter(torch.zeros(5))
 78 |         register_param(param3, ParamType.CHUNK_BASED, torch.float, "param3")
 79 |         ret = chunk_tensor_index.try_insert_tensor(0, param3, AccessType.DATA)
 80 |         tensor_info = chunk_tensor_index.get_tensor_info(param3.ps_attr.data_id())
 81 |         self.assertTrue(tensor_info.start_offset == 25)
 82 | 
 83 |         param4 = torch.nn.Parameter(torch.zeros(100))
 84 |         register_param(param4, ParamType.CHUNK_BASED, torch.float, "param4")
 85 |         ret = chunk_tensor_index.try_insert_tensor(0, param4, AccessType.DATA)
 86 |         self.assertFalse(ret)
 87 |         # chunk_tensor_index.delete_tensor(11)
 88 | 
 89 |         param5 = torch.nn.Parameter(torch.zeros(13))
 90 |         register_param(param5, ParamType.CHUNK_BASED, torch.float, "param5")
 91 |         ret = chunk_tensor_index.try_insert_tensor(1, param5, AccessType.DATA)
 92 |         tensor_info = chunk_tensor_index.get_tensor_info(param5.ps_attr.data_id())
 93 |         self.assertTrue(tensor_info.start_offset == 0)
 94 | 
 95 |         ret = chunk_tensor_index.try_insert_tensor(1, param5, AccessType.DATA)
 96 |         tensor_info = chunk_tensor_index.get_tensor_info(param5.ps_attr.data_id())
 97 |         self.assertTrue(tensor_info.start_offset == 0)
 98 | 
 99 |         param6 = torch.nn.Parameter(torch.zeros(1000))
100 |         register_param(param6, ParamType.CHUNK_BASED, torch.float, "param6")
101 |         ret = chunk_tensor_index.try_insert_tensor(1, param6, AccessType.DATA)
102 |         self.assertFalse(ret)
103 | 
104 | 
105 | if __name__ == "__main__":
106 | 
107 |     unittest.main()
108 | 


--------------------------------------------------------------------------------
/patrickstar/utils/global_timer.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import time
 31 | import torch
 32 | 
 33 | # from .logging import logger
 34 | from .singleton_meta import SingletonMeta
 35 | 
 36 | 
 37 | class GlobalTimer(metaclass=SingletonMeta):
 38 |     def __init__(self):
 39 |         """
 40 |         Timer for different function of the program.
 41 |         The naming convention should be {TrainingState}_{function},
 42 |         e.g. ADAM_compute
 43 |         """
 44 |         self.elapse_stat = {}
 45 |         self.start_time = {}
 46 |         self.start_flag = False
 47 | 
 48 |     def start(self):
 49 |         self.start_flag = True
 50 | 
 51 |     def start_profile(self, key):
 52 |         if not self.start_flag:
 53 |             return
 54 |         if key in self.start_time:
 55 |             assert self.start_time[key] == 0, f"Please Check {key} profiling function"
 56 |         self.start_time[key] = time.time()
 57 | 
 58 |     def finish_profile(self, key):
 59 |         if not self.start_flag:
 60 |             return
 61 |         torch.cuda.current_stream().synchronize()
 62 |         if key in self.elapse_stat:
 63 |             self.elapse_stat[key] += time.time() - self.start_time[key]
 64 |         else:
 65 |             self.elapse_stat[key] = time.time() - self.start_time[key]
 66 |         self.start_time[key] = 0
 67 | 
 68 |     def reset(self):
 69 |         if not self.start_flag:
 70 |             return
 71 |         for k, _ in self.elapse_stat.items():
 72 |             self.elapse_stat[k] = 0
 73 | 
 74 |     def print(self):
 75 |         if not self.start_flag:
 76 |             return
 77 |         print("------------- PROFILE RESULTS ----------------")
 78 |         dot_length = 20
 79 |         for k in self.elapse_stat:
 80 |             dot_length = max(dot_length, len(k) + 2)
 81 |         overall_elapse = (
 82 |             self.elapse_stat["FWD"] + self.elapse_stat["BWD"] + self.elapse_stat["ADAM"]
 83 |         )
 84 |         for k, v in self.elapse_stat.items():
 85 |             print(
 86 |                 f'{k} {"." * (dot_length - len(k))} {v}, {v / overall_elapse * 100} %'
 87 |             )
 88 |         print(f'TOTAL {"." * (dot_length - len("TOTAL"))} {overall_elapse}')
 89 | 
 90 | 
 91 | my_timer = GlobalTimer()
 92 | 
 93 | 
 94 | class DataMoveCnter(metaclass=SingletonMeta):
 95 |     def __init__(self):
 96 |         self.amount_dict = {}
 97 |         self.times_dict = {}
 98 | 
 99 |     def update(self, key_name, tensor_size):
100 |         my_timer = GlobalTimer()
101 |         if not my_timer.start_flag:
102 |             return
103 |         if key_name in self.times_dict:
104 |             self.times_dict[key_name] += 1
105 |             self.amount_dict[key_name] += tensor_size
106 |         else:
107 |             self.times_dict[key_name] = 1
108 |             self.amount_dict[key_name] = tensor_size
109 | 
110 |     def reset(self):
111 |         for k, _ in self.times_dict.items():
112 |             self.times_dict[k] = 0
113 |             self.amount_dict[k] = 0
114 | 
115 |     def print(self):
116 |         print("------------- DATA MOVE RESULTS --------------")
117 |         my_timer = GlobalTimer()
118 |         for k, v in self.times_dict.items():
119 |             bwd = 0
120 |             if k in my_timer.elapse_stat and self.amount_dict[k] != 0:
121 |                 bwd = self.amount_dict[k] / my_timer.elapse_stat[k]
122 |                 print(
123 |                     f"{k}: {self.amount_dict[k] / 1024 / 1024} MB, {v} times, {bwd / 1024 / 1024} MB/s"
124 |                 )
125 |             else:
126 |                 print(f"{k}: {self.amount_dict[k] / 1024 / 1024} MB")
127 | 
128 | 
129 | data_move_cnter = DataMoveCnter()
130 | 


--------------------------------------------------------------------------------
/examples/simple_net.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import torch
 31 | 
 32 | # from checkpoint.torch_checkpoint import checkpoint
 33 | from torch.utils.checkpoint import checkpoint
 34 | from torch.utils.data import SequentialSampler
 35 | from transformers import BertConfig
 36 | from transformers.models.bert.modeling_bert import BertEmbeddings
 37 | 
 38 | 
 39 | class Encoder(torch.nn.Module):
 40 |     def __init__(self, hidden_dim, is_ckp=False):
 41 |         super(Encoder, self).__init__()
 42 |         self.linear1 = torch.nn.Sequential(
 43 |             torch.nn.Linear(hidden_dim, hidden_dim),
 44 |             torch.nn.Linear(hidden_dim, hidden_dim),
 45 |             torch.nn.Linear(hidden_dim, hidden_dim),
 46 |         )
 47 | 
 48 |         self.linear3 = torch.nn.Linear(hidden_dim, hidden_dim)
 49 |         self.linear4 = torch.nn.Linear(hidden_dim, hidden_dim)
 50 |         self.linear5 = torch.nn.Linear(hidden_dim, hidden_dim)
 51 |         self.is_ckp = is_ckp
 52 | 
 53 |     def forward(self, x):
 54 |         h2 = self.linear1(x)
 55 |         if self.is_ckp:
 56 |             h3 = checkpoint(self.linear3, h2)
 57 |         else:
 58 |             h3 = self.linear3(h2)
 59 |         h4 = self.linear4(h3)
 60 |         h5 = self.linear5(h4)
 61 |         return h5
 62 | 
 63 | 
 64 | def get_data_loader(
 65 |     batch_size,
 66 |     total_samples,
 67 |     hidden_dim,
 68 |     device,
 69 |     data_type=torch.float,
 70 |     is_distrbuted=False,
 71 | ):
 72 |     train_data = torch.randn(total_samples, hidden_dim, device=device, dtype=data_type)
 73 |     train_label = torch.empty(total_samples, dtype=torch.long, device=device).random_(
 74 |         hidden_dim
 75 |     )
 76 |     train_dataset = torch.utils.data.TensorDataset(train_data, train_label)
 77 |     if is_distrbuted:
 78 |         sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
 79 |     else:
 80 |         sampler = SequentialSampler(train_dataset)
 81 |     train_loader = torch.utils.data.DataLoader(
 82 |         train_dataset, batch_size=batch_size, sampler=sampler
 83 |     )
 84 |     return train_loader
 85 | 
 86 | 
 87 | def get_bert_data_loader(
 88 |     batch_size, total_samples, sequence_length, device, is_distrbuted=False
 89 | ):
 90 |     train_data = torch.randint(
 91 |         low=0,
 92 |         high=10,
 93 |         size=(total_samples, sequence_length),
 94 |         device=device,
 95 |         dtype=torch.long,
 96 |     )
 97 |     train_label = torch.zeros(total_samples, dtype=torch.long, device=device)
 98 |     train_dataset = torch.utils.data.TensorDataset(train_data, train_label)
 99 |     if is_distrbuted:
100 |         sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
101 |     else:
102 |         sampler = SequentialSampler(train_dataset)
103 |     train_loader = torch.utils.data.DataLoader(
104 |         train_dataset, batch_size=batch_size, sampler=sampler
105 |     )
106 |     return train_loader
107 | 
108 | 
109 | class SimpleModel(torch.nn.Module):
110 |     def __init__(self, hidden_dim, seq_len, is_ckp=False, is_share_param=False):
111 |         super(SimpleModel, self).__init__()
112 |         config = BertConfig()
113 |         config.vocab_size = 25
114 |         config.max_position_embeddings = seq_len
115 |         config.hidden_size = hidden_dim
116 |         self.embeddings_1 = BertEmbeddings(config)
117 | 
118 |         self._is_share_param = is_share_param
119 |         if is_share_param:
120 |             self.embeddings_2 = self.embeddings_1
121 |         else:
122 |             self.embeddings_2 = BertEmbeddings(config)
123 |         self.encoder = Encoder(hidden_dim, is_ckp)
124 |         self.cross_entropy_loss = torch.nn.CrossEntropyLoss()
125 | 
126 |     def forward(self, x, y):
127 |         h1 = self.embeddings_1(x)
128 |         h2 = self.embeddings_2(x)
129 |         h3 = h1 + h2
130 |         h3 = self.encoder(h3)
131 |         return self.cross_entropy_loss(h3[:, 0], y)
132 | 


--------------------------------------------------------------------------------
/unitest/common.py:
--------------------------------------------------------------------------------
  1 | # BSD 3-Clause License
  2 | #
  3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company.  All rights reserved.
  4 | #
  5 | # Redistribution and use in source and binary forms, with or without modification,
  6 | # are permitted provided that the following conditions are met:
  7 | #
  8 | #  * Redistributions of source code must retain the above copyright notice, this
  9 | #    list of conditions and the following disclaimer.
 10 | #
 11 | #  * Redistributions in binary form must reproduce the above copyright notice,
 12 | #    this list of conditions and the following disclaimer in the documentation
 13 | #    and/or other materials provided with the distribution.
 14 | #
 15 | #  * Neither the name of the psutil authors nor the names of its contributors
 16 | #    may be used to endorse or promote products derived from this software without
 17 | #    specific prior written permission.
 18 | #
 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 29 | 
 30 | import os
 31 | import time
 32 | 
 33 | import torch
 34 | from torch.multiprocessing import Process
 35 | 
 36 | # Worker timeout *after* the first worker has completed.
 37 | UNIT_WORKER_TIMEOUT = 120
 38 | 
 39 | 
 40 | def distributed_test(world_size=2, backend="nccl", use_fake_dist=False):
 41 |     r"""A decorator for executing a function (e.g., a unit test) in a distributed manner.
 42 | 
 43 |     This decorator manages the spawning and joining of processes, initialization of
 44 |     torch.distributed, and catching of errors.
 45 | 
 46 |     Usage example:
 47 |         @distributed_test(worker_size=[2,3])
 48 |         def my_test():
 49 |             rank = dist.get_rank()
 50 |             world_size = dist.get_world_size()
 51 |             assert(rank < world_size)
 52 | 
 53 |     Args:
 54 |         world_size (int or list): number of ranks to spawn. Can be a list to spawn
 55 |         multiple tests.
 56 |     """
 57 | 
 58 |     def dist_wrap(run_func):
 59 |         """Second-level decorator for dist_test. This actually wraps the function."""
 60 | 
 61 |         def dist_init(local_rank, num_procs, *func_args, **func_kwargs):
 62 |             """Initialize torch.distributed and execute the user function."""
 63 |             os.environ["MASTER_ADDR"] = "127.0.0.1"
 64 |             os.environ["MASTER_PORT"] = "29503"
 65 |             os.environ["LOCAL_RANK"] = str(local_rank)
 66 |             # NOTE: unit tests don't support multi-node so local_rank == global rank
 67 |             os.environ["RANK"] = str(local_rank)
 68 |             os.environ["WORLD_SIZE"] = str(num_procs)
 69 | 
 70 |             torch.distributed.init_process_group(backend=backend)
 71 |             if torch.cuda.is_available():
 72 |                 if use_fake_dist:
 73 |                     torch.cuda.set_device(0)
 74 |                 else:
 75 |                     torch.cuda.set_device(local_rank)
 76 |             run_func(*func_args, **func_kwargs)
 77 | 
 78 |         def dist_launcher(num_procs, *func_args, **func_kwargs):
 79 |             r"""Launch processes and gracefully handle failures."""
 80 | 
 81 |             # Spawn all workers on subprocesses.
 82 |             processes = []
 83 |             for local_rank in range(num_procs):
 84 |                 p = Process(
 85 |                     target=dist_init,
 86 |                     args=(local_rank, num_procs, *func_args),
 87 |                     kwargs=func_kwargs,
 88 |                 )
 89 |                 p.start()
 90 |                 processes.append(p)
 91 | 
 92 |             # Now loop and wait for a test to complete. The spin-wait here isn't a big
 93 |             # deal because the number of processes will be O(#GPUs) << O(#CPUs).
 94 |             any_done = False
 95 |             while not any_done:
 96 |                 for p in processes:
 97 |                     if not p.is_alive():
 98 |                         any_done = True
 99 |                         break
100 | 
101 |             # Wait for all other processes to complete
102 |             for p in processes:
103 |                 p.join(UNIT_WORKER_TIMEOUT)
104 | 
105 |             failed = [(rank, p) for rank, p in enumerate(processes) if p.exitcode != 0]
106 |             for _, p in failed:
107 |                 # If it still hasn't terminated, kill it because it hung.
108 |                 if p.exitcode is None:
109 |                     p.terminate()
110 |                 if p.exitcode != 0:
111 |                     p.terminate()
112 | 
113 |         def run_func_decorator(*func_args, **func_kwargs):
114 |             r"""Entry point for @distributed_test()."""
115 | 
116 |             if isinstance(world_size, int):
117 |                 dist_launcher(world_size, *func_args, **func_kwargs)
118 |             elif isinstance(world_size, list):
119 |                 for procs in world_size:
120 |                     dist_launcher(procs, *func_args, **func_kwargs)
121 |                     time.sleep(0.5)
122 |             else:
123 |                 raise TypeError("world_size must be an integer or a list of integers.")
124 | 
125 |         return run_func_decorator
126 | 
127 |     return dist_wrap
128 | 


--------------------------------------------------------------------------------
/examples/run_transformers.sh:
--------------------------------------------------------------------------------
  1 | cd $(dirname $0)
  2 | 
  3 | export GPU_NUM=${GPU_NUM:-1}
  4 | # Chunk Size in MB
  5 | export CS=${CS:-256}
  6 | # Batch Size
  7 | export BS=${BS:-16}
  8 | # Embedding on CPU
  9 | export CPU_EBD=${CPU_EBD:-0}
 10 | # Release remote chunks after init
 11 | export RELEASE_AFTER_INIT=${RELEASE_AFTER_INIT:-0}
 12 | export MODEL_NAME=${MODEL_NAME:-"GPT2small"}
 13 | # BERT or GPT
 14 | export MODEL_TYPE=${MODEL_TYPE:-"GPT"}
 15 | # distributed plan patrickstar or torch
 16 | export DIST_PLAN=${DIST_PLAN:-"patrickstar"}
 17 | # check results of patrickstar and torch, which disable
 18 | # DIST_PLAN setting
 19 | export RES_CHECK=${RES_CHECK:-0}
 20 | # offload activation checkpoints to CPU
 21 | export ACT_OFFLOAD=${ACT_OFFLOAD:-0}
 22 | # activation rematerization, aka. gradient checkpointing
 23 | export CKP=${CKP:-1}
 24 | # no retry after failed, used for torch 1.9.0
 25 | export NO_RETRY=${NO_RETRY:-0}
 26 | export SKIP_LOG_EXSIT=${SKIP_LOG_EXSIT:-0}
 27 | # static partition.
 28 | export SP=${SP:-0}
 29 | export MEM_PROF=${MEM_PROF:-0}
 30 | # asyn memory monitor for mem sampler
 31 | export AMM=${AMM:-1}
 32 | # mem saving comm
 33 | export MSC=${MSC:-1}
 34 | # mem caching comm
 35 | export CACHE=${CACHE:-1}
 36 | # async move
 37 | export ASYNC_MOVE=${ASYNC_MOVE:-0}
 38 | # linear tiling comm
 39 | export TILING=${TILING:-0}
 40 | # hybrid adam
 41 | export HYB=${HYB:-1}
 42 | 
 43 | export LOCAL_WORLD_SIZE=${LOCAL_WORLD_SIZE:-1}
 44 | export CS_SEARCH=${CS_SEARCH:-0}
 45 | 
 46 | export NNODES=${NNODES:-1}
 47 | export NODE_RANK=${NODE_RANK:-0}
 48 | export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
 49 | export MASTER_PORT=${MASTER_PORT:-"12345"}
 50 | export SUFFIX=${SUFFIX:-""}
 51 | 
 52 | if [[ ${TILING} == 1 ]];  then
 53 | TILING_FLAG="--with_tiling_linear"
 54 | else
 55 | export TILING_FLAG=""
 56 | fi
 57 | 
 58 | 
 59 | if [[ ${CACHE} == 1 ]];  then
 60 | CACHE_FLAG="--with_mem_cache"
 61 | else
 62 | export CACHE_FLAG=""
 63 | fi
 64 | 
 65 | if [[ ${ASYNC_MOVE} == 1 ]];  then
 66 | ASYNC_MOVE_FLAG="--with_async_move"
 67 | else
 68 | export ASYNC_MOVE_FLAG=""
 69 | fi
 70 | 
 71 | if [[ ${MSC} == 1 ]];  then
 72 | MSC_FLAG="--with_mem_saving_com"
 73 | else
 74 | export MSC_FLAG=""
 75 | fi
 76 | 
 77 | if [[ ${AMM} == 1 ]];  then
 78 | AMM_FLAG="--with_async_mem_monitor"
 79 | else
 80 | export AMM_FLAG=""
 81 | fi
 82 | 
 83 | 
 84 | if [[ ${MEM_PROF} == 1 ]];  then
 85 | MEM_PROF_FLAG="--with_mem_profiler"
 86 | else
 87 | export MEM_PROF_FLAG=""
 88 | fi
 89 | 
 90 | 
 91 | if [[ ${ACT_OFFLOAD} == 1 ]];  then
 92 | ACT_OFFLOAD_FLAG="--with_activation_offload"
 93 | else
 94 | export ACT_OFFLOAD_FLAG=""
 95 | fi
 96 | 
 97 | if [[ ${RES_CHECK} == 1 ]];  then
 98 | RES_CHECK_FLAG="--res_check"
 99 | else
100 | export RES_CHECK_FLAG=""
101 | fi
102 | 
103 | 
104 | if [[ ${CPU_EBD} == 1 ]];  then
105 | export CPU_EBD_FLAG="--use_cpu_embedding"
106 | else
107 | export CPU_EBD_FLAG=""
108 | fi
109 | 
110 | if [[ ${RELEASE_AFTER_INIT} == 1 ]];  then
111 | export RELEASE_AFTER_INIT_FLAG="--release_after_init"
112 | else
113 | export RELEASE_AFTER_INIT_FLAG=""
114 | fi
115 | 
116 | if [[ ${CKP} == 1 ]]; then
117 |     export CKP_FLAG="--use_ckp"
118 | else
119 |     export CKP_FLAG=""
120 | fi
121 | 
122 | let CHUNK_SIZE=${CS}*1024*1024
123 | 
124 | if [[ ${HYB} == 1 ]]; then
125 |     export HYBRID_ADAM_FLAG="--use_hybrid_adam"
126 | else
127 |     export HYBRID_ADAM_FLAG=""
128 | fi
129 | 
130 | 
131 | 
132 | LOG_DIR="./logs_${MODEL_NAME}"
133 | mkdir -p ${LOG_DIR}
134 | 
135 | GIT_VER=`git rev-parse --short=5 HEAD`
136 | LOG_FILE="log.${MODEL_NAME}_type_${MODEL_TYPE}_gpu_${GPU_NUM}_cs_${CS}_bs_${BS}_cpueb_${CPU_EBD}_hyb_${HYB}_offload_${ACT_OFFLOAD}_SP_${SP}_AMM_${AMM}_MSC_${MSC}_CACHE_${CACHE}_TILING_${TILING}_${GIT_VER}_node_${NNODES}_${SUFFIX}"
137 | 
138 | is_run_flag=`python ./benchmark/is_run_this_file.py --path "${LOG_DIR}" --file "${LOG_FILE}"`
139 | echo is_run_flag $is_run_flag
140 | if [[ ${is_run_flag} == "0" && ${SKIP_LOG_EXSIT} == 1 ]];
141 | then
142 | echo "it has been logged"
143 | exit
144 | fi
145 | echo "runing ${LOG_DIR} ${LOG_FILE}"
146 | 
147 | if [[ ${NO_RETRY} == "1" ]];
148 | then
149 | NO_RETRY_FLAG="--max_restarts=0"
150 | fi
151 | 
152 | 
153 | if [[ ${SP} == 1 ]];
154 | then
155 | SP_FLAG="--with_static_partition"
156 | fi
157 | 
158 | 
159 | wc=`cat /proc/cpuinfo | grep "processor"| wc -l`
160 | let TNUM=wc/${GPU_NUM}
161 | echo "CPU core number " $wc "THREAD NUM " ${TNUM}
162 | 
163 | cmd_opts="
164 |     --use_fp16 \
165 |     ${RES_CHECK_FLAG} \
166 |     ${NO_RETRY_FLAG} \
167 |     ${CKP_FLAG} \
168 |     --dist_plan=${DIST_PLAN} \
169 |     --batch_size=${BS} \
170 |     --model_name=${MODEL_NAME} \
171 |     --model_type=${MODEL_TYPE} \
172 |     --batch_size=${BS} \
173 |     ${CPU_EBD_FLAG} \
174 |     ${HYBRID_ADAM_FLAG} \
175 |     ${RELEASE_AFTER_INIT_FLAG} \
176 |     ${LIGHTSEQ_FLAG} \
177 |     ${ACT_OFFLOAD_FLAG} \
178 |     ${SP_FLAG} \
179 |     ${MEM_PROF_FLAG} \
180 |     ${AMM_FLAG} \
181 |     ${MSC_FLAG} \
182 |     ${CACHE_FLAG} \
183 |     ${ASYNC_MOVE_FLAG} \
184 |     ${TILING_FLAG} \
185 | "
186 | 
187 | if [[ ${CS_SEARCH} == 1 ]];  then
188 | mkdir -p ./search_res
189 | SLOG_FILE="./search_res/slog_file.${MODEL_NAME}_bs_${BS}_cpueb_${CPU_EBD}_offload_${ACT_OFFLOAD}_SP_${SP}_AMM_${AMM}_MSC_${MSC}_CACHE_${CACHE}_TILING_${TILING}_${GIT_VER}"
190 | rm -rf ${SLOG_FILE}
191 | 
192 | for((i=312;i>=64;i-=32));
193 | do
194 | let CUR_CHUNK_SIZE=${i}*1024*1024
195 | echo "searching CHUNK_SIZE ${i} M elem"
196 | 
197 | python -m torch.distributed.launch --nproc_per_node=1 \
198 |     eval_chunk_size.py \
199 |     --default_chunk_size=${CUR_CHUNK_SIZE} \
200 |     --slog_file=${SLOG_FILE} \
201 |     ${cmd_opts}
202 | done
203 | else
204 | env OMP_NUM_THREADS=${TNUM} timeout -s SIGKILL 30m python -m torch.distributed.launch --nproc_per_node=${GPU_NUM} \
205 | --nnodes=${NNODES} --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \
206 |     pretrain_demo.py \
207 |     --default_chunk_size=${CHUNK_SIZE} \
208 |     ${cmd_opts} \
209 |     2>&1 | tee ${LOG_DIR}/${LOG_FILE}
210 | fi
211 | 


--------------------------------------------------------------------------------
/doc/optimization_options.md:
--------------------------------------------------------------------------------
 1 | This page explains the optimization options for benchmarking.
 2 | Optimizations are divided into PatrickStar-related ones and general ones.
 3 | General Optimizations can be applied to any PyTorch-based framework.
 4 | 
 5 | ## General Optimizations
 6 | 1. Activation Checkpoing (a.k.a gradient checkpointing in [PyTorch](https://pytorch.org/docs/stable/checkpoint.html))
 7 | `--use_ckp`
 8 | Make sure this option is open for large model training. It can primarily save activation memory footprint at the cost of recomputing.
 9 | 
10 | 2. Activation Offloading
11 | `--with_activation_offload`
12 | Offload the checkpoints activation from GPU to CPU. Further Save GPU memory.
13 | Note you have to use activation checkpoing first.
14 | 
15 | 3. CPU Embedding
16 | `--use_cpu_embedding`
17 | nn.Embedding is executed on CPU, save GPU memory. More importantly, it shrinks the chunk size. For some small models, the most significant layer is Embedding. Therefore, the chunk size has to be larger than the embedding numel.
18 | 
19 | 
20 | 4. Tiling Linear (a.k.a Memory-centric tiling in [DeepSpeed](https://deepspeed.readthedocs.io/en/stable/zero3.html#memory-centric-tiling))
21 | `--with_tiling_linear`
22 | Memory-centric tiling (MCT) can split a param tensor of linear into pieces, and they do not need to be stored in contiguous memory space. This will help reduce chunk size. However, to achieve the best performance, you have to tune the in_splits/out_splits of the function's parameters.
23 | 
24 | ## PatrickStar-related Optmizations
25 | 
26 | 1. Memory Saving Communication.
27 | `--with_mem_saving_com`
28 | Use one-to-all communication to replace the original collective communication. More specifically, reduce scatter is replaced with Nx reduce. all gather is replaced with Nx bcast. In this way, we do not need to keep a Nx chunk buffer for distributed training, therefore saving the GPU memory. This method also changes the CPU-GPU and intra-GPU communication volume. In general, it reduces CPU-GPU comm volume at a cost of increasing intra-GPU bcast comm volume and also lower the intra-GPU bcast bandwidth. However, in some cases, it can improve the overall performance of the system from such a tradeoff. It is suitable for training an extremely large model with a computing cluster with high-quality intra-GPU communication bandwidth, i.e. 50B model on a node of SuperPod. Details in Merge Request #250.
29 | 
30 | 2. Memory Allocation Caching.
31 | `--with_mem_cache`
32 | Use a cache to allocate and release chunk memory. The cache is a size-limited queue whose capacity is default as 2. It is helpful for Memory Saving Communication in distributed training. It avoids frequent release and allocates memory for remote chunks. See detail in #241.
33 | 
34 | 
35 | 2. Hybrid ADAM:
36 | `--use_hybrid_adam`
37 | Place Optimizer States (OS) on both CPU and GPU. Part of ADAM computation is conducted on CPU and the rest of computation is on GPU. On the contrary, Zero-Offload does ADAM on CPU only. This technique is able to accelerate ADAM computation for relative small model.
38 | 
39 | 3. Activation Offload.
40 | `--with_activation_offload`
41 | Offload activation to CPU. Must used in combination with activation checkpointing (a.k.a gradient checkpoint in PyTorch).
42 | 
43 | 4. Asyn Monitoring Memory with the Runtime Memory Tracer.
44 | `--with_async_mem_monitor`
45 | Async Sampling memory usage with an independent thread. It will bring a more accurate runtime
46 | memory usage statistics. If you turn off this flag, memory usage sampling will triggered at the exact moment before or after operators (submodule in PyTorch) computing.
47 | 
48 | 
49 | 5. Static Partion.
50 | `--with_static_partition`
51 | PatirckStar is famous for dynamic partition model data. With help of this flag you can static partition model data between CPU and GPU. The max GPU used by chunks is `warmup_gpu_chunk_mem_ratio` * gpu_size. It is still better than Zero-Offload, which alway put all param and grad in GPU, to avoid OOM. It will lead to lower computing efficient than the default dynamic partition. But it is helpful to aggressively avoid OOM.
52 | 
53 | 6. Release Remote Chunk After Initialization.
54 | `release_after_init`
55 | The is a computing efficient irrelevant option used for distributed training. It allocates memory for remote chunks but release it immediately. In this way, we can make sure the model parameter is randomly initialized the same as a serial version. Solve the problem with random seed. It is used in combination with the `--res_check` option to check the correctness of distributed training.
56 | 
57 | 7. Adjusting the quota of CPU and GPU memory of memory tracer.
58 | We provide ways to adjust the CPU and GPU memory usage quota for the memory tracer. We did not expose this optimization as parameters passed through the command line. As shown in the pretrain_demo.py, there is a JSON config for the memory tracer setting. You can adjust the four ratio suffix values.
59 | 
60 | `warmup_gpu_chunk_mem_ratio`: the max gpu memory of a GPU can be used for chunks during the warmup iteration.
61 | 
62 | `overall_gpu_mem_ratio`: the available gpu mem size / real gpu mem capacity. Turn up the value if you meet cpu or gpu OOM during iteration.
63 | 
64 | `overall_cpu_mem_ratio`: the available cpu mem size / real cpu mem capacity. Turn up the value if you meet cpu or gpu OOM during iteration.
65 | 
66 | `margin_use_ratio`: Space to host optimizer states in GPU / the rest GPU space excluding the peak chunk-used space after warmup FWD+BWD.
67 | 
68 | `use_fake_dist`: a debug flag, to simulate multiple-GPU on one GPU. It is used when we are poor. After we have multi-GPU we deprecated this flag.
69 | 
70 | ```
71 | "mem_tracer": {
72 |                     "use_async_mem_monitor": args.with_async_mem_monitor,
73 |                     "warmup_gpu_chunk_mem_ratio": 0.1,
74 |                     "overall_gpu_mem_ratio": 0.8,
75 |                     "overall_cpu_mem_ratio": 0.8,
76 |                     "margin_use_ratio": 0.8,
77 |                     "use_fake_dist": False,
78 |                     "with_static_partition": args.with_static_partition,
79 |                 },
80 | ```
81 | 


--------------------------------------------------------------------------------