├── .gitignore ├── logo.png ├── requirements.txt ├── doc ├── m_node_superpod.png ├── mgpu_scalability.png ├── one_node_perf_a100.png ├── clue-gpt2-loss-n-acc.png ├── profiler │ ├── GPT3_8B_memory.png │ └── GPT3_8B_4xV100_access.png ├── yard_network_fabric.md └── optimization_options.md ├── .code.yml ├── MANIFEST.in ├── .flake8 ├── examples ├── benchmark │ ├── run_benchmark.sh │ ├── run_a100_benchmark_example.sh │ ├── run_a100_benchmark_large_model.sh │ ├── run_a100_benchmark_small_model.sh │ ├── is_run_this_file.py │ ├── generate_res_table.py │ └── process_logs.py ├── optimizations │ ├── __init__.py │ ├── global_opt_flags.py │ ├── test_tiling.py │ └── ls_hf_transformer_encoder_layer.py ├── README.md ├── data_loader.py ├── ps_config.py ├── imdb_dataset.py ├── moe │ ├── moe_bert.py │ └── huggingface_bert_moe.py ├── train_simple_net.py ├── huggingface_bert.py ├── simple_net.py └── run_transformers.sh ├── CHANGE_LOG.md ├── .pre-commit-config.yaml ├── LICENSE ├── __init__.py ├── patrickstar ├── profiler │ ├── __init__.py │ └── profiler.py ├── ops │ ├── op_builder │ │ ├── __init__.py │ │ └── cpu_adam.py │ ├── __init__.py │ ├── csrc │ │ └── includes │ │ │ └── context.h │ └── embedding.py ├── fp16 │ └── __init__.py ├── core │ ├── memtracer │ │ ├── __init__.py │ │ ├── training_stage_mgr.py │ │ └── metronome.py │ ├── __init__.py │ ├── comm.py │ ├── const.py │ ├── tensor_stub.py │ ├── torch_profiler_hook.py │ └── memory_cache.py ├── manager │ ├── __init__.py │ ├── cuda_context.py │ └── runtime_config.py ├── __init__.py ├── utils │ ├── __init__.py │ ├── singleton_meta.py │ ├── helper.py │ ├── model_size_calculator.py │ ├── distributed.py │ ├── memory.py │ ├── logging.py │ ├── memory_monitor.py │ └── global_timer.py └── runtime │ └── __init__.py ├── tools └── merge_checkpoint.py ├── unitest ├── test_torch_scope.py ├── test_optimizer_init.py ├── test_embedding_ops.py ├── test_utils.py ├── test_memory_cache.py ├── test_eviction_policy.py ├── test_model_init.py ├── test_chunk_list.py ├── test_client.py ├── test_chunk_data.py └── common.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__* 2 | *DS_Store* 3 | -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/logo.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch 2 | pytest 3 | psutil 4 | ninja 5 | rich 6 | transformers 7 | scipy 8 | -------------------------------------------------------------------------------- /doc/m_node_superpod.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/m_node_superpod.png -------------------------------------------------------------------------------- /doc/mgpu_scalability.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/mgpu_scalability.png -------------------------------------------------------------------------------- /doc/one_node_perf_a100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/one_node_perf_a100.png -------------------------------------------------------------------------------- /doc/clue-gpt2-loss-n-acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/clue-gpt2-loss-n-acc.png -------------------------------------------------------------------------------- /doc/profiler/GPT3_8B_memory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/profiler/GPT3_8B_memory.png -------------------------------------------------------------------------------- /doc/profiler/GPT3_8B_4xV100_access.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Tencent/PatrickStar/HEAD/doc/profiler/GPT3_8B_4xV100_access.png -------------------------------------------------------------------------------- /.code.yml: -------------------------------------------------------------------------------- 1 | source: 2 | third_party_source: 3 | filepath_regex: [".*/patrickstar/ops/csrc/*/.*", 4 | ".*/patrickstar/ops/op_builder/.*"] 5 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include *.txt README.md 2 | recursive-include requirements *.txt 3 | recursive-include patrickstar *.cpp *.h *.cu *.tr *.cuh *.cc 4 | recursive-include csrc *.cpp *.h *.cu *.tr *.cuh *.cc 5 | -------------------------------------------------------------------------------- /.flake8: -------------------------------------------------------------------------------- 1 | [flake8] 2 | ignore = 3 | ;W503 line break before binary operator 4 | W503, 5 | ;E203 whitespace before ':' 6 | E203, 7 | 8 | ; exclude file 9 | exclude = 10 | .tox, 11 | .git, 12 | __pycache__, 13 | build, 14 | dist, 15 | *.pyc, 16 | *.egg-info, 17 | .cache, 18 | .eggs 19 | 20 | max-line-length = 120 21 | 22 | per-file-ignores = __init__.py:F401 23 | -------------------------------------------------------------------------------- /examples/benchmark/run_benchmark.sh: -------------------------------------------------------------------------------- 1 | mkdir -p ./logs 2 | 3 | export MODEL_NAME="" 4 | export BS=32 5 | export CS=64 6 | export CPU_EBD=1 7 | export SP=0 8 | export ACT_OFFLOAD=0 9 | export NO_RETRY=1 10 | export SKIP_LOG_EXSIT=1 11 | 12 | for MODEL_NAME in "GPT2small" 13 | do 14 | for BS in 32 15 | do 16 | for CS in 64 17 | do 18 | for CPU_EBD in 1 19 | do 20 | for SP in 0 21 | do 22 | for ACT_OFFLOAD in 0 1 23 | do 24 | echo "****************** Begin ***************************" 25 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} " 26 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD}" 27 | bash ../run_transformers.sh 28 | echo "****************** Finished ***************************" 29 | echo "" 30 | echo "" 31 | done 32 | done 33 | done 34 | done 35 | done 36 | done 37 | -------------------------------------------------------------------------------- /doc/yard_network_fabric.md: -------------------------------------------------------------------------------- 1 | ## Network Topology of a node of WeChat Yard 2 | ```nvidia-smi topo -m``` 3 | 4 | GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 5 | 6 | GPU0 X NV1 NV2 NV1 SYS SYS SYS NV2 7 | 8 | GPU1 NV1 X NV1 NV2 SYS SYS NV2 SYS 9 | 10 | GPU2 NV2 NV1 X NV2 SYS NV1 SYS SYS 11 | 12 | GPU3 NV1 NV2 NV2 X NV1 SYS SYS SYS 13 | 14 | GPU4 SYS SYS SYS NV1 X NV2 NV2 NV1 15 | 16 | GPU5 SYS SYS NV1 SYS NV2 X NV1 NV2 17 | 18 | GPU6 SYS NV2 SYS SYS NV2 NV1 X NV1 19 | 20 | GPU7 NV2 SYS SYS SYS NV1 NV2 NV1 X 21 | 22 | ```nvidia-smi nvlink --status -i 0``` 23 | 24 | GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-4b6ebbfe-8eac-8fed-1939-b4c545eafa7f) 25 | 26 | Link 0: 25.781 GB/s 27 | 28 | Link 1: 25.781 GB/s 29 | 30 | Link 2: 25.781 GB/s 31 | 32 | Link 3: 25.781 GB/s 33 | 34 | Link 4: 25.781 GB/s 35 | 36 | Link 5: 25.781 GB/s 37 | -------------------------------------------------------------------------------- /examples/benchmark/run_a100_benchmark_example.sh: -------------------------------------------------------------------------------- 1 | export MODEL_NAME="" 2 | export BS=12 3 | export CS=384 4 | export CPU_EBD=0 5 | export SP=0 6 | export ACT_OFFLOAD=0 7 | export NO_RETRY=0 8 | export SKIP_LOG_EXSIT=0 9 | export MSC=1 10 | export CACHE=1 11 | export GPU_NUM=8 12 | export MODEL_TYPE="BERT" 13 | 14 | 15 | for GPU_NUM in 8 16 | do 17 | for MODEL_NAME in "GPT_DS_40B" 18 | do 19 | for BS in 4 20 | do 21 | for CS in 288 22 | do 23 | for CPU_EBD in 0 24 | do 25 | for SP in 0 26 | do 27 | for ACT_OFFLOAD in 0 28 | do 29 | for MSC in 1 30 | do 31 | for CACHE in 1 32 | do 33 | echo "****************** Begin ***************************" 34 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} " 35 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}" 36 | bash ../run_transformers.sh 37 | echo "****************** Finished ***************************" 38 | echo "" 39 | echo "" 40 | done 41 | done 42 | done 43 | done 44 | done 45 | done 46 | done 47 | done 48 | done 49 | -------------------------------------------------------------------------------- /examples/benchmark/run_a100_benchmark_large_model.sh: -------------------------------------------------------------------------------- 1 | export MODEL_NAME="" 2 | export BS=32 3 | export CS=64 4 | export CPU_EBD=1 5 | export SP=0 6 | export ACT_OFFLOAD=0 7 | export NO_RETRY=0 8 | export SKIP_LOG_EXSIT=1 9 | export MSC=1 10 | export CACHE=1 11 | export GPU_NUM=1 12 | 13 | 14 | for GPU_NUM in 1 2 4 8 15 | do 16 | for MODEL_NAME in "GPT_DS_20B" "GPT_DS_40B" 17 | do 18 | for BS in 8 4 16 19 | do 20 | for CS in 256 384 21 | do 22 | for CPU_EBD in 0 23 | do 24 | for SP in 0 25 | do 26 | for ACT_OFFLOAD in 0 27 | do 28 | for MSC in 1 29 | do 30 | for CACHE in 0 1 31 | do 32 | echo "****************** Begin ***************************" 33 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} " 34 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}" 35 | bash ../run_transformers.sh 36 | echo "****************** Finished ***************************" 37 | echo "" 38 | echo "" 39 | done 40 | done 41 | done 42 | done 43 | done 44 | done 45 | done 46 | done 47 | done 48 | -------------------------------------------------------------------------------- /examples/benchmark/run_a100_benchmark_small_model.sh: -------------------------------------------------------------------------------- 1 | export MODEL_NAME="" 2 | export BS=32 3 | export CS=64 4 | export CPU_EBD=1 5 | export SP=0 6 | export ACT_OFFLOAD=0 7 | export NO_RETRY=0 8 | export SKIP_LOG_EXSIT=1 9 | export MSC=1 10 | export CACHE=1 11 | export GPU_NUM=1 12 | 13 | 14 | for GPU_NUM in 1 2 4 8 15 | do 16 | for MODEL_NAME in "GPT_DS_20B" "GPT_DS_40B" 17 | do 18 | for BS in 8 4 16 19 | do 20 | for CS in 256 384 21 | do 22 | for CPU_EBD in 0 23 | do 24 | for SP in 0 25 | do 26 | for ACT_OFFLOAD in 0 27 | do 28 | for MSC in 0 1 29 | do 30 | for CACHE in 0 1 31 | do 32 | echo "****************** Begin ***************************" 33 | echo "* benchmarking CS ${CS} BS ${BS} MODEL ${MODEL_NAME} " 34 | echo "* CPU_EBD ${CPU_EBD} SP ${SP} ACT_OFFLOAD ${ACT_OFFLOAD} MSC ${MSC} CACHE ${CACHE}" 35 | bash ../run_transformers.sh 36 | echo "****************** Finished ***************************" 37 | echo "" 38 | echo "" 39 | done 40 | done 41 | done 42 | done 43 | done 44 | done 45 | done 46 | done 47 | done 48 | -------------------------------------------------------------------------------- /CHANGE_LOG.md: -------------------------------------------------------------------------------- 1 | ## v0.4.5 Dec. 2021 2 | Refactor the files in example and add chunk size searching.= 3 | Evaluate on 8 nodes of SuperPod. Fix bugs in multi-GPU mem tracer. 4 | 5 | 6 | ### v0.4.4 Dec. 2021 7 | The system is successfully evaluated on a multi-node system. 8 | The benchmark scripts are integrated with memory-centric tiling borrowed from DeepSpeed. 9 | It trains an 18B model on WeChat Yard. 10 | 11 | 12 | ### v0.4.3 Nov. 2021 13 | The system is evaluated on A100 SuperPod. 14 | Some optimizations are developed to improve further the model scale and efficiency, including memory saving communication (MSC) and allocation cache (CACHE). 15 | A severe bug caused by asyn chunk copy using stream is identified and fixed. 16 | It trains a 50B model on an 8xA100 SuperPod node. 17 | 18 | 19 | ### v0.4.0 Nov. 2021, 20 | The system is upgraded with a better memory tracer. 21 | We improve the max model scale further than v0.3.0 (15B vs. 12B) on the WeChat Yard Platform. 22 | 23 | ### v0.3.0 Oct. 2021. 24 | Our initial version significantly surpasses DeepSpeed both in model-scale and computing efficiency. 25 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | # See https://pre-commit.com for more information 2 | # See https://pre-commit.com/hooks.html for more hooks 3 | repos: 4 | - repo: https://github.com/pre-commit/pre-commit-hooks 5 | rev: v2.4.0 6 | hooks: 7 | - id: trailing-whitespace 8 | - id: end-of-file-fixer 9 | - id: check-added-large-files 10 | - repo: https://github.com/doublify/pre-commit-clang-format 11 | rev: master 12 | hooks: 13 | - id: clang-format 14 | files: \.(c|cc|cxx|cpp|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|vert|cu)$ 15 | - repo: https://github.com/ambv/black 16 | rev: stable 17 | hooks: 18 | - id: black 19 | - repo: https://github.com/pycqa/flake8 20 | rev: '' # pick a git hash / tag to point to 21 | hooks: 22 | - id: flake8 23 | - repo: https://github.com/Lucas-C/pre-commit-hooks 24 | rev: "v1.1.7" 25 | hooks: 26 | - id: forbid-crlf 27 | - id: remove-crlf 28 | - id: forbid-tabs 29 | - id: remove-tabs 30 | args: [ --whitespaces-count, "2" ] # defaults to: 4 31 | - id: insert-license 32 | files: \.(c|cc|cxx|cpp|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|vert|cu)$ 33 | args: 34 | - --license-filepath 35 | - LICENSE # defaults to: LICENSE.txt 36 | - --comment-style 37 | - // # defaults to: # 38 | - id: insert-license 39 | files: \.(py)$ 40 | args: 41 | - --license-filepath 42 | - LICENSE # defaults to: LICENSE.txt 43 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without modification, 6 | are permitted provided that the following conditions are met: 7 | 8 | * Redistributions of source code must retain the above copyright notice, this 9 | list of conditions and the following disclaimer. 10 | 11 | * Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | * Neither the name of the psutil authors nor the names of its contributors 16 | may be used to endorse or promote products derived from this software without 17 | specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /examples/optimizations/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /patrickstar/profiler/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | 31 | from .profiler import profiler 32 | -------------------------------------------------------------------------------- /patrickstar/ops/op_builder/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .cpu_adam import CPUAdamBuilder 31 | -------------------------------------------------------------------------------- /patrickstar/fp16/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .loss_scaler import LossScaler, DynamicLossScaler 31 | -------------------------------------------------------------------------------- /examples/optimizations/global_opt_flags.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | USE_TILE = False 31 | USE_ACT_OFFLOAD = False 32 | -------------------------------------------------------------------------------- /patrickstar/ops/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .embedding import Embedding 31 | from .fp16_cpu_adam import FP16Adam 32 | -------------------------------------------------------------------------------- /patrickstar/core/memtracer/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .memtracer import RuntimeMemTracer 31 | from .metronome import Metronome 32 | -------------------------------------------------------------------------------- /patrickstar/manager/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .cuda_context import CUDAContext 31 | from .runtime_config import _runtime_config 32 | -------------------------------------------------------------------------------- /patrickstar/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .core import PatrickStarClient 31 | from .core.memtracer import RuntimeMemTracer 32 | from .ops import FP16Adam 33 | from .runtime import initialize_engine 34 | from .utils import global_timer 35 | from .utils import see_memory_usage 36 | from .utils.model_size_calculator import get_ps_model_size, estimate_bert_mac 37 | -------------------------------------------------------------------------------- /examples/README.md: -------------------------------------------------------------------------------- 1 | ## PatrickStar examples 2 | 3 | ### Use PatrickStar with HuggingFace 4 | 5 | `huggingface_bert.py` is a fine-tuning Huggingface example with Patrickstar. Could you compare it with the [official Huggingface example](https://huggingface.co/transformers/custom_datasets.html#seq-imdb) to know how to apply PatrickStar to existed projects. 6 | 7 | Before running the example, you need to prepare the data: 8 | 9 | ```bash 10 | wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz 11 | tar -xf aclImdb_v1.tar.gz 12 | ``` 13 | 14 | And change the directory used in `get_dataset()`. After these, you are ready to go: 15 | 16 | ```bash 17 | python huggingface_bert.py 18 | ``` 19 | 20 | ### Use PatrickStar to train large model 21 | 22 | `run_transformers.sh` and `pretrain_demo.py` is an example to train large PTMs with PatrickStar. You could run different size of model by adding config to`run_transformers.sh`. 23 | 24 | The following command will run a model with 4B params: 25 | 26 | ```bash 27 | env MODEL_NAME=GPT2_4B RES_CHECK=0 DIST_PLAN="patrickstar" bash run_transformers.sh 28 | ``` 29 | 30 | For the available `MODEL_NAME`, please check `pretrain_demo.py`. 31 | 32 | Check the accuracy of PatrickStar with Bert: 33 | 34 | ```bash 35 | bash RES_CHECK=1 run_transformers.sh 36 | ``` 37 | 38 | ### MoE support 39 | 40 | PatrickStar also support training MoE models. In the `examples/moe` directory, run: 41 | 42 | ```bash 43 | python -m torch.distributed.launch --nproc_per_node=4 huggingface_bert_moe.py 44 | ``` 45 | 46 | Note that you need to install [FastMoE](https://github.com/laekov/fastmoe) before running this example. 47 | 48 | 49 | ### Search the best chunk size 50 | 51 | Chunk size (CS) is an important hyperparameter for patrickstar. 52 | Although you can set an CS value empirically by run your training task serveral times. We provide an systemic way to find a CS with less memory footprint. Using the following command to search the chunk size. 53 | 54 | ``` 55 | env CS_SEARCH=1 bash run_transformers.sh 56 | ``` 57 | -------------------------------------------------------------------------------- /patrickstar/utils/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .distributed import get_world_size, get_rank, get_local_world_size 31 | from .helper import getsizeof, get_space_of 32 | from .logging import log_dist, logger, print_rank 33 | from .memory import get_memory_info 34 | from .memory_monitor import ( 35 | see_memory_usage, 36 | get_sys_memory_used, 37 | ) 38 | from .singleton_meta import SingletonMeta 39 | -------------------------------------------------------------------------------- /patrickstar/core/memtracer/training_stage_mgr.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from patrickstar.core.const import TrainingStage 31 | 32 | 33 | class TrainingStageMgr: 34 | def __init__(self): 35 | """ 36 | Tell us in which stage the training are. (FWD, BWD, ADAM) 37 | Also tell us whether in an warmup iteration. 38 | """ 39 | self.training_phase = TrainingStage.UNSTART 40 | self.is_warmup = False 41 | -------------------------------------------------------------------------------- /patrickstar/core/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .chunk_data import Chunk 31 | from .chunk_list import ChunkList 32 | from .chunk_tensor_index import ChunkTensorIndex 33 | from .client import PatrickStarClient 34 | from .const import AccessType, ChunkState, TensorState, TrainingStage, ChunkType 35 | from .hook import setup_patrickstar_hooks 36 | from .parameter import PSParameter, register_param, is_param_registered, ParamType 37 | from .preprocess import PSPreProcessCtx, torch_scope 38 | -------------------------------------------------------------------------------- /tools/merge_checkpoint.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import fire 31 | import torch 32 | 33 | from patrickstar.utils import logger 34 | 35 | 36 | def merge_checkpoint(pattern, num): 37 | merged_state_dict = {} 38 | for i in range(num): 39 | filename = pattern.replace("*", f"{i}") 40 | merged_state_dict.update(torch.load(filename)) 41 | 42 | merged_filename = pattern.replace("*", "merged") 43 | logger.warning(f"Merged checkpoint will be saved to {merged_filename}") 44 | torch.save(merged_state_dict, merged_filename) 45 | 46 | 47 | if __name__ == "__main__": 48 | fire.Fire(merge_checkpoint) 49 | -------------------------------------------------------------------------------- /patrickstar/utils/singleton_meta.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | 31 | class SingletonMeta(type): 32 | """ 33 | The Singleton class can be implemented in different ways in Python. Some 34 | possible methods include: base class, decorator, metaclass. We will use the 35 | metaclass because it is best suited for this purpose. 36 | """ 37 | 38 | _instances = {} 39 | 40 | def __call__(cls, *args, **kwargs): 41 | """ 42 | Possible changes to the value of the `__init__` argument do not affect 43 | the returned instance. 44 | """ 45 | if cls not in cls._instances: 46 | instance = super().__call__(*args, **kwargs) 47 | cls._instances[cls] = instance 48 | return cls._instances[cls] 49 | -------------------------------------------------------------------------------- /patrickstar/manager/cuda_context.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from patrickstar.utils import SingletonMeta 31 | from patrickstar.utils import logger, get_world_size 32 | import torch 33 | 34 | 35 | class CUDAContext(metaclass=SingletonMeta): 36 | def __init__(self): 37 | self.compute_stream = torch.cuda.current_stream() 38 | if get_world_size() == 1: 39 | self.copy_stream = torch.cuda.Stream() 40 | else: 41 | # TODO(zilinzhu) The async copy mechanism has some 42 | # weird numeric bugs in multi-process setting. 43 | # Turn it off before fixing that. 44 | logger.warning( 45 | "Asynchronized move will not be enabled for world size larger than 1" 46 | ) 47 | self.copy_stream = self.compute_stream 48 | -------------------------------------------------------------------------------- /examples/benchmark/is_run_this_file.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import argparse 31 | from process_logs import is_run_this_file 32 | 33 | 34 | def add_args(parser): 35 | group = parser.add_argument_group(title="patrickstar") 36 | group.add_argument( 37 | "--file", 38 | type=str, 39 | help="file name.", 40 | ) 41 | group.add_argument( 42 | "--path", 43 | type=str, 44 | help="path name.", 45 | ) 46 | return parser 47 | 48 | 49 | if __name__ == "__main__": 50 | parser = argparse.ArgumentParser(description="PatrickStar Arguments") 51 | parser = add_args(parser) 52 | args = parser.parse_args() 53 | IS_RUN = is_run_this_file(args.path, args.file, {}, {}) 54 | 55 | if IS_RUN: 56 | print(1) 57 | else: 58 | print(0) 59 | -------------------------------------------------------------------------------- /examples/optimizations/test_tiling.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | import pytest 32 | import copy 33 | from tiling import TiledLinear 34 | 35 | 36 | @pytest.mark.parametrize("in_splits,out_splits", [(1, 1), (2, 2)]) 37 | @pytest.mark.parametrize("in_f,out_f", [(32, 32), (23, 29), (29, 23)]) 38 | def test_tiled_forward(in_splits, out_splits, in_f, out_f): 39 | base = torch.nn.Linear(in_f, out_f) 40 | test = TiledLinear( 41 | in_f, 42 | out_f, 43 | bias=True, 44 | init_linear=copy.deepcopy(base), 45 | out_splits=out_splits, 46 | in_splits=in_splits, 47 | ) 48 | 49 | inp = torch.rand(in_f) 50 | 51 | base_out = base(copy.deepcopy(inp)) 52 | test_out = test(copy.deepcopy(inp)) 53 | 54 | assert torch.allclose(base_out, test_out, rtol=1e-4) 55 | -------------------------------------------------------------------------------- /unitest/test_torch_scope.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | 34 | from common import distributed_test 35 | from patrickstar.core import PatrickStarClient, PSPreProcessCtx, torch_scope, ParamType 36 | 37 | 38 | class TestTorchScopeContext(unittest.TestCase): 39 | def setUp(self): 40 | pass 41 | 42 | @distributed_test(world_size=[1]) 43 | def test_torch_scope(self): 44 | def model_provider(): 45 | with torch_scope(): 46 | return torch.nn.Linear(5, 10) 47 | 48 | default_chunk_size = 1 * 1024 * 1024 49 | client = PatrickStarClient(0, default_chunk_size) 50 | 51 | with PSPreProcessCtx(client, dtype=torch.float): 52 | ps_model = model_provider() 53 | 54 | assert ps_model.weight.ps_attr.param_type == ParamType.TORCH_BASED 55 | 56 | 57 | if __name__ == "__main__": 58 | unittest.main() 59 | -------------------------------------------------------------------------------- /patrickstar/manager/runtime_config.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | from copy import deepcopy 30 | 31 | from patrickstar.utils import SingletonMeta 32 | 33 | 34 | class RuntimeConfig(metaclass=SingletonMeta): 35 | def __init__(self): 36 | self.config = { 37 | "use_chunk": True, 38 | # Whether the torch based tensors will do allreduce, 39 | # this is strongly related to `torch_scope` 40 | "do_allreduce": True, 41 | } 42 | self.old_configs = [] 43 | 44 | @property 45 | def use_chunk(self): 46 | return self.config["use_chunk"] 47 | 48 | @property 49 | def do_allreduce(self): 50 | return self.config["do_allreduce"] 51 | 52 | def push(self): 53 | self.old_configs.append(self.config) 54 | self.config = deepcopy(self.config) 55 | 56 | def pop(self): 57 | self.config = self.old_configs.pop() 58 | 59 | 60 | _runtime_config = RuntimeConfig() 61 | -------------------------------------------------------------------------------- /patrickstar/utils/helper.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | from patrickstar.core.const import ParamType, AccessType 32 | 33 | 34 | def get_real_data_tensor(param): 35 | if param.ps_attr.param_type == ParamType.TORCH_BASED: 36 | return param.data 37 | elif param.ps_attr.param_type == ParamType.CHUNK_BASED: 38 | return param.ps_attr.access_tensor(AccessType.DATA) 39 | else: 40 | raise RuntimeError 41 | 42 | 43 | def getsizeof(data_type: torch.dtype): 44 | if data_type == torch.float: 45 | return 4 46 | elif data_type == torch.half: 47 | return 2 48 | elif data_type == torch.int8: 49 | return 1 50 | elif data_type == torch.int16: 51 | return 2 52 | elif data_type == torch.int32: 53 | return 4 54 | elif data_type == torch.int64: 55 | return 8 56 | else: 57 | raise TypeError(f"getsizeof dose not support data type {data_type}") 58 | 59 | 60 | def get_space_of(tensor): 61 | return tensor.numel() * getsizeof(tensor.dtype) 62 | -------------------------------------------------------------------------------- /examples/data_loader.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | from torch.utils.data import SequentialSampler 32 | 33 | 34 | def get_bert_data_loader( 35 | batch_size, 36 | total_samples, 37 | sequence_length, 38 | device, 39 | data_type=torch.float, 40 | is_distrbuted=False, 41 | ): 42 | train_data = torch.randint( 43 | low=0, 44 | high=1000, 45 | size=(total_samples, sequence_length), 46 | device=device, 47 | dtype=torch.long, 48 | ) 49 | train_label = torch.randint( 50 | low=0, high=2, size=(total_samples,), device=device, dtype=torch.long 51 | ) 52 | train_dataset = torch.utils.data.TensorDataset(train_data, train_label) 53 | if is_distrbuted: 54 | sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 55 | else: 56 | sampler = SequentialSampler(train_dataset) 57 | train_loader = torch.utils.data.DataLoader( 58 | train_dataset, batch_size=batch_size, sampler=sampler 59 | ) 60 | return train_loader 61 | -------------------------------------------------------------------------------- /patrickstar/core/comm.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from patrickstar.utils import get_world_size 31 | 32 | 33 | class CommGroupInfo(object): 34 | def __init__(self, chunk_type, id): 35 | self.chunk_type = chunk_type 36 | self.id = id 37 | 38 | def __hash__(self): 39 | return hash((self.chunk_type, self.id)) 40 | 41 | def __eq__(self, other): 42 | return (self.chunk_type, self.id) == (other.chunk_type, other.id) 43 | 44 | def __str__(self): 45 | return f"({self.chunk_type}, {self.id})" 46 | 47 | 48 | class CommInfo(object): 49 | def __init__(self, chunk_type, group_id, offset): 50 | assert offset < get_world_size() 51 | self.group = CommGroupInfo(chunk_type=chunk_type, id=group_id) 52 | self.offset = offset 53 | 54 | @property 55 | def chunk_type(self): 56 | return self.group.chunk_type 57 | 58 | @property 59 | def group_id(self): 60 | return self.group.id 61 | 62 | def __str__(self): 63 | return f"({self.group}, {self.offset})" 64 | -------------------------------------------------------------------------------- /patrickstar/utils/model_size_calculator.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from patrickstar.core.parameter import is_param_registered 31 | 32 | 33 | def get_ps_model_size(model): 34 | numel = 0 35 | param_cnt = 0 36 | for _, param in model.named_parameters(recurse=True): 37 | if is_param_registered(param): 38 | numel += param.ps_attr.numel 39 | else: 40 | numel += param.numel() 41 | param_cnt += 1 42 | return numel, param_cnt 43 | 44 | 45 | def estimate_bert_mac(config, batch_size, sequence_length, model_size): 46 | nvidia_total_macs = ( 47 | 96 48 | * batch_size 49 | * sequence_length 50 | * config.num_hidden_layers 51 | * config.hidden_size ** 2 52 | * ( 53 | 1 54 | + sequence_length / (6 * config.hidden_size) 55 | + config.vocab_size / (16 * config.num_hidden_layers * config.hidden_size) 56 | ) 57 | ) 58 | 59 | tera_flops = model_size * batch_size * sequence_length * 2 * 4 60 | return tera_flops, nvidia_total_macs 61 | -------------------------------------------------------------------------------- /unitest/test_optimizer_init.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | from transformers import BertModel, BertConfig 34 | 35 | from common import distributed_test 36 | from patrickstar.core import PSPreProcessCtx 37 | from patrickstar.core import PatrickStarClient 38 | from patrickstar.ops import FP16Adam 39 | 40 | 41 | class TestOptimizerInitContext(unittest.TestCase): 42 | def setUp(self): 43 | pass 44 | 45 | @distributed_test(world_size=[1]) 46 | def test_optimizer_init(self): 47 | def model_provider(): 48 | cfg = BertConfig() 49 | cfg.vocab_size = 10 50 | model = BertModel(cfg) 51 | return model 52 | 53 | default_chunk_size = 32 * 1024 * 1024 54 | client = PatrickStarClient(0, default_chunk_size) 55 | 56 | torch.manual_seed(0) 57 | with PSPreProcessCtx(client, dtype=torch.float): 58 | ps_model = model_provider() 59 | 60 | FP16Adam(client, ps_model.parameters()) 61 | 62 | 63 | if __name__ == "__main__": 64 | unittest.main() 65 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from setuptools import setup, find_packages 31 | from torch.utils.cpp_extension import BuildExtension 32 | from patrickstar.ops.op_builder import CPUAdamBuilder 33 | 34 | 35 | def fetch_requirements(path): 36 | with open(path, "r") as fd: 37 | return [r.strip() for r in fd.readlines()] 38 | 39 | 40 | require_list = fetch_requirements("requirements.txt") 41 | 42 | setup( 43 | name="patrickstar", 44 | version="0.4.6", 45 | description="PatrickStart library", 46 | long_description="PatrickStar: Parallel Training of Large Language Models via a Chunk-based Parameter Server", 47 | long_description_content_type="text/markdown", 48 | author="Tencent PatrickStar Team", 49 | author_email="fangjiarui123@gmail.com", 50 | url="https://fangjiarui.github.io/", 51 | install_requires=require_list, 52 | setup_requires=require_list, 53 | packages=find_packages(), 54 | include_package_data=True, 55 | classifiers=[ 56 | "Programming Language :: Python :: 3.6", 57 | "Programming Language :: Python :: 3.7", 58 | "Programming Language :: Python :: 3.8", 59 | ], 60 | license="BSD", 61 | ext_modules=[CPUAdamBuilder().builder()], 62 | cmdclass={"build_ext": BuildExtension.with_options(use_ninja=False)}, 63 | ) 64 | -------------------------------------------------------------------------------- /patrickstar/core/const.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from enum import Enum 31 | 32 | 33 | class AccessType(Enum): 34 | DATA = 1 35 | GRAD = 2 36 | 37 | 38 | class ChunkState(Enum): 39 | r"""Chunk state during training.""" 40 | FREE = 0 41 | # Chunk memory is allocated. 42 | # Tensors are used for computing. 43 | COMPUTE = 1 44 | # Holding meaningful data. 45 | HOLD = 2 46 | HOLD_AFTER_FWD = 3 47 | HOLD_AFTER_BWD = 4 48 | 49 | # Chunk memory is not allocated. 50 | RELEASED = 5 51 | 52 | 53 | class TensorState(Enum): 54 | r"""Tensor state during training 55 | 56 | Notice that this is the state of the tensor in the chunk, 57 | while `ChunkState` is the state of the whole state. 58 | """ 59 | # Can be released. 60 | FREE = 0 61 | # In computation, cannot be moved. 62 | COMPUTE = 1 63 | # Can be moved, cannot be released. 64 | HOLD = 2 65 | HOLD_AFTER_FWD = 3 66 | HOLD_AFTER_BWD = 4 67 | 68 | 69 | class TrainingStage(Enum): 70 | UNSTART = 0 71 | FWD = 1 72 | BWD = 2 73 | ADAM = 3 74 | 75 | 76 | class ChunkType(Enum): 77 | PARAM_FP16 = 0 78 | PARAM_FP32 = 1 79 | MOMENTUM = 2 80 | VARIANCE = 3 81 | UNDEF = 4 82 | 83 | 84 | class ParamType(Enum): 85 | CHUNK_BASED = 0 86 | TORCH_BASED = 1 87 | -------------------------------------------------------------------------------- /patrickstar/utils/distributed.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | import os 30 | import torch 31 | 32 | from .logging import logger 33 | 34 | 35 | def get_rank(): 36 | if torch.distributed.is_initialized(): 37 | return torch.distributed.get_rank() 38 | return 0 39 | 40 | 41 | def get_world_size(): 42 | if torch.distributed.is_initialized(): 43 | return torch.distributed.get_world_size() 44 | return 1 45 | 46 | 47 | # Use global variable to prevent changing of the environment variable 48 | # and to make sure the warning is only logged once. 49 | _local_world_size = None 50 | 51 | 52 | def get_local_world_size(): 53 | global _local_world_size 54 | if _local_world_size is None: 55 | if torch.distributed.is_initialized(): 56 | if "LOCAL_WORLD_SIZE" in os.environ: 57 | _local_world_size = int(os.environ["LOCAL_WORLD_SIZE"]) 58 | else: 59 | logger.warning( 60 | "If you are training with multiple nodes, it's recommand to " 61 | "set LOCAL_WORLD_SIZE manually to make better use of CPU memory. " 62 | "Otherwise, get_world_size() is used instead." 63 | ) 64 | _local_world_size = get_world_size() 65 | else: 66 | _local_world_size = 1 67 | return _local_world_size 68 | -------------------------------------------------------------------------------- /unitest/test_embedding_ops.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | from torch.nn import Embedding as TorchEmbedding 34 | from transformers import BertConfig 35 | 36 | from common import distributed_test 37 | from patrickstar.ops import Embedding as PSEmbedding 38 | 39 | 40 | class TestClientAccess(unittest.TestCase): 41 | def setUp(self): 42 | pass 43 | 44 | @distributed_test(world_size=[1]) 45 | def test_embedding(self): 46 | cfg = BertConfig() 47 | cfg.hidden_dropout_prob = 0 48 | test_device = torch.device("cuda:0") 49 | seq_len = 10 50 | torch.manual_seed(0) 51 | input_ids = torch.randint( 52 | low=0, 53 | high=cfg.vocab_size - 1, 54 | size=(1, seq_len), 55 | dtype=torch.long, 56 | device=test_device, 57 | ) 58 | 59 | torch.manual_seed(0) 60 | torch_embedding = TorchEmbedding(cfg.vocab_size, 64) 61 | torch.manual_seed(0) 62 | PSEmbedding.use_cpu = True 63 | ps_embedding = PSEmbedding(cfg.vocab_size, 64) 64 | 65 | res = ps_embedding(input_ids) 66 | torch_res = torch_embedding.to(test_device)(input_ids) 67 | 68 | self.assertLess(torch.max(torch_res.cpu() - res.cpu()), 1e-2) 69 | 70 | 71 | if __name__ == "__main__": 72 | unittest.main() 73 | -------------------------------------------------------------------------------- /patrickstar/core/tensor_stub.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | 32 | from patrickstar.core.const import AccessType, ParamType 33 | 34 | 35 | class TensorInfo(object): 36 | r"""The info related to certain tensor.""" 37 | 38 | def __init__( 39 | self, 40 | chunk_id: int, 41 | tensor_id: int, 42 | start_offset: int, 43 | numel: int, 44 | param: torch.nn.Parameter, 45 | access_type: AccessType, 46 | param_name="", 47 | ): 48 | self.tensor_id = tensor_id 49 | self.chunk_id = chunk_id 50 | self.start_offset = start_offset 51 | self.numel = numel 52 | self.param = param 53 | self.tensor_name = ( 54 | f"{param_name}.data" 55 | if (access_type == AccessType.DATA) 56 | else f"{param_name}.grad" 57 | ) 58 | self.access_type = access_type 59 | 60 | def __str__(self): 61 | return ( 62 | f"tensor_id: {self.tensor_id}, name: {self.tensor_name}, " 63 | f"shape: {self.param.shape}, chunk_id: {self.chunk_id}, " 64 | f"start_offset: {self.start_offset}, numel: {self.numel}, state: {self.state()}" 65 | ) 66 | 67 | def state(self): 68 | if self.param.ps_attr.param_type == ParamType.TORCH_BASED: 69 | return None 70 | else: 71 | return self.param.ps_attr.get_state(self.access_type) 72 | -------------------------------------------------------------------------------- /patrickstar/utils/memory.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from collections import namedtuple 31 | 32 | import psutil 33 | 34 | 35 | ps_mem_info = namedtuple("ps_mem_info", ["total", "free", "cached", "buffers", "used"]) 36 | 37 | 38 | def get_memory_info(): 39 | try: 40 | # psutil reads the memory info from /proc/memory_info, 41 | # which results in returning the host memory instead of 42 | # that of container. 43 | # Here we try to read the container memory with method in: 44 | # https://stackoverflow.com/a/46213331/5163915 45 | # TODO(zilinzhu) Make this robust on most OS. 46 | mems = {} 47 | with open("/sys/fs/cgroup/memory/memory.meminfo", "rb") as f: 48 | for line in f: 49 | fields = line.split() 50 | mems[fields[0]] = int(fields[1]) * 1024 51 | total = mems[b"MemTotal:"] 52 | free = mems[b"MemFree:"] 53 | cached = mems[b"Cached:"] 54 | buffers = mems[b"Buffers:"] 55 | used = total - free - cached - buffers 56 | if used < 0: 57 | used = total - free 58 | mem_info = ps_mem_info( 59 | total=total, free=free, cached=cached, buffers=buffers, used=used 60 | ) 61 | except FileNotFoundError: 62 | mems = psutil.virtual_memory() 63 | mem_info = ps_mem_info( 64 | total=mems.total, 65 | free=mems.free, 66 | cached=mems.cached, 67 | buffers=mems.buffers, 68 | used=mems.used, 69 | ) 70 | return mem_info 71 | -------------------------------------------------------------------------------- /unitest/test_utils.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | from patrickstar.utils.memory_monitor import get_sys_memory_used 34 | from patrickstar.core.memtracer.memtracer import AsyncMemoryMonitor 35 | 36 | 37 | class TestAsynMemoryMonitor(unittest.TestCase): 38 | def setUp(self): 39 | pass 40 | 41 | def helper_func(self): 42 | dev = torch.device("cuda:0") 43 | m = 400 44 | n = 500 45 | k = 600 46 | a = torch.randn(m, k, device=torch.device("cuda:0")) 47 | b = torch.randn(k, n, device=torch.device("cuda:0")) 48 | c = torch.randn(m, n, device=torch.device("cuda:0")) 49 | print(f"mem usage before matmul: {get_sys_memory_used(dev)}") 50 | start_mem = get_sys_memory_used(dev) 51 | for i in range(10): 52 | c += torch.matmul(a, b) 53 | print(f"mem usage after matmul: {get_sys_memory_used(dev)}") 54 | finish_mem = get_sys_memory_used(dev) 55 | return max(start_mem, finish_mem) 56 | 57 | def test_async_mem_monitor(self): 58 | mem_monitor = AsyncMemoryMonitor() 59 | mem_monitor.start() 60 | max_mem_coarse = self.helper_func() 61 | max_mem_fine = mem_monitor.finish() 62 | self.assertTrue(max_mem_fine >= max_mem_coarse) 63 | # max_mem fine 3760640, corse 2960384 64 | # indicates the operator will generate singnificant temp buff. 65 | print(f"max_mem fine {max_mem_fine}, corse {max_mem_coarse}") 66 | 67 | 68 | if __name__ == "__main__": 69 | unittest.main() 70 | -------------------------------------------------------------------------------- /patrickstar/ops/op_builder/cpu_adam.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | """ 31 | Copyright 2020 The Microsoft DeepSpeed Team 32 | """ 33 | import os 34 | import sys 35 | 36 | from .builder import CUDAOpBuilder 37 | 38 | 39 | class CPUAdamBuilder(CUDAOpBuilder): 40 | BUILD_VAR = "DS_BUILD_CPU_ADAM" 41 | NAME = "cpu_adam" 42 | BASE_DIR = "patrickstar/ops/csrc" 43 | 44 | def __init__(self): 45 | super().__init__(name=self.NAME) 46 | 47 | def is_compatible(self): 48 | # Disable on Windows. 49 | return sys.platform != "win32" 50 | 51 | def absolute_name(self): 52 | return f"patrickstar.ops.adam.{self.NAME}_op" 53 | 54 | def sources(self): 55 | return [ 56 | os.path.join(CPUAdamBuilder.BASE_DIR, "adam/cpu_adam.cpp"), 57 | ] 58 | 59 | def include_paths(self): 60 | import torch 61 | 62 | cuda_include = os.path.join(torch.utils.cpp_extension.CUDA_HOME, "include") 63 | return [os.path.join(CPUAdamBuilder.BASE_DIR, "includes"), cuda_include] 64 | 65 | def cxx_args(self): 66 | import torch 67 | 68 | cuda_lib64 = os.path.join(torch.utils.cpp_extension.CUDA_HOME, "lib64") 69 | cpu_arch = self.cpu_arch() 70 | simd_width = self.simd_width() 71 | 72 | return [ 73 | "-O3", 74 | "-std=c++14", 75 | f"-L{cuda_lib64}", 76 | "-lcudart", 77 | "-lcublas", 78 | "-g", 79 | "-Wno-reorder", 80 | cpu_arch, 81 | "-fopenmp", 82 | simd_width, 83 | ] 84 | -------------------------------------------------------------------------------- /unitest/test_memory_cache.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | 34 | from patrickstar.core.memory_cache import MemoryCache 35 | from patrickstar.core.memtracer import RuntimeMemTracer 36 | 37 | 38 | class TestMemoryCache(unittest.TestCase): 39 | def setUp(self): 40 | self.default_chunk_size = 40 41 | 42 | def test_case1(self): 43 | self.compute_device = ( 44 | torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") 45 | ) 46 | memtracer = RuntimeMemTracer() 47 | memory_cache = MemoryCache(2, memtracer) 48 | 49 | payload1 = memory_cache.pop_or_allocate( 50 | self.compute_device, 10, torch.float, False 51 | ) 52 | payload1_addr = payload1.data_ptr() 53 | memory_cache.push(payload1) 54 | payload2 = memory_cache.pop_or_allocate( 55 | self.compute_device, 10, torch.float, False 56 | ) 57 | self.assertTrue(payload1_addr == payload2.data_ptr()) 58 | 59 | payload3 = memory_cache.pop_or_allocate( 60 | self.compute_device, 10, torch.float, False 61 | ) 62 | self.assertTrue(payload1_addr != payload3.data_ptr()) 63 | print("payload3 ", payload3.data_ptr()) 64 | 65 | payload2_addr = payload2.data_ptr() 66 | memory_cache.push(payload2) 67 | memory_cache.push(payload3) 68 | 69 | payload4 = memory_cache.pop_or_allocate( 70 | self.compute_device, 71 | 10, 72 | torch.float, 73 | False, 74 | ) 75 | self.assertTrue(payload2_addr == payload4.data_ptr()) 76 | 77 | 78 | if __name__ == "__main__": 79 | unittest.main() 80 | -------------------------------------------------------------------------------- /unitest/test_eviction_policy.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | from patrickstar.core.eviction_policy import LatestAccessChunkEvictionPolicy 34 | from patrickstar.core.chunk_data import Chunk 35 | from patrickstar.core.memtracer import RuntimeMemTracer 36 | 37 | 38 | class TestEvictionPolicy(unittest.TestCase): 39 | def setUp(self): 40 | pass 41 | 42 | def test_chunk_eviction(self): 43 | id_to_chunk_list = {} 44 | dev = torch.device("cpu:0") 45 | mem_tracer = RuntimeMemTracer( 46 | local_rank=0, config={"use_async_mem_monitor": True} 47 | ) 48 | id_to_chunk_list[0] = Chunk(10, torch.float, 0, mem_tracer, None, 0, False) 49 | id_to_chunk_list[0].allocate_payload(dev) 50 | id_to_chunk_list[1] = Chunk(10, torch.float, 1, mem_tracer, None, 0, False) 51 | id_to_chunk_list[1].allocate_payload(dev) 52 | metronome = mem_tracer.metronome 53 | metronome.set_warmup(True) 54 | policy = LatestAccessChunkEvictionPolicy(metronome) 55 | 56 | # trace chunk access 57 | policy.trace_access(0, dev) 58 | metronome.tiktac() 59 | policy.trace_access(1, dev) 60 | print(policy.chunk_access_dict) 61 | 62 | # Finish warmup 63 | metronome.set_warmup(False) 64 | metronome.reset() 65 | 66 | # Test eviction strategy 67 | ret_list = policy.derive_eviction_list(id_to_chunk_list, 10, dev) 68 | self.assertTrue(ret_list == [0]) 69 | 70 | metronome.tiktac() 71 | ret_list = policy.derive_eviction_list(id_to_chunk_list, 10, dev) 72 | self.assertTrue(ret_list == [1]) 73 | 74 | 75 | if __name__ == "__main__": 76 | unittest.main() 77 | -------------------------------------------------------------------------------- /examples/ps_config.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | 31 | def get_patrickstar_config( 32 | args, lr=0.001, betas=(0.9, 0.999), eps=1e-6, weight_decay=0 33 | ): 34 | config = { 35 | # The same format as optimizer config of DeepSpeed 36 | # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters 37 | "optimizer": { 38 | "type": "Adam", 39 | "params": { 40 | "lr": lr, 41 | "betas": betas, 42 | "eps": eps, 43 | "weight_decay": weight_decay, 44 | "use_hybrid_adam": args.use_hybrid_adam, 45 | }, 46 | }, 47 | "fp16": { 48 | "enabled": True, 49 | # Set "loss_scale" to 0 to use DynamicLossScaler. 50 | "loss_scale": 0, 51 | "initial_scale_power": args.init_loss_scale_power, 52 | "loss_scale_window": 1000, 53 | "hysteresis": 2, 54 | "min_loss_scale": 1, 55 | }, 56 | "default_chunk_size": args.default_chunk_size, 57 | "release_after_init": args.release_after_init, 58 | "use_fake_dist": args.use_fake_dist, 59 | "use_cpu_embedding": args.use_cpu_embedding, 60 | "client": { 61 | "mem_tracer": { 62 | "use_async_mem_monitor": args.with_async_mem_monitor, 63 | "warmup_gpu_chunk_mem_ratio": 0.1, 64 | "overall_gpu_mem_ratio": 0.9, 65 | "overall_cpu_mem_ratio": 0.9, 66 | "margin_use_ratio": 0.8, 67 | "use_fake_dist": False, 68 | "with_static_partition": args.with_static_partition, 69 | }, 70 | "opts": { 71 | "with_mem_saving_comm": args.with_mem_saving_comm, 72 | "with_mem_cache": args.with_mem_cache, 73 | "with_async_move": args.with_async_move, 74 | }, 75 | }, 76 | } 77 | 78 | return config 79 | -------------------------------------------------------------------------------- /patrickstar/core/memtracer/metronome.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | from .training_stage_mgr import TrainingStageMgr 31 | 32 | 33 | class Metronome(object): 34 | """ 35 | A metronome for memory stats sampling. 36 | Use two indicators to tell us where the training is now 37 | One is moment, indicates the moment of one iteration. 38 | The other is training stage, indicates FWD/BWD/ADAM and is this iteration is 39 | a warmup iteration. 40 | 41 | It also contain the training stage information. 42 | """ 43 | 44 | def __init__(self): 45 | self._moment = 0 46 | self._total_moment = None 47 | self.training_stage_mgr = TrainingStageMgr() 48 | 49 | def set_training_phase(self, phase): 50 | self.training_stage_mgr.training_phase = phase 51 | 52 | def set_warmup(self, flag): 53 | self.training_stage_mgr.is_warmup = flag 54 | 55 | def is_warmup(self): 56 | return self.training_stage_mgr.is_warmup 57 | 58 | def training_stage(self): 59 | return self.training_stage_mgr.training_phase 60 | 61 | def get_total_mom(self): 62 | assert self._total_moment is not None, "Don not use get_total during warmup" 63 | return self._total_moment 64 | 65 | def tiktac(self): 66 | """ 67 | The function should be called right before and after computing of an operator. 68 | """ 69 | self._moment += 1 70 | 71 | def moment(self): 72 | return self._moment 73 | 74 | def reset(self): 75 | """ 76 | The function is called after a trainig iteration is finished. 77 | """ 78 | self._total_moment = self._moment 79 | self._moment = 0 80 | 81 | def next_moment(self): 82 | assert self._total_moment is not None 83 | return min(self._total_moment, self._moment + 1) % self._total_moment 84 | 85 | def prev_moment(self): 86 | assert self._total_moment is not None 87 | return max(0, self._moment - 1) % self._total_moment 88 | -------------------------------------------------------------------------------- /unitest/test_model_init.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | from transformers import BertModel, BertConfig 34 | 35 | from common import distributed_test 36 | from patrickstar.core import PatrickStarClient, ParamType 37 | from patrickstar.core.preprocess import PSPreProcessCtx 38 | 39 | 40 | class TestModelInitContext(unittest.TestCase): 41 | def setUp(self): 42 | pass 43 | 44 | @distributed_test(world_size=[2], backend="gloo", use_fake_dist=True) 45 | def test_model_init(self): 46 | def model_provider(): 47 | cfg = BertConfig() 48 | cfg.vocab_size = 10 49 | model = BertModel(cfg) 50 | return model 51 | 52 | compute_device = torch.device("cpu:0") 53 | default_chunk_size = 32 * 1024 * 1024 54 | client = PatrickStarClient(0, default_chunk_size) 55 | 56 | torch.manual_seed(0) 57 | with PSPreProcessCtx(client, dtype=torch.float, release_after_init=True): 58 | ps_model = model_provider() 59 | 60 | torch.manual_seed(0) 61 | torch_model = model_provider() 62 | 63 | for ps_param, torch_param in zip( 64 | ps_model.parameters(), torch_model.parameters() 65 | ): 66 | if ps_param.ps_attr.param_type == ParamType.TORCH_BASED: 67 | self.assertLess( 68 | torch.max(torch_param.data - ps_param), 69 | 1e-4, 70 | "PyTorch tensors are not consist with each other", 71 | ) 72 | else: 73 | ps_data = client.access_data(ps_param, compute_device) 74 | if ps_param.ps_attr.is_local(): 75 | self.assertLess( 76 | torch.max(torch_param.data - ps_data), 77 | 1e-4, 78 | f"{ps_param.ps_attr.name} ps tensor and pytorch tensor are not consist with each other", 79 | ) 80 | client.release_data(ps_param) 81 | 82 | 83 | if __name__ == "__main__": 84 | 85 | unittest.main() 86 | -------------------------------------------------------------------------------- /examples/imdb_dataset.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import os 31 | from pathlib import Path 32 | 33 | import torch 34 | from sklearn.model_selection import train_test_split 35 | from transformers import BertTokenizerFast 36 | 37 | 38 | # wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz 39 | # tar -xf aclImdb_v1.tar.gz 40 | def get_dataset(data_path): 41 | def read_imdb_split(split_dir): 42 | split_dir = Path(split_dir) 43 | texts = [] 44 | labels = [] 45 | for label_dir in ["pos", "neg"]: 46 | for text_file in (split_dir / label_dir).iterdir(): 47 | texts.append(text_file.read_text()) 48 | labels.append(0 if label_dir == "neg" else 1) 49 | 50 | return texts, labels 51 | 52 | train_texts, train_labels = read_imdb_split(os.path.join(data_path, "train")) 53 | test_texts, test_labels = read_imdb_split(os.path.join(data_path, "test")) 54 | train_texts, val_texts, train_labels, val_labels = train_test_split( 55 | train_texts, train_labels, test_size=0.2 56 | ) 57 | 58 | tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") 59 | 60 | train_encodings = tokenizer(train_texts, truncation=True, padding=True) 61 | val_encodings = tokenizer(val_texts, truncation=True, padding=True) 62 | test_encodings = tokenizer(test_texts, truncation=True, padding=True) 63 | 64 | class IMDbDataset(torch.utils.data.Dataset): 65 | def __init__(self, encodings, labels): 66 | self.encodings = encodings 67 | self.labels = labels 68 | 69 | def __getitem__(self, idx): 70 | item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()} 71 | item["labels"] = torch.tensor(self.labels[idx]) 72 | return item 73 | 74 | def __len__(self): 75 | return len(self.labels) 76 | 77 | train_dataset = IMDbDataset(train_encodings, train_labels) 78 | val_dataset = IMDbDataset(val_encodings, val_labels) 79 | test_dataset = IMDbDataset(test_encodings, test_labels) 80 | 81 | return train_dataset, val_dataset, test_dataset 82 | -------------------------------------------------------------------------------- /examples/moe/moe_bert.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | from transformers import BertLayer, BertForSequenceClassification 30 | from transformers.models.bert.modeling_bert import BertAttention 31 | 32 | from patrickstar.core import torch_scope 33 | from patrickstar.utils import logger, get_world_size 34 | 35 | try: 36 | import fmoe 37 | except ImportError: 38 | logger.error("Please install FastMoE to use MoE with PatrickStar.") 39 | 40 | 41 | def __init__(self, config): 42 | super(BertLayer, self).__init__() 43 | self.chunk_size_feed_forward = config.chunk_size_feed_forward 44 | self.seq_len_dim = 1 45 | self.attention = BertAttention(config) 46 | self.is_decoder = config.is_decoder 47 | self.add_cross_attention = config.add_cross_attention 48 | if self.add_cross_attention: 49 | assert ( 50 | self.is_decoder 51 | ), f"{self} should be used as a decoder model if cross attention is added" 52 | self.crossattention = BertAttention(config) 53 | # The MoE modules are mainly of model parallel, we need to use `torch_scope` 54 | # to separate it from the other chunk based data parallel modules. 55 | # Also, MoE modules will take cart of its own communication, that's why 56 | # we need to disable allreduce in the torch scope. 57 | with torch_scope(do_allreduce=False): 58 | self.output = fmoe.FMoETransformerMLP( 59 | num_expert=2, 60 | world_size=get_world_size(), 61 | d_model=config.hidden_size, 62 | d_hidden=config.intermediate_size, 63 | gate=fmoe.gates.NaiveGate, 64 | ) 65 | 66 | 67 | def feed_forward_chunk(self, attention_output): 68 | layer_output = self.output(attention_output) 69 | return layer_output 70 | 71 | 72 | def build_moe_bert(): 73 | # Normally you should write your own Model and create the MoE parts 74 | # in it. Here we directly substitute the origin huggingface Bert model 75 | # for simplicity. 76 | BertLayer.__init__ = __init__ 77 | BertLayer.feed_forward_chunk = feed_forward_chunk 78 | model = BertForSequenceClassification.from_pretrained("bert-base-uncased") 79 | return model 80 | -------------------------------------------------------------------------------- /patrickstar/utils/logging.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | import logging 30 | import sys 31 | from rich.logging import RichHandler 32 | 33 | import torch.distributed as dist 34 | 35 | 36 | class LoggerFactory: 37 | @staticmethod 38 | def create_logger(name=None, level=logging.WARNING): 39 | """create a logger 40 | Args: 41 | name (str): name of the logger 42 | level: level of logger 43 | Raises: 44 | ValueError is name is None 45 | """ 46 | 47 | if name is None: 48 | raise ValueError("name for logger cannot be None") 49 | 50 | # formatter = logging.Formatter( 51 | # "[%(asctime)s] [%(levelname)s] " 52 | # "[%(filename)s:%(lineno)d:%(funcName)s] %(message)s") 53 | 54 | formatter = logging.Formatter("[%(asctime)s] [%(levelname)s] %(message)s") 55 | 56 | logger_ = logging.getLogger(name) 57 | logger_.setLevel(level) 58 | logger_.propagate = False 59 | ch = logging.StreamHandler(stream=sys.stdout) 60 | ch.setFormatter(formatter) 61 | logger_.addHandler(RichHandler()) 62 | return logger_ 63 | 64 | 65 | logger = LoggerFactory.create_logger(name="PatrickStar", level=logging.WARNING) 66 | 67 | 68 | def log_dist(message, ranks=[0], level=logging.INFO): 69 | """Log message when one of following condition meets 70 | + not dist.is_initialized() 71 | + dist.get_rank() in ranks if ranks is not None or ranks = [-1] 72 | Args: 73 | message (str) 74 | ranks (list) 75 | level (int) 76 | """ 77 | should_log = not dist.is_initialized() 78 | ranks = ranks or [] 79 | my_rank = dist.get_rank() if dist.is_initialized() else -1 80 | if ranks and not should_log: 81 | should_log = ranks[0] == -1 82 | should_log = should_log or (my_rank in set(ranks)) 83 | if should_log: 84 | final_message = "[Rank {}] {}".format(my_rank, message) 85 | logger.log(level, final_message) 86 | 87 | 88 | def print_rank(message, rank=0, debug=False, force=False): 89 | if (not dist.is_initialized() or dist.get_rank() == rank) and (debug or force): 90 | logger.info(message) 91 | -------------------------------------------------------------------------------- /examples/benchmark/generate_res_table.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import os 31 | import collections 32 | from process_logs import collect_info_from_dir 33 | 34 | if __name__ == "__main__": 35 | overall_res_dict = {} 36 | overall_file_dict = {} 37 | for file in os.listdir("./"): 38 | if os.path.isdir(file) and "logs" in file: 39 | res_dict, file_dict = collect_info_from_dir(file) 40 | overall_res_dict.update(res_dict) 41 | overall_file_dict.update(file_dict) 42 | 43 | detail_res_table = {} 44 | best_res_table = {} 45 | pos = {1: 0, 2: 2, 4: 4, 8: 6} 46 | 47 | for k, v in overall_res_dict.items(): 48 | plan = k.split("_") 49 | model_scale = plan[0] 50 | bs = plan[1] 51 | gpu_num = int(plan[2]) 52 | key = (model_scale, bs) 53 | if key not in detail_res_table: 54 | detail_res_table[key] = [None for i in range(8)] 55 | 56 | filename = overall_file_dict[k] 57 | detail_res_table[key][pos[gpu_num]] = v * gpu_num 58 | detail_res_table[key][pos[gpu_num] + 1] = filename 59 | 60 | if model_scale not in best_res_table: 61 | best_res_table[model_scale] = [0 for i in range(8)] 62 | if v * gpu_num > best_res_table[model_scale][pos[gpu_num]]: 63 | best_res_table[model_scale][pos[gpu_num]] = v * gpu_num 64 | best_res_table[model_scale][pos[gpu_num] + 1] = bs # filename 65 | 66 | od = collections.OrderedDict(sorted(detail_res_table.items())) 67 | with open("benchmark_res.csv", "w") as wfh: 68 | for k, v in od.items(): 69 | for item in k: 70 | wfh.write(str(item)) 71 | wfh.write(",") 72 | for item in v: 73 | wfh.write(str(item)) 74 | wfh.write(",") 75 | wfh.write("\n") 76 | 77 | od = collections.OrderedDict(sorted(best_res_table.items())) 78 | 79 | with open("best_res.csv", "w") as wfh: 80 | for k, v in od.items(): 81 | wfh.write(str(k)) 82 | wfh.write(",") 83 | for item in v: 84 | wfh.write(str(item)) 85 | wfh.write(",") 86 | wfh.write("\n") 87 | -------------------------------------------------------------------------------- /patrickstar/profiler/profiler.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import pickle 31 | import time 32 | 33 | from patrickstar.utils import SingletonMeta 34 | 35 | 36 | class Profiler(metaclass=SingletonMeta): 37 | def __init__(self): 38 | self._nested_level = 0 39 | self.start_time = None 40 | self.warmup_finish_time = None 41 | self.end_time = None 42 | # memory info 43 | # [(moment, time, memory)] 44 | self.gpu_memory_used = [] 45 | self.gpu_chunk_memory_used = [] 46 | self.cpu_memory_used = [] 47 | self.cpu_chunk_memory_used = [] 48 | # stage info 49 | # [(time, stage_converted)] 50 | self.stage_convert_time = [] 51 | # chunk info 52 | # {chunk_id: 53 | # "type": type, 54 | # "life_cycle": [(time, type, to_device)]} 55 | self.chunk_life_cycle = {} 56 | 57 | def start(self): 58 | if self.start_time is None: 59 | self.start_time = time.time() 60 | self._nested_level += 1 61 | 62 | def end(self): 63 | self._nested_level = max(0, self._nested_level - 1) 64 | if self._nested_level == 0: 65 | self.end_time = time.time() 66 | 67 | def started(self): 68 | return self._nested_level > 0 69 | 70 | def warmup_finish(self): 71 | if self.warmup_finish_time is None: 72 | self.warmup_finish_time = time.time() 73 | 74 | def state_dict(self): 75 | return { 76 | "start_time": self.start_time, 77 | "end_time": self.end_time if self.end_time is not None else time.time(), 78 | "warmup_finish_time": self.warmup_finish_time, 79 | "gpu_memory_used": self.gpu_memory_used, 80 | "gpu_chunk_memory_used": self.gpu_chunk_memory_used, 81 | "cpu_memory_used": self.cpu_memory_used, 82 | "cpu_chunk_memory_used": self.cpu_chunk_memory_used, 83 | "stage_convert_time": self.stage_convert_time, 84 | "chunk_life_cycle": self.chunk_life_cycle, 85 | } 86 | 87 | def save(self, filename): 88 | with open(filename, "wb") as f: 89 | pickle.dump(self.state_dict(), f) 90 | 91 | 92 | profiler = Profiler() 93 | -------------------------------------------------------------------------------- /patrickstar/ops/csrc/includes/context.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include "cublas_v2.h" 9 | #include "cuda.h" 10 | #include "curand.h" 11 | 12 | #define WARP_SIZE 32 13 | 14 | #define CUDA_CHECK(callstr) \ 15 | { \ 16 | cudaError_t error_code = callstr; \ 17 | if (error_code != cudaSuccess) { \ 18 | std::cerr << "CUDA error " << error_code << " at " << __FILE__ << ":" << __LINE__; \ 19 | assert(0); \ 20 | } \ 21 | } 22 | 23 | #define CUDA_1D_KERNEL_LOOP(i, n) \ 24 | for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x) 25 | 26 | #define CUDA_2D_KERNEL_LOOP(i, n, j, m) \ 27 | for (size_t i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); i += blockDim.x * gridDim.x) \ 28 | for (size_t j = blockIdx.y * blockDim.y + threadIdx.y; j < (m); j += blockDim.y * gridDim.y) 29 | 30 | #define DS_CUDA_NUM_THREADS 512 31 | #define DS_MAXIMUM_NUM_BLOCKS 262144 32 | 33 | inline int DS_GET_BLOCKS(const int N) 34 | { 35 | return (std::max)( 36 | (std::min)((N + DS_CUDA_NUM_THREADS - 1) / DS_CUDA_NUM_THREADS, DS_MAXIMUM_NUM_BLOCKS), 37 | // Use at least 1 block, since CUDA does not allow empty block 38 | 1); 39 | } 40 | 41 | class Context { 42 | public: 43 | Context() : _workspace(nullptr), _seed(42), _curr_offset(0) 44 | { 45 | curandCreateGenerator(&_gen, CURAND_RNG_PSEUDO_DEFAULT); 46 | curandSetPseudoRandomGeneratorSeed(_gen, 123); 47 | if (cublasCreate(&_cublasHandle) != CUBLAS_STATUS_SUCCESS) { 48 | auto message = std::string("Fail to create cublas handle."); 49 | std::cerr << message << std::endl; 50 | throw std::runtime_error(message); 51 | } 52 | } 53 | 54 | virtual ~Context() 55 | { 56 | cublasDestroy(_cublasHandle); 57 | cudaFree(_workspace); 58 | } 59 | 60 | static Context& Instance() 61 | { 62 | static Context _ctx; 63 | return _ctx; 64 | } 65 | 66 | void SetWorkSpace(void* workspace) 67 | { 68 | if (!workspace) { throw std::runtime_error("Workspace is null."); } 69 | _workspace = workspace; 70 | } 71 | 72 | void* GetWorkSpace() { return _workspace; } 73 | 74 | curandGenerator_t& GetRandGenerator() { return _gen; } 75 | 76 | cudaStream_t GetCurrentStream() 77 | { 78 | // get current pytorch stream. 79 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 80 | return stream; 81 | } 82 | 83 | cudaStream_t GetNewStream() { return at::cuda::getStreamFromPool(); } 84 | 85 | cublasHandle_t GetCublasHandle() { return _cublasHandle; } 86 | 87 | std::pair IncrementOffset(uint64_t offset_inc) 88 | { 89 | uint64_t offset = _curr_offset; 90 | _curr_offset += offset_inc; 91 | return std::pair(_seed, offset); 92 | } 93 | 94 | void SetSeed(uint64_t new_seed) { _seed = new_seed; } 95 | 96 | const std::vector>& GetGemmAlgos() const { return _gemm_algos; } 97 | 98 | private: 99 | curandGenerator_t _gen; 100 | cublasHandle_t _cublasHandle; 101 | void* _workspace; 102 | uint64_t _seed; 103 | uint64_t _curr_offset; 104 | std::vector> _gemm_algos; 105 | }; 106 | -------------------------------------------------------------------------------- /patrickstar/utils/memory_monitor.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import gc 31 | 32 | import psutil 33 | import torch 34 | from .distributed import get_rank, get_local_world_size 35 | from .memory import get_memory_info 36 | 37 | 38 | def get_sys_memory_used(device): 39 | """ 40 | Get the free memory info of device. 41 | Notice that for CPU, this function will return 1/N of the total free memory, 42 | where N is the world size. 43 | """ 44 | if device.type == "cuda": 45 | ret = torch.cuda.memory_allocated() 46 | # get the peak memory to report correct data, so reset the counter for the next call 47 | if hasattr(torch.cuda, "reset_peak_memory_stats"): # pytorch 1.4+ 48 | torch.cuda.reset_peak_memory_stats() 49 | elif device.type == "cpu": 50 | mem_info = get_memory_info() 51 | ret = mem_info.used / get_local_world_size() 52 | return ret 53 | 54 | 55 | def see_memory_usage(message, force=False, scale_name="MB"): 56 | if not force: 57 | return 58 | if not get_rank() == 0: 59 | return 60 | 61 | # python doesn't do real-time garbage collection so do it explicitly to get the correct RAM reports 62 | gc.collect() 63 | 64 | if scale_name == "MB": 65 | scale = 1024 * 1024 66 | elif scale_name == "B": 67 | scale = 1 68 | # Print message except when distributed but not rank 0 69 | print(message) 70 | print( 71 | f"MA {round(torch.cuda.memory_allocated() / scale, 2)} {scale_name} \ 72 | Max_MA {round(torch.cuda.max_memory_allocated() / scale, 2)} {scale_name} \ 73 | CA {round(torch.cuda.memory_reserved() / scale, 2)} {scale_name} \ 74 | Max_CA {round(torch.cuda.max_memory_reserved() / scale)} {scale_name} " 75 | ) 76 | 77 | # TODO(zilinzhu) Find how to get the available and percent value of the 78 | # memory in docker to substitute psutil.virtual_memory to get_memory_info. 79 | vm_stats = psutil.virtual_memory() 80 | used_gb = round(((vm_stats.total - vm_stats.available) / (1024 ** 3)), 2) 81 | print(f"CPU Virtual Memory: used = {used_gb} GB, percent = {vm_stats.percent}%") 82 | 83 | # get the peak memory to report correct data, so reset the counter for the next call 84 | if hasattr(torch.cuda, "reset_peak_memory_stats"): # pytorch 1.4+ 85 | torch.cuda.reset_peak_memory_stats() 86 | -------------------------------------------------------------------------------- /patrickstar/ops/embedding.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | import torch.nn as nn 32 | 33 | from patrickstar.utils import logger 34 | 35 | 36 | class _CopyInputToCPU(torch.autograd.Function): 37 | @staticmethod 38 | def symbolic(graph, input_): 39 | return input_.to(torch.device("cpu:0")) 40 | 41 | @staticmethod 42 | def forward(ctx, input_): 43 | logger.debug(f"Copy input to cpu and {input_.dtype}.") 44 | return input_.to(torch.device("cpu:0")) 45 | 46 | @staticmethod 47 | def backward(ctx, grad_output): 48 | target_device = torch.device(f"cuda:{torch.cuda.current_device()}") 49 | logger.debug("Copy grad_output to cuda.") 50 | return grad_output.to(target_device) 51 | 52 | 53 | class _CopyActToGPU(torch.autograd.Function): 54 | @staticmethod 55 | def symbolic(graph, input_): 56 | target_device = torch.device(f"cuda:{torch.cuda.current_device()}") 57 | 58 | return input_.to(target_device) 59 | 60 | @staticmethod 61 | def forward(ctx, input_): 62 | target_device = torch.device(f"cuda:{torch.cuda.current_device()}") 63 | 64 | logger.debug(f"Copy grad_output to cuda, input dtype {input_.dtype}.") 65 | return input_.to(target_device) 66 | 67 | @staticmethod 68 | def backward(ctx, grad_output): 69 | return grad_output.to(torch.device("cpu:0")).float() 70 | 71 | 72 | def copy_to_cpu(input_): 73 | return _CopyInputToCPU.apply(input_) 74 | 75 | 76 | def copy_to_gpu(input_): 77 | return _CopyActToGPU.apply(input_) 78 | 79 | 80 | class Embedding(nn.Embedding): 81 | r"""CPU Embedding. 82 | 83 | If `use_cpu` is set, the embedding operations will 84 | be performed on CPU. 85 | """ 86 | use_cpu = False 87 | # `instances` is a helper class static member for 88 | # preprocess context. For detail, see comments there. 89 | instances = [] 90 | 91 | def __init__(self, *args, **kwargs): 92 | super().__init__(*args, **kwargs) 93 | self.use_cpu = Embedding.use_cpu 94 | Embedding.instances.append(self) 95 | 96 | def forward(self, input_): 97 | if self.use_cpu: 98 | input_ = copy_to_cpu(input_) 99 | else: 100 | input_ = copy_to_gpu(input_) 101 | output = super().forward(input_) 102 | if self.use_cpu: 103 | output = copy_to_gpu(output) 104 | return output.to(torch.half) 105 | -------------------------------------------------------------------------------- /examples/moe/huggingface_bert_moe.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | import argparse 30 | from tqdm import tqdm 31 | 32 | import torch 33 | from torch.utils.data import DataLoader 34 | 35 | from patrickstar.runtime import initialize_engine 36 | from patrickstar.utils import get_rank 37 | 38 | from examples.imdb_dataset import get_dataset 39 | from moe_bert import build_moe_bert 40 | 41 | parser = argparse.ArgumentParser() 42 | parser.add_argument("--type", dest="type", type=str, choices=["patrickstar", "torch"]) 43 | parser.add_argument("--local_rank", dest="local_rank", type=int, default=None) 44 | args = parser.parse_args() 45 | 46 | torch.distributed.init_process_group(backend="nccl") 47 | torch.cuda.set_device(get_rank()) 48 | 49 | train_dataset, _, test_dataset = get_dataset("/root/aclImdb") 50 | 51 | device = ( 52 | torch.device(f"cuda:{get_rank()}") 53 | if torch.cuda.is_available() 54 | else torch.device("cpu") 55 | ) 56 | 57 | if args.type == "patrickstar": 58 | 59 | def model_func(): 60 | return build_moe_bert() 61 | 62 | config = { 63 | # The same format as optimizer config of DeepSpeed 64 | # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters 65 | "optimizer": { 66 | "type": "Adam", 67 | "params": { 68 | "lr": 5e-5, 69 | "betas": (0.9, 0.999), 70 | "eps": 1e-6, 71 | "weight_decay": 0, 72 | "use_hybrid_adam": True, 73 | }, 74 | }, 75 | "default_chunk_size": 64 * 1024 * 1024, 76 | "release_after_init": True, 77 | "use_cpu_embedding": False, 78 | } 79 | 80 | model, optim = initialize_engine( 81 | model_func=model_func, local_rank=args.local_rank, config=config 82 | ) 83 | else: 84 | model = build_moe_bert() 85 | optim = torch.optim.Adam(model.parameters(), lr=5e-5) 86 | model.cuda() 87 | 88 | 89 | train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True) 90 | 91 | for batch in tqdm(train_loader): 92 | optim.zero_grad() 93 | input_ids = batch["input_ids"].to(device) 94 | attention_mask = batch["attention_mask"].to(device) 95 | labels = batch["labels"].to(device) 96 | outputs = model(input_ids, attention_mask=attention_mask, labels=labels) 97 | loss = outputs[0] 98 | if args.type == "patrickstar": 99 | model.backward(loss) 100 | else: 101 | loss.backward() 102 | optim.step() 103 | -------------------------------------------------------------------------------- /examples/train_simple_net.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import logging 31 | import torch 32 | 33 | from patrickstar.runtime import initialize_engine 34 | from patrickstar.utils import logger 35 | 36 | from simple_net import SimpleModel, get_bert_data_loader 37 | 38 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 39 | 40 | BATCH_SIZE = 8 41 | HIDDEN_DIM = 4 42 | SEQ_LEN = 128 43 | 44 | 45 | def model_func(): 46 | return SimpleModel( 47 | hidden_dim=HIDDEN_DIM, seq_len=SEQ_LEN, is_ckp=True, is_share_param=True 48 | ) 49 | 50 | 51 | LR = 5e-5 52 | BETAS = (0.9, 0.999) 53 | EPS = 1e-6 54 | WEIGHT_DECAY = 0 55 | 56 | # TEST_CASE = "torch" 57 | TEST_CASE = "patrickstar" 58 | logger.setLevel(logging.WARNING) 59 | print(f"TEST_CASE {TEST_CASE}") 60 | config = { 61 | # The same format as optimizer config of DeepSpeed 62 | # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters 63 | "optimizer": { 64 | "type": "Adam", 65 | "params": { 66 | "lr": LR, 67 | "betas": BETAS, 68 | "eps": EPS, 69 | "weight_decay": WEIGHT_DECAY, 70 | "use_hybrid_adam": True, 71 | }, 72 | }, 73 | "fp16": { 74 | "enabled": True, 75 | "loss_scale": 0, 76 | "initial_scale_power": 2 ** 3, 77 | "loss_scale_window": 1000, 78 | "hysteresis": 2, 79 | "min_loss_scale": 1, 80 | }, 81 | "default_chunk_size": 1024, 82 | "use_fake_dist": False, 83 | "use_cpu_embedding": False, 84 | } 85 | 86 | torch.manual_seed(0) 87 | if TEST_CASE == "patrickstar": 88 | model, optim = initialize_engine(model_func=model_func, local_rank=0, config=config) 89 | elif TEST_CASE == "torch": 90 | model = model_func() 91 | optim = torch.optim.Adam( 92 | model.parameters(), LR=LR, BETAS=BETAS, EPS=EPS, WEIGHT_DECAY=WEIGHT_DECAY 93 | ) 94 | model.cuda() 95 | else: 96 | raise RuntimeError 97 | 98 | train_loader = get_bert_data_loader(BATCH_SIZE, 10000, 128, device, False) 99 | 100 | for epoch in range(3): 101 | for i, batch in enumerate(train_loader): 102 | optim.zero_grad() 103 | input_ids, labels = batch 104 | loss = model(input_ids, labels) 105 | if TEST_CASE == "patrickstar": 106 | model.backward(loss) 107 | optim.step() 108 | elif TEST_CASE == "torch": 109 | loss.backward() 110 | optim.zero_grad() 111 | optim.step() 112 | print(i, loss.item()) 113 | if i == 10: 114 | exit() 115 | 116 | model.eval() 117 | -------------------------------------------------------------------------------- /unitest/test_chunk_list.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | 34 | from common import distributed_test 35 | from patrickstar.core import ChunkList, ChunkState, ChunkType 36 | from patrickstar.core.eviction_policy import LatestAccessChunkEvictionPolicy 37 | from patrickstar.core.memtracer import RuntimeMemTracer 38 | 39 | 40 | class TestChunkData(unittest.TestCase): 41 | def setUp(self): 42 | self.memtracer = RuntimeMemTracer() 43 | self.policy = LatestAccessChunkEvictionPolicy(self.memtracer.metronome) 44 | 45 | @distributed_test(world_size=[1]) 46 | def test_add_chunk(self): 47 | self.memtracer.metronome.set_warmup(False) 48 | chunk_list = ChunkList(0, self.memtracer, self.policy) 49 | assert chunk_list.size() == 0 50 | 51 | chunk_list.new_chunk( 52 | chunk_id=0, 53 | chunk_size=20, 54 | data_type=torch.float, 55 | is_dummy=False, 56 | chunk_type=ChunkType.PARAM_FP32, 57 | ) 58 | 59 | assert chunk_list.size() == 1 60 | assert chunk_list[0].get_state() == ChunkState.RELEASED 61 | 62 | @distributed_test(world_size=[1], use_fake_dist=True) 63 | def test_new_chunk(self): 64 | compute_device = ( 65 | torch.device(f"cuda:{torch.cuda.current_device()}") 66 | if torch.cuda.is_available() 67 | else torch.device("cpu:0") 68 | ) 69 | self.memtracer.metronome.set_warmup(False) 70 | chunk_list = ChunkList(0, self.memtracer, self.policy) 71 | 72 | new_chunk_id = 123 73 | chunk_list.new_chunk( 74 | chunk_id=new_chunk_id, 75 | chunk_size=20, 76 | data_type=torch.float, 77 | is_dummy=False, 78 | chunk_type=ChunkType.PARAM_FP32, 79 | ) 80 | chunk_list.access_chunk(new_chunk_id, compute_device) 81 | 82 | assert chunk_list[new_chunk_id].get_state() == ChunkState.FREE 83 | 84 | self.assertEqual( 85 | chunk_list.last_chunk_id(ChunkType.PARAM_FP32), 86 | new_chunk_id, 87 | "check last_chunk_id", 88 | ) 89 | 90 | chunk_list.new_chunk( 91 | chunk_id=1, 92 | chunk_size=20, 93 | data_type=torch.float, 94 | is_dummy=False, 95 | chunk_type=ChunkType.PARAM_FP32, 96 | ) 97 | 98 | self.assertEqual(chunk_list.size(), 2) 99 | 100 | self.assertEqual( 101 | chunk_list.last_chunk_id(ChunkType.PARAM_FP32), 1, "check last_chunk_id" 102 | ) 103 | 104 | 105 | if __name__ == "__main__": 106 | unittest.main() 107 | -------------------------------------------------------------------------------- /patrickstar/core/torch_profiler_hook.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | 31 | import time 32 | import torch 33 | 34 | from patrickstar.core.hook import ( 35 | PreBackwardFunction, 36 | PostBackwardFunction, 37 | _apply_to_tensors_only, 38 | ) 39 | from patrickstar.utils import get_sys_memory_used, logger 40 | from patrickstar.profiler import profiler 41 | 42 | 43 | def _cur_mem_usage(): 44 | """ 45 | A function used to sample memory usage at the moment 46 | before and after an operator sharted and finished. 47 | """ 48 | dev = torch.device(f"cuda:{torch.cuda.current_device()}") 49 | gpu_mem_used = get_sys_memory_used(dev) 50 | return gpu_mem_used 51 | 52 | 53 | def _record_mem_stats(): 54 | """ 55 | Record memory statistics at this moment for the profiler. 56 | """ 57 | mem_cur_mon = _cur_mem_usage() 58 | profiler.gpu_memory_used.append((None, time.time(), mem_cur_mon)) 59 | 60 | 61 | def _register_hooks_recursively(module, name=""): 62 | r"""Register hook in post order traverse.""" 63 | 64 | for child_name, child in module.named_children(): 65 | logger.debug(f"{child.__class__.__name__}") 66 | _register_hooks_recursively(child, name + child_name) 67 | 68 | # Early return on modules with no parameters or buffers that 69 | # are not in their children. 70 | if ( 71 | len(list(module.named_parameters(recurse=False))) == 0 72 | and len(list(module.named_buffers(recurse=False))) == 0 73 | ): 74 | return 75 | 76 | def _pre_post_forward_module_hook(module, *args): 77 | _record_mem_stats() 78 | 79 | # The hook can modify the output 80 | def _pre_backward_module_hook(module, inputs, output): 81 | def _run_before_backward_function(sub_module): 82 | _record_mem_stats() 83 | 84 | return _apply_to_tensors_only( 85 | module, PreBackwardFunction, _run_before_backward_function, output 86 | ) 87 | 88 | def _post_backward_module_hook(module, inputs): 89 | def _run_after_backward_function(sub_module): 90 | _record_mem_stats() 91 | 92 | return _apply_to_tensors_only( 93 | module, PostBackwardFunction, _run_after_backward_function, inputs 94 | ) 95 | 96 | module.register_forward_pre_hook(_pre_post_forward_module_hook) 97 | module.register_forward_hook(_pre_post_forward_module_hook) 98 | 99 | module.register_forward_hook(_pre_backward_module_hook) 100 | module.register_forward_pre_hook(_post_backward_module_hook) 101 | 102 | 103 | def register_torch_profiler_hook(module): 104 | """ 105 | Collect activation statistis during training. 106 | """ 107 | _register_hooks_recursively(module) 108 | -------------------------------------------------------------------------------- /patrickstar/runtime/__init__.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | from patrickstar.core import PSPreProcessCtx, PatrickStarClient 32 | from patrickstar.core.memtracer import RuntimeMemTracer 33 | from patrickstar.utils import logger, log_dist 34 | from .engine import PatrickStarEngine 35 | import time 36 | 37 | DEFAULT_CHUNK_SIZE = 32 * 1024 * 1024 38 | 39 | 40 | def initialize_engine(model_func, local_rank, config=None, client=None): 41 | """Initialize the PatrickStar Engine. 42 | Arguments: 43 | model_func: Required: nn.module class before apply any wrappers 44 | client: Required: PatrickStarClient for orchestrating chunks. 45 | config: Optional: config json for optimizer. 46 | Returns: 47 | A tuple of ``engine`` and ``optimizer`` 48 | * ``engine``: PatrickStar runtime engine which wraps the client model for distributed training. 49 | * ``optimizer``: Wrapped optimizer if a user defined ``optimizer`` is supplied, or if 50 | optimizer is specified in json config else ``None``. 51 | """ 52 | if isinstance(model_func, torch.nn.Module): 53 | logger.debug( 54 | "Passing nn.Module into initialize_engine. " 55 | "Make sure you have intialized the model within PSPreProcessCtx" 56 | ) 57 | assert client is not None, "Must pass the client when passing a nn.Module." 58 | model = model_func 59 | else: 60 | assert callable(model_func), "model_func need to be callable." 61 | 62 | if config is None: 63 | default_chunk_size = DEFAULT_CHUNK_SIZE 64 | release_after_init = False 65 | use_cpu_embedding = True 66 | else: 67 | default_chunk_size = config.get("default_chunk_size", DEFAULT_CHUNK_SIZE) 68 | release_after_init = config.get("release_after_init", False) 69 | use_cpu_embedding = config.get("use_cpu_embedding", True) 70 | 71 | client = PatrickStarClient( 72 | rank=local_rank, 73 | default_chunk_size=default_chunk_size, 74 | config=config.get("client", None), 75 | ) 76 | 77 | start_time = time.time() 78 | log_dist("begin initialize the model parameters...") 79 | with PSPreProcessCtx( 80 | client=client, 81 | dtype=torch.float, 82 | release_after_init=release_after_init, 83 | use_cpu_embedding=use_cpu_embedding, 84 | ): 85 | model = model_func() 86 | end_time = time.time() 87 | log_dist( 88 | f"finished initialized the model parameters... {end_time - start_time} s" 89 | ) 90 | 91 | engine = PatrickStarEngine(model=model, client=client, config=config) 92 | client.start_mem_tracer() 93 | return (engine, engine.optimizer) 94 | -------------------------------------------------------------------------------- /examples/huggingface_bert.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | from torch.utils.data import DataLoader 32 | from transformers import BertForSequenceClassification 33 | 34 | from patrickstar.runtime import initialize_engine 35 | from patrickstar.utils import get_rank 36 | 37 | from imdb_dataset import get_dataset 38 | 39 | 40 | # Uncomment these lines when doing multiprocess training 41 | # torch.distributed.init_process_group(backend='nccl') 42 | # torch.cuda.set_device(get_rank()) 43 | 44 | train_dataset, _, test_dataset = get_dataset("/root/aclImdb") 45 | 46 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 47 | 48 | 49 | def model_func(): 50 | model = BertForSequenceClassification.from_pretrained("bert-base-uncased") 51 | # For large models, please uncomment the following lines to utilize gradient checkpointing 52 | # model.gradient_checkpointing_enable() 53 | return model 54 | 55 | 56 | config = { 57 | # The same format as optimizer config of DeepSpeed 58 | # https://www.deepspeed.ai/docs/config-json/#optimizer-parameters 59 | "optimizer": { 60 | "type": "Adam", 61 | "params": { 62 | "lr": 5e-5, 63 | "betas": (0.9, 0.999), 64 | "eps": 1e-6, 65 | "weight_decay": 0, 66 | "use_hybrid_adam": True, 67 | }, 68 | }, 69 | "fp16": { 70 | "enabled": True, 71 | "loss_scale": 0, 72 | "initial_scale_power": 2 ** 3, 73 | "loss_scale_window": 1000, 74 | "hysteresis": 2, 75 | "min_loss_scale": 1, 76 | }, 77 | "default_chunk_size": 64 * 1024 * 1024, 78 | "release_after_init": False, 79 | "use_cpu_embedding": False, 80 | } 81 | 82 | model, optim = initialize_engine( 83 | model_func=model_func, local_rank=get_rank(), config=config 84 | ) 85 | 86 | train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True) 87 | 88 | print("train loss:") 89 | 90 | for i, batch in enumerate(train_loader): 91 | optim.zero_grad() 92 | input_ids = batch["input_ids"].to(device) 93 | attention_mask = batch["attention_mask"].to(device) 94 | labels = batch["labels"].to(device) 95 | outputs = model(input_ids, attention_mask=attention_mask, labels=labels) 96 | loss = outputs[0] 97 | model.backward(loss) 98 | optim.step() 99 | print(i, loss.item()) 100 | if i == 10: 101 | break 102 | 103 | model.eval() 104 | 105 | print("test loss:") 106 | 107 | test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False) 108 | for i, batch in enumerate(test_loader): 109 | input_ids = batch["input_ids"].to(device) 110 | attention_mask = batch["attention_mask"].to(device) 111 | labels = batch["labels"].to(device) 112 | outputs = model(input_ids, attention_mask=attention_mask, labels=labels) 113 | loss = outputs[0] 114 | print(i, loss.item()) 115 | if i == 5: 116 | break 117 | -------------------------------------------------------------------------------- /examples/benchmark/process_logs.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import os 31 | import sys 32 | import numpy as np 33 | from scipy.stats import t 34 | 35 | 36 | def is_run_this_file(path, file, res_dict, file_dict): 37 | """ 38 | collect throughput performance form log 39 | and update res_dict and file_dict 40 | ret: is_run, whether to run the script 41 | """ 42 | model_name = "" 43 | gpu_num = 0 44 | bs = 0 45 | 46 | # if the file is not exist. 47 | # do not execute training 48 | if not os.path.isfile(path + "/" + file): 49 | return True 50 | 51 | f = open(path + "/" + file) 52 | is_run = True 53 | 54 | perf_list = np.array([]) 55 | if not os.path.isdir(file): 56 | fn_list = file.split(".")[1].split("_") 57 | for i in range(len(fn_list)): 58 | if "gpu" in fn_list[i]: 59 | model_name = fn_list[i - 1] 60 | gpu_num = fn_list[i + 1] 61 | elif "bs" == fn_list[i]: 62 | bs = fn_list[i + 1] 63 | key = model_name + "_" + bs + "_" + gpu_num 64 | iter_f = iter(f) 65 | for line in iter_f: 66 | if "Tflops" in line and "WARM" not in line: 67 | sline = line.split() 68 | perf = float(sline[-2]) 69 | 70 | perf_list = np.append(perf_list, perf) 71 | 72 | is_run = False 73 | if "RuntimeError" in line: 74 | return False 75 | 76 | if len(perf_list) == 0: 77 | return False 78 | 79 | # calculate CI of perf_list 80 | perf_list = perf_list[1:-1] 81 | m = perf_list.mean() 82 | s = perf_list.std() 83 | dof = len(perf_list) - 1 84 | confidence = 0.95 85 | t_crit = np.abs(t.ppf((1 - confidence) / 2, dof)) 86 | ic_perf = ( 87 | -s * t_crit / np.sqrt(len(perf_list)), 88 | +s * t_crit / np.sqrt(len(perf_list)), 89 | ) 90 | 91 | res_dict[key] = (*ic_perf, m) 92 | file_dict[key] = file 93 | 94 | return is_run 95 | 96 | 97 | def collect_info_from_dir(path): 98 | res_dict = {} 99 | file_dict = {} 100 | files = os.listdir(path) 101 | for file in files: 102 | is_run_this_file(path, file, res_dict, file_dict) 103 | print("process ", path) 104 | return res_dict, file_dict 105 | 106 | 107 | if __name__ == "__main__": 108 | res_dict = {} 109 | file_dict = {} 110 | if len(sys.argv) > 1: 111 | PATH = str(sys.argv[1]) 112 | else: 113 | PATH = "./logs_GPT2small" 114 | files = os.listdir(PATH) 115 | res_dict, file_dict = collect_info_from_dir(PATH) 116 | new_res_list = [] 117 | for k, v in res_dict.items(): 118 | plan = k.split("_") 119 | # model_name, bs, gpu_num, best perf, file 120 | new_res_list.append((plan[0], plan[1], plan[2], v, file_dict[k])) 121 | 122 | new_res_list.sort() 123 | for elem in new_res_list: 124 | print(elem) 125 | -------------------------------------------------------------------------------- /examples/optimizations/ls_hf_transformer_encoder_layer.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | try: 31 | from lightseq.training.ops.pytorch.transformer_encoder_layer import ( 32 | LSTransformerEncoderLayer, 33 | ) 34 | except ImportError: 35 | raise RuntimeError("pip install lightseq first!") 36 | 37 | 38 | class LSHFTransformerEncoderLayer(LSTransformerEncoderLayer): 39 | def __init__(self, *args, **kwargs): 40 | super(LSHFTransformerEncoderLayer, self).__init__(*args, **kwargs) 41 | 42 | def forward(self, hidden_states, encoder_padding_mask, *args, **kwargs): 43 | encoder_padding_mask /= -10000.0 44 | encoder_padding_mask = encoder_padding_mask.squeeze() 45 | output = super().forward(hidden_states, encoder_padding_mask) 46 | return (output, None, None, None) 47 | 48 | 49 | def gen_bert_config(training_args, config): 50 | bert_config = LSTransformerEncoderLayer.get_config( 51 | max_batch_tokens=4096, 52 | max_seq_len=config.max_position_embeddings, 53 | hidden_size=config.hidden_size, 54 | intermediate_size=config.intermediate_size, 55 | nhead=config.num_attention_heads, 56 | attn_prob_dropout_ratio=config.attention_probs_dropout_prob, 57 | activation_dropout_ratio=config.hidden_dropout_prob, 58 | hidden_dropout_ratio=config.hidden_dropout_prob, 59 | pre_layer_norm=False, 60 | fp16=training_args.use_fp16, 61 | local_rank=training_args.local_rank, 62 | activation_fn="gelu", 63 | ) 64 | return bert_config 65 | 66 | 67 | def get_hf_bert_enc_layer_params(layer): 68 | init_ws = [] 69 | init_bs = [] 70 | 71 | init_ws.append(layer.attention.self.query.weight.detach().clone()) 72 | init_bs.append(layer.attention.self.query.bias.detach().clone()) 73 | init_ws.append(layer.attention.self.key.weight.detach().clone()) 74 | init_bs.append(layer.attention.self.key.bias.detach().clone()) 75 | init_ws.append(layer.attention.self.value.weight.detach().clone()) 76 | init_bs.append(layer.attention.self.value.bias.detach().clone()) 77 | init_ws.append(layer.attention.output.dense.weight.detach().clone()) 78 | init_bs.append(layer.attention.output.dense.bias.detach().clone()) 79 | init_ws.append(layer.attention.output.LayerNorm.weight.detach().clone()) 80 | init_bs.append(layer.attention.output.LayerNorm.bias.detach().clone()) 81 | 82 | init_ws.append(layer.intermediate.dense.weight.detach().clone()) 83 | init_bs.append(layer.intermediate.dense.bias.detach().clone()) 84 | init_ws.append(layer.output.dense.weight.detach().clone()) 85 | init_bs.append(layer.output.dense.bias.detach().clone()) 86 | init_ws.append(layer.output.LayerNorm.weight.detach().clone()) 87 | init_bs.append(layer.output.LayerNorm.bias.detach().clone()) 88 | 89 | return init_ws, init_bs 90 | 91 | 92 | def inject_ls_enc_layer(model, training_args, config): 93 | for i in range(config.num_hidden_layers): 94 | bert_config = gen_bert_config(training_args, config) 95 | init_ws, init_bs = get_hf_bert_enc_layer_params(model.bert.encoder.layer[i]) 96 | model.bert.encoder.layer[i] = LSHFTransformerEncoderLayer( 97 | bert_config, init_ws, init_bs 98 | ).cuda() 99 | -------------------------------------------------------------------------------- /unitest/test_client.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import logging 31 | import unittest 32 | 33 | import torch 34 | 35 | from common import distributed_test 36 | from patrickstar import RuntimeMemTracer 37 | from patrickstar.core import PatrickStarClient, AccessType, register_param, ChunkType 38 | from patrickstar.core.parameter import ParamType 39 | 40 | 41 | class TestClientAccess(unittest.TestCase): 42 | def setUp(self): 43 | self.default_chunk_size = 40 44 | logging.info("SetUp finished") 45 | 46 | @distributed_test(world_size=[1]) 47 | def test_append_ps_tensor(self): 48 | RuntimeMemTracer(0) 49 | self.client = PatrickStarClient( 50 | rank=0, default_chunk_size=self.default_chunk_size 51 | ) 52 | 53 | self.compute_device = torch.device("cpu:0") 54 | 55 | param_size_list = [10, 11, 12, 13] 56 | 57 | param_list = [] 58 | param_payload_ref_list = [] 59 | for idx, psize in enumerate(param_size_list): 60 | param = torch.nn.Parameter(torch.rand(psize)) 61 | param_list.append(param) 62 | param_payload_ref_list.append(param.data.clone()) 63 | 64 | register_param(param, ParamType.CHUNK_BASED, torch.float, f"param_{idx}") 65 | self.client.append_tensor( 66 | [param], torch.float, AccessType.DATA, ChunkType.PARAM_FP32 67 | ) 68 | 69 | real_payload = self.client.access_data(param, torch.device("cpu:0")) 70 | real_payload.copy_(param.data) 71 | self.client.release_data(param) 72 | self.assertTrue(param.data.numel() == 0) 73 | 74 | self.client.display_chunk_info() 75 | for param, payload_ref in zip(param_list, param_payload_ref_list): 76 | real_payload = self.client.access_data(param, torch.device("cpu:0")) 77 | self.assertEqual(torch.max(real_payload - payload_ref), 0) 78 | self.client.release_data(param) 79 | 80 | @distributed_test(world_size=[1]) 81 | def test_append_torch_tensor(self): 82 | self.client = PatrickStarClient( 83 | rank=0, default_chunk_size=self.default_chunk_size 84 | ) 85 | 86 | self.compute_device = torch.device("cpu:0") 87 | 88 | param_size_list = [10, 11, 12, 13] 89 | 90 | param_list = [] 91 | param_payload_ref_list = [] 92 | for idx, psize in enumerate(param_size_list): 93 | param = torch.nn.Parameter(torch.rand(psize)) 94 | param_list.append(param) 95 | register_param(param, ParamType.TORCH_BASED, torch.float, f"param_{idx}") 96 | param_payload_ref_list.append(param.data.clone()) 97 | self.client.append_tensor( 98 | [param], torch.float, AccessType.DATA, ChunkType.PARAM_FP32 99 | ) 100 | 101 | real_payload = self.client.access_data(param, torch.device("cpu:0")) 102 | real_payload.copy_(param.data) 103 | self.client.release_data(param) 104 | 105 | self.client.display_chunk_info() 106 | for param, payload_ref in zip(param_list, param_payload_ref_list): 107 | real_payload = self.client.access_data(param, torch.device("cpu:0")) 108 | self.assertEqual(torch.max(real_payload - payload_ref), 0) 109 | self.client.release_data(param) 110 | 111 | 112 | if __name__ == "__main__": 113 | 114 | unittest.main() 115 | -------------------------------------------------------------------------------- /patrickstar/core/memory_cache.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | from patrickstar.core.memtracer.memtracer import RuntimeMemTracer 32 | from patrickstar.utils.helper import getsizeof 33 | 34 | 35 | class MemoryCache(object): 36 | def __init__(self, capacity, memtracer: RuntimeMemTracer): 37 | r""" " 38 | A cache of chunk to avoid too much memory allocation and free. 39 | `capacity` chunks always stay in the GPU memory. 40 | If we have allocated a chunk on the target device, just reuse the cached one. 41 | Params: 42 | `capacity` : the capacity size of each type of tensor cache list. 43 | Returns: 44 | None or a `torch.Tensor`. 45 | """ 46 | self._capacity = capacity 47 | self._cached_tensors = {} 48 | self._memtracer = memtracer 49 | 50 | def _new_mem(self, size, data_type, device_type, pin_memory): 51 | space_size = getsizeof(data_type) * size 52 | ret = torch.zeros( 53 | size, 54 | dtype=data_type, 55 | device=device_type, 56 | pin_memory=pin_memory, 57 | ) 58 | self._memtracer.add(device_type.type, space_size, pin_memory) 59 | return ret 60 | 61 | def pop_or_allocate( 62 | self, 63 | device_type: torch.device, 64 | size: int, 65 | data_type: torch.dtype, 66 | pin_memory: bool, 67 | ) -> torch.Tensor: 68 | """ 69 | Return a tensor including `size` `device_type` elements on `device_type`. 70 | Delete the reference to the tenor in MemoryCache. 71 | Return: 72 | torch.Tensor 73 | """ 74 | assert isinstance( 75 | device_type, torch.device 76 | ), "device_type must be type of torch.device" 77 | if (device_type, data_type) not in self._cached_tensors: 78 | return self._new_mem(size, data_type, device_type, pin_memory) 79 | tensors = self._cached_tensors[(device_type, data_type)] 80 | i = -1 81 | for i in range(len(tensors)): 82 | if tensors[i].numel() == size: 83 | break 84 | if i == -1: 85 | return self._new_mem(size, data_type, device_type, pin_memory) 86 | new_tensor_ref = tensors[i] 87 | # delete the reference to tensors[i] in MemoryCache 88 | tensors.pop(i) 89 | return new_tensor_ref 90 | 91 | def push(self, payload): 92 | """ 93 | NOTE() must set payload to None outside of this function. 94 | Recycle a payload tensor. 95 | If the cache is fulled, delete the payload. 96 | Returns: 97 | success pushed or not. 98 | """ 99 | device_type = payload.device 100 | data_type = payload.dtype 101 | if (device_type, data_type) not in self._cached_tensors and self._capacity > 0: 102 | self._cached_tensors[(device_type, data_type)] = [payload.zero_()] 103 | else: 104 | size = payload.numel() 105 | # the cache is fulled 106 | if len(self._cached_tensors[(device_type, data_type)]) == self._capacity: 107 | is_pinned_flag = payload.is_pinned() 108 | del payload 109 | space_size = getsizeof(data_type) * size 110 | self._memtracer.delete(device_type.type, space_size, is_pinned_flag) 111 | else: 112 | self._cached_tensors[(device_type, data_type)].append(payload.zero_()) 113 | -------------------------------------------------------------------------------- /unitest/test_chunk_data.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import unittest 31 | 32 | import torch 33 | 34 | from common import distributed_test 35 | from patrickstar.core import AccessType, ChunkTensorIndex 36 | from patrickstar.core import register_param, ParamType 37 | 38 | 39 | class TestChunkData(unittest.TestCase): 40 | def setUp(self): 41 | self.default_chunk_size = 40 42 | 43 | @distributed_test(world_size=[1]) 44 | def test_allocate(self): 45 | self.compute_device = ( 46 | torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu") 47 | ) 48 | # Statically construct chunk layout -> chunk_tensor_index 49 | chunk_tensor_index = ChunkTensorIndex(self.default_chunk_size) 50 | 51 | param1 = torch.nn.Parameter(torch.zeros(10)) 52 | register_param(param1, ParamType.CHUNK_BASED, torch.float, "param1") 53 | chunk_tensor_index.add_tensor( 54 | chunk_id=0, 55 | tensor_id=param1.ps_attr.data_id(), 56 | start_offset=0, 57 | numel=param1.numel(), 58 | param=param1, 59 | access_type=AccessType.DATA, 60 | ) 61 | 62 | self.assertTrue( 63 | chunk_tensor_index.tensor_id_to_chunk_id(param1.ps_attr.data_id()) == 0 64 | ) 65 | self.assertTrue(chunk_tensor_index.get_chunk_id(param1, AccessType.DATA) == 0) 66 | 67 | param2 = torch.nn.Parameter(torch.zeros(15)) 68 | register_param(param2, ParamType.CHUNK_BASED, torch.float, "param2") 69 | self.assertTrue( 70 | chunk_tensor_index.get_chunk_id(param2, AccessType.DATA) is None 71 | ) 72 | ret = chunk_tensor_index.try_insert_tensor(0, param2, AccessType.DATA) 73 | self.assertTrue(ret) 74 | tensor_info = chunk_tensor_index.get_tensor_info(param2.ps_attr.data_id()) 75 | self.assertTrue(tensor_info.start_offset == 10) 76 | 77 | param3 = torch.nn.Parameter(torch.zeros(5)) 78 | register_param(param3, ParamType.CHUNK_BASED, torch.float, "param3") 79 | ret = chunk_tensor_index.try_insert_tensor(0, param3, AccessType.DATA) 80 | tensor_info = chunk_tensor_index.get_tensor_info(param3.ps_attr.data_id()) 81 | self.assertTrue(tensor_info.start_offset == 25) 82 | 83 | param4 = torch.nn.Parameter(torch.zeros(100)) 84 | register_param(param4, ParamType.CHUNK_BASED, torch.float, "param4") 85 | ret = chunk_tensor_index.try_insert_tensor(0, param4, AccessType.DATA) 86 | self.assertFalse(ret) 87 | # chunk_tensor_index.delete_tensor(11) 88 | 89 | param5 = torch.nn.Parameter(torch.zeros(13)) 90 | register_param(param5, ParamType.CHUNK_BASED, torch.float, "param5") 91 | ret = chunk_tensor_index.try_insert_tensor(1, param5, AccessType.DATA) 92 | tensor_info = chunk_tensor_index.get_tensor_info(param5.ps_attr.data_id()) 93 | self.assertTrue(tensor_info.start_offset == 0) 94 | 95 | ret = chunk_tensor_index.try_insert_tensor(1, param5, AccessType.DATA) 96 | tensor_info = chunk_tensor_index.get_tensor_info(param5.ps_attr.data_id()) 97 | self.assertTrue(tensor_info.start_offset == 0) 98 | 99 | param6 = torch.nn.Parameter(torch.zeros(1000)) 100 | register_param(param6, ParamType.CHUNK_BASED, torch.float, "param6") 101 | ret = chunk_tensor_index.try_insert_tensor(1, param6, AccessType.DATA) 102 | self.assertFalse(ret) 103 | 104 | 105 | if __name__ == "__main__": 106 | 107 | unittest.main() 108 | -------------------------------------------------------------------------------- /patrickstar/utils/global_timer.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import time 31 | import torch 32 | 33 | # from .logging import logger 34 | from .singleton_meta import SingletonMeta 35 | 36 | 37 | class GlobalTimer(metaclass=SingletonMeta): 38 | def __init__(self): 39 | """ 40 | Timer for different function of the program. 41 | The naming convention should be {TrainingState}_{function}, 42 | e.g. ADAM_compute 43 | """ 44 | self.elapse_stat = {} 45 | self.start_time = {} 46 | self.start_flag = False 47 | 48 | def start(self): 49 | self.start_flag = True 50 | 51 | def start_profile(self, key): 52 | if not self.start_flag: 53 | return 54 | if key in self.start_time: 55 | assert self.start_time[key] == 0, f"Please Check {key} profiling function" 56 | self.start_time[key] = time.time() 57 | 58 | def finish_profile(self, key): 59 | if not self.start_flag: 60 | return 61 | torch.cuda.current_stream().synchronize() 62 | if key in self.elapse_stat: 63 | self.elapse_stat[key] += time.time() - self.start_time[key] 64 | else: 65 | self.elapse_stat[key] = time.time() - self.start_time[key] 66 | self.start_time[key] = 0 67 | 68 | def reset(self): 69 | if not self.start_flag: 70 | return 71 | for k, _ in self.elapse_stat.items(): 72 | self.elapse_stat[k] = 0 73 | 74 | def print(self): 75 | if not self.start_flag: 76 | return 77 | print("------------- PROFILE RESULTS ----------------") 78 | dot_length = 20 79 | for k in self.elapse_stat: 80 | dot_length = max(dot_length, len(k) + 2) 81 | overall_elapse = ( 82 | self.elapse_stat["FWD"] + self.elapse_stat["BWD"] + self.elapse_stat["ADAM"] 83 | ) 84 | for k, v in self.elapse_stat.items(): 85 | print( 86 | f'{k} {"." * (dot_length - len(k))} {v}, {v / overall_elapse * 100} %' 87 | ) 88 | print(f'TOTAL {"." * (dot_length - len("TOTAL"))} {overall_elapse}') 89 | 90 | 91 | my_timer = GlobalTimer() 92 | 93 | 94 | class DataMoveCnter(metaclass=SingletonMeta): 95 | def __init__(self): 96 | self.amount_dict = {} 97 | self.times_dict = {} 98 | 99 | def update(self, key_name, tensor_size): 100 | my_timer = GlobalTimer() 101 | if not my_timer.start_flag: 102 | return 103 | if key_name in self.times_dict: 104 | self.times_dict[key_name] += 1 105 | self.amount_dict[key_name] += tensor_size 106 | else: 107 | self.times_dict[key_name] = 1 108 | self.amount_dict[key_name] = tensor_size 109 | 110 | def reset(self): 111 | for k, _ in self.times_dict.items(): 112 | self.times_dict[k] = 0 113 | self.amount_dict[k] = 0 114 | 115 | def print(self): 116 | print("------------- DATA MOVE RESULTS --------------") 117 | my_timer = GlobalTimer() 118 | for k, v in self.times_dict.items(): 119 | bwd = 0 120 | if k in my_timer.elapse_stat and self.amount_dict[k] != 0: 121 | bwd = self.amount_dict[k] / my_timer.elapse_stat[k] 122 | print( 123 | f"{k}: {self.amount_dict[k] / 1024 / 1024} MB, {v} times, {bwd / 1024 / 1024} MB/s" 124 | ) 125 | else: 126 | print(f"{k}: {self.amount_dict[k] / 1024 / 1024} MB") 127 | 128 | 129 | data_move_cnter = DataMoveCnter() 130 | -------------------------------------------------------------------------------- /examples/simple_net.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import torch 31 | 32 | # from checkpoint.torch_checkpoint import checkpoint 33 | from torch.utils.checkpoint import checkpoint 34 | from torch.utils.data import SequentialSampler 35 | from transformers import BertConfig 36 | from transformers.models.bert.modeling_bert import BertEmbeddings 37 | 38 | 39 | class Encoder(torch.nn.Module): 40 | def __init__(self, hidden_dim, is_ckp=False): 41 | super(Encoder, self).__init__() 42 | self.linear1 = torch.nn.Sequential( 43 | torch.nn.Linear(hidden_dim, hidden_dim), 44 | torch.nn.Linear(hidden_dim, hidden_dim), 45 | torch.nn.Linear(hidden_dim, hidden_dim), 46 | ) 47 | 48 | self.linear3 = torch.nn.Linear(hidden_dim, hidden_dim) 49 | self.linear4 = torch.nn.Linear(hidden_dim, hidden_dim) 50 | self.linear5 = torch.nn.Linear(hidden_dim, hidden_dim) 51 | self.is_ckp = is_ckp 52 | 53 | def forward(self, x): 54 | h2 = self.linear1(x) 55 | if self.is_ckp: 56 | h3 = checkpoint(self.linear3, h2) 57 | else: 58 | h3 = self.linear3(h2) 59 | h4 = self.linear4(h3) 60 | h5 = self.linear5(h4) 61 | return h5 62 | 63 | 64 | def get_data_loader( 65 | batch_size, 66 | total_samples, 67 | hidden_dim, 68 | device, 69 | data_type=torch.float, 70 | is_distrbuted=False, 71 | ): 72 | train_data = torch.randn(total_samples, hidden_dim, device=device, dtype=data_type) 73 | train_label = torch.empty(total_samples, dtype=torch.long, device=device).random_( 74 | hidden_dim 75 | ) 76 | train_dataset = torch.utils.data.TensorDataset(train_data, train_label) 77 | if is_distrbuted: 78 | sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 79 | else: 80 | sampler = SequentialSampler(train_dataset) 81 | train_loader = torch.utils.data.DataLoader( 82 | train_dataset, batch_size=batch_size, sampler=sampler 83 | ) 84 | return train_loader 85 | 86 | 87 | def get_bert_data_loader( 88 | batch_size, total_samples, sequence_length, device, is_distrbuted=False 89 | ): 90 | train_data = torch.randint( 91 | low=0, 92 | high=10, 93 | size=(total_samples, sequence_length), 94 | device=device, 95 | dtype=torch.long, 96 | ) 97 | train_label = torch.zeros(total_samples, dtype=torch.long, device=device) 98 | train_dataset = torch.utils.data.TensorDataset(train_data, train_label) 99 | if is_distrbuted: 100 | sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 101 | else: 102 | sampler = SequentialSampler(train_dataset) 103 | train_loader = torch.utils.data.DataLoader( 104 | train_dataset, batch_size=batch_size, sampler=sampler 105 | ) 106 | return train_loader 107 | 108 | 109 | class SimpleModel(torch.nn.Module): 110 | def __init__(self, hidden_dim, seq_len, is_ckp=False, is_share_param=False): 111 | super(SimpleModel, self).__init__() 112 | config = BertConfig() 113 | config.vocab_size = 25 114 | config.max_position_embeddings = seq_len 115 | config.hidden_size = hidden_dim 116 | self.embeddings_1 = BertEmbeddings(config) 117 | 118 | self._is_share_param = is_share_param 119 | if is_share_param: 120 | self.embeddings_2 = self.embeddings_1 121 | else: 122 | self.embeddings_2 = BertEmbeddings(config) 123 | self.encoder = Encoder(hidden_dim, is_ckp) 124 | self.cross_entropy_loss = torch.nn.CrossEntropyLoss() 125 | 126 | def forward(self, x, y): 127 | h1 = self.embeddings_1(x) 128 | h2 = self.embeddings_2(x) 129 | h3 = h1 + h2 130 | h3 = self.encoder(h3) 131 | return self.cross_entropy_loss(h3[:, 0], y) 132 | -------------------------------------------------------------------------------- /unitest/common.py: -------------------------------------------------------------------------------- 1 | # BSD 3-Clause License 2 | # 3 | # Copyright (C) 2021 THL A29 Limited, a Tencent company. All rights reserved. 4 | # 5 | # Redistribution and use in source and binary forms, with or without modification, 6 | # are permitted provided that the following conditions are met: 7 | # 8 | # * Redistributions of source code must retain the above copyright notice, this 9 | # list of conditions and the following disclaimer. 10 | # 11 | # * Redistributions in binary form must reproduce the above copyright notice, 12 | # this list of conditions and the following disclaimer in the documentation 13 | # and/or other materials provided with the distribution. 14 | # 15 | # * Neither the name of the psutil authors nor the names of its contributors 16 | # may be used to endorse or promote products derived from this software without 17 | # specific prior written permission. 18 | # 19 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 21 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 23 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 24 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 25 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 26 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 27 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 28 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | 30 | import os 31 | import time 32 | 33 | import torch 34 | from torch.multiprocessing import Process 35 | 36 | # Worker timeout *after* the first worker has completed. 37 | UNIT_WORKER_TIMEOUT = 120 38 | 39 | 40 | def distributed_test(world_size=2, backend="nccl", use_fake_dist=False): 41 | r"""A decorator for executing a function (e.g., a unit test) in a distributed manner. 42 | 43 | This decorator manages the spawning and joining of processes, initialization of 44 | torch.distributed, and catching of errors. 45 | 46 | Usage example: 47 | @distributed_test(worker_size=[2,3]) 48 | def my_test(): 49 | rank = dist.get_rank() 50 | world_size = dist.get_world_size() 51 | assert(rank < world_size) 52 | 53 | Args: 54 | world_size (int or list): number of ranks to spawn. Can be a list to spawn 55 | multiple tests. 56 | """ 57 | 58 | def dist_wrap(run_func): 59 | """Second-level decorator for dist_test. This actually wraps the function.""" 60 | 61 | def dist_init(local_rank, num_procs, *func_args, **func_kwargs): 62 | """Initialize torch.distributed and execute the user function.""" 63 | os.environ["MASTER_ADDR"] = "127.0.0.1" 64 | os.environ["MASTER_PORT"] = "29503" 65 | os.environ["LOCAL_RANK"] = str(local_rank) 66 | # NOTE: unit tests don't support multi-node so local_rank == global rank 67 | os.environ["RANK"] = str(local_rank) 68 | os.environ["WORLD_SIZE"] = str(num_procs) 69 | 70 | torch.distributed.init_process_group(backend=backend) 71 | if torch.cuda.is_available(): 72 | if use_fake_dist: 73 | torch.cuda.set_device(0) 74 | else: 75 | torch.cuda.set_device(local_rank) 76 | run_func(*func_args, **func_kwargs) 77 | 78 | def dist_launcher(num_procs, *func_args, **func_kwargs): 79 | r"""Launch processes and gracefully handle failures.""" 80 | 81 | # Spawn all workers on subprocesses. 82 | processes = [] 83 | for local_rank in range(num_procs): 84 | p = Process( 85 | target=dist_init, 86 | args=(local_rank, num_procs, *func_args), 87 | kwargs=func_kwargs, 88 | ) 89 | p.start() 90 | processes.append(p) 91 | 92 | # Now loop and wait for a test to complete. The spin-wait here isn't a big 93 | # deal because the number of processes will be O(#GPUs) << O(#CPUs). 94 | any_done = False 95 | while not any_done: 96 | for p in processes: 97 | if not p.is_alive(): 98 | any_done = True 99 | break 100 | 101 | # Wait for all other processes to complete 102 | for p in processes: 103 | p.join(UNIT_WORKER_TIMEOUT) 104 | 105 | failed = [(rank, p) for rank, p in enumerate(processes) if p.exitcode != 0] 106 | for _, p in failed: 107 | # If it still hasn't terminated, kill it because it hung. 108 | if p.exitcode is None: 109 | p.terminate() 110 | if p.exitcode != 0: 111 | p.terminate() 112 | 113 | def run_func_decorator(*func_args, **func_kwargs): 114 | r"""Entry point for @distributed_test().""" 115 | 116 | if isinstance(world_size, int): 117 | dist_launcher(world_size, *func_args, **func_kwargs) 118 | elif isinstance(world_size, list): 119 | for procs in world_size: 120 | dist_launcher(procs, *func_args, **func_kwargs) 121 | time.sleep(0.5) 122 | else: 123 | raise TypeError("world_size must be an integer or a list of integers.") 124 | 125 | return run_func_decorator 126 | 127 | return dist_wrap 128 | -------------------------------------------------------------------------------- /examples/run_transformers.sh: -------------------------------------------------------------------------------- 1 | cd $(dirname $0) 2 | 3 | export GPU_NUM=${GPU_NUM:-1} 4 | # Chunk Size in MB 5 | export CS=${CS:-256} 6 | # Batch Size 7 | export BS=${BS:-16} 8 | # Embedding on CPU 9 | export CPU_EBD=${CPU_EBD:-0} 10 | # Release remote chunks after init 11 | export RELEASE_AFTER_INIT=${RELEASE_AFTER_INIT:-0} 12 | export MODEL_NAME=${MODEL_NAME:-"GPT2small"} 13 | # BERT or GPT 14 | export MODEL_TYPE=${MODEL_TYPE:-"GPT"} 15 | # distributed plan patrickstar or torch 16 | export DIST_PLAN=${DIST_PLAN:-"patrickstar"} 17 | # check results of patrickstar and torch, which disable 18 | # DIST_PLAN setting 19 | export RES_CHECK=${RES_CHECK:-0} 20 | # offload activation checkpoints to CPU 21 | export ACT_OFFLOAD=${ACT_OFFLOAD:-0} 22 | # activation rematerization, aka. gradient checkpointing 23 | export CKP=${CKP:-1} 24 | # no retry after failed, used for torch 1.9.0 25 | export NO_RETRY=${NO_RETRY:-0} 26 | export SKIP_LOG_EXSIT=${SKIP_LOG_EXSIT:-0} 27 | # static partition. 28 | export SP=${SP:-0} 29 | export MEM_PROF=${MEM_PROF:-0} 30 | # asyn memory monitor for mem sampler 31 | export AMM=${AMM:-1} 32 | # mem saving comm 33 | export MSC=${MSC:-1} 34 | # mem caching comm 35 | export CACHE=${CACHE:-1} 36 | # async move 37 | export ASYNC_MOVE=${ASYNC_MOVE:-0} 38 | # linear tiling comm 39 | export TILING=${TILING:-0} 40 | # hybrid adam 41 | export HYB=${HYB:-1} 42 | 43 | export LOCAL_WORLD_SIZE=${LOCAL_WORLD_SIZE:-1} 44 | export CS_SEARCH=${CS_SEARCH:-0} 45 | 46 | export NNODES=${NNODES:-1} 47 | export NODE_RANK=${NODE_RANK:-0} 48 | export MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} 49 | export MASTER_PORT=${MASTER_PORT:-"12345"} 50 | export SUFFIX=${SUFFIX:-""} 51 | 52 | if [[ ${TILING} == 1 ]]; then 53 | TILING_FLAG="--with_tiling_linear" 54 | else 55 | export TILING_FLAG="" 56 | fi 57 | 58 | 59 | if [[ ${CACHE} == 1 ]]; then 60 | CACHE_FLAG="--with_mem_cache" 61 | else 62 | export CACHE_FLAG="" 63 | fi 64 | 65 | if [[ ${ASYNC_MOVE} == 1 ]]; then 66 | ASYNC_MOVE_FLAG="--with_async_move" 67 | else 68 | export ASYNC_MOVE_FLAG="" 69 | fi 70 | 71 | if [[ ${MSC} == 1 ]]; then 72 | MSC_FLAG="--with_mem_saving_com" 73 | else 74 | export MSC_FLAG="" 75 | fi 76 | 77 | if [[ ${AMM} == 1 ]]; then 78 | AMM_FLAG="--with_async_mem_monitor" 79 | else 80 | export AMM_FLAG="" 81 | fi 82 | 83 | 84 | if [[ ${MEM_PROF} == 1 ]]; then 85 | MEM_PROF_FLAG="--with_mem_profiler" 86 | else 87 | export MEM_PROF_FLAG="" 88 | fi 89 | 90 | 91 | if [[ ${ACT_OFFLOAD} == 1 ]]; then 92 | ACT_OFFLOAD_FLAG="--with_activation_offload" 93 | else 94 | export ACT_OFFLOAD_FLAG="" 95 | fi 96 | 97 | if [[ ${RES_CHECK} == 1 ]]; then 98 | RES_CHECK_FLAG="--res_check" 99 | else 100 | export RES_CHECK_FLAG="" 101 | fi 102 | 103 | 104 | if [[ ${CPU_EBD} == 1 ]]; then 105 | export CPU_EBD_FLAG="--use_cpu_embedding" 106 | else 107 | export CPU_EBD_FLAG="" 108 | fi 109 | 110 | if [[ ${RELEASE_AFTER_INIT} == 1 ]]; then 111 | export RELEASE_AFTER_INIT_FLAG="--release_after_init" 112 | else 113 | export RELEASE_AFTER_INIT_FLAG="" 114 | fi 115 | 116 | if [[ ${CKP} == 1 ]]; then 117 | export CKP_FLAG="--use_ckp" 118 | else 119 | export CKP_FLAG="" 120 | fi 121 | 122 | let CHUNK_SIZE=${CS}*1024*1024 123 | 124 | if [[ ${HYB} == 1 ]]; then 125 | export HYBRID_ADAM_FLAG="--use_hybrid_adam" 126 | else 127 | export HYBRID_ADAM_FLAG="" 128 | fi 129 | 130 | 131 | 132 | LOG_DIR="./logs_${MODEL_NAME}" 133 | mkdir -p ${LOG_DIR} 134 | 135 | GIT_VER=`git rev-parse --short=5 HEAD` 136 | LOG_FILE="log.${MODEL_NAME}_type_${MODEL_TYPE}_gpu_${GPU_NUM}_cs_${CS}_bs_${BS}_cpueb_${CPU_EBD}_hyb_${HYB}_offload_${ACT_OFFLOAD}_SP_${SP}_AMM_${AMM}_MSC_${MSC}_CACHE_${CACHE}_TILING_${TILING}_${GIT_VER}_node_${NNODES}_${SUFFIX}" 137 | 138 | is_run_flag=`python ./benchmark/is_run_this_file.py --path "${LOG_DIR}" --file "${LOG_FILE}"` 139 | echo is_run_flag $is_run_flag 140 | if [[ ${is_run_flag} == "0" && ${SKIP_LOG_EXSIT} == 1 ]]; 141 | then 142 | echo "it has been logged" 143 | exit 144 | fi 145 | echo "runing ${LOG_DIR} ${LOG_FILE}" 146 | 147 | if [[ ${NO_RETRY} == "1" ]]; 148 | then 149 | NO_RETRY_FLAG="--max_restarts=0" 150 | fi 151 | 152 | 153 | if [[ ${SP} == 1 ]]; 154 | then 155 | SP_FLAG="--with_static_partition" 156 | fi 157 | 158 | 159 | wc=`cat /proc/cpuinfo | grep "processor"| wc -l` 160 | let TNUM=wc/${GPU_NUM} 161 | echo "CPU core number " $wc "THREAD NUM " ${TNUM} 162 | 163 | cmd_opts=" 164 | --use_fp16 \ 165 | ${RES_CHECK_FLAG} \ 166 | ${NO_RETRY_FLAG} \ 167 | ${CKP_FLAG} \ 168 | --dist_plan=${DIST_PLAN} \ 169 | --batch_size=${BS} \ 170 | --model_name=${MODEL_NAME} \ 171 | --model_type=${MODEL_TYPE} \ 172 | --batch_size=${BS} \ 173 | ${CPU_EBD_FLAG} \ 174 | ${HYBRID_ADAM_FLAG} \ 175 | ${RELEASE_AFTER_INIT_FLAG} \ 176 | ${LIGHTSEQ_FLAG} \ 177 | ${ACT_OFFLOAD_FLAG} \ 178 | ${SP_FLAG} \ 179 | ${MEM_PROF_FLAG} \ 180 | ${AMM_FLAG} \ 181 | ${MSC_FLAG} \ 182 | ${CACHE_FLAG} \ 183 | ${ASYNC_MOVE_FLAG} \ 184 | ${TILING_FLAG} \ 185 | " 186 | 187 | if [[ ${CS_SEARCH} == 1 ]]; then 188 | mkdir -p ./search_res 189 | SLOG_FILE="./search_res/slog_file.${MODEL_NAME}_bs_${BS}_cpueb_${CPU_EBD}_offload_${ACT_OFFLOAD}_SP_${SP}_AMM_${AMM}_MSC_${MSC}_CACHE_${CACHE}_TILING_${TILING}_${GIT_VER}" 190 | rm -rf ${SLOG_FILE} 191 | 192 | for((i=312;i>=64;i-=32)); 193 | do 194 | let CUR_CHUNK_SIZE=${i}*1024*1024 195 | echo "searching CHUNK_SIZE ${i} M elem" 196 | 197 | python -m torch.distributed.launch --nproc_per_node=1 \ 198 | eval_chunk_size.py \ 199 | --default_chunk_size=${CUR_CHUNK_SIZE} \ 200 | --slog_file=${SLOG_FILE} \ 201 | ${cmd_opts} 202 | done 203 | else 204 | env OMP_NUM_THREADS=${TNUM} timeout -s SIGKILL 30m python -m torch.distributed.launch --nproc_per_node=${GPU_NUM} \ 205 | --nnodes=${NNODES} --node_rank=${NODE_RANK} --master_addr=${MASTER_ADDR} --master_port=${MASTER_PORT} \ 206 | pretrain_demo.py \ 207 | --default_chunk_size=${CHUNK_SIZE} \ 208 | ${cmd_opts} \ 209 | 2>&1 | tee ${LOG_DIR}/${LOG_FILE} 210 | fi 211 | -------------------------------------------------------------------------------- /doc/optimization_options.md: -------------------------------------------------------------------------------- 1 | This page explains the optimization options for benchmarking. 2 | Optimizations are divided into PatrickStar-related ones and general ones. 3 | General Optimizations can be applied to any PyTorch-based framework. 4 | 5 | ## General Optimizations 6 | 1. Activation Checkpoing (a.k.a gradient checkpointing in [PyTorch](https://pytorch.org/docs/stable/checkpoint.html)) 7 | `--use_ckp` 8 | Make sure this option is open for large model training. It can primarily save activation memory footprint at the cost of recomputing. 9 | 10 | 2. Activation Offloading 11 | `--with_activation_offload` 12 | Offload the checkpoints activation from GPU to CPU. Further Save GPU memory. 13 | Note you have to use activation checkpoing first. 14 | 15 | 3. CPU Embedding 16 | `--use_cpu_embedding` 17 | nn.Embedding is executed on CPU, save GPU memory. More importantly, it shrinks the chunk size. For some small models, the most significant layer is Embedding. Therefore, the chunk size has to be larger than the embedding numel. 18 | 19 | 20 | 4. Tiling Linear (a.k.a Memory-centric tiling in [DeepSpeed](https://deepspeed.readthedocs.io/en/stable/zero3.html#memory-centric-tiling)) 21 | `--with_tiling_linear` 22 | Memory-centric tiling (MCT) can split a param tensor of linear into pieces, and they do not need to be stored in contiguous memory space. This will help reduce chunk size. However, to achieve the best performance, you have to tune the in_splits/out_splits of the function's parameters. 23 | 24 | ## PatrickStar-related Optmizations 25 | 26 | 1. Memory Saving Communication. 27 | `--with_mem_saving_com` 28 | Use one-to-all communication to replace the original collective communication. More specifically, reduce scatter is replaced with Nx reduce. all gather is replaced with Nx bcast. In this way, we do not need to keep a Nx chunk buffer for distributed training, therefore saving the GPU memory. This method also changes the CPU-GPU and intra-GPU communication volume. In general, it reduces CPU-GPU comm volume at a cost of increasing intra-GPU bcast comm volume and also lower the intra-GPU bcast bandwidth. However, in some cases, it can improve the overall performance of the system from such a tradeoff. It is suitable for training an extremely large model with a computing cluster with high-quality intra-GPU communication bandwidth, i.e. 50B model on a node of SuperPod. Details in Merge Request #250. 29 | 30 | 2. Memory Allocation Caching. 31 | `--with_mem_cache` 32 | Use a cache to allocate and release chunk memory. The cache is a size-limited queue whose capacity is default as 2. It is helpful for Memory Saving Communication in distributed training. It avoids frequent release and allocates memory for remote chunks. See detail in #241. 33 | 34 | 35 | 2. Hybrid ADAM: 36 | `--use_hybrid_adam` 37 | Place Optimizer States (OS) on both CPU and GPU. Part of ADAM computation is conducted on CPU and the rest of computation is on GPU. On the contrary, Zero-Offload does ADAM on CPU only. This technique is able to accelerate ADAM computation for relative small model. 38 | 39 | 3. Activation Offload. 40 | `--with_activation_offload` 41 | Offload activation to CPU. Must used in combination with activation checkpointing (a.k.a gradient checkpoint in PyTorch). 42 | 43 | 4. Asyn Monitoring Memory with the Runtime Memory Tracer. 44 | `--with_async_mem_monitor` 45 | Async Sampling memory usage with an independent thread. It will bring a more accurate runtime 46 | memory usage statistics. If you turn off this flag, memory usage sampling will triggered at the exact moment before or after operators (submodule in PyTorch) computing. 47 | 48 | 49 | 5. Static Partion. 50 | `--with_static_partition` 51 | PatirckStar is famous for dynamic partition model data. With help of this flag you can static partition model data between CPU and GPU. The max GPU used by chunks is `warmup_gpu_chunk_mem_ratio` * gpu_size. It is still better than Zero-Offload, which alway put all param and grad in GPU, to avoid OOM. It will lead to lower computing efficient than the default dynamic partition. But it is helpful to aggressively avoid OOM. 52 | 53 | 6. Release Remote Chunk After Initialization. 54 | `release_after_init` 55 | The is a computing efficient irrelevant option used for distributed training. It allocates memory for remote chunks but release it immediately. In this way, we can make sure the model parameter is randomly initialized the same as a serial version. Solve the problem with random seed. It is used in combination with the `--res_check` option to check the correctness of distributed training. 56 | 57 | 7. Adjusting the quota of CPU and GPU memory of memory tracer. 58 | We provide ways to adjust the CPU and GPU memory usage quota for the memory tracer. We did not expose this optimization as parameters passed through the command line. As shown in the pretrain_demo.py, there is a JSON config for the memory tracer setting. You can adjust the four ratio suffix values. 59 | 60 | `warmup_gpu_chunk_mem_ratio`: the max gpu memory of a GPU can be used for chunks during the warmup iteration. 61 | 62 | `overall_gpu_mem_ratio`: the available gpu mem size / real gpu mem capacity. Turn up the value if you meet cpu or gpu OOM during iteration. 63 | 64 | `overall_cpu_mem_ratio`: the available cpu mem size / real cpu mem capacity. Turn up the value if you meet cpu or gpu OOM during iteration. 65 | 66 | `margin_use_ratio`: Space to host optimizer states in GPU / the rest GPU space excluding the peak chunk-used space after warmup FWD+BWD. 67 | 68 | `use_fake_dist`: a debug flag, to simulate multiple-GPU on one GPU. It is used when we are poor. After we have multi-GPU we deprecated this flag. 69 | 70 | ``` 71 | "mem_tracer": { 72 | "use_async_mem_monitor": args.with_async_mem_monitor, 73 | "warmup_gpu_chunk_mem_ratio": 0.1, 74 | "overall_gpu_mem_ratio": 0.8, 75 | "overall_cpu_mem_ratio": 0.8, 76 | "margin_use_ratio": 0.8, 77 | "use_fake_dist": False, 78 | "with_static_partition": args.with_static_partition, 79 | }, 80 | ``` 81 | --------------------------------------------------------------------------------