├── LICENSE.txt ├── Notice.txt ├── README.md ├── README_CN.md ├── inference ├── openapi.sh └── run_serve.sh └── train ├── ds_zero2_no_offload.json ├── ds_zero3_no_offload.json ├── ds_zero3_offload.json ├── ds_zero3_offload_no_auto.json ├── example_data.jsonl ├── merge_lora_weight.py ├── merge_lora_weight.sh ├── requirements.txt ├── train.py └── train.sh /LICENSE.txt: -------------------------------------------------------------------------------- 1 | TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT 2 | Tencent-Hunyuan-7B Release Date: January 24, 2025 3 | THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW. 4 | By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately. 5 | 1. DEFINITIONS. 6 | a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A. 7 | b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan Works or any portion or element thereof set forth herein. 8 | c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent. 9 | d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means. 10 | e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use. 11 | f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement. 12 | g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives. 13 | h. “Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service. 14 | i. “Tencent,” “We” or “Us” shall mean THL A29 Limited. 15 | j. “Tencent Hunyuan” shall mean the large language models, text/image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us, including, without limitation to, Tencent-Hunyuan-7B released at [https://github.com/Tencent/Tencent-Hunyuan-7B]. 16 | k. “Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof. 17 | l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea. 18 | m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You. 19 | n. “including” shall mean including but not limited to. 20 | 2. GRANT OF RIGHTS. 21 | We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy. 22 | 3. DISTRIBUTION. 23 | You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, exclusively in the Territory, provided that You meet all of the following conditions: 24 | a. You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement; 25 | b. You must cause any modified files to carry prominent notices stating that You changed the files; 26 | c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and 27 | d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.” 28 | You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You. 29 | 4. ADDITIONAL COMMERCIAL TERMS. 30 | If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights. 31 | 5. RULES OF USE. 32 | a. Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b). 33 | b. You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other AI model (other than Tencent Hunyuan or Model Derivatives thereof). 34 | c. You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan Works, Output or results of the Tencent Hunyuan Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement. 35 | 6. INTELLECTUAL PROPERTY. 36 | a. Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You. 37 | b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent. 38 | c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works. 39 | d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses. 40 | 7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY. 41 | a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto. 42 | b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT. 43 | c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. 44 | 8. SURVIVAL AND TERMINATION. 45 | a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. 46 | b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement. 47 | 9. GOVERNING LAW AND JURISDICTION. 48 | a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement. 49 | b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute. 50 | 51 | EXHIBIT A 52 | ACCEPTABLE USE POLICY 53 | 54 | Tencent reserves the right to update this Acceptable Use Policy from time to time. 55 | Last modified: November 5, 2024 56 | 57 | Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives: 58 | 1. Outside the Territory; 59 | 2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation; 60 | 3. To harm Yourself or others; 61 | 4. To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others; 62 | 5. To override or circumvent the safety guardrails and safeguards We have put in place; 63 | 6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; 64 | 7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections; 65 | 8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement; 66 | 9. To intentionally defame, disparage or otherwise harass others; 67 | 10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems; 68 | 11. To generate or disseminate personal identifiable information with the purpose of harming others; 69 | 12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated; 70 | 13. To impersonate another individual without consent, authorization, or legal right; 71 | 14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance); 72 | 15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions; 73 | 16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism; 74 | 17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics; 75 | 18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm; 76 | 19. For military purposes; 77 | 20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices. 78 | -------------------------------------------------------------------------------- /Notice.txt: -------------------------------------------------------------------------------- 1 | Usage and Legal Notices: 2 | 3 | Tencent is pleased to support the open source community by making Tencent-Hunyuan-7B Dense available. 4 | 5 | Copyright (C) 2025 THL A29 Limited, a Tencent company. All rights reserved. The below software and/or models in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) THL A29 Limited. 6 | 7 | Tencent-Hunyuan-7B is licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT, which can be found in this repository called "LICENSE", except for the third-party components listed below, which is licensed under different terms. Tencent-Hunyuan-7B does not impose any additional limitations beyond what is outlined in the respective licenses of these third-party components. Users must comply with all terms and conditions of original licenses of these third-party components and must ensure that the usage of the third party components adheres to all relevant laws and regulations. 8 | 9 | 10 | Other dependencies and licenses: 11 | 12 | 13 | Open Source Software Licensed under the Apache License Version 2.0: 14 | The below software in this distribution may have been modified by THL A29 Limited ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2025 THL A29 Limited. 15 | -------------------------------------------------------------------- 16 | 1. VLLM 17 | Copyright (c) vllm original author and authors 18 | Please note this software has been modified by Tencent in this distribution. 19 | 20 | 21 | Terms of the Apache License Version 2.0: 22 | -------------------------------------------------------------------- 23 | Apache License 24 | 25 | Version 2.0, January 2004 26 | 27 | http://www.apache.org/licenses/ 28 | 29 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 30 | 1. Definitions. 31 | 32 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. 33 | 34 | "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. 35 | 36 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 37 | 38 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 39 | 40 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 41 | 42 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 43 | 44 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). 45 | 46 | "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." 49 | 50 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 51 | 52 | 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 53 | 54 | 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 55 | 56 | 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: 57 | 58 | You must give any other recipients of the Work or Derivative Works a copy of this License; and 59 | 60 | You must cause any modified files to carry prominent notices stating that You changed the files; and 61 | 62 | You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and 63 | 64 | If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. 65 | 66 | You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 67 | 68 | 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 69 | 70 | 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 71 | 72 | 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 73 | 74 | 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 75 | 76 | 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. 77 | 78 | END OF TERMS AND CONDITIONS -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 中文  | English 3 |
4 |
7 |
8 |
11 | 🫣 Hugging Face Hunyuan-7B-Instruct   | 🫣 Hugging Face Hunyuan-7B-Pretrain  
12 |
13 | ## Model Introduction
14 |
15 | The 7B models released by Hunyuan this time: [Hunyuan-7B-Pretrain](https://huggingface.co/tencent/Hunyuan-7B-Pretrain) and [Hunyuan-7B-Instruct](https://huggingface.co/tencent/Hunyuan-7B-Instruct) , use better data allocation and training, have strong performance, and have achieved a good balance between computing and performance. It stands out from many large-scale language models and is currently one of the strongest Chinese 7B Dense models.
16 |
17 | ### Introduction to Technical Advantages
18 |
19 | #### Model
20 |
21 | - Extended long text capability to 256K and utilizes Grouped Query Attention (GQA)
22 |
23 | #### Inference Framework
24 | - This open-source release offers two inference backend options tailored for the Hunyuan-7B model: the popular [vLLM-backend](https://github.com/quinnrong94/vllm/tree/dev_hunyuan) and the TensorRT-LLM Backend. In this release, we are initially open-sourcing the vLLM solution, with plans to release the TRT-LLM solution in the near future.
25 |
26 | #### Training Framework
27 | - The Hunyuan-7B open-source model is fully compatible with the Hugging Face format, enabling researchers and developers to perform model fine-tuning using the hf-deepspeed framework. Learn more : [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) 。
28 |
29 |
30 |
31 | ## Related News
32 | * 2025.1.24 We have open-sourced **Hunyuan-7B-Pretrain** , **Hunyuan-7B-Instruct** on Hugging Face.
33 |
34 |
35 |
36 | ## Benchmark
37 |
38 | Note: The following benchmarks are evaluated by TRT-LLM-backend
39 |
40 | **Hunyuan-7B-Pretrain**
41 |
42 | | | Qwen2.5-7B | Llama3-8B | OLMO2-7B | HunYuan-7B-V2 |
43 | |------------------|------------|------------|----------|---------------|
44 | | MMLU | 74.26 | 66.95 | 63.7 | **75.37** |
45 | | MMLU-Pro | 46.17 | 34.04 | 31 | **47.54** |
46 | | MMLU-CF | **61.01** | 55.21 | 52.94 | 59.62 |
47 | | MMLU-Redux | 73.47 | 66.44 | 63.74 | **74.54** |
48 | | BBH | 70.4 | 62.16 | 38.01 | **70.77** |
49 | | HellaSwag | 75.82 | 78.24 | 61.97 | **80.77** |
50 | | WinoGrande | 69.69 | 73.64 | **74.43** | 71.51 |
51 | | PIQA | 79.33 | 80.52 | **80.63** | 81.45 |
52 | | SIQA | 77.48 | 61.05 | 65.2 | **79.73** |
53 | | NaturalQuestions | 31.77 | 35.43 | **36.9** | 33.52 |
54 | | DROP | 68.2 | 60.13 | 60.8 | **68.63** |
55 | | ARC-C | 91.64 | 77.59 | 74.92 | **91.97** |
56 | | TriviaQA | 69.31 | **78.61** | 78 | 74.31 |
57 | | Chinese-SimpleQA | 30.37 | 19.4 | 7.35 | **30.51** |
58 | | SimpleQA | 4.98 | **7.68** | 4.51 | 3.73 |
59 | | CMMLU | 81.39 | 50.25 | 38.79 | **82.19** |
60 | | C-Eval | 81.11 | 50.4 | 38.53 | **82.12** |
61 | | C3 | 71.77 | 61.5 | 54 | **79.07** |
62 | | GSM8K | 82.71 | 57.54 | 67.5 | **93.33** |
63 | | MATH | 49.6 | 18.45 | 19 | **62.15** |
64 | | CMATH | 84.33 | 52.83 | 44 | **88.5** |
65 | | HumanEval | 57.93 | 35.98 | 15.24 | **59.15** |
66 |
67 |
68 |
69 |
70 | **Hunyuan-7B-Instruct**
71 |
72 | | Model | Qwen2.5-7B-Instruct | Llama-3-8B-Instruct | OLMo-2-1124-7B-DPO | Hunyuan-7B-Instruct |
73 | |-------------|---------------------|---------------------|--------------------|-------------------|
74 | | ARC-C | **89.83** | 82.4 | - | 88.81 |
75 | | BBH | 66.24 | - | 46.6 | **76.47** |
76 | | CEval | 76.82 | - | - | **81.8** |
77 | | CMMLU | 78.55 | - | - | **82.29** |
78 | | DROP_F1 | 80.63 | - | 60.5 | **82.96** |
79 | | GPQA | 36.87 | 34.6 | - | **47.98** |
80 | | Gsm8k | 80.14 | 80.6 | 85.1 | **90.14** |
81 | | HellaSwag | 83.34 | - | - | **86.57** |
82 | | HumanEval | **84.8** | 60.4 | - | 84.0 |
83 | | MATH | **72.86** | - | 32.5 | 70.64 |
84 | | MMLU | 72.36 | 68.5 | 61.3 | **79.18** |
85 |
86 |
87 |
88 | ## Quick Start
89 |
90 | You can refer to the content in [Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) to get started quickly. The training and inference code can use the version provided in this github repository.
91 |
92 | ### Docker:
93 |
94 | To simplify the deployment process, HunyuanLLM provides a pre-built Docker image:
95 |
96 | [hunyuaninfer/hunyuan-large:dense-infer-open-source](https://hub.docker.com/layers/hunyuaninfer/hunyuan-large/dense-infer-open-source/images/sha256-3a39561d8262dac04fcb46e7860663158909720b76a28b94a54eb852524ae6a4).
97 |
98 | ### Inference Performance
99 |
100 | This section presents the efficiency test results of deploying various models using vLLM, including inference speed (tokens/s) under different batch sizes.
101 |
102 | | Inference Framework | Model | Number of GPUs (GPU productA) | input_length | batch=1 | batch=4 |
103 | |------|------------|-------------------------|-------------------------|---------------------|----------------------|
104 | | vLLM | hunyuan-7B | 1 | 2048 | 78.9 | 279.5 |
105 |
106 | ## Contact Us
107 |
108 | If you would like to leave a message for our R&D and product teams, Welcome to contact our open-source team . You can also contact us via email (hunyuan_opensource@tencent.com).
--------------------------------------------------------------------------------
/README_CN.md:
--------------------------------------------------------------------------------
1 |
2 | English | 中文  3 |
4 |
7 |
8 |
11 | 🫣 Hugging Face  
12 |
13 | ## 模型介绍
14 |
15 | 本次混元发布的7B模型:[Hunyuan-7B-Pretrain](https://huggingface.co/tencent/Hunyuan-7B-Pretrain)和[Hunyuan-7B-Instruct](https://huggingface.co/tencent/Hunyuan-7B-Instruct) ,采用了更优的数据配比与训练,拥有强劲的性能,在计算与性能间取得良好平衡的优势从众多规模的语言模型中脱颖而出,是目前最强的中文7B Dense模型之一。
16 | ### 技术优势介绍
17 |
18 | #### 模型
19 |
20 | - 使用了GQA的同时,将长文能力拓展到256K。
21 |
22 | #### 推理框架
23 | - 模型支持 TRT-LLM-backend 和 [vLLM-backend](https://github.com/quinnrong94/vllm/tree/dev_hunyuan) 推理框架。本次优先开源vLLM框架,TRT-LLM将在近期推出。
24 |
25 | #### 训练框架
26 | - Hunyuan-7B开源模型已经支持huggingface格式,支持用户采用hf-deepspeed框架进行模型精调。详情可以参照[Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) 。
27 |
28 |
29 |
30 | ## 新闻
31 | * 2025.1 我们在Hugging Face开源了**Hunyuan-7B-Pretrain** 、 **Hunyuan-7B-Instruct** 。
32 |
33 |
34 |
35 | ## Benchmark评估榜单
36 |
37 | 注:下列Benchmark均为 TRT-LLM-backend 测评得出
38 | **Hunyuan-7B-Pretrain**
39 |
40 | | | Qwen2.5-7B | Llama3-8B | OLMO2-7B | HunYuan-7B-V2 |
41 | |------------------|------------|------------|----------|---------------|
42 | | MMLU | 74.26 | 66.95 | 63.7 | **75.37** |
43 | | MMLU-Pro | 46.17 | 34.04 | 31 | **47.54** |
44 | | MMLU-CF | **61.01** | 55.21 | 52.94 | 59.62 |
45 | | MMLU-Redux | 73.47 | 66.44 | 63.74 | **74.54** |
46 | | BBH | 70.4 | 62.16 | 38.01 | **70.77** |
47 | | HellaSwag | 75.82 | 78.24 | 61.97 | **80.77** |
48 | | WinoGrande | 69.69 | 73.64 | **74.43** | 71.51 |
49 | | PIQA | 79.33 | 80.52 | **80.63** | 81.45 |
50 | | SIQA | 77.48 | 61.05 | 65.2 | **79.73** |
51 | | NaturalQuestions | 31.77 | 35.43 | **36.9** | 33.52 |
52 | | DROP | 68.2 | 60.13 | 60.8 | **68.63** |
53 | | ARC-C | 91.64 | 77.59 | 74.92 | **91.97** |
54 | | TriviaQA | 69.31 | **78.61** | 78 | 74.31 |
55 | | Chinese-SimpleQA | 30.37 | 19.4 | 7.35 | **30.51** |
56 | | SimpleQA | 4.98 | **7.68** | 4.51 | 3.73 |
57 | | CMMLU | 81.39 | 50.25 | 38.79 | **82.19** |
58 | | C-Eval | 81.11 | 50.4 | 38.53 | **82.12** |
59 | | C3 | 71.77 | 61.5 | 54 | **79.07** |
60 | | GSM8K | 82.71 | 57.54 | 67.5 | **93.33** |
61 | | MATH | 49.6 | 18.45 | 19 | **62.15** |
62 | | CMATH | 84.33 | 52.83 | 44 | **88.5** |
63 | | HumanEval | 57.93 | 35.98 | 15.24 | **59.15** |
64 |
65 |
66 |
67 |
68 | **Hunyuan-7B-Instruct**
69 |
70 | | Model | Qwen2.5-7B-Instruct | Llama-3-8B-Instruct | OLMo-2-1124-7B-DPO | Hunyuan-7B-Instruct |
71 | |-------------|---------------------|---------------------|--------------------|-------------------|
72 | | ARC-C | **89.83** | 82.4 | - | 88.81 |
73 | | BBH | 66.24 | - | 46.6 | **76.47** |
74 | | CEval | 76.82 | - | - | **81.8** |
75 | | CMMLU | 78.55 | - | - | **82.29** |
76 | | DROP_F1 | 80.63 | - | 60.5 | **82.96** |
77 | | GPQA | 36.87 | 34.6 | - | **47.98** |
78 | | Gsm8k | 80.14 | 80.6 | 85.1 | **90.14** |
79 | | HellaSwag | 83.34 | - | - | **86.57** |
80 | | HumanEval | **84.8** | 60.4 | - | 84.0 |
81 | | MATH | **72.86** | - | 32.5 | 70.64 |
82 | | MMLU | 72.36 | 68.5 | 61.3 | **79.18** |
83 |
84 |
85 |
86 | ## 快速开始
87 |
88 | 您可以参考[Tencent-Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) 中的内容进行快速上手,训练推理代码使用本github仓库提供版本即可。
89 |
90 | ## 镜像
91 |
92 | 为了简化部署过程,HunyuanLLM提供了预先构建的Docker镜像:
93 | [hunyuaninfer/hunyuan-large:dense-infer-open-source](https://hub.docker.com/layers/hunyuaninfer/hunyuan-large/dense-infer-open-source/images/sha256-3a39561d8262dac04fcb46e7860663158909720b76a28b94a54eb852524ae6a4).
94 |
95 | ### 性能评估:
96 |
97 | 本部分介绍采用vLLM部署各个模型的效率测试结果,包括不同Batchsize下的推理速度(tokens/s)。
98 |
99 | | 推理框架 | 模型 | 部署卡数(卡型1) | input_length | batch=1 | batch=4 |
100 | |------|-----------------------------|-----------|-------------------------|---------------------|----------------------|
101 | | vLLM | hunyuan-7B | 1 | 2048 | 78.9 | 279.5 |
102 |
103 | ## 联系我们
104 | 如果你想给我们的研发和产品团队留言,欢迎联系我们腾讯混元LLM团队。你可以通过邮件(hunyuan_opensource@tencent.com)联系我们。
105 |
--------------------------------------------------------------------------------
/inference/openapi.sh:
--------------------------------------------------------------------------------
1 | curl http://${LOCAL_IP}:8020/v1/chat/completions -H 'Content-Type: application/json' -d '{
2 | "model": "${MODEL_PATH}",
3 | "messages": [
4 | {
5 | "role": "system",
6 | "content": [{"type": "text", "text": "You are a helpful assistant."}]
7 | },
8 | {
9 | "role": "user",
10 | "content": [{"type": "text", "text": "请按面积大小对四大洋进行排序,并给出面积最小的洋是哪一个?直接输出结果。"}]
11 | }
12 | ],
13 | "max_tokens": 2048,
14 | "temperature":0.7,
15 | "top_p": 0.6,
16 | "top_k": 20,
17 | "repetition_penalty": 1.05,
18 | "stop_token_ids": [127960]
19 | }'
--------------------------------------------------------------------------------
/inference/run_serve.sh:
--------------------------------------------------------------------------------
1 | MODEL_PATH=${MODEL_PATH}
2 |
3 | export TP_SOCKET_IFNAME=eth1
4 | # export VLLM_LOGGING_LEVEL=DEBUG
5 | export NCCL_SOCKET_IFNAME=eth1
6 | export GLOO_SOCKET_IFNAME=eth1
7 |
8 | export VLLM_HOST_IP=$LOCAL_IP
9 |
10 | python3 -m vllm.entrypoints.openai.api_server \
11 | --host ${LOCAL_IP} \
12 | --port 8020 \
13 | --trust-remote-code \
14 | --model ${MODEL_PATH} \
15 | --distributed-executor-backend ray \
16 | --disable-custom-all-reduce \
17 | --gpu_memory_utilization 0.92 \
18 | --tensor-parallel-size 1 \
19 | --pipeline-parallel-size 1 \
20 | --dtype bfloat16 \
21 | --disable-log-stats \
22 | --max-num-seqs 8 \
23 | --enforce-eager \
24 | --use-v2-block-manager \
25 | 2>&1 | tee log_server.txt
--------------------------------------------------------------------------------
/train/ds_zero2_no_offload.json:
--------------------------------------------------------------------------------
1 | {
2 | "fp16": {
3 | "enabled": "auto",
4 | "loss_scale": 0,
5 | "loss_scale_window": 100,
6 | "initial_scale_power": 16,
7 | "hysteresis": 2,
8 | "min_loss_scale": 1e-10
9 | },
10 | "zero_optimization": {
11 | "stage": 2,
12 | "allgather_partitions": true,
13 | "allgather_bucket_size": 1e8,
14 | "overlap_comm": true,
15 | "reduce_scatter": true,
16 | "reduce_bucket_size": 1e8,
17 | "contiguous_gradients": true
18 | },
19 | "gradient_accumulation_steps": "auto",
20 | "gradient_clipping": "auto",
21 | "steps_per_print": 10,
22 | "train_batch_size": "auto",
23 | "train_micro_batch_size_per_gpu": "auto",
24 | "wall_clock_breakdown": false
25 | }
--------------------------------------------------------------------------------
/train/ds_zero3_no_offload.json:
--------------------------------------------------------------------------------
1 | {
2 | "fp16": {
3 | "enabled": "auto",
4 | "loss_scale": 0,
5 | "loss_scale_window": 1000,
6 | "initial_scale_power": 16,
7 | "hysteresis": 2,
8 | "min_loss_scale": 1
9 | },
10 | "bf16": {
11 | "enabled": "auto"
12 | },
13 |
14 | "zero_optimization": {
15 | "stage": 3,
16 | "offload_optimizer": {
17 | "device": "none",
18 | "pin_memory": true
19 | },
20 | "offload_param": {
21 | "device": "none",
22 | "pin_memory": true
23 | },
24 | "overlap_comm": true,
25 | "contiguous_gradients": true,
26 | "sub_group_size": 1e9,
27 | "reduce_bucket_size": "auto",
28 | "stage3_prefetch_bucket_size": "auto",
29 | "stage3_param_persistence_threshold": "auto",
30 | "stage3_max_live_parameters": 1e9,
31 | "stage3_max_reuse_distance": 1e9,
32 | "stage3_gather_16bit_weights_on_model_save": true
33 | },
34 |
35 | "gradient_accumulation_steps": "auto",
36 | "gradient_clipping": "auto",
37 | "steps_per_print": 10,
38 | "train_batch_size": "auto",
39 | "train_micro_batch_size_per_gpu": "auto",
40 | "wall_clock_breakdown": false
41 | }
--------------------------------------------------------------------------------
/train/ds_zero3_offload.json:
--------------------------------------------------------------------------------
1 | {
2 | "fp16": {
3 | "enabled": "auto",
4 | "loss_scale": 0,
5 | "loss_scale_window": 1000,
6 | "initial_scale_power": 16,
7 | "hysteresis": 2,
8 | "min_loss_scale": 1
9 | },
10 | "bf16": {
11 | "enabled": "auto"
12 | },
13 |
14 | "zero_optimization": {
15 | "stage": 3,
16 | "offload_optimizer": {
17 | "device": "cpu",
18 | "pin_memory": true
19 | },
20 | "offload_param": {
21 | "device": "cpu",
22 | "pin_memory": true
23 | },
24 | "overlap_comm": true,
25 | "contiguous_gradients": true,
26 | "sub_group_size": 1e9,
27 | "reduce_bucket_size": "auto",
28 | "stage3_prefetch_bucket_size": "auto",
29 | "stage3_param_persistence_threshold": "auto",
30 | "stage3_max_live_parameters": 1e9,
31 | "stage3_max_reuse_distance": 1e9,
32 | "stage3_gather_16bit_weights_on_model_save": true
33 | },
34 |
35 | "gradient_accumulation_steps": "auto",
36 | "gradient_clipping": "auto",
37 | "steps_per_print": 10,
38 | "train_batch_size": "auto",
39 | "train_micro_batch_size_per_gpu": "auto",
40 | "wall_clock_breakdown": false
41 | }
--------------------------------------------------------------------------------
/train/ds_zero3_offload_no_auto.json:
--------------------------------------------------------------------------------
1 | {
2 | "fp16": {
3 | "enabled": false,
4 | "loss_scale": 0,
5 | "loss_scale_window": 1000,
6 | "initial_scale_power": 16,
7 | "hysteresis": 2,
8 | "min_loss_scale": 1
9 | },
10 | "bf16": {
11 | "enabled": true
12 | },
13 |
14 | "zero_optimization": {
15 | "stage": 3,
16 | "offload_optimizer": {
17 | "device": "cpu",
18 | "pin_memory": true
19 | },
20 | "overlap_comm": true,
21 | "contiguous_gradients": true,
22 | "sub_group_size": 1e9,
23 | "reduce_bucket_size": 1e8,
24 | "stage3_prefetch_bucket_size": 1e8,
25 | "stage3_param_persistence_threshold": 0,
26 | "stage3_max_live_parameters": 1e8,
27 | "stage3_max_reuse_distance": 1e8,
28 | "stage3_gather_16bit_weights_on_model_save": true
29 | },
30 |
31 | "gradient_accumulation_steps": 1,
32 | "gradient_clipping": "auto",
33 | "steps_per_print": 10,
34 | "train_micro_batch_size_per_gpu": 1,
35 | "wall_clock_breakdown": false
36 | }
--------------------------------------------------------------------------------
/train/example_data.jsonl:
--------------------------------------------------------------------------------
1 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "能否帮我撰写一个关于环保的议论文开头部分?\n主题:环保\n立场:重要性和我们的责任"}, {"role": "assistant", "content": "在我们的生活中,环保已经成为一个不可忽视的议题。随着科技的发展和人口的增长,我们的环境正在遭受前所未有的压力。森林被砍伐,河流被污染,动植物种类正在消失,全球气候正在发生变化。这些都是我们无法忽视的事实。然而,环保并不仅仅是一个环境问题,它更是一个关乎我们生存和发展的问题。环保的重要性已经超越了任何一个单一的领域,它关乎我们的健康、经济、社会和文化。因此,我们每一个人都有责任和义务去保护我们的环境,这不仅是为了我们自己,也是为了我们的后代和整个地球。"}]}
2 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "请描述小红书APP的用户画像,并用表格的形式展示结果。"}, {"role": "assistant", "content": "| 用户特征 | 描述 |\n| -------- | ------------------------------ |\n| 年龄段 | 18-35岁 |\n| 性别 | 女性用户较多(约占70%) |\n| 地域 | 一线城市和二线城市用户占比较高 |\n| 职业 | 白领、创业者、自由职业者等 |\n| 爱好 | 美容、时尚、旅行、美食、健康、生活方式等 |\n| 消费能力 | 中高消费能力用户 |\n| 使用目的 | 阅读内容,分享购物心得,结识相似兴趣的朋友 |\n\n注:以上用户画像仅供参考,实际情况可能会有所不同。"}]}
3 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "请根据以下两个句子的描述,判断它们是否对同一情况进行了描述。句子1:“外面下着大雨,人们纷纷打起了伞。” 句子2:“由于雨量较大,人们都撑起了雨伞。”"}, {"role": "assistant", "content": "是的,这两个句子描述的是同一情况。"}]}
4 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "你认为游戏开发者应该如何处理游戏中的社交网络集成?"}, {"role": "assistant", "content": "游戏开发者在处理游戏中的社交网络集成时可以考虑以下几点:\n\n1. 以用户为中心:确保集成社交网络功能是为了提高用户体验,让玩家更容易地与朋友互动、分享成就和游戏进度。\n\n2. 保护隐私:在集成社交网络时,务必保护用户的隐私。在收集和使用用户数据时,需要遵循相关法规和用户协议,同时提供明确的隐私政策。\n\n3. 适度推送:在推送游戏动态、邀请和通知时要适度,避免过度打扰用户或侵犯用户隐私。可提供自定义设置,让玩家选择性地接收通知。\n\n4. 社交功能设计:在游戏中加入社交功能,如聊天、组队、比赛等,以增进玩家之间的互动。同时,确保这些功能易用且不影响游戏体验。\n\n5. 社区建设:鼓励玩家在社交平台上建立游戏相关的社区,分享游戏心得、攻略和经验。这将有助于增加游戏的知名度和吸引力。\n\n6. 奖励机制:设定奖励机制,鼓励玩家在社交网络上分享游戏内容、邀请好友等。但要确保奖励机制不会破坏游戏平衡。\n\n7. 持续更新和优化:根据玩家的反馈,不断优化社交网络集成功能,确保其与游戏的其他方面保持一致。\n\n总之,游戏开发者应该在保护用户隐私的前提下,提供便捷的社交网络集成功能,以提高游戏的互动性和吸引力。同时,要关注玩家的反馈,不断优化功能。"}]}
5 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "将这段过去进行时的文字改为现在完成进行时:“她们在公园里跳舞。”"}, {"role": "assistant", "content": "她们已经在公园里跳舞了。"}]}
6 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "在godaddy上买了一台服务器怎么更改root账户"}, {"role": "assistant", "content": "在GoDaddy上购买的服务器可能是Linux系统的VPS或者是专用服务器。要更改root账户,你可以通过SSH连接到服务器并修改密码。以下是具体步骤:\n\n1. 打开终端(Terminal)或使用SSH客户端(如PuTTY)。\n2. 使用SSH连接到服务器。在终端中输入以下命令:\n\n ```\n ssh root@your_server_ip_address\n ```\n\n 将`your_server_ip_address`替换为你的服务器IP地址。如果需要,你可能还需要输入服务器的SSH端口号。\n \n3. 当系统提示输入密码时,输入你的当前root密码。如果连接成功,你将看到服务器的命令提示符。\n\n4. 要更改root密码,请输入以下命令:\n\n ```\n passwd\n ```\n\n5. 系统将提示你输入新的UNIX密码。输入你想要设置的新密码并按回车键。\n\n6. 系统将要求你重新输入新密码以进行确认。再次输入新密码并按回车键。\n\n7. 如果密码更改成功,系统将显示“密码已成功更新”的消息。\n\n8. 使用`exit`命令断开与服务器的连接。\n\n现在你已经成功更改了root账户的密码。下次使用SSH连接到服务器时,请使用新的密码进行身份验证。"}]}
7 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "PHP算法逻辑:如何计算年龄?"}, {"role": "assistant", "content": "要计算年龄,您可以使用以下PHP算法。此算法考虑了生日、当前日期以及闰年的计算。\n\n```php\nfunction calculateAge($birthdate) {\n // 格式为YYYY-MM-DD的生日\n $birthdate = strtotime($birthdate);\n \n // 获取当前日期\n $now = time();\n\n // 计算年龄\n $age = date('Y', $now) - date('Y', $birthdate);\n \n // 如果还没有过生日,减去一年\n if (date('m-d', $now) < date('m-d', $birthdate)) {\n $age--;\n }\n\n return $age;\n}\n\n// 使用\n$birthdate = \"2000-01-01\";\necho \"年龄是: \" . calculateAge($birthdate) . \" 岁\";\n```\n\n这个函数将生日作为输入,输出用户的年龄。首先,它将生日转换为时间戳,然后获取当前的时间戳。接下来,它计算年份之间的差异。如果当前日期仍未超过这一年的生日,这个函数将年龄减少1。\n\n在使用函数时,只需传入出生日期即可,函数会返回相应的年龄。"}]}
8 | {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "给我解释一下什么是胰岛素抵抗? 请详细描述胰岛素抵抗的症状、原因与治疗方法。 胰岛素抵抗会影响一个人的日常生活吗? 有哪些生活习惯改变可以帮助改善胰岛素抵抗?"}, {"role": "assistant", "content": "胰岛素抵抗是指身体对胰岛素的敏感性降低,胰岛素不能有效地发挥作用,导致血糖无法正常进入身体细胞。胰岛素是一种激素,由胰腺产生,其主要作用是帮助身体细胞吸收血糖并将其转化为能量。当胰岛素抵抗发生时,胰腺会分泌更多的胰岛素以维持正常的血糖水平,但长期的胰岛素抵抗可能导致2型糖尿病和其他健康问题的发生。\n\n胰岛素抵抗的症状可能包括:\n\n1. 高血糖:由于胰岛素不能正常工作,血糖无法进入细胞,导致血糖升高。\n2. 疲劳:由于身体无法有效利用血糖产生能量,患者可能会感到疲劳无力。\n3. 肥胖:胰岛素抵抗可能导致过量的胰岛素分泌,使得脂肪存储增加,进而导致体重增加。\n4. 高血压:胰岛素抵抗与高血压之间存在关联,可能导致血压升高。\n5. 黑棘皮病:皮肤出现黑色、粗糙的皮疹,通常出现在颈部、腋窝等皮肤皱褶处。\n\n胰岛素抵抗的原因可能包括:\n\n1. 遗传因素:遗传易感基因可能会增加一个人患胰岛素抵抗的风险。\n2. 肥胖:过重或肥胖会增加患胰岛素抵抗的风险。\n3. 缺乏运动:缺乏运动可能会导致肌肉对胰岛素的敏感性降低。\n4. 高糖饮食:长期摄入高糖食品可能导致胰岛素抵抗。\n5. 睡眠不足:长期睡眠不足可能会影响胰岛素的正常功能。\n\n治疗胰岛素抵抗的方法主要包括生活方式的改变和药物治疗:\n\n1. 减轻体重:通过改善饮食和增加运动来减轻体重,有助于提高身体对胰岛素的敏感性。\n2. 增加运动:增加有氧运动和力量训练,有助于提高肌肉对胰岛素的敏感性。\n3. 改善饮食:摄入低糖、高纤维、富含全谷物和蔬菜水果的食物,有助于控制血糖和减轻体重。\n4. 药物治疗:在必要时,医生可能会开处降糖药物,如二甲双胍,以帮助提高胰岛素敏感性。\n\n胰岛素抵抗会影响一个人的日常生活,包括疲劳、体重增加、血糖波动等问题。通过改变生活习惯,如减轻体重、增加运动、改善饮食和保证充足睡眠,可以帮助改善胰岛素抵抗,降低患2型糖尿病的风险。"}]}
9 |
--------------------------------------------------------------------------------
/train/merge_lora_weight.py:
--------------------------------------------------------------------------------
1 | # 导入所需的库
2 | from transformers import AutoModelForCausalLM # 用于加载预训练的语言模型
3 | from peft import PeftModel # 用于处理LoRA权重
4 | import argparse # 用于解析命令行参数
5 | import shutil # 用于文件操作,如复制
6 | import os # 用于文件路径操作
7 | import torch # 用于深度学习操作
8 |
9 | def main():
10 | # 创建参数解析器
11 | parser = argparse.ArgumentParser()
12 | # 添加命令行参数
13 | parser.add_argument("--base_model_path", type=str, required=True,
14 | help="Path to pretrained model or model identifier from huggingface.co/models")
15 | parser.add_argument("--adapter_model_path", type=str, required=True, help="Path to adapter model")
16 | parser.add_argument("--output_path", type=str, required=True, help="Path to save the output model")
17 | parser.add_argument("--save_dtype", type=str, choices=['bf16', 'fp32', 'fp16'],
18 | default='fp32', help="In which dtype to save, fp32, bf16 or fp16.")
19 | # 解析命令行参数
20 | args = parser.parse_args()
21 |
22 | name2dtype = {'bf16': torch.bfloat16, 'fp32': torch.float32, 'fp16': torch.float16}
23 | # 加载基座模型
24 | model = AutoModelForCausalLM.from_pretrained(
25 | args.base_model_path, device_map='cpu',
26 | trust_remote_code=True, torch_dtype=name2dtype[args.save_dtype]
27 | )
28 | # 在基座模型的基础上加载 adapter 权重
29 | model = PeftModel.from_pretrained(model, args.adapter_model_path, trust_remote_code=True)
30 | # 融合模型和 adapter
31 | model = model.merge_and_unload()
32 | # 保存融合后的模型权重
33 | model.save_pretrained(args.output_path, safe_serialization=False)
34 |
35 | # 拷贝 tokenizer,config 和模型文件
36 | shutil.copy(
37 | os.path.join(args.base_model_path, 'generation_config.json'),
38 | os.path.join(args.output_path, 'generation_config.json')
39 | )
40 | shutil.copy(
41 | os.path.join(args.base_model_path, 'hy.tiktoken'),
42 | os.path.join(args.output_path, 'hy.tiktoken')
43 | )
44 | shutil.copy(
45 | os.path.join(args.base_model_path, 'tokenizer_config.json'),
46 | os.path.join(args.output_path, 'tokenizer_config.json')
47 | )
48 | shutil.copy(
49 | os.path.join(args.base_model_path, 'config.json'),
50 | os.path.join(args.output_path, 'config.json')
51 | )
52 | shutil.copy(
53 | os.path.join(args.base_model_path, 'modeling_hunyuan.py'),
54 | os.path.join(args.output_path, 'modeling_hunyuan.py')
55 | )
56 | shutil.copy(
57 | os.path.join(args.base_model_path, 'configuration_hunyuan.py'),
58 | os.path.join(args.output_path, 'configuration_hunyuan.py')
59 | )
60 | shutil.copy(
61 | os.path.join(args.base_model_path, 'tokenization_hy.py'),
62 | os.path.join(args.output_path, 'tokenization_hy.py')
63 | )
64 |
65 | print(f'Merged model weight is saved to {args.output_path}')
66 |
67 | if __name__ == "__main__":
68 | main()
69 |
--------------------------------------------------------------------------------
/train/merge_lora_weight.sh:
--------------------------------------------------------------------------------
1 | python3 merge_lora_weight.py --base_model_path /xxx/hunyuan_l_train/checkpoint-200 --adapter_model_path /xxx/runs/hunyuan_l_lora_train/checkpoint-200 --output_path /xxx/ckpts/merged_hunyuan_lora_weight --save_dtype bf16
--------------------------------------------------------------------------------
/train/requirements.txt:
--------------------------------------------------------------------------------
1 | transformers==4.41.2
2 | deepspeed==0.15.1
3 | peft
--------------------------------------------------------------------------------
/train/train.py:
--------------------------------------------------------------------------------
1 | # Copyright 2024 Tencent Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 |
15 | # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
16 | #
17 | # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
18 | # and OPT implementations in this library. It has been modified from its
19 | # original forms to accommodate minor architectural differences compared
20 | # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
21 | #
22 | # Licensed under the Apache License, Version 2.0 (the "License");
23 | # you may not use this file except in compliance with the License.
24 | # You may obtain a copy of the License at
25 | #
26 | # http://www.apache.org/licenses/LICENSE-2.0
27 | #
28 | # Unless required by applicable law or agreed to in writing, software
29 | # distributed under the License is distributed on an "AS IS" BASIS,
30 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
31 | # See the License for the specific language governing permissions and
32 | # limitations under the License.
33 |
34 |
35 | import os
36 | import sys
37 | sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
38 | import json
39 | import torch
40 | import shutil
41 | import logging
42 | from dataclasses import dataclass, field
43 | import deepspeed
44 | from typing import Optional, Dict
45 | import transformers
46 | from torch.utils.data import Dataset
47 | from transformers import Trainer, TrainerCallback
48 | from peft import LoraConfig, get_peft_model
49 | from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
50 |
51 |
52 | def print_args(args, name='arguments'):
53 | """Print arguments."""
54 | if torch.distributed.get_rank() == 0:
55 | print(f'------------------------ {name} ------------------------', flush=True)
56 | str_list = []
57 | for arg in vars(args):
58 | dots = '.' * (48 - len(arg))
59 | str_list.append(' {} {} {}'.format(arg, dots, getattr(args, arg)))
60 | for arg in sorted(str_list, key=lambda x: x.lower()):
61 | print(arg, flush=True)
62 | print(f'-------------------- end of {name} ---------------------', flush=True)
63 |
64 |
65 | @dataclass
66 | class ModelArguments:
67 | use_flash_attn: bool = field(
68 | default=False,
69 | metadata={"help": "Enable FlashAttention-2 for faster training."}
70 | )
71 | use_lora: bool = field(default=False, metadata={"help": "Enable Lora for faster training."})
72 | hidden_size: int = field(default=2048, metadata={"help": "The hidden size of the model."})
73 | num_layers: int = field(default=24, metadata={"help": "The number of layers of the model."})
74 | num_attention_heads: int = field(default=16, metadata={"help": "The number of attention heads of the model."})
75 | intermediate_size: int = field(default=8192, metadata={"help": "The intermediate size of the model."})
76 | max_position_embeddings: int = field(
77 | default=2048,
78 | metadata={"help": "The maximum sequence length that this model might ever be used with."}
79 | )
80 | vocab_size: int = field(default=50257, metadata={"help": "The vocabulary size of the model."})
81 | type_vocab_size: int = field(default=1, metadata={"help": "The vocabulary size of the model."})
82 | layer_norm_eps: float = field(
83 | default=1e-5,
84 | metadata={"help": "The epsilon used by the layer normalization layers of the model."}
85 | )
86 | moe_topk: int = field(default=4, metadata={"help": "The topk for MOE."})
87 | num_experts: int = field(default=8, metadata={"help": "The number of experts for MOE."})
88 | num_key_value_heads: int = field(default=16, metadata={"help": "The number of key-value heads in GQA."})
89 | use_cla: bool = field(default=False, metadata={"help": "Whether to use CLA."})
90 | cla_share_factor: int = field(default=2, metadata={"help": "The share factor for CLA."})
91 | use_mixed_mlp_moe: bool = field(
92 | default=False,
93 | metadata={"help": "Whether to use mixed MoE with shared expert."}
94 | )
95 | num_shared_expert: int = field(default=1, metadata={"help": "Number of shared experts."})
96 | use_qk_norm: bool = field(default=False, metadata={"help": "Whether to use qk norm."})
97 | tie_word_embeddings: bool = field(
98 | default=True,
99 | metadata={"help": "Whether to tie the word embeddings of the encoder and the decoder."}
100 | )
101 | lora_rank: int = field(default=64, metadata={"help": "The rank of lora."})
102 | lora_alpha: int = field(default=8, metadata={"help": "Lora alpha"})
103 | lora_dropout: float = field(default=0.0, metadata={"help": "Lora dropout"})
104 | train_attention_params_only: bool = field(default=False, metadata={
105 | "help": "Whether to train attention parameters only."}
106 | )
107 |
108 |
109 | @dataclass
110 | class DataArguments:
111 | train_data_file: str = field(default=None, metadata={"help": "Path to the training data."})
112 | max_seq_length: int = field(
113 | default=2048,
114 | metadata={"help": "The max sequence length of the model inputs after tokenization."}
115 | )
116 | complex_data: Optional[str] = field(default=None)
117 | use_dummy_data: bool = field(default=False, metadata={"help": "Use dummy data."})
118 |
119 |
120 | @dataclass
121 | class TrainingArguments(transformers.TrainingArguments):
122 | cache_dir: Optional[str] = field(default=None)
123 | optim: str = field(default="adamw_torch")
124 | model_max_length: int = field(
125 | default=2048,
126 | metadata={"help": "Maximum sequence length. Sequences will be right padded (and possibly truncated)."},
127 | )
128 | tokenizer_name_or_path: Optional[str] = field(default=None)
129 | model_name_or_path: Optional[str] = field(default=None)
130 | make_moe_param_leaf_module: bool = field(
131 | default=False,
132 | metadata={"help": "Make MoE parameters zero-3 leaf module."}
133 | )
134 | min_lr: float = field(
135 | default=0.01,
136 | metadata={"help": "The final learning rate at the end of the decay will be learning_rate * min_lr"}
137 | )
138 |
139 |
140 | IGNORE_INDEX = -100
141 |
142 |
143 | class DummyDataset(Dataset):
144 | def __init__(self, tokenizer, max_seq_length=512, length=1000):
145 | self.tokenizer = tokenizer
146 | self.max_seq_length = max_seq_length
147 | self.length = length
148 |
149 | def __len__(self):
150 | return self.length
151 |
152 | def __getitem__(self, index):
153 | tokens = torch.randint(0, self.tokenizer.vocab_size, (self.max_seq_length, ))
154 | return {'input_ids': tokens, 'labels': tokens}
155 |
156 |
157 | class SFTDataset(Dataset):
158 | def __init__(self, data_file, tokenizer, max_seq_length = 2048, prompt_format = 'mplus'):
159 | self.tokenizer = tokenizer
160 | self.prompt_format = prompt_format
161 | self.max_seq_length = max_seq_length
162 |
163 | self.data_list = self.load_data(data_file)
164 |
165 | def __len__(self):
166 | return len(self.data_list)
167 |
168 | def load_data(self, data_file):
169 | logging.info('Loading data: {}'.format(data_file))
170 | with open(data_file, 'r', encoding='utf8') as f:
171 | data_list = f.readlines()
172 | logging.info("there are {} data in dataset".format(len(data_list)))
173 | return data_list
174 |
175 | def encode_data(self, data_dict):
176 | model_inputs = {}
177 | message_tokens = torch.tensor(self.tokenizer.apply_chat_template(data_dict['messages']))
178 | extra_0_token_id = self.tokenizer.convert_tokens_to_ids('<|extra_0|>')
179 | eos_token_id = self.tokenizer.convert_tokens_to_ids('<|eos|>')
180 | loss_token_begins = (message_tokens == extra_0_token_id).nonzero(as_tuple=True)[0].tolist()
181 | loss_token_ends = (message_tokens == eos_token_id).nonzero(as_tuple=True)[0].tolist()
182 | message_labels = torch.tensor([IGNORE_INDEX] * message_tokens.shape[0])
183 | for begin_idx, end_idx in zip(loss_token_begins, loss_token_ends):
184 | message_labels[begin_idx:end_idx + 1] = message_tokens[begin_idx:end_idx + 1]
185 | input_ids = message_tokens.to(torch.long)
186 | labels = message_labels.to(torch.long)
187 |
188 | input_ids = input_ids[:self.max_seq_length]
189 | labels = labels[:self.max_seq_length]
190 | attention_mask = [1 if val != self.tokenizer.pad_id else 0 for val in input_ids]
191 | model_inputs["input_ids"] = input_ids
192 | model_inputs["attention_mask"] = torch.tensor(attention_mask, dtype=torch.bool)
193 | model_inputs["labels"] = labels
194 |
195 | return model_inputs
196 |
197 | def __getitem__(self, index):
198 | data = self.data_list[index]
199 | data = json.loads(data)
200 | model_inputs = self.encode_data(data)
201 |
202 | return model_inputs
203 |
204 |
205 | @dataclass
206 | class DataCollatorForSupervisedDataset(object):
207 | """Collate examples for supervised fine-tuning."""
208 |
209 | tokenizer: transformers.PreTrainedTokenizer
210 |
211 | def __call__(self, instances):
212 | input_ids = [instance['input_ids'] for instance in instances]
213 | labels = [instance['labels'] for instance in instances]
214 | input_ids = torch.nn.utils.rnn.pad_sequence(input_ids, batch_first=True, padding_value=self.tokenizer.pad_id)
215 | labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=IGNORE_INDEX)
216 | return dict(
217 | input_ids=input_ids,
218 | labels=labels,
219 | attention_mask=input_ids.ne(self.tokenizer.pad_id),
220 | )
221 |
222 |
223 | def make_supervised_data_module(tokenizer, data_args) -> Dict:
224 | """Make dataset and collator for supervised fine-tuning."""
225 | if data_args.use_dummy_data:
226 | train_dataset = DummyDataset(tokenizer, data_args.max_seq_length)
227 | else:
228 | train_dataset = SFTDataset(
229 | tokenizer=tokenizer,
230 | data_file=data_args.train_data_file,
231 | max_seq_length=data_args.max_seq_length
232 | )
233 | data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
234 | return dict(train_dataset=train_dataset, eval_dataset=None, data_collator=data_collator)
235 |
236 |
237 | # full 模型训练时,需要修改 config.json 以及拷贝模型与配置文件支持 Auto load
238 | class CustomSaveCallback(TrainerCallback):
239 | def on_save(self, args, state, control, **kwargs):
240 | if torch.distributed.get_rank() == 0:
241 | output_dir = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
242 |
243 | # 拷贝tokenizer, 模型和配置文件
244 | model_path = os.path.join(args.model_name_or_path, 'modeling_hunyuan.py')
245 | config_path = os.path.join(args.model_name_or_path, 'configuration_hunyuan.py')
246 | shutil.copy(model_path, os.path.join(output_dir, 'modeling_hunyuan.py'))
247 | shutil.copy(config_path, os.path.join(output_dir, 'configuration_hunyuan.py'))
248 | shutil.copy(
249 | os.path.join(args.tokenizer_name_or_path, 'generation_config.json'),
250 | os.path.join(output_dir, 'generation_config.json')
251 | )
252 | shutil.copy(
253 | os.path.join(args.tokenizer_name_or_path, 'hy.tiktoken'),
254 | os.path.join(output_dir, 'hy.tiktoken')
255 | )
256 | shutil.copy(
257 | os.path.join(args.tokenizer_name_or_path, 'tokenizer_config.json'),
258 | os.path.join(output_dir, 'tokenizer_config.json')
259 | )
260 | shutil.copy(
261 | os.path.join(args.tokenizer_name_or_path, 'tokenization_hy.py'),
262 | os.path.join(output_dir, 'tokenization_hy.py')
263 | )
264 |
265 | # 修改 config.json,增加 auto_map
266 | if os.path.exists(os.path.join(output_dir, "config.json")):
267 | config = json.load(open(os.path.join(output_dir, "config.json"), 'r'))
268 | config['auto_map'] = {
269 | "AutoConfig": "configuration_hunyuan.HunYuanConfig",
270 | "AutoModel": "modeling_hunyuan.HunyuanModel",
271 | "AutoModelForCausalLM": "modeling_hunyuan.HunYuanForCausalLM"
272 | }
273 | json.dump(config, open(os.path.join(output_dir, "config.json"), 'w'), indent=2)
274 |
275 | return control
276 |
277 |
278 | def train():
279 | parser = transformers.HfArgumentParser((ModelArguments, DataArguments, TrainingArguments))
280 | model_args, data_args, training_args = parser.parse_args_into_dataclasses()
281 | print_args(model_args, 'model arguments')
282 | print_args(data_args, 'data arguments')
283 | print_args(training_args, 'training arguments')
284 |
285 | tokenizer = transformers.AutoTokenizer.from_pretrained(
286 | training_args.tokenizer_name_or_path,
287 | trust_remote_code = True
288 | )
289 |
290 | init_kwargs = {}
291 | if model_args.use_flash_attn:
292 | init_kwargs["attn_implementation"] = "flash_attention_2"
293 | if training_args.bf16:
294 | init_kwargs["torch_dtype"] = torch.bfloat16
295 | elif training_args.fp16:
296 | init_kwargs["torch_dtype"] = torch.float16
297 |
298 | if training_args.model_name_or_path is not None and os.path.exists(training_args.model_name_or_path):
299 | print(f"Initializing model from local file: {training_args.model_name_or_path}")
300 | model = transformers.AutoModelForCausalLM.from_pretrained(
301 | training_args.model_name_or_path,
302 | trust_remote_code=True,
303 | **init_kwargs
304 | )
305 | else:
306 | from models.modeling_hunyuan import HunYuanForCausalLM, HunYuanMoE
307 | from models.configuration_hunyuan import HunYuanConfig
308 | print(f"Model name or path does not exist: {training_args.model_name_or_path}, \
309 | use random initialized model instead.")
310 | # 定义模型
311 | config = HunYuanConfig(
312 | vocab_size=tokenizer.vocab_size, # 词表大小
313 | hidden_size=model_args.hidden_size, # 隐藏层大小
314 | intermediate_size=model_args.intermediate_size, # FFN 层大小
315 | max_position_embeddings=training_args.model_max_length, # 最大序列长度
316 | moe_topk=model_args.moe_topk, # topk
317 | num_experts=model_args.num_experts, # expert 数量
318 | num_attention_heads=model_args.num_attention_heads, # 多头注意力头数
319 | num_key_value_heads=model_args.num_key_value_heads, # GQA 时的 key value 头数
320 | num_hidden_layers=model_args.num_layers, # Transformer 层数
321 | cla_share_factor=model_args.cla_share_factor, # CLA 因子
322 | use_cla=model_args.use_cla,
323 | use_mixed_mlp_moe=model_args.use_mixed_mlp_moe,
324 | num_shared_expert=model_args.num_shared_expert,
325 | use_qk_norm=model_args.use_qk_norm,
326 | model_type='hunyuan',
327 | tie_word_embeddings=model_args.tie_word_embeddings,
328 | **init_kwargs
329 | )
330 | with deepspeed.zero.Init(dtype=init_kwargs["torch_dtype"], config_dict_or_path=training_args.deepspeed):
331 | model = HunYuanForCausalLM(config)
332 |
333 | if model_args.train_attention_params_only:
334 | for name, param in model.named_parameters():
335 | if 'self_attn' not in name:
336 | param.requires_grad = False
337 |
338 | if model_args.use_lora:
339 | # 定义 Lora 配置
340 | lora_config = LoraConfig(
341 | r=model_args.lora_rank,
342 | lora_alpha=model_args.lora_alpha,
343 | lora_dropout=model_args.lora_dropout,
344 | target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
345 | bias="none",
346 | task_type="CAUSAL_LM",
347 | )
348 | model = get_peft_model(model, lora_config)
349 |
350 | # 用 zero3 的时候不切分 MoE 参数
351 | if model_args.num_experts > 0 \
352 | and training_args.make_moe_param_leaf_module and \
353 | training_args.deepspeed_plugin.zero_stage == 3:
354 | from deepspeed.utils import set_z3_leaf_modules
355 | set_z3_leaf_modules(model, [HunYuanMoE])
356 |
357 | data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
358 | # Tell Trainer not to attempt DataParallel
359 | model.is_parallelizable = True
360 | model.model_parallel = True
361 |
362 | training_args.lr_scheduler_kwargs = {
363 | 'min_lr': training_args.min_lr,
364 | }
365 |
366 | trainer = Trainer(
367 | model=model,
368 | tokenizer=tokenizer,
369 | args=training_args,
370 | callbacks=[CustomSaveCallback],
371 | **data_module
372 | )
373 | model.config.use_cache = False
374 |
375 | trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
376 |
377 |
378 | if __name__ == "__main__":
379 | train()
380 |
--------------------------------------------------------------------------------
/train/train.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | export HOST_GPU_NUM=8
4 | # 当前机器ip
5 | export LOCAL_IP=${ip1}
6 | # 多节点机器ip,逗号隔开
7 | export NODE_IP_LIST="${ip1}:8,${ip2}:8"
8 | # 机器节点个数
9 | export NODES=2
10 | export NODE_NUM=$((${NODES} * ${HOST_GPU_NUM}))
11 |
12 | export NCCL_DEBUG=WARN
13 |
14 | model_path=path_to_model_weight
15 | tokenizer_path=../models
16 | train_data_file=example_data.jsonl
17 |
18 | # ds_config_file=ds_zero2_no_offload.json
19 | # ds_config_file=ds_zero3_no_offload.json
20 | ds_config_file=ds_zero3_offload_no_auto.json
21 |
22 | output_path=/root/hf_train_output
23 |
24 | mkdir -p ${output_path}
25 |
26 | current_time=$(date "+%Y.%m.%d-%H.%M.%S")
27 | log_file=${output_path}/"log_${current_time}.txt"
28 |
29 | echo $NODE_IP_LIST > env.txt 2>&1 &
30 | sed "s/:/ slots=/g" env.txt | sed "s/,/\n/g" > "hostfile"
31 | sed "s/:.//g" env.txt | sed "s/,/\n/g" > "pssh.hosts"
32 | export CHIEF_IP=$LOCAL_IP
33 |
34 | HOST_PATH=hostfile
35 | # HOST_PATH=none
36 |
37 | deepspeed --hostfile=$HOST_PATH --master_addr $CHIEF_IP train.py \
38 | --do_train \
39 | --model_name_or_path ${model_path} \
40 | --tokenizer_name_or_path ${tokenizer_path} \
41 | --train_data_file ${train_data_file} \
42 | --deepspeed ${ds_config_file} \
43 | --output_dir ${output_path} \
44 | --overwrite_output_dir \
45 | --per_device_train_batch_size 1 \
46 | --gradient_accumulation_steps 1 \
47 | --gradient_checkpointing \
48 | --lr_scheduler_type cosine_with_min_lr \
49 | --logging_steps 1 \
50 | --max_steps 200 \
51 | --save_steps 100 \
52 | --learning_rate 1e-5 \
53 | --min_lr 1e-6 \
54 | --warmup_ratio 0.01 \
55 | --save_strategy steps \
56 | --save_safetensors False \
57 | --use_lora \
58 | --lora_rank 64 \
59 | --lora_alpha 128 \
60 | --lora_dropout 0.1 \
61 | --hidden_size 6400 \
62 | --intermediate_size 18304 \
63 | --model_max_length 4096 \
64 | --max_seq_length 4096 \
65 | --moe_topk 1 \
66 | --num_experts 2 \
67 | --num_attention_heads 80 \
68 | --num_key_value_heads 8 \
69 | --num_layers 4 \
70 | --cla_share_factor 2 \
71 | --use_cla \
72 | --use_mixed_mlp_moe \
73 | --num_shared_expert 1 \
74 | --use_qk_norm \
75 | --bf16 | tee ${log_file}
76 |
--------------------------------------------------------------------------------