├── .nojekyll ├── Dockerfile ├── update.sh ├── 404.html ├── docs ├── img │ ├── 3cbd29e85d45383c45dd987a7e719538.jpg │ └── 5bd0711d72136eaddc9ce8383206b925.jpg ├── 11.md ├── 13.md ├── 9.md ├── 7.md ├── 12.md ├── 3.md ├── 10.md ├── 5.md ├── 2.md ├── 4.md ├── 1.md ├── 6.md └── 8.md ├── SUMMARY.md ├── asset ├── docsify-baidu-push.js ├── docsify-cnzz.js ├── docsify-baidu-stat.js ├── docsify-apachecn-footer.js ├── style.css ├── docsify-copy-code.min.js ├── prism-darcula.css ├── search.min.js ├── docsify-clicker.js └── vue.css ├── index.html ├── .gitignore ├── README.md └── LICENSE /.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM httpd:2.4 2 | COPY ./ /usr/local/apache2/htdocs/ -------------------------------------------------------------------------------- /update.sh: -------------------------------------------------------------------------------- 1 | git add -A 2 | git commit -am "$(date "+%Y-%m-%d %H:%M:%S")" 3 | git push -------------------------------------------------------------------------------- /404.html: -------------------------------------------------------------------------------- 1 | --- 2 | permalink: /404.html 3 | --- 4 | 5 | -------------------------------------------------------------------------------- /docs/img/3cbd29e85d45383c45dd987a7e719538.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apachecn/lightgbm-doc-zh/HEAD/docs/img/3cbd29e85d45383c45dd987a7e719538.jpg -------------------------------------------------------------------------------- /docs/img/5bd0711d72136eaddc9ce8383206b925.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apachecn/lightgbm-doc-zh/HEAD/docs/img/5bd0711d72136eaddc9ce8383206b925.jpg -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | + [安装指南](docs/1.md) 2 | + [快速入门指南](docs/2.md) 3 | + [Python 包的相关介绍](docs/3.md) 4 | + [特性](docs/4.md) 5 | + [实验](docs/5.md) 6 | + [参数](docs/6.md) 7 | + [参数优化](docs/7.md) 8 | + [Python API](docs/8.md) 9 | + [并行学习指南](docs/9.md) 10 | + [LightGBM GPU 教程](docs/10.md) 11 | + [进阶主题](docs/11.md) 12 | + [LightGBM FAQ 常见问题解答](docs/12.md) 13 | + [Development Guide](docs/13.md) -------------------------------------------------------------------------------- /asset/docsify-baidu-push.js: -------------------------------------------------------------------------------- 1 | (function(){ 2 | var plugin = function(hook) { 3 | hook.doneEach(function() { 4 | new Image().src = 5 | '//api.share.baidu.com/s.gif?r=' + 6 | encodeURIComponent(document.referrer) + 7 | "&l=" + encodeURIComponent(location.href) 8 | }) 9 | } 10 | var plugins = window.$docsify.plugins || [] 11 | plugins.push(plugin) 12 | window.$docsify.plugins = plugins 13 | })() -------------------------------------------------------------------------------- /asset/docsify-cnzz.js: -------------------------------------------------------------------------------- 1 | (function(){ 2 | var plugin = function(hook) { 3 | hook.doneEach(function() { 4 | var sc = document.createElement('script') 5 | sc.src = 'https://s5.cnzz.com/z_stat.php?id=' + 6 | window.$docsify.cnzzId + '&online=1&show=line' 7 | document.querySelector('article').appendChild(sc) 8 | }) 9 | } 10 | var plugins = window.$docsify.plugins || [] 11 | plugins.push(plugin) 12 | window.$docsify.plugins = plugins 13 | })() -------------------------------------------------------------------------------- /asset/docsify-baidu-stat.js: -------------------------------------------------------------------------------- 1 | (function(){ 2 | var plugin = function(hook) { 3 | hook.doneEach(function() { 4 | window._hmt = window._hmt || [] 5 | var hm = document.createElement("script") 6 | hm.src = "https://hm.baidu.com/hm.js?" + window.$docsify.bdStatId 7 | document.querySelector("article").appendChild(hm) 8 | }) 9 | } 10 | var plugins = window.$docsify.plugins || [] 11 | plugins.push(plugin) 12 | window.$docsify.plugins = plugins 13 | })() -------------------------------------------------------------------------------- /docs/11.md: -------------------------------------------------------------------------------- 1 | # 进阶主题 2 | 3 | ## 缺失值的处理 4 | 5 | * LightGBM 通过默认的方式来处理缺失值，你可以通过设置 `use_missing=false` 来使其无效。 6 | * LightGBM 通过默认的的方式用 NA (NaN) 去表示缺失值，你可以通过设置 `zero_as_missing=true` 将其变为零。 7 | * 当设置 `zero_as_missing=false` （默认）时，在稀疏矩阵里 (和LightSVM) ，没有显示的值视为零。 8 | * 当设置 `zero_as_missing=true` 时， NA 和 0 （包括在稀疏矩阵里，没有显示的值）视为缺失。 9 | 10 | ## 分类特征的支持 11 | 12 | * 当使用本地分类特征，LightGBM 能提供良好的精确度。不像简单的 one-hot 编码，LightGBM 可以找到分类特征的最优分割。相对于 one-hot 编码结果，LightGBM 可以提供更加准确的最优分割。 13 | * 用 `categorical_feature` 指定分类特征参考 [Parameters](./Parameters.rst) 的参数 `categorical_feature` 14 | * 首先需要转换为 int 类型，并且只支持非负数。将其转换为连续范围更好。 15 | * 使用 `min_data_per_group`, `cat_smooth` 去处理过拟合（当 `#data` 比较小，或者 `#category` 比较大） 16 | * 对于具有高基数的分类特征(`#category` 比较大), 最好把它转化为数字特征。 17 | 18 | ## LambdaRank 19 | 20 | * 标签应该是 int 类型，较大的数字代表更高的相关性（例如：0：坏，1：公平，2：好，3：完美）。 21 | * 使用 `label_gain` 设置增益（重量）的 `int` 标签。 22 | * 使用 `max_position` 设置 NDCG 优化位置。 23 | 24 | ## 参数优化 25 | 26 | * 参考 [参数优化](./Parameters-Tuning.rst) . 27 | 28 | ## 并行学习 29 | 30 | * 参考 [并行学习指南](./Parallel-Learning-Guide.rst) . 31 | 32 | ## GPU 的支持 33 | 34 | * 参考 [GPU 教程](./GPU-Tutorial.rst) 和 [GPU Targets](./GPU-Targets.rst) 35 | 36 | ## GCC 用户的建议 (MinGW, *nix) 37 | 38 | * 参考 [gcc 建议](./gcc-Tips.rst). -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |

now loading...

21 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | .DS_Store 103 | 104 | # gitbook 105 | _book 106 | 107 | # node.js 108 | node_modules 109 | 110 | # windows 111 | Thumbs.db 112 | 113 | # word 114 | ~$*.docx 115 | ~$*.doc 116 | 117 | # custom 118 | docs/README.md 119 | -------------------------------------------------------------------------------- /asset/docsify-apachecn-footer.js: -------------------------------------------------------------------------------- 1 | (function(){ 2 | var footer = [ 3 | '

', 4 | '

', 5 | '

我们一直在努力

', 6 | '

apachecn/lightgbm-doc-zh

', 7 | '

', 8 | ' ', 9 | ' ', 10 | '

', 11 | '

', 12 | '

', 13 | ' ', 17 | '

', 18 | '

' 19 | ].join('\n') 20 | var plugin = function(hook) { 21 | hook.afterEach(function(html) { 22 | return html + footer 23 | }) 24 | hook.doneEach(function() { 25 | (adsbygoogle = window.adsbygoogle || []).push({}) 26 | }) 27 | } 28 | var plugins = window.$docsify.plugins || [] 29 | plugins.push(plugin) 30 | window.$docsify.plugins = plugins 31 | })() -------------------------------------------------------------------------------- /docs/13.md: -------------------------------------------------------------------------------- 1 | # Development Guide 2 | 3 | ## Algorithms 4 | 5 | Refer to [Features](./Features.rst) to understand important algorithms used in LightGBM. 6 | 7 | ## Classes and Code Structure 8 | 9 | ### Important Classes 10 | 11 | | Class | Description | 12 | | --- | --- | 13 | | `Application` | The entrance of application, including training and prediction logic | 14 | | `Bin` | Data structure used for store feature discrete values(converted from float values) | 15 | | `Boosting` | Boosting interface, current implementation is GBDT and DART | 16 | | `Config` | Store parameters and configurations | 17 | | `Dataset` | Store information of dataset | 18 | | `DatasetLoader` | Used to construct dataset | 19 | | `Feature` | Store One column feature | 20 | | `Metric` | Evaluation metrics | 21 | | `Network` | Network interfaces and communication algorithms | 22 | | `ObjectiveFunction` | Objective function used to train | 23 | | `Tree` | Store information of tree model | 24 | | `TreeLearner` | Used to learn trees | 25 | 26 | ### Code Structure 27 | 28 | | Path | Description | 29 | | --- | --- | 30 | | ./include | Header files | 31 | | ./include/utils | Some common functions | 32 | | ./src/application | Implementations of training and prediction logic | 33 | | ./src/boosting | Implementations of Boosting | 34 | | ./src/io | Implementations of IO relatived classes, including `Bin`, `Config`, `Dataset`, `DatasetLoader`, `Feature` and `Tree` | 35 | | ./src/metric | Implementations of metrics | 36 | | ./src/network | Implementations of network functions | 37 | | ./src/objective | Implementations of objective functions | 38 | | ./src/treelearner | Implementations of tree learners | 39 | 40 | ### Documents API 41 | 42 | Refer to [docs README](./README.rst). 43 | 44 | ## C API 45 | 46 | Refere to the comments in [c_api.h](https://github.com/Microsoft/LightGBM/blob/master/include/LightGBM/c_api.h). 47 | 48 | ## High Level Language Package 49 | 50 | See the implementations at [Python-package](https://github.com/Microsoft/LightGBM/tree/master/python-package) and [R-package](https://github.com/Microsoft/LightGBM/tree/master/R-package). 51 | 52 | ## Questions 53 | 54 | Refer to [FAQ](./FAQ.rst). 55 | 56 | Also feel free to open [issues](https://github.com/Microsoft/LightGBM/issues) if you met problems. -------------------------------------------------------------------------------- /asset/style.css: -------------------------------------------------------------------------------- 1 | /*隐藏头部的目录*/ 2 | #main>ul:nth-child(1) { 3 | display: none; 4 | } 5 | 6 | #main>ul:nth-child(2) { 7 | display: none; 8 | } 9 | 10 | .markdown-section h1 { 11 | margin: 3rem 0 2rem 0; 12 | } 13 | 14 | .markdown-section h2 { 15 | margin: 2rem 0 1rem; 16 | } 17 | 18 | img, 19 | pre { 20 | border-radius: 8px; 21 | } 22 | 23 | .content, 24 | .sidebar, 25 | .markdown-section, 26 | body, 27 | .search input { 28 | background-color: rgba(243, 242, 238, 1) !important; 29 | } 30 | 31 | @media (min-width:600px) { 32 | .sidebar-toggle { 33 | background-color: #f3f2ee; 34 | } 35 | } 36 | 37 | .docsify-copy-code-button { 38 | background: #f8f8f8 !important; 39 | color: #7a7a7a !important; 40 | } 41 | 42 | body { 43 | /*font-family: Microsoft YaHei, Source Sans Pro, Helvetica Neue, Arial, sans-serif !important;*/ 44 | } 45 | 46 | .markdown-section>p { 47 | font-size: 16px !important; 48 | } 49 | 50 | .markdown-section pre>code { 51 | font-family: Consolas, Roboto Mono, Monaco, courier, monospace !important; 52 | font-size: .9rem !important; 53 | 54 | } 55 | 56 | /*.anchor span { 57 | color: rgb(66, 185, 131); 58 | }*/ 59 | 60 | section.cover h1 { 61 | margin: 0; 62 | } 63 | 64 | body>section>div.cover-main>ul>li>a { 65 | color: #42b983; 66 | } 67 | 68 | .markdown-section img { 69 | box-shadow: 7px 9px 10px #aaa !important; 70 | } 71 | 72 | 73 | pre { 74 | background-color: #f3f2ee !important; 75 | } 76 | 77 | @media (min-width:600px) { 78 | pre code { 79 | /*box-shadow: 2px 1px 20px 2px #aaa;*/ 80 | /*border-radius: 10px !important;*/ 81 | padding-left: 20px !important; 82 | } 83 | } 84 | 85 | @media (max-width:600px) { 86 | pre { 87 | padding-left: 0px !important; 88 | padding-right: 0px !important; 89 | } 90 | } 91 | 92 | .markdown-section pre { 93 | padding-left: 0 !important; 94 | padding-right: 0px !important; 95 | box-shadow: 2px 1px 20px 2px #aaa; 96 | } -------------------------------------------------------------------------------- /docs/9.md: -------------------------------------------------------------------------------- 1 | # 并行学习指南 2 | 3 | 这是一篇 LightGBM 的并行学习教程. 4 | 5 | 点击 [快速入门](./Quick-Start.rst) 来学习怎样使用 LightGBM. 6 | 7 | ## 选择合适的并行算法 8 | 9 | LightGBM 现已提供了以下并行学习算法. 10 | 11 | | **Parallel Algorithm** | **How to Use** | 12 | | --- | --- | 13 | | Data parallel | `tree_learner=data` | 14 | | Feature parallel | `tree_learner=feature` | 15 | | Voting parallel | `tree_learner=voting` | 16 | 17 | 这些算法适用于不同场景,如下表所示: 18 | 19 | | | **#data is small** | **#data is large** | 20 | | --- | --- | --- | 21 | | **#feature is small** | Feature Parallel | Data Parallel | 22 | | **#feature is large** | Feature Parallel | Voting Parallel | 23 | 24 | 在 [optimization in parallel learning](./Features.rst#optimization-in-parallel-learning) 了解更多并行算法的细节. 25 | 26 | ## 构建并行版本 27 | 28 | 默认的并行版本支持基于 socket 的并行学习. 29 | 30 | 如果你想构建基于 MPI 的并行版本, 请参考 [安装指南](./Installation-Guide.rst#build-mpi-version). 31 | 32 | ## 准备工作 33 | 34 | ### Socket 版本 35 | 36 | 它需要收集所有想要运行并行学习的机器的所有 IP 并且指定一个 TCP 端口号 (假设是 12345 ) , 更改防火墙使得这个端口可以被访问 (12345). 然后把这些 IP 和端口写入一个文件中 (假设是 `mlist.txt`), 如下所示: 37 | 38 | ``` 39 | machine1_ip 12345 40 | machine2_ip 12345 41 | 42 | ``` 43 | 44 | ### MPI 版本 45 | 46 | 它需要收集所有想要运行并行学习机器的 IP (或 hostname) . 然后把这些IP写入一个文件中 (例如 `mlist.txt`) 如下所示: 47 | 48 | ``` 49 | machine1_ip 50 | machine2_ip 51 | 52 | ``` 53 | 54 | **Note**: 对于 windows 用户, 需要安装 “smpd” 来开启 MPI 服务. 更多细节点击 [here](https://blogs.technet.microsoft.com/windowshpc/2015/02/02/how-to-compile-and-run-a-simple-ms-mpi-program/). 55 | 56 | ## 运行并行学习 57 | 58 | ### Socket 版本 59 | 60 | 1. 在配置文件中编辑以下参数: 61 | 62 | `tree_learner=your_parallel_algorithm`, 在这里编辑 `your_parallel_algorithm` (e.g. feature/data) . 63 | 64 | `num_machines=your_num_machines`, 在这里编辑 `your_num_machines` (e.g. 4) . 65 | 66 | `machine_list_file=mlist.txt`, `mlist.txt` 在 [准备工作](#preparation) 生成. 67 | 68 | `local_listen_port=12345`, `12345` 在 [准备工作](#preparation) 分配. 69 | 70 | 2. 拷贝数据文件, 可执行文件, 配置文件和 `mlist.txt` 到所有机器. 71 | 72 | 3. 在所有机器上运行以下命令, 你需要更改 `your_config_file` 为真实配置文件. 73 | 74 | Windows: `lightgbm.exe config=your_config_file` 75 | 76 | Linux: `./lightgbm config=your_config_file` 77 | 78 | ### MPI 版本 79 | 80 | 1. 在配置中编辑以下参数: 81 | 82 | `tree_learner=your_parallel_algorithm`, 在这里编辑 `your_parallel_algorithm` (e.g. feature/data) . 83 | 84 | `num_machines=your_num_machines`, 在这里编辑 `your_num_machines` (e.g. 4) . 85 | 86 | 2. 拷贝数据文件, 可执行文件, 配置文件和 `mlist.txt` 到所有机器. 87 | 88 | **Note**: MPI 需要运行在 **所有机器的相同路径上**. 89 | 90 | 3. 在机器上运行以下命令 (不需要运行所有机器), 需要更改 `your_config_file` 为真实的配置文件. 91 | 92 | Windows: 93 | 94 | ``` 95 | mpiexec.exe /machinefile mlist.txt lightgbm.exe config=your_config_file 96 | 97 | ``` 98 | 99 | Linux: 100 | 101 | ``` 102 | mpiexec --machinefile mlist.txt ./lightgbm config=your_config_file 103 | 104 | ``` 105 | 106 | ### 例子 107 | 108 | * [A simple parallel example](https://github.com/Microsoft/lightgbm/tree/master/examples/parallel_learning) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LightGBM 中文文档 2 | 3 | LightGBM 是一个梯度 boosting 框架, 使用基于学习算法的决策树. 4 | 它是分布式的, 高效的, 装逼的, 它具有以下优势: 5 | * 速度和内存使用的优化 6 | * 减少分割增益的计算量 7 | * 通过直方图的相减来进行进一步的加速 8 | * 减少内存的使用 9 | 减少并行学习的通信代价 10 | * 稀疏优化 11 | * 准确率的优化 12 | * Leaf-wise (Best-first) 的决策树生长策略 13 | * 类别特征值的最优分割 14 | * 网络通信的优化 15 | * 并行学习的优化 16 | * 特征并行 17 | * 数据并行 18 | * 投票并行 19 | * GPU 支持可处理大规模数据 20 | 21 | 更多有关 LightGBM 特性的详情, 请参阅: [LightGBM 特性](). 22 | 23 | ## 文档地址 24 | 25 | + [在线阅读](http://lightgbm.apachecn.org) 26 | + [在线阅读（Gitee）](https://apachecn.gitee.io/lightgbm-doc-zh/) 27 | 28 | ## 项目负责人 29 | 30 | * [@那伊抹微笑](https://github.com/wangyangting) 31 | 32 | ## 项目贡献者 33 | 34 | * [@那伊抹微笑](https://github.com/apachecn/lightgbm-doc-zh) 35 | * [@陈瑶](https://github.com/apachecn/lightgbm-doc-zh) 36 | * [@胡世昌](https://github.com/apachecn/lightgbm-doc-zh) 37 | * [@王金树](https://github.com/apachecn/lightgbm-doc-zh) 38 | * [@谢家柯](https://github.com/apachecn/lightgbm-doc-zh) 39 | * [@方振影](https://github.com/apachecn/lightgbm-doc-zh) 40 | * [@臧艺](https://github.com/apachecn/lightgbm-doc-zh) 41 | * [@冯斐](https://github.com/apachecn/lightgbm-doc-zh) 42 | * [@黄志浩](https://github.com/apachecn/lightgbm-doc-zh) 43 | * [@刘陆琛](https://github.com/apachecn/lightgbm-doc-zh) 44 | * [@周立刚](https://github.com/apachecn/lightgbm-doc-zh) 45 | * [@陈洪](https://github.com/apachecn/lightgbm-doc-zh) 46 | * [@孙永杰](https://github.com/apachecn/lightgbm-doc-zh) 47 | * [@王贤才](https://github.com/apachecn/lightgbm-doc-zh) 48 | 49 | ## 下载 50 | 51 | ### Docker 52 | 53 | ``` 54 | docker pull apachecn0/lightgbm-doc-zh 55 | docker run -tid -p :80 apachecn0/lightgbm-doc-zh 56 | # 访问 http://localhost:{port} 查看文档 57 | ``` 58 | 59 | ### PYPI 60 | 61 | ``` 62 | pip install lightgbm-doc-zh 63 | lightgbm-doc-zh 64 | # 访问 http://localhost:{port} 查看文档 65 | ``` 66 | 67 | ### NPM 68 | 69 | ``` 70 | npm install -g lightgbm-doc-zh 71 | lightgbm-doc-zh 72 | # 访问 http://localhost:{port} 查看文档 73 | ``` 74 | 75 | ## 贡献指南 76 | 77 | 为了使项目更加便于维护，我们将文档格式全部转换成了 Markdown，同时更换了页面生成器。后续维护工作将完全在 Markdown 上进行。 78 | 79 | 小部分格式仍然存在问题，主要是链接和表格。需要大家帮忙找到，并提 PullRequest 来修复。 80 | 81 | ## 建议反馈 82 | 83 | * 联系项目负责人 [@那伊抹微笑](https://github.com/wangyangting). 84 | * 在我们的 [apachecn/lightgbm-doc-zh](https://github.com/apachecn/lightgbm-doc-zh) github 上提 issue. 85 | * 发送到 Email: lightgbm#apachecn.org（#替换成@）. 86 | * 在我们的 [组织学习交流群](./apachecn-learning-group.rst) 中联系群主/管理员即可. 87 | 88 | ## 组织学习交流群 89 | 90 | 机器学习交流群: [629470233](http://shang.qq.com/wpa/qunwpa?idkey=bcee938030cc9e1552deb3bd9617bbbf62d3ec1647e4b60d9cd6b6e8f78ddc03) （2000人） 91 | 92 | 大数据交流群: [214293307](http://shang.qq.com/wpa/qunwpa?idkey=bcee938030cc9e1552deb3bd9617bbbf62d3ec1647e4b60d9cd6b6e8f78ddc03) （2000人） 93 | 94 | 了解我们: [http://www.apachecn.org/organization/209.html](http://www.apachecn.org/organization/209.html) 95 | 96 | 加入组织: [http://www.apachecn.org/organization/209.html](http://www.apachecn.org/organization/209.html) 97 | 98 | 更多信息请参阅: [http://www.apachecn.org/organization/348.html](http://www.apachecn.org/organization/348.html) 99 | -------------------------------------------------------------------------------- /asset/docsify-copy-code.min.js: -------------------------------------------------------------------------------- 1 | /*! 2 | * docsify-copy-code 3 | * v2.1.0 4 | * https://github.com/jperasmus/docsify-copy-code 5 | * (c) 2017-2019 JP Erasmus 6 | * MIT license 7 | */ 8 | !function(){"use strict";function r(o){return(r="function"==typeof Symbol&&"symbol"==typeof Symbol.iterator?function(o){return typeof o}:function(o){return o&&"function"==typeof Symbol&&o.constructor===Symbol&&o!==Symbol.prototype?"symbol":typeof o})(o)}!function(o,e){void 0===e&&(e={});var t=e.insertAt;if(o&&"undefined"!=typeof document){var n=document.head||document.getElementsByTagName("head")[0],c=document.createElement("style");c.type="text/css","top"===t&&n.firstChild?n.insertBefore(c,n.firstChild):n.appendChild(c),c.styleSheet?c.styleSheet.cssText=o:c.appendChild(document.createTextNode(o))}}(".docsify-copy-code-button,.docsify-copy-code-button span{cursor:pointer;transition:all .25s ease}.docsify-copy-code-button{position:absolute;z-index:1;top:0;right:0;overflow:visible;padding:.65em .8em;border:0;border-radius:0;outline:0;font-size:1em;background:grey;background:var(--theme-color,grey);color:#fff;opacity:0}.docsify-copy-code-button span{border-radius:3px;background:inherit;pointer-events:none}.docsify-copy-code-button .error,.docsify-copy-code-button .success{position:absolute;z-index:-100;top:50%;left:0;padding:.5em .65em;font-size:.825em;opacity:0;-webkit-transform:translateY(-50%);transform:translateY(-50%)}.docsify-copy-code-button.error .error,.docsify-copy-code-button.success .success{opacity:1;-webkit-transform:translate(-115%,-50%);transform:translate(-115%,-50%)}.docsify-copy-code-button:focus,pre:hover .docsify-copy-code-button{opacity:1}"),document.querySelector('link[href*="docsify-copy-code"]')&&console.warn("[Deprecation] Link to external docsify-copy-code stylesheet is no longer necessary."),window.DocsifyCopyCodePlugin={init:function(){return function(o,e){o.ready(function(){console.warn("[Deprecation] Manually initializing docsify-copy-code using window.DocsifyCopyCodePlugin.init() is no longer necessary.")})}}},window.$docsify=window.$docsify||{},window.$docsify.plugins=[function(o,s){o.doneEach(function(){var o=Array.apply(null,document.querySelectorAll("pre[data-lang]")),c={buttonText:"Copy to clipboard",errorText:"Error",successText:"Copied"};s.config.copyCode&&Object.keys(c).forEach(function(t){var n=s.config.copyCode[t];"string"==typeof n?c[t]=n:"object"===r(n)&&Object.keys(n).some(function(o){var e=-1',''.concat(c.buttonText,""),''.concat(c.errorText,""),''.concat(c.successText,""),""].join("");o.forEach(function(o){o.insertAdjacentHTML("beforeend",e)})}),o.mounted(function(){document.querySelector(".content").addEventListener("click",function(o){if(o.target.classList.contains("docsify-copy-code-button")){var e="BUTTON"===o.target.tagName?o.target:o.target.parentNode,t=document.createRange(),n=e.parentNode.querySelector("code"),c=window.getSelection();t.selectNode(n),c.removeAllRanges(),c.addRange(t);try{document.execCommand("copy")&&(e.classList.add("success"),setTimeout(function(){e.classList.remove("success")},1e3))}catch(o){console.error("docsify-copy-code: ".concat(o)),e.classList.add("error"),setTimeout(function(){e.classList.remove("error")},1e3)}"function"==typeof(c=window.getSelection()).removeRange?c.removeRange(t):"function"==typeof c.removeAllRanges&&c.removeAllRanges()}})})}].concat(window.$docsify.plugins||[])}(); 9 | //# sourceMappingURL=docsify-copy-code.min.js.map 10 | -------------------------------------------------------------------------------- /asset/prism-darcula.css: -------------------------------------------------------------------------------- 1 | /** 2 | * Darcula theme 3 | * 4 | * Adapted from a theme based on: 5 | * IntelliJ Darcula Theme (https://github.com/bulenkov/Darcula) 6 | * 7 | * @author Alexandre Paradis 8 | * @version 1.0 9 | */ 10 | 11 | code[class*="lang-"], 12 | pre[data-lang] { 13 | color: #a9b7c6 !important; 14 | background-color: #2b2b2b !important; 15 | font-family: Consolas, Monaco, 'Andale Mono', monospace; 16 | direction: ltr; 17 | text-align: left; 18 | white-space: pre; 19 | word-spacing: normal; 20 | word-break: normal; 21 | line-height: 1.5; 22 | 23 | -moz-tab-size: 4; 24 | -o-tab-size: 4; 25 | tab-size: 4; 26 | 27 | -webkit-hyphens: none; 28 | -moz-hyphens: none; 29 | -ms-hyphens: none; 30 | hyphens: none; 31 | } 32 | 33 | pre[data-lang]::-moz-selection, pre[data-lang] ::-moz-selection, 34 | code[class*="lang-"]::-moz-selection, code[class*="lang-"] ::-moz-selection { 35 | color: inherit; 36 | background: rgba(33, 66, 131, .85); 37 | } 38 | 39 | pre[data-lang]::selection, pre[data-lang] ::selection, 40 | code[class*="lang-"]::selection, code[class*="lang-"] ::selection { 41 | color: inherit; 42 | background: rgba(33, 66, 131, .85); 43 | } 44 | 45 | /* Code blocks */ 46 | pre[data-lang] { 47 | padding: 1em; 48 | margin: .5em 0; 49 | overflow: auto; 50 | } 51 | 52 | :not(pre) > code[class*="lang-"], 53 | pre[data-lang] { 54 | background: #2b2b2b; 55 | } 56 | 57 | /* Inline code */ 58 | :not(pre) > code[class*="lang-"] { 59 | padding: .1em; 60 | border-radius: .3em; 61 | } 62 | 63 | .token.comment, 64 | .token.prolog, 65 | .token.cdata { 66 | color: #808080; 67 | } 68 | 69 | .token.delimiter, 70 | .token.boolean, 71 | .token.keyword, 72 | .token.selector, 73 | .token.important, 74 | .token.atrule { 75 | color: #cc7832; 76 | } 77 | 78 | .token.operator, 79 | .token.punctuation, 80 | .token.attr-name { 81 | color: #a9b7c6; 82 | } 83 | 84 | .token.tag, 85 | .token.tag .punctuation, 86 | .token.doctype, 87 | .token.builtin { 88 | color: #e8bf6a; 89 | } 90 | 91 | .token.entity, 92 | .token.number, 93 | .token.symbol { 94 | color: #6897bb; 95 | } 96 | 97 | .token.property, 98 | .token.constant, 99 | .token.variable { 100 | color: #9876aa; 101 | } 102 | 103 | .token.string, 104 | .token.char { 105 | color: #6a8759; 106 | } 107 | 108 | .token.attr-value, 109 | .token.attr-value .punctuation { 110 | color: #a5c261; 111 | } 112 | 113 | .token.attr-value .punctuation:first-child { 114 | color: #a9b7c6; 115 | } 116 | 117 | .token.url { 118 | color: #287bde; 119 | text-decoration: underline; 120 | } 121 | 122 | .token.function { 123 | color: #ffc66d; 124 | } 125 | 126 | .token.regex { 127 | background: #364135; 128 | } 129 | 130 | .token.bold { 131 | font-weight: bold; 132 | } 133 | 134 | .token.italic { 135 | font-style: italic; 136 | } 137 | 138 | .token.inserted { 139 | background: #294436; 140 | } 141 | 142 | .token.deleted { 143 | background: #484a4a; 144 | } 145 | 146 | code.lang-css .token.property, 147 | code.lang-css .token.property + .token.punctuation { 148 | color: #a9b7c6; 149 | } 150 | 151 | code.lang-css .token.id { 152 | color: #ffc66d; 153 | } 154 | 155 | code.lang-css .token.selector > .token.class, 156 | code.lang-css .token.selector > .token.attribute, 157 | code.lang-css .token.selector > .token.pseudo-class, 158 | code.lang-css .token.selector > .token.pseudo-element { 159 | color: #ffc66d; 160 | } -------------------------------------------------------------------------------- /docs/7.md: -------------------------------------------------------------------------------- 1 | # 参数优化 2 | 3 | 该页面包含了 LightGBM 中所有的参数. 4 | 5 | **其他有用链接列表** 6 | 7 | * [参数](./Parameters.rst) 8 | * [Python API](./Python-API.rst) 9 | 10 | ## 针对 Leaf-wise (最佳优先) 树的参数优化 11 | 12 | LightGBM uses the [leaf-wise](./Features.rst#leaf-wise-best-first-tree-growth) tree growth algorithm, while many other popular tools use depth-wise tree growth. Compared with depth-wise growth, the leaf-wise algorithm can convenge much faster. However, the leaf-wise growth may be over-fitting if not used with the appropriate parameters. 13 | 14 | LightGBM 使用 [leaf-wise](./Features.rst#leaf-wise-best-first-tree-growth) 的树生长策略, 而很多其他流行的算法采用 depth-wise 的树生长策略. 与 depth-wise 的树生长策略相较, leaf-wise 算法可以收敛的更快. 但是, 如果参数选择不当的话, leaf-wise 算法有可能导致过拟合. 15 | 16 | To get good results using a leaf-wise tree, these are some important parameters: 17 | 18 | 想要在使用 leaf-wise 算法时得到好的结果, 这里有几个重要的参数值得注意: 19 | 20 | 1. `num_leaves`. This is the main parameter to control the complexity of the tree model. Theoretically, we can set `num_leaves = 2^(max_depth)` to convert from depth-wise tree. However, this simple conversion is not good in practice. The reason is, when number of leaves are the same, the leaf-wise tree is much deeper than depth-wise tree. As a result, it may be over-fitting. Thus, when trying to tune the `num_leaves`, we should let it be smaller than `2^(max_depth)`. For example, when the `max_depth=6` the depth-wise tree can get good accuracy, but setting `num_leaves` to `127` may cause over-fitting, and setting it to `70` or `80` may get better accuracy than depth-wise. Actually, the concept `depth` can be forgotten in leaf-wise tree, since it doesn’t have a correct mapping from `leaves` to `depth`. 21 | 22 | 1. `num_leaves`. 这是控制树模型复杂度的主要参数. 理论上, 借鉴 depth-wise 树, 我们可以设置 `num_leaves = 2^(max_depth)` 但是, 这种简单的转化在实际应用中表现不佳. 这是因为, 当叶子数目相同时, leaf-wise 树要比 depth-wise 树深得多, 这就有可能导致过拟合. 因此, 当我们试着调整 `num_leaves` 的取值时, 应该让其小于 `2^(max_depth)`. 举个例子, 当 `max_depth=6` 时(这里译者认为例子中, 树的最大深度应为7), depth-wise 树可以达到较高的准确率.但是如果设置 `num_leaves` 为 `127` 时, 有可能会导致过拟合, 而将其设置为 `70` 或 `80` 时可能会得到比 depth-wise 树更高的准确率. 其实, `depth` 的概念在 leaf-wise 树中并没有多大作用, 因为并不存在一个从 `leaves` 到 `depth` 的合理映射. 23 | 2. `min_data_in_leaf`. This is a very important parameter to deal with over-fitting in leaf-wise tree. Its value depends on the number of training data and `num_leaves`. Setting it to a large value can avoid growing too deep a tree, but may cause under-fitting. In practice, setting it to hundreds or thousands is enough for a large dataset. 24 | 25 | 1. `min_data_in_leaf`. 这是处理 leaf-wise 树的过拟合问题中一个非常重要的参数. 它的值取决于训练数据的样本个树和 `num_leaves`. 将其设置的较大可以避免生成一个过深的树, 但有可能导致欠拟合. 实际应用中, 对于大数据集, 设置其为几百或几千就足够了. 26 | 2. `max_depth`. You also can use `max_depth` to limit the tree depth explicitly. 27 | 28 | 1. `max_depth`. 你也可以利用 `max_depth` 来显式地限制树的深度. 29 | 30 | ## 针对更快的训练速度 31 | 32 | * Use bagging by setting `bagging_fraction` and `bagging_freq` 33 | * Use feature sub-sampling by setting `feature_fraction` 34 | * Use small `max_bin` 35 | * Use `save_binary` to speed up data loading in future learning 36 | * Use parallel learning, refer to [并行学习指南](./Parallel-Learning-Guide.rst) 37 | * 通过设置 `bagging_fraction` 和 `bagging_freq` 参数来使用 bagging 方法 38 | * 通过设置 `feature_fraction` 参数来使用特征的子抽样 39 | * 使用较小的 `max_bin` 40 | * 使用 `save_binary` 在未来的学习过程对数据加载进行加速 41 | * 使用并行学习, 可参考 [并行学习指南](./Parallel-Learning-Guide.rst) 42 | 43 | ## 针对更好的准确率 44 | 45 | * Use large `max_bin` (may be slower) 46 | * Use small `learning_rate` with large `num_iterations` 47 | * Use large `num_leaves` (may cause over-fitting) 48 | * Use bigger training data 49 | * Try `dart` 50 | * 使用较大的 `max_bin` （学习速度可能变慢） 51 | * 使用较小的 `learning_rate` 和较大的 `num_iterations` 52 | * 使用较大的 `num_leaves` （可能导致过拟合） 53 | * 使用更大的训练数据 54 | * 尝试 `dart` 55 | 56 | ## 处理过拟合 57 | 58 | * Use small `max_bin` 59 | * Use small `num_leaves` 60 | * Use `min_data_in_leaf` and `min_sum_hessian_in_leaf` 61 | * Use bagging by set `bagging_fraction` and `bagging_freq` 62 | * Use feature sub-sampling by set `feature_fraction` 63 | * Use bigger training data 64 | * Try `lambda_l1`, `lambda_l2` and `min_gain_to_split` for regularization 65 | * Try `max_depth` to avoid growing deep tree 66 | * 使用较小的 `max_bin` 67 | * 使用较小的 `num_leaves` 68 | * 使用 `min_data_in_leaf` 和 `min_sum_hessian_in_leaf` 69 | * 通过设置 `bagging_fraction` 和 `bagging_freq` 来使用 bagging 70 | * 通过设置 `feature_fraction` 来使用特征子抽样 71 | * 使用更大的训练数据 72 | * 使用 `lambda_l1`, `lambda_l2` 和 `min_gain_to_split` 来使用正则 73 | * 尝试 `max_depth` 来避免生成过深的树 -------------------------------------------------------------------------------- /asset/search.min.js: -------------------------------------------------------------------------------- 1 | !function(){"use strict";function e(e){var n={"&":"&","<":"<",">":">",'"':""","'":"'","/":"/"};return String(e).replace(/[&<>"'\/]/g,function(e){return n[e]})}function n(e){var n=[];return h.dom.findAll("a:not([data-nosearch])").map(function(t){var o=t.href,i=t.getAttribute("href"),r=e.parse(o).path;r&&-1===n.indexOf(r)&&!Docsify.util.isAbsolutePath(i)&&n.push(r)}),n}function t(e){localStorage.setItem("docsify.search.expires",Date.now()+e),localStorage.setItem("docsify.search.index",JSON.stringify(g))}function o(e,n,t,o){void 0===n&&(n="");var i,r=window.marked.lexer(n),a=window.Docsify.slugify,s={};return r.forEach(function(n){if("heading"===n.type&&n.depth<=o)i=t.toURL(e,{id:a(n.text)}),s[i]={slug:i,title:n.text,body:""};else{if(!i)return;s[i]?s[i].body?s[i].body+="\n"+(n.text||""):s[i].body=n.text:s[i]={slug:i,title:"",body:""}}}),a.clear(),s}function i(n){var t=[],o=[];Object.keys(g).forEach(function(e){o=o.concat(Object.keys(g[e]).map(function(n){return g[e][n]}))}),n=n.trim();var i=n.split(/[\s\-\，\\\/]+/);1!==i.length&&(i=[].concat(n,i));for(var r=0;rl.length&&(d=l.length);var p="..."+e(l).substring(f,d).replace(o,''+n+"")+"...";s+=p}}),a)){var d={title:e(c),content:s,url:f};t.push(d)}}(r);return t}function r(e,i){h=Docsify;var r="auto"===e.paths,a=localStorage.getItem("docsify.search.expires")

',o=Docsify.dom.create("div",t),i=Docsify.dom.find("aside");Docsify.dom.toggleClass(o,"search"),Docsify.dom.before(i,o)}function c(e){var n=Docsify.dom.find("div.search"),t=Docsify.dom.find(n,".results-panel");if(!e)return t.classList.remove("show"),void(t.innerHTML="");var o=i(e),r="";o.forEach(function(e){r+='

\n \n

'+e.title+"

"+e.content+"

\n\n

"}),t.classList.add("show"),t.innerHTML=r||'

'+y+"

"}function l(){var e,n=Docsify.dom.find("div.search"),t=Docsify.dom.find(n,"input");Docsify.dom.on(n,"click",function(e){return"A"!==e.target.tagName&&e.stopPropagation()}),Docsify.dom.on(t,"input",function(n){clearTimeout(e),e=setTimeout(function(e){return c(n.target.value.trim())},100)})}function f(e,n){var t=Docsify.dom.getNode('.search input[type="search"]');if(t)if("string"==typeof e)t.placeholder=e;else{var o=Object.keys(e).filter(function(e){return n.indexOf(e)>-1})[0];t.placeholder=e[o]}}function d(e,n){if("string"==typeof e)y=e;else{var t=Object.keys(e).filter(function(e){return n.indexOf(e)>-1})[0];y=e[t]}}function p(e,n){var t=n.router.parse().query.s;a(),s(e,t),l(),t&&setTimeout(function(e){return c(t)},500)}function u(e,n){f(e.placeholder,n.route.path),d(e.noData,n.route.path)}var h,g={},y="",m={placeholder:"Type to search",noData:"No Results!",paths:"auto",depth:2,maxAge:864e5},v=function(e,n){var t=Docsify.util,o=n.config.search||m;Array.isArray(o)?m.paths=o:"object"==typeof o&&(m.paths=Array.isArray(o.paths)?o.paths:"auto",m.maxAge=t.isPrimitive(o.maxAge)?o.maxAge:m.maxAge,m.placeholder=o.placeholder||m.placeholder,m.noData=o.noData||m.noData,m.depth=o.depth||m.depth);var i="auto"===m.paths;e.mounted(function(e){p(m,n),!i&&r(m,n)}),e.doneEach(function(e){u(m,n),i&&r(m,n)})};$docsify.plugins=[].concat(v,$docsify.plugins)}(); 2 | -------------------------------------------------------------------------------- /docs/12.md: -------------------------------------------------------------------------------- 1 | # LightGBM FAQ 常见问题解答 2 | 3 | ## 内容 4 | 5 | * [关键问题](#关键问题) 6 | * [LightGBM](#lightgbm) 7 | * [R包](#r包) 8 | * [Python包](#python包) 9 | 10 | * * * 11 | 12 | ## 关键问题 13 | 14 | 在使用 LightGBM 遇到坑爹的问题时（程序奔溃，预测结果错误，无意义输出…）,你应该联系谁？ 15 | 16 | 如果你的问题不是那么紧急，可以把问题放到 [Microsoft/LightGBM repository](https://github.com/Microsoft/LightGBM/issues). 17 | 18 | 如果你的问题急需要解决，首先要明确你有哪些错误: 19 | 20 | * 你认为问题会不会复现在 CLI（命令行接口），R 或者 Python上？ 21 | * 还是只会在某个特定的包（R 或者 Python）上出现? 22 | * 还是会在某个特定的编译器（gcc 或者 MinGW）上出现？ 23 | * 还是会在某个特定的操作系统（Windows 或者 Linux）上出现？ 24 | * 你能用一个简单的例子复现这些问题吗？ 25 | * 你能（或者不能）在去掉所有的优化信息和在 debug 模式下编译 LightGBM 时复现这些问题吗？ 26 | 27 | 当出现问题的时候，根据上述答案，随时可以@我们（不同的问题可以@不同的人，下面是各种不同类型问题的负责人），这样我们就能更快地帮助你解决问题。 28 | 29 | * [@guolinke](https://github.com/guolinke) (C++ code / R-package / Python-package) 30 | * [@chivee](https://github.com/chivee) (C++ code / Python-package) 31 | * [@Laurae2](https://github.com/Laurae2) (R-package) 32 | * [@wxchan](https://github.com/wxchan) (Python-package) 33 | * [@henry0312](https://github.com/henry0312) (Python-package) 34 | * [@StrikerRUS](https://github.com/StrikerRUS) (Python-package) 35 | * [@huanzhang12](https://github.com/huanzhang12) (GPU support) 36 | 37 | ### 记住这是一个免费的/开放的社区支持，我们可能不能做到全天候的提供帮助. 38 | 39 | ## LightGBM 40 | 41 | * **问题 1**：我可以去哪里找到关于LightGBM参数的更多详细内容？ 42 | * **方法 1**：可以看一下这个 [Parameters](./Parameters.rst) and [Laurae++/Parameters](https://sites.google.com/view/lauraepp/parameters) 网站。 43 | 44 | * * * 45 | 46 | * **问题 2**：在一个有百万个特征的数据集中，（要在很长一段时间后才开始训练或者）训练根本没有开始。 47 | * **方法 2**：对 `bin_construct_sample_cnt` 用一个较小的值和对 `min_data` 用一个较大的值。 48 | 49 | * * * 50 | 51 | * **问题 3**：当在一个很大的数据集上使用LightGBM，我的电脑会耗尽内存。 52 | * **方法 3**：很多方法啊：将 `histogram_pool_size` 参数设置成你想为LightGBM分配的MB(histogram_pool_size + dataset size = approximately RAM used), 减少 `num_leaves` 或减少 `max_bin` （点这里 [Microsoft/LightGBM#562](https://github.com/Microsoft/LightGBM/issues/562)）。 53 | 54 | * * * 55 | 56 | * **问题 4**：我使用Windows系统。我应该使用Visual Studio或者MinGW编译LightGBM吗？ 57 | * **方法 4**：推荐使用 [Visual Studio](https://github.com/Microsoft/LightGBM/issues/542)，因为它的性能更好。 58 | 59 | * * * 60 | 61 | * **问题 5**：当使用LightGBM，我每次运行得到的结果都不同（结果不能复现）。 62 | * **方法 5**：这是一个很正常的问题，我们/你也无能为力。你可以试试使用 `gpu_use_dp = true` 来复现结果（点这里 [Microsoft/LightGBM#560](https://github.com/Microsoft/LightGBM/pull/560#issuecomment-304561654)）。你也可以使用CPU的版本试试。 63 | 64 | * * * 65 | 66 | * **问题 6**：Bagging在改变线程的数量时，是不能复现的。 67 | * **方法 6**：由于LightGBM Bagging是多线程运行的，它的输出依赖于使用线程的数量。 There is [no workaround currently](https://github.com/Microsoft/LightGBM/issues/632)。 68 | 69 | * * * 70 | 71 | * **问题 7**：我试过使用随机森林模式，LightGBM崩溃啦！ 72 | * **方法 7**：这是设计的问题。你必须使用 `bagging_fraction` 和 `feature_fraction` 与1不同，要和 `bagging_freq` 结合使用。看这个例子 [this thread](https://github.com/Microsoft/LightGBM/issues/691)。 73 | 74 | * * * 75 | 76 | * **问题 8**：当在一个很大的数据集上和很多核心系统使用LightGBMWindows系统时，CPU不是满负荷运行（例如只使用了10%的CPU）。 77 | * **方法 8**：请使用 [Visual Studio](https://www.visualstudio.com/downloads/)，因为Visual Studio可能 [10x faster than MinGW](https://github.com/Microsoft/LightGBM/issues/749)，尤其是在很大的树上。 78 | 79 | * * * 80 | 81 | ## R 包 82 | 83 | * **问题 1**：在训练先前的LightGBM模型时一个错误出现后，任何使用LightGBM的训练命令都不会起作用。 84 | * **方法 1**：在R控制台中运行 `lgb.unloader(wipe = TRUE)`，再重新创建LightGBM数据集（这会消除所有与LightGBM相关的变量）。由于这些指针，选择不去消除这些变量不会修复这些错误。这是一个已知的问题: [Microsoft/LightGBM#698](https://github.com/Microsoft/LightGBM/issues/698)。 85 | 86 | * * * 87 | 88 | * **问题 2**：我使用过 `setinfo` ,试过打印我的 `lgb.Dataset` ,结果R控制台无响应。 89 | * **方法 2**：在使用 `setinfo` 后避免打印 `lgb.Dataset`. 这是一个已知的bug：[Microsoft/LightGBM#539](https://github.com/Microsoft/LightGBM/issues/539)。 90 | 91 | * * * 92 | 93 | ## Python 包 94 | 95 | * **问题 1**：当从GitHub使用 `python setup.py install` 安装，我看到如下错误信息。 96 | 97 | ``` 98 | error：错误：安装脚本指定绝对路径： 99 | /Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so 100 | setup()参数必须 *一直* 是/-分离路径相对于setup.py目录， *从不* 是绝对路径。 101 | 102 | ``` 103 | 104 | * **方法 1**：这个错误在新版本中应该会被解决。如果你还会遇到这个问题，试着在你的Python包中去掉 `lightgbm.egg-info` 文件夹，再重装一下，或者对照一下这个 [this thread on stackoverflow](http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path)。 105 | 106 | * * * 107 | 108 | * **问题 2**：我看到错误信息如下 109 | 110 | ``` 111 | 在构建数据集前不能 get/set label/weight/init_score/group/num_data/num_feature。 112 | 113 | ``` 114 | 115 | 但是我已经使用下面的代码构建数据集 116 | 117 | ``` 118 | train = lightgbm.Dataset(X_train, y_train) 119 | 120 | ``` 121 | 122 | 或如下错误信息 123 | 124 | ``` 125 | 在释放原始数据后，不能设置predictor/reference/categorical特征。可以在创建数据集时设置free_raw_data=False避免上面的问题。 126 | 127 | ``` 128 | 129 | * **方法2**: 因为LightGBM创建bin mappers来构建树，在一个Booster内的train和valid数据集共享同一个bin mappers，类别特征和特征名等信息，数据集对象在创建Booster时候被创建。如果你设置 `free_raw_data=True` (默认)，原始数据（在Python数据结构中的）将会被释放。所以，如果你想要： 130 | 131 | * 在创建数据集前get label(or weight/init_score/group)，这和get `self.label` 操作相同。 132 | * 在创建数据集前set label(or weight/init_score/group)，这和 `self.label=some_label_array` 操作相同。 133 | * 在创建数据集前get num_data(or num_feature)，你可以使用 `self.data` 得到数据，然后如果你的数据是 `numpy.ndarray`，使用一些类似 `self.data.shape` 的代码。 134 | * 在构建数据集之后set predictor(or reference/categorical feature)，你应该设置 `free_raw_data=False` 或使用同样的原始数据初始化数据集对象。 -------------------------------------------------------------------------------- /docs/3.md: -------------------------------------------------------------------------------- 1 | # Python 包的相关介绍 2 | 3 | 该文档给出了有关 LightGBM Python 软件包的基本演练. 4 | 5 | **其它有用的链接列表** 6 | 7 | * [Python 例子](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide) 8 | * [Python API](./Python-API.rst) 9 | * [参数优化](./Parameters-Tuning.rst) 10 | 11 | ## 安装 12 | 13 | 安装 Python 软件包的依赖, `setuptools`, `wheel`, `numpy` 和 `scipy` 是必须的, `scikit-learn` 对于 sklearn 接口和推荐也是必须的: 14 | 15 | ``` 16 | pip install setuptools wheel numpy scipy scikit-learn -U 17 | 18 | ``` 19 | 20 | 参考 [Python-package](https://github.com/Microsoft/LightGBM/tree/master/python-package) 安装指南文件夹. 21 | 22 | 为了验证是否安装成功, 可以在 Python 中 `import lightgbm` 试试: 23 | 24 | ``` 25 | import lightgbm as lgb 26 | 27 | ``` 28 | 29 | ## 数据接口 30 | 31 | LightGBM Python 模块能够使用以下几种方式来加载数据: 32 | 33 | * libsvm/tsv/csv txt format file（libsvm/tsv/csv 文本文件格式） 34 | * Numpy 2D array, pandas object（Numpy 2维数组, pandas 对象） 35 | * LightGBM binary file（LightGBM 二进制文件） 36 | 37 | 加载后的数据存在 `Dataset` 对象中. 38 | 39 | **要加载 ligsvm 文本文件或 LightGBM 二进制文件到 Dataset 中:** 40 | 41 | ``` 42 | train_data = lgb.Dataset('train.svm.bin') 43 | 44 | ``` 45 | 46 | **要加载 numpy 数组到 Dataset 中:** 47 | 48 | ``` 49 | data = np.random.rand(500, 10) # 500 个样本, 每一个包含 10 个特征 50 | label = np.random.randint(2, size=500) # 二元目标变量, 0 和 1 51 | train_data = lgb.Dataset(data, label=label) 52 | 53 | ``` 54 | 55 | **要加载 scpiy.sparse.csr_matrix 数组到 Dataset 中:** 56 | 57 | ``` 58 | csr = scipy.sparse.csr_matrix((dat, (row, col))) 59 | train_data = lgb.Dataset(csr) 60 | 61 | ``` 62 | 63 | **保存 Dataset 到 LightGBM 二进制文件将会使得加载更快速:** 64 | 65 | ``` 66 | train_data = lgb.Dataset('train.svm.txt') 67 | train_data.save_binary('train.bin') 68 | 69 | ``` 70 | 71 | **创建验证数据:** 72 | 73 | ``` 74 | test_data = train_data.create_valid('test.svm') 75 | 76 | ``` 77 | 78 | or 79 | 80 | ``` 81 | test_data = lgb.Dataset('test.svm', reference=train_data) 82 | 83 | ``` 84 | 85 | 在 LightGBM 中, 验证数据应该与训练数据一致（格式一致）. 86 | 87 | **指定 feature names（特征名称）和 categorical features（分类特征）:** 88 | 89 | ``` 90 | train_data = lgb.Dataset(data, label=label, feature_name=['c1', 'c2', 'c3'], categorical_feature=['c3']) 91 | 92 | ``` 93 | 94 | LightGBM 可以直接使用 categorical features（分类特征）作为 input（输入）. 它不需要被转换成 one-hot coding（独热编码）, 并且它比 one-hot coding（独热编码）更快（约快上 8 倍） 95 | 96 | **注意**: 在你构造 `Dataset` 之前, 你应该将分类特征转换为 `int` 类型的值. 97 | 98 | **当需要时可以设置权重:** 99 | 100 | ``` 101 | w = np.random.rand(500, ) 102 | train_data = lgb.Dataset(data, label=label, weight=w) 103 | 104 | ``` 105 | 106 | 或者 107 | 108 | ``` 109 | train_data = lgb.Dataset(data, label=label) 110 | w = np.random.rand(500, ) 111 | train_data.set_weight(w) 112 | 113 | ``` 114 | 115 | 并且你也可以使用 `Dataset.set_init_score()` 来初始化 score（分数）, 以及使用 `Dataset.set_group()` ；来设置 group/query 数据以用于 ranking（排序）任务. 116 | 117 | **内存的高使用:** 118 | 119 | LightGBM 中的 `Dataset` 对象由于只需要保存 discrete bins（离散的数据块）, 因此它具有很好的内存效率. 然而, Numpy/Array/Pandas 对象的内存开销较大. 如果你关心你的内存消耗. 您可以根据以下方式来节省内存: 120 | 121 | 1. 在构造 `Dataset` 时设置 `free_raw_data=True` （默认为 `True`） 122 | 2. 在 `Dataset` 被构造完之后手动设置 `raw_data=None` 123 | 3. 调用 `gc` 124 | 125 | ## 设置参数 126 | 127 | LightGBM 可以使用一个 pairs 的 list 或一个字典来设置 [参数](./Parameters.rst). 例如: 128 | 129 | * Booster（提升器）参数: 130 | 131 | ``` 132 | param = {'num_leaves':31, 'num_trees':100, 'objective':'binary'} 133 | param['metric'] = 'auc' 134 | 135 | ``` 136 | 137 | * 您还可以指定多个 eval 指标: 138 | 139 | ``` 140 | param['metric'] = ['auc', 'binary_logloss'] 141 | 142 | ``` 143 | 144 | ## 训练 145 | 146 | 训练一个模型时, 需要一个 parameter list（参数列表）和 data set（数据集）: 147 | 148 | ``` 149 | num_round = 10 150 | bst = lgb.train(param, train_data, num_round, valid_sets=[test_data]) 151 | 152 | ``` 153 | 154 | 在训练完成后, 可以使用如下方式来存储模型: 155 | 156 | ``` 157 | bst.save_model('model.txt') 158 | 159 | ``` 160 | 161 | 训练后的模型也可以转存为 JSON 的格式: 162 | 163 | ``` 164 | json_model = bst.dump_model() 165 | 166 | ``` 167 | 168 | 以保存模型也可以使用如下的方式来加载. 169 | 170 | ``` 171 | bst = lgb.Booster(model_file='model.txt') #init model 172 | 173 | ``` 174 | 175 | ## 交叉验证 176 | 177 | 使用 5-折方式的交叉验证来进行训练（4 个训练集, 1 个测试集）: 178 | 179 | ``` 180 | num_round = 10 181 | lgb.cv(param, train_data, num_round, nfold=5) 182 | 183 | ``` 184 | 185 | ## 提前停止 186 | 187 | 如果您有一个验证集, 你可以使用提前停止找到最佳数量的 boosting rounds（梯度次数）. 提前停止需要在 `valid_sets` 中至少有一个集合. 如果有多个，它们都会被使用: 188 | 189 | ``` 190 | bst = lgb.train(param, train_data, num_round, valid_sets=valid_sets, early_stopping_rounds=10) 191 | bst.save_model('model.txt', num_iteration=bst.best_iteration) 192 | 193 | ``` 194 | 195 | 该模型将开始训练, 直到验证得分停止提高为止. 验证错误需要至少每个 <cite>early_stopping_rounds</cite> 减少以继续训练. 196 | 197 | 如果提前停止, 模型将有 1 个额外的字段: <cite>bst.best_iteration</cite>. 请注意 <cite>train()</cite> 将从最后一次迭代中返回一个模型, 而不是最好的一个. 198 | 199 | This works with both metrics to minimize (L2, log loss, etc.) and to maximize (NDCG, AUC). Note that if you specify more than one evaluation metric, all of them will be used for early stopping. 200 | 201 | 这与两个度量标准一起使用以达到最小化（L2, 对数损失, 等等）和最大化（NDCG, AUC）. 请注意, 如果您指定多个评估指标, 则它们都会用于提前停止. 202 | 203 | ## 预测 204 | 205 | 已经训练或加载的模型都可以对数据集进行预测: 206 | 207 | ``` 208 | # 7 个样本, 每一个包含 10 个特征 209 | data = np.random.rand(7, 10) 210 | ypred = bst.predict(data) 211 | 212 | ``` 213 | 214 | 如果在训练过程中启用了提前停止, 可以用 <cite>bst.best_iteration</cite> 从最佳迭代中获得预测结果: 215 | 216 | ``` 217 | ypred = bst.predict(data, num_iteration=bst.best_iteration) 218 | 219 | ``` -------------------------------------------------------------------------------- /docs/10.md: -------------------------------------------------------------------------------- 1 | # LightGBM GPU 教程 2 | 3 | 本文档的目的在于一步步教你快速上手 GPU 训练。 4 | 5 | 对于 Windows, 请参阅 [GPU Windows 教程](./GPU-Windows.rst). 6 | 7 | 我们将用 [Microsoft Azure cloud computing platform](https://azure.microsoft.com/) 上的 GPU 实例做演示，但你可以使用具有现代 AMD 或 NVIDIA GPU 的任何机器。 8 | 9 | ## GPU 安装 10 | 11 | 你需要在 Azure （East US, North Central US, South Central US, West Europe 以及 Southeast Asia 等区域都可用）上启动一个 `NV` 类型的实例并选择 Ubuntu 16.04 LTS 作为操作系统。 12 | 13 | 经测试, `NV6` 类型的虚拟机是满足最小需求的, 这种虚拟机包括 1/2 M60 GPU， 8 GB 内存, 180 GB/s 的内存带宽以及 4,825 GFLOPS 的峰值计算能力。不要使用 `NC` 类型的实例，因为这些 GPU (K80) 是基于较老的架构 (Kepler). 14 | 15 | 首先我们需要安装精简版的 NVIDIA 驱动和 OpenCL 开发环境： 16 | 17 | ``` 18 | sudo apt-get update 19 | sudo apt-get install --no-install-recommends nvidia-375 20 | sudo apt-get install --no-install-recommends nvidia-opencl-icd-375 nvidia-opencl-dev opencl-headers 21 | 22 | ``` 23 | 24 | 安装完驱动以后需要重新启动服务器。 25 | 26 | ``` 27 | sudo init 6 28 | 29 | ``` 30 | 31 | 大约 30 秒后，服务器可以重新运转。 32 | 33 | 如果你正在使用 AMD GPU, 你需要下载并安装 [AMDGPU-Pro](http://support.amd.com/en-us/download/linux) 驱动，同时安装 `ocl-icd-libopencl1` 和 `ocl-icd-opencl-dev` 两个包。 34 | 35 | ## 编译 LightGBM 36 | 37 | 现在安装必要的生成工具和依赖: 38 | 39 | ``` 40 | sudo apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev 41 | 42 | ``` 43 | 44 | `NV6` GPU 实例自带一个 320 GB 的极速 SSD，挂载在 `/mnt` 目录下。我们把它作为我们的工作环境（如果你正在使用自己的机器，可以跳过该步）： 45 | 46 | ``` 47 | sudo mkdir -p /mnt/workspace 48 | sudo chown $(whoami):$(whoami) /mnt/workspace 49 | cd /mnt/workspace 50 | 51 | ``` 52 | 53 | 现在我们可以准备好校验 LightGBM 并使用 GPU 支持来编译它: 54 | 55 | ``` 56 | git clone --recursive https://github.com/Microsoft/LightGBM 57 | cd LightGBM 58 | mkdir build ; cd build 59 | cmake -DUSE_GPU=1 .. 60 | # if you have installed the NVIDIA OpenGL, please using following instead 61 | # sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -OpenCL_INCLUDE_DIR=/usr/local/cuda/include/ .. 62 | make -j$(nproc) 63 | cd .. 64 | 65 | ``` 66 | 67 | 你可以看到有两个二进制文件生成了，`lightgbm` 和 `lib_lightgbm.so` 68 | 69 | 如果你正在 OSX 系统上编译，你可能需要在 `src/treelearner/gpu_tree_learner.h` 中移除 `BOOST_COMPUTE_USE_OFFLINE_CACHE` 宏指令以避免 Boost.Compute 中的冲突错误。 70 | 71 | ## 安装 Python 接口 (可选) 72 | 73 | 如果你希望使用 LightGBM 的 Python 接口，你现在可以安装它（同时包括一些必要的 Python 依赖包）: 74 | 75 | ``` 76 | sudo apt-get -y install python-pip 77 | sudo -H pip install setuptools numpy scipy scikit-learn -U 78 | cd python-package/ 79 | sudo python setup.py install --precompile 80 | cd .. 81 | 82 | ``` 83 | 84 | 你需要设置一个额外的参数 `"device" : "gpu"` （同时也包括其他选项如 `learning_rate`, `num_leaves`, 等等）来在 Python 中使用 GPU. 85 | 86 | 你可以阅读我们的 [Python Package Examples](https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide) 来获取更多关于如何使用 Python 接口的信息。 87 | 88 | ## 数据集准备 89 | 90 | 使用如下命令来准备 Higgs 数据集 91 | 92 | ``` 93 | git clone https://github.com/guolinke/boosting_tree_benchmarks.git 94 | cd boosting_tree_benchmarks/data 95 | wget "https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz" 96 | gunzip HIGGS.csv.gz 97 | python higgs2libsvm.py 98 | cd ../.. 99 | ln -s boosting_tree_benchmarks/data/higgs.train 100 | ln -s boosting_tree_benchmarks/data/higgs.test 101 | 102 | ``` 103 | 104 | 现在我们可以通过运行如下命令来为 LightGBM 创建一个配置文件（请复制整段代码块并作为一个整体来运行它）： 105 | 106 | ``` 107 | cat > lightgbm_gpu.conf <> lightgbm_gpu.conf 124 | 125 | ``` 126 | 127 | 我们可以通过在配置文件中设置 `device=gpu` 来使 GPU 处于可用状态。默认将使用系统安装的第一个 GPU (`gpu_platform_id=0` 以及 `gpu_device_id=0`). 128 | 129 | ## 在 GPU 上运行你的第一个学习任务 130 | 131 | 现在我们可以准备开始用 GPU 做训练了！ 132 | 133 | 首先我们希望确保 GPU 能够正确工作。运行如下代码来在 GPU 上训练，并记录下 50 次迭代后的 AUC。 134 | 135 | ``` 136 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc 137 | 138 | ``` 139 | 140 | 现在用如下代码在 CPU 上训练相同的数据集. 你应该能观察到相似的 AUC： 141 | 142 | ``` 143 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train valid=higgs.test objective=binary metric=auc device=cpu 144 | 145 | ``` 146 | 147 | 现在我们可以不计算 AUC，每次迭代后进行 GPU 上的速度测试。 148 | 149 | ``` 150 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc 151 | 152 | ``` 153 | 154 | CPU 的速度测试: 155 | 156 | ``` 157 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=binary metric=auc device=cpu 158 | 159 | ``` 160 | 161 | 你可以观察到在该 GPU 上加速了超过三倍. 162 | 163 | GPU 加速也可以用于其他任务/指标上（回归，多类别分类器，排序，等等）. 比如，我们可以在一个回归任务下训练 Higgs 数据集: 164 | 165 | ``` 166 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2 167 | 168 | ``` 169 | 170 | 同样地, 你也可以比较 CPU 上的训练速度: 171 | 172 | ``` 173 | ./lightgbm config=lightgbm_gpu.conf data=higgs.train objective=regression_l2 metric=l2 device=cpu 174 | 175 | ``` 176 | 177 | ## 进一步阅读 178 | 179 | * [GPU 优化指南和性能比较](./GPU-Performance.rst) 180 | * [GPU SDK Correspondence and Device Targeting Table](./GPU-Targets.rst) 181 | * [GPU Windows 教程](./GPU-Windows.rst) 182 | 183 | ## 参考 184 | 185 | 如果您觉得 GPU 加速很有用，希望您在著作中能够引用如下文章； 186 | 187 | Huan Zhang, Si Si and Cho-Jui Hsieh. “[GPU Acceleration for Large-scale Tree Boosting](https://arxiv.org/abs/1706.08359).” arXiv:1706.08359, 2017. -------------------------------------------------------------------------------- /docs/5.md: -------------------------------------------------------------------------------- 1 | # 实验 2 | 3 | ## 对比实验 4 | 5 | 详细的实验脚本和输出日志部分请参考 [repo](https://github.com/guolinke/boosting_tree_benchmarks). 6 | 7 | ### 数据集 8 | 9 | 我们使用4个数据集进行对比实验，有关数据的细节在下表列出： 10 | 11 | | **数据集** | **任务** | **链接** | **训练集** | **特征** | **注释** | 12 | | --- | --- | --- | --- | --- | --- | 13 | | Higgs | 二分类 | [link](https://archive.ics.uci.edu/ml/datasets/HIGGS) | 10,500,000 | 28 | 使用余下50万个样本作为测试集 | 14 | | Yahoo LTR | 机器学习排序 | [link](https://webscope.sandbox.yahoo.com/catalog.php?datatype=c) | 473,134 | 700 | set1.train为训练集，set1.test为测试集 | 15 | | MS LTR | 机器学习排序 | [link](http://research.microsoft.com/en-us/projects/mslr/) | 2,270,296 | 137 | {S1,S2,S3}为训练集，{S5} 为测试集 | 16 | | Expo | 二分类 | [link](http://stat-computing.org/dataexpo/2009/) | 11,000,000 | 700 | 使用余下100W个样本作为测试集 | 17 | | Allstate | 二分类 | [link](https://www.kaggle.com/c/ClaimPredictionChallenge) | 13,184,290 | 4228 | 使用余下100W个样本作为测试集 | 18 | 19 | ### 硬件环境 20 | 21 | 我们使用一台Linux服务器作为实验平台，具体配置如下： 22 | 23 | | **OS** | **CPU** | **Memory** | 24 | | --- | --- | --- | 25 | | Ubuntu 14.04 LTS | 2 * E5-2670 v3 | DDR4 2133Mhz, 256GB | 26 | 27 | ### 底层 28 | 29 | 我们使用 [xgboost](https://github.com/dmlc/xgboost) 作为底层算法。 30 | 31 | 并且 xgboost 和 LightGBM 都基于 OpenMP 构建。 32 | 33 | ### 设置 34 | 35 | 我们为该实验建立了3个设置，这些设置的参数如下: 36 | 37 | 1. xgboost: 38 | 39 | ``` 40 | eta = 0.1 41 | max_depth = 8 42 | num_round = 500 43 | nthread = 16 44 | tree_method = exact 45 | min_child_weight = 100 46 | 47 | ``` 48 | 49 | 2. xgboost_hist (使用直方图算法): 50 | 51 | ``` 52 | eta = 0.1 53 | num_round = 500 54 | nthread = 16 55 | tree_method = approx 56 | min_child_weight = 100 57 | tree_method = hist 58 | grow_policy = lossguide 59 | max_depth = 0 60 | max_leaves = 255 61 | 62 | ``` 63 | 64 | 3. LightGBM: 65 | 66 | ``` 67 | learning_rate = 0.1 68 | num_leaves = 255 69 | num_trees = 500 70 | num_threads = 16 71 | min_data_in_leaf = 0 72 | min_sum_hessian_in_leaf = 100 73 | 74 | ``` 75 | 76 | xgboost 通过 `max_depth` 对建树进行深度限制与模型复杂度控制。 77 | 78 | LightGBM 通过 `num_leaves` 执行带深度限制的 leaf-wise 叶子生长策略与模型复杂度控制。 79 | 80 | 因此我们无法设置完全相同的模型进行比较。为了相对权衡，我们在xgboost中设置 `max_depth=8` 以使叶子数量达到最大数量 255 与 LightGBM 中设置 `num_leves=255` 进行比较。 81 | 82 | 其他参数皆为默认值 83 | 84 | ### 结论 85 | 86 | #### 效率 87 | 88 | 为了比较效率, 我们只运行没有任何测试或者度量输出的训练进程，并且我们不计算 IO 的时间。 89 | 90 | 如下是耗时的对比表格： 91 | 92 | | **Data** | **xgboost** | **xgboost_hist** | **LightGBM** | 93 | | --- | --- | --- | --- | 94 | | Higgs | 3794.34 s | 551.898 s | **238.505513 s** | 95 | | Yahoo LTR | 674.322 s | 265.302 s | **150.18644 s** | 96 | | MS LTR | 1251.27 s | 385.201 s | **215.320316 s** | 97 | | Expo | 1607.35 s | 588.253 s | **138.504179 s** | 98 | | Allstate | 2867.22 s | 1355.71 s | **348.084475 s** | 99 | 100 | 我们发现在所有数据集上 LightGBM 都比 xgboost 快。 101 | 102 | #### 准确率 103 | 104 | 为了比较准确率, 我们使用数据集测试集部分的准确率进行公平比较。 105 | 106 | | **Data** | **Metric** | **xgboost** | **xgboost_hist** | **LightGBM** | 107 | | --- | --- | --- | --- | --- | 108 | | Higgs | AUC | 0.839593 | 0.845605 | 0.845154 | 109 | | Yahoo LTR | NDCG1 | 0.719748 | 0.720223 | 0.732466 | 110 | | NDCG3 | 0.717813 | 0.721519 | 0.738048 | 111 | | NDCG5 | 0.737849 | 0.739904 | 0.756548 | 112 | | NDCG10 | 0.78089 | 0.783013 | 0.796818 | 113 | | MS LTR | NDCG1 | 0.483956 | 0.488649 | 0.524255 | 114 | | NDCG3 | 0.467951 | 0.473184 | 0.505327 | 115 | | NDCG5 | 0.472476 | 0.477438 | 0.510007 | 116 | | NDCG10 | 0.492429 | 0.496967 | 0.527371 | 117 | | Expo | AUC | 0.756713 | 0.777777 | 0.777543 | 118 | | Allstate | AUC | 0.607201 | 0.609042 | 0.609167 | 119 | 120 | #### 内存消耗 121 | 122 | 我们在运行训练任务时监视 RES，并在 LightGBM 中设置 `two_round=true` （将增加数据载入时间，但会减少峰值内存使用量，不影响训练速度和准确性）以减少峰值内存使用量。 123 | 124 | | **Data** | **xgboost** | **xgboost_hist** | **LightGBM** | 125 | | --- | --- | --- | --- | 126 | | Higgs | 4.853GB | 3.784GB | **0.868GB** | 127 | | Yahoo LTR | 1.907GB | 1.468GB | **0.831GB** | 128 | | MS LTR | 5.469GB | 3.654GB | **0.886GB** | 129 | | Expo | 1.553GB | 1.393GB | **0.543GB** | 130 | | Allstate | 6.237GB | 4.990GB | **1.027GB** | 131 | 132 | ## 并行测试 133 | 134 | ### 数据集 135 | 136 | 我们使用 `terabyte click log` 数据集进行并行测试，详细信息如下表： 137 | 138 | | **数据** | **任务** | **链接** | **数据集** | **特征** | 139 | | --- | --- | --- | --- | --- | 140 | | Criteo | 二分类 | [link](http://labs.criteo.com/2013/12/download-terabyte-click-logs/) | 1,700,000,000 | 67 | 141 | 142 | 该数据集包含了 24 天点击记录，其中有 13 个整数特征与 26 个类别特征。 143 | 144 | 我们统计了该数据集 26 个类别前十天的点击率和计数，使用接下来十天的数据作为训练集并且该训练集中类别已与点击率和计数相对应。 145 | 146 | 处理后的训练集共有 17 亿条数据和 67 个特征。 147 | 148 | ### 环境 149 | 150 | 我们使用了 16 台 Windows 服务器作为实验平台，详细信息如下表： 151 | 152 | | **OS** | **CPU** | **Memory** | **Network Adapter** | 153 | | --- | --- | --- | --- | 154 | | Windows Server 2012 | 2 * E5-2670 v2 | DDR3 1600Mhz, 256GB | Mellanox ConnectX-3, 54Gbps, RDMA support | 155 | 156 | ### 设置： 157 | 158 | ``` 159 | learning_rate = 0.1 160 | num_leaves = 255 161 | num_trees = 100 162 | num_thread = 16 163 | tree_learner = data 164 | 165 | ``` 166 | 167 | 我们在此使用并行数据，因为该数据集数据量大但是特征少。 168 | 169 | 其他参数皆为默认值 170 | 171 | ### 结论 172 | 173 | | **#Machine** | **Time per Tree** | **Memory Usage(per Machine)** | 174 | | --- | --- | --- | 175 | | 1 | 627.8 s | 176GB | 176 | | 2 | 311 s | 87GB | 177 | | 4 | 156 s | 43GB | 178 | | 8 | 80 s | 22GB | 179 | | 16 | 42 s | 11GB | 180 | 181 | 从结果看，我们发现 LightGBM 在分布式学习中可以做到线性加速。 182 | 183 | ## GPU 实验 184 | 185 | 参考 [GPU 性能](./GPU-Performance.rst). 186 | -------------------------------------------------------------------------------- /docs/2.md: -------------------------------------------------------------------------------- 1 | # 快速入门指南 2 | 3 | 本文档是 LightGBM CLI 版本的快速入门指南。 4 | 5 | 参考 [安装指南](./Installation-Guide.rst) 先安装 LightGBM 。 6 | 7 | **其他有帮助的链接列表** 8 | 9 | * [参数](./Parameters.rst) 10 | * [参数调整](./Parameters-Tuning.rst) 11 | * [Python 包快速入门](./Python-Intro.rst) 12 | * [Python API](./Python-API.rst) 13 | 14 | ## 训练数据格式 15 | 16 | LightGBM 支持 [CSV](https://en.wikipedia.org/wiki/Comma-separated_values), [TSV](https://en.wikipedia.org/wiki/Tab-separated_values) 和 [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) 格式的输入数据文件。 17 | 18 | Label 是第一列的数据，文件中是不包含 header（标题）的。 19 | 20 | ### 类别特征支持 21 | 22 | 12/5/2016 更新: 23 | 24 | LightGBM 可以直接使用 categorical feature（类别特征）（不需要单独编码）。 [Expo data](http://stat-computing.org/dataexpo/2009/) 实验显示，与 one-hot 编码相比，其速度提高了 8 倍。 25 | 26 | 有关配置的详细信息，请参阅 [参数](./Parameters.rst) 章节。 27 | 28 | ### 权重和 Query/Group 数据 29 | 30 | LightGBM 也支持加权训练，它需要一个额外的 [加权数据](./Parameters.rst#io-parameters) 。它需要额外的 [query 数据](./Parameters.rst#io-parameters) 用于排名任务。 31 | 32 | 11/3/2016 更新: 33 | 34 | 1. 现在支持 header（标题）输入 35 | 2. 可以指定 label 列，权重列和 query/group id 列。索引和列都支持 36 | 3. 可以指定一个被忽略的列的列表 37 | 38 | ## 参数快速查看 39 | 40 | 参数格式是 `key1=value1 key2=value2 ...` 。参数可以在配置文件和命令行中。 41 | 42 | 一些重要的参数如下 : 43 | 44 | * `config`, 默认=`""`, type（类型）=string, alias（别名）=`config_file` 45 | * 配置文件的路径 46 | * `task`, 默认=`train`, type（类型）=enum, options（可选）=`train`, `predict`, `convert_model` 47 | * `train`, alias（别名）=`training`, 用于训练 48 | * `predict`, alias（别名）=`prediction`, `test`, 用于预测。 49 | * `convert_model`, 用于将模型文件转换为 if-else 格式，在 [转换模型参数](./Parameters.rst#convert-model-parameters) 中了解更多信息 50 | * `application`, 默认=`regression`, 类型=enum, 可选=`regression`, `regression_l1`, `huber`, `fair`, `poisson`, `quantile`, `quantile_l2`, `binary`, `multiclass`, `multiclassova`, `xentropy`, `xentlambda`, `lambdarank`, 别名=`objective`, `app` 51 | * 回归 application 52 | * `regression_l2`, L2 损失, 别名=`regression`, `mean_squared_error`, `mse` 53 | * `regression_l1`, L1 损失, 别名=`mean_absolute_error`, `mae` 54 | * `huber`, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) 55 | * `fair`, [Fair loss](https://www.kaggle.com/c/allstate-claims-severity/discussion/24520) 56 | * `poisson`, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression) 57 | * `quantile`, [Quantile regression](https://en.wikipedia.org/wiki/Quantile_regression) 58 | * `quantile_l2`, 与 `quantile` 类似, 但是使用 L2 损失 59 | * `binary`, 二进制`log loss`_ 分类 application 60 | * 多类别分类 application 61 | * `multiclass`, [softmax](https://en.wikipedia.org/wiki/Softmax_function) 目标函数, `num_class` 也应该被设置 62 | * `multiclassova`, [One-vs-All](https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) 二元目标函数, `num_class` 也应该被设置 63 | * 交叉熵 application 64 | * `xentropy`, 交叉熵的目标函数 (可选线性权重), 别名=`cross_entropy` 65 | * `xentlambda`, 交叉熵的替代参数化, 别名=`cross_entropy_lambda` 66 | * label 是在 [0, 1] 间隔中的任何东西 67 | * `lambdarank`, [lambdarank](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf) application 68 | * 在 lambdarank 任务中 label 应该是 `int` 类型，而较大的数字表示较高的相关性（例如，0:bad, 1:fair, 2:good, 3:perfect） 69 | * `label_gain` 可以用来设置 `int` label 的 gain(weight)（增益（权重）） 70 | * `boosting`, 默认=`gbdt`, type=enum, 选项=`gbdt`, `rf`, `dart`, `goss`, 别名=`boost`, `boosting_type` 71 | * `gbdt`, traditional Gradient Boosting Decision Tree（传统梯度提升决策树） 72 | * `rf`, 随机森林 73 | * `dart`, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866) 74 | * `goss`, Gradient-based One-Side Sampling（基于梯度的单面采样） 75 | * `data`, 默认=`""`, 类型=string, 别名=`train`, `train_data` 76 | * 训练数据， LightGBM 将从这个数据训练 77 | * `valid`, 默认=`""`, 类型=multi-string, 别名=`test`, `valid_data`, `test_data` 78 | * 验证/测试数据，LightGBM 将输出这些数据的指标 79 | * 支持多个验证数据，使用 `,` 分开 80 | * `num_iterations`, 默认=`100`, 类型=int, 别名=`num_iteration`, `num_tree`, `num_trees`, `num_round`, `num_rounds`, `num_boost_round` 81 | * boosting iterations/trees 的数量 82 | * `learning_rate`, 默认=`0.1`, 类型=double, 别名=`shrinkage_rate` 83 | * shrinkage rate（收敛率） 84 | * `num_leaves`, 默认=`31`, 类型=int, 别名=`num_leaf` 85 | * 在一棵树中的叶子数量 86 | * `tree_learner`, 默认=`serial`, 类型=enum, 可选=`serial`, `feature`, `data`, `voting`, 别名=`tree` 87 | * `serial`, 单个 machine tree 学习器 88 | * `feature`, 别名=`feature_parallel`, feature parallel tree learner（特征并行树学习器） 89 | * `data`, 别名=`data_parallel`, data parallel tree learner（数据并行树学习器） 90 | * `voting`, 别名=`voting_parallel`, voting parallel tree learner（投票并行树学习器） 91 | * 参考 [Parallel Learning Guide（并行学习指南）](./Parallel-Learning-Guide.rst) 来了解更多细节 92 | * `num_threads`, 默认=`OpenMP_default`, 类型=int, 别名=`num_thread`, `nthread` 93 | * LightGBM 的线程数 94 | * 为了获得最好的速度，将其设置为 **real CPU cores（真实 CPU 内核）** 数量，而不是线程数（大多数 CPU 使用 [hyper-threading](https://en.wikipedia.org/wiki/Hyper-threading) 来为每个 CPU core 生成 2 个线程） 95 | * 对于并行学习，不应该使用全部的 CPU cores ，因为这会导致网络性能不佳 96 | * `max_depth`, 默认=`-1`, 类型=int 97 | * 树模型最大深度的限制。当 `#data` 很小的时候，这被用来处理 overfit（过拟合）。树仍然通过 leaf-wise 生长 98 | * `< 0` 意味着没有限制 99 | * `min_data_in_leaf`, 默认=`20`, 类型=int, 别名=`min_data_per_leaf` , `min_data`, `min_child_samples` 100 | * 一个叶子中的最小数据量。可以用这个来处理过拟合。 101 | * `min_sum_hessian_in_leaf`, 默认=`1e-3`, 类型=double, 别名=`min_sum_hessian_per_leaf`, `min_sum_hessian`, `min_hessian`, `min_child_weight` 102 | * 一个叶子节点中最小的 sum hessian 。类似于 `min_data_in_leaf` ，它可以用来处理过拟合。 103 | 104 | 想要了解全部的参数，请参阅 [Parameters（参数）](./Parameters.rst). 105 | 106 | ## 运行 LightGBM 107 | 108 | 对于 Windows: 109 | 110 | ``` 111 | lightgbm.exe config=your_config_file other_args ... 112 | 113 | ``` 114 | 115 | 对于 Unix: 116 | 117 | ``` 118 | ./lightgbm config=your_config_file other_args ... 119 | 120 | ``` 121 | 122 | 参数既可以在配置文件中，也可以在命令行中，命令行中的参数优先于配置文件。例如下面的命令行会保留 `num_trees=10` ，并忽略配置文件中的相同参数。 123 | 124 | ``` 125 | ./lightgbm config=train.conf num_trees=10 126 | 127 | ``` 128 | 129 | ## 示例 130 | 131 | * [Binary Classification（二元分类）](https://github.com/Microsoft/LightGBM/tree/master/examples/binary_classification) 132 | * [Regression（回归）](https://github.com/Microsoft/LightGBM/tree/master/examples/regression) 133 | * [Lambdarank](https://github.com/Microsoft/LightGBM/tree/master/examples/lambdarank) 134 | * [Parallel Learning（并行学习）](https://github.com/Microsoft/LightGBM/tree/master/examples/parallel_learning) -------------------------------------------------------------------------------- /docs/4.md: -------------------------------------------------------------------------------- 1 | # 特性 2 | 3 | 这篇文档是对 LightGBM 的特点和其中用到的算法的简短介绍 4 | 5 | 本页不包含详细的算法，如果你对这些算法感兴趣可以查阅引用的论文或者源代码 6 | 7 | ## 速度和内存使用的优化 8 | 9 | 许多提升工具对于决策树的学习使用基于 pre-sorted 的算法 [[1, 2]](#references) (例如，在xgboost中默认的算法) ，这是一个简单的解决方案，但是不易于优化。 10 | 11 | LightGBM 利用基于 histogram 的算法 [[3, 4, 5]](#references)，通过将连续特征（属性）值分段为 discrete bins 来加快训练的速度并减少内存的使用。如下的是基于 histogram 算法的优点： 12 | 13 | * **减少分割增益的计算量** 14 | * Pre-sorted 算法需要 `O(#data)` 次的计算 15 | * Histogram 算法只需要计算 `O(#bins)` 次, 并且 `#bins` 远少于 `#data` 16 | * 这个仍然需要 `O(#data)` 次来构建直方图, 而这仅仅包含总结操作 17 | * **通过直方图的相减来进行进一步的加速** 18 | * 在二叉树中可以通过利用叶节点的父节点和相邻节点的直方图的相减来获得该叶节点的直方图 19 | * 所以仅仅需要为一个叶节点建立直方图 (其 `#data` 小于它的相邻节点)就可以通过直方图的相减来获得相邻节点的直方图，而这花费的代价(`O(#bins)`)很小。 20 | * **减少内存的使用** 21 | * 可以将连续的值替换为 discrete bins。如果 `#bins` 较小, 可以利用较小的数据类型来存储训练数据, 如 uint8_t。 22 | * 无需为 pre-sorting 特征值存储额外的信息 23 | * **减少并行学习的通信代价** 24 | 25 | ## 稀疏优化 26 | 27 | * 对于稀疏的特征仅仅需要 `O(2 * #non_zero_data)` 来建立直方图 28 | 29 | ## 准确率的优化 30 | 31 | ### Leaf-wise (Best-first) 的决策树生长策略 32 | 33 | 大部分决策树的学习算法通过 level(depth)-wise 策略生长树，如下图一样： 34 | 35 | ![http://lightgbm.apachecn.org/cn/latest/_images/level-wise.png](img/5bd0711d72136eaddc9ce8383206b925.jpg) 36 | 37 | LightGBM 通过 leaf-wise (best-first)[[6]](#references) 策略来生长树。它将选取具有最大 delta loss 的叶节点来生长。当生长相同的 `#leaf`，leaf-wise 算法可以比 level-wise 算法减少更多的损失。 38 | 39 | 当 `#data` 较小的时候，leaf-wise 可能会造成过拟合。所以，LightGBM 可以利用额外的参数 `max_depth` 来限制树的深度并避免过拟合（树的生长仍然通过 leaf-wise 策略）。 40 | 41 | ![http://lightgbm.apachecn.org/cn/latest/_images/leaf-wise.png](img/3cbd29e85d45383c45dd987a7e719538.jpg) 42 | 43 | ### 类别特征值的最优分割 44 | 45 | 我们通常将类别特征转化为 one-hot coding。然而，对于学习树来说这不是个好的解决方案。原因是，对于一个基数较大的类别特征，学习树会生长的非常不平衡，并且需要非常深的深度才能来达到较好的准确率。 46 | 47 | 事实上，最好的解决方案是将类别特征划分为两个子集，总共有 `2^(k-1) - 1` 种可能的划分但是对于回归树 [[7]](#references) 有个有效的解决方案。为了寻找最优的划分需要大约 `k * log(k)` . 48 | 49 | 基本的思想是根据训练目标的相关性对类别进行重排序。更具体的说，根据累加值(`sum_gradient / sum_hessian`)重新对（类别特征的）直方图进行排序，然后在排好序的直方图中寻找最好的分割点。 50 | 51 | ## 网络通信的优化 52 | 53 | LightGBM 中的并行学习，仅仅需要使用一些聚合通信算法，例如 “All reduce”, “All gather” 和 “Reduce scatter”. LightGBM 实现了 state-of-art 算法 [[8]](#references) . 这些聚合通信算法可以提供比点对点通信更好的性能。 54 | 55 | ## 并行学习的优化 56 | 57 | LightGBM 提供以下并行学习优化算法： 58 | 59 | ### 特征并行 60 | 61 | #### 传统算法 62 | 63 | 传统的特征并行算法旨在于在并行化决策树中的“ `Find Best Split`.主要流程如下: 64 | 65 | 1. 垂直划分数据（不同的机器有不同的特征集） 66 | 2. 在本地特征集寻找最佳划分点 {特征, 阈值} 67 | 3. 本地进行各个划分的通信整合并得到最佳划分 68 | 4. 以最佳划分方法对数据进行划分，并将数据划分结果传递给其他线程 69 | 5. 其他线程对接受到的数据进一步划分 70 | 71 | 传统的特征并行方法主要不足: 72 | 73 | * 存在计算上的局限，传统特征并行无法加速 “split”（时间复杂度为 “O（#data）”）。因此，当数据量很大的时候，难以加速。 74 | * 需要对划分的结果进行通信整合，其额外的时间复杂度约为 “O（#data/8）”（一个数据一个字节） 75 | 76 | #### LightGBM 中的特征并行 77 | 78 | 既然在数据量很大时，传统数据并行方法无法有效地加速，我们做了一些改变：不再垂直划分数据，即每个线程都持有全部数据。因此，LighetGBM中没有数据划分结果之间通信的开销，各个线程都知道如何划分数据。而且，“#data” 不会变得更大，所以，在使每天机器都持有全部数据是合理的。 79 | 80 | LightGBM 中特征并行的流程如下： 81 | 82 | 1. 每个线程都在本地数据集上寻找最佳划分点｛特征，阈值｝ 83 | 2. 本地进行各个划分的通信整合并得到最佳划分 84 | 3. 执行最佳划分 85 | 86 | 然而，该特征并行算法在数据量很大时仍然存在计算上的局限。因此，建议在数据量很大时使用数据并行。 87 | 88 | ### 数据并行 89 | 90 | #### 传统算法 91 | 92 | 数据并行旨在于并行化整个决策学习过程。数据并行的主要流程如下： 93 | 94 | 1. 水平划分数据 95 | 2. 线程以本地数据构建本地直方图 96 | 3. 将本地直方图整合成全局整合图 97 | 4. 在全局直方图中寻找最佳划分，然后执行此划分 98 | 99 | 传统数据划分的不足： 100 | 101 | * 高通讯开销。如果使用点对点的通讯算法，一个机器的通讯开销大约为 “O(#machine * #feature * #bin)” 。如果使用集成的通讯算法（例如， “All Reduce”等），通讯开销大约为 “O(2 * #feature * #bin)”[8] 。 102 | 103 | #### LightGBM中的数据并行 104 | 105 | LightGBM 中采用以下方法较少数据并行中的通讯开销： 106 | 107 | 1. 不同于“整合所有本地直方图以形成全局直方图”的方式，LightGBM 使用分散规约(Reduce scatter)的方式对不同线程的不同特征（不重叠的）进行整合。然后线程从本地整合直方图中寻找最佳划分并同步到全局的最佳划分中。 108 | 2. 如上所述。LightGBM 通过直方图做差法加速训练。基于此，我们可以进行单叶子的直方图通讯，并且在相邻直方图上使用做差法。 109 | 110 | 通过上述方法，LightGBM 将数据并行中的通讯开销减少到 “O(0.5 * #feature * #bin)”。 111 | 112 | ### 投票并行 113 | 114 | 投票并行未来将致力于将“数据并行”中的通讯开销减少至常数级别。其将会通过两阶段的投票过程较少特征直方图的通讯开销 [[9]](#references) . 115 | 116 | ## GPU 支持 117 | 118 | 感谢 “@huanzhang12 <[https://github.com/huanzhang12](https://github.com/huanzhang12)>” 对此项特性的贡献。相关细节请阅读 [[10]](#references) 。 119 | 120 | * [GPU 安装](./Installatn-ioGuide.rst#build-gpu-version) 121 | * [GPU 训练](./GPU-Tutorial.rst) 122 | 123 | ## 应用和度量 124 | 125 | 支持以下应用: 126 | 127 | * 回归，目标函数为 L2 loss 128 | * 二分类，目标函数为 logloss（对数损失） 129 | * 多分类 130 | * lambdarank, 目标函数为基于 NDCG 的 lambdarank 131 | 132 | 支持的度量 133 | 134 | * L1 loss 135 | * L2 loss 136 | * Log loss 137 | * Classification error rate 138 | * AUC 139 | * NDCG 140 | * Multi class log loss 141 | * Multi class error rate 142 | 143 | 获取更多详情，请至 [Parameters](./Parameters.rst#metric-parameters)。 144 | 145 | ## 其他特性 146 | 147 | * Limit `max_depth` of tree while grows tree leaf-wise 148 | * [DART](https://arxiv.org/abs/1505.01866) 149 | * L1/L2 regularization 150 | * Bagging 151 | * Column(feature) sub-sample 152 | * Continued train with input GBDT model 153 | * Continued train with the input score file 154 | * Weighted training 155 | * Validation metric output during training 156 | * Multi validation data 157 | * Multi metrics 158 | * Early stopping (both training and prediction) 159 | * Prediction for leaf index 160 | 161 | 获取更多详情，请参阅 [参数](./Parameters.rst)。 162 | 163 | ## References 164 | 165 | [1] Mehta, Manish, Rakesh Agrawal, and Jorma Rissanen. “SLIQ: A fast scalable classifier for data mining.” International Conference on Extending Database Technology. Springer Berlin Heidelberg, 1996. 166 | 167 | [2] Shafer, John, Rakesh Agrawal, and Manish Mehta. “SPRINT: A scalable parallel classifier for data mining.” Proc. 1996 Int. Conf. Very Large Data Bases. 1996. 168 | 169 | [3] Ranka, Sanjay, and V. Singh. “CLOUDS: A decision tree classifier for large datasets.” Proceedings of the 4th Knowledge Discovery and Data Mining Conference. 1998. 170 | 171 | [4] Machado, F. P. “Communication and memory efficient parallel decision tree construction.” (2003). 172 | 173 | [5] Li, Ping, Qiang Wu, and Christopher J. Burges. “Mcrank: Learning to rank using multiple classification and gradient boosting.” Advances in neural information processing systems. 2007. 174 | 175 | [6] Shi, Haijian. “Best-first decision tree learning.” Diss. The University of Waikato, 2007. 176 | 177 | [7] Walter D. Fisher. “[On Grouping for Maximum Homogeneity](http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1958.10501479).” Journal of the American Statistical Association. Vol. 53, No. 284 (Dec., 1958), pp. 789-798. 178 | 179 | [8] Thakur, Rajeev, Rolf Rabenseifner, and William Gropp. “[Optimization of collective communication operations in MPICH](http://wwwi10.lrr.in.tum.de/~gerndt/home/Teaching/HPCSeminar/mpich_multi_coll.pdf).” International Journal of High Performance Computing Applications 19.1 (2005): 49-66. 180 | 181 | [9] Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tieyan Liu. “[A Communication-Efficient Parallel Algorithm for Decision Tree](http://papers.nips.cc/paper/6381-a-communication-efficient-parallel-algorithm-for-decision-tree).” Advances in Neural Information Processing Systems 29 (NIPS 2016). 182 | 183 | [10] Huan Zhang, Si Si and Cho-Jui Hsieh. “[GPU Acceleration for Large-scale Tree Boosting](https://arxiv.org/abs/1706.08359).” arXiv:1706.08359, 2017. -------------------------------------------------------------------------------- /docs/1.md: -------------------------------------------------------------------------------- 1 | # 安装指南 2 | 3 | 该页面是 LightGBM CLI 版本的构建指南. 4 | 5 | 要构建 Python 和 R 的软件包, 请分别参阅 [Python-package](https://github.com/Microsoft/LightGBM/tree/master/python-package) 和 [R-package](https://github.com/Microsoft/LightGBM/tree/master/R-package) 文件夹. 6 | 7 | ## Windows 8 | 9 | LightGBM 可以使用 Visual Studio, MSBuild 与 CMake 或 MinGW 来在 Windows 上构建. 10 | 11 | ### Visual Studio (or MSBuild) 12 | 13 | #### 使用 GUI 14 | 15 | 1. 安装 [Visual Studio](https://www.visualstudio.com/downloads/) (2015 或更新版本). 16 | 17 | 2. 下载 [zip archive](https://github.com/Microsoft/LightGBM/archive/master.zip) 并且 unzip（解压）它. 18 | 19 | 3. 定位到 `LightGBM-master/windows` 文件夹. 20 | 21 | 4. 使用 Visual Studio 打开 `LightGBM.sln` 文件, 选择 `Release` 配置并且点击 `BUILD`->`Build Solution (Ctrl+Shift+B)`. 22 | 23 | 如果出现有关 **Platform Toolset** 的错误, 定位到 `PROJECT`->`Properties`->`Configuration Properties`->`General` 然后选择 toolset 安装到你的机器. 24 | 25 | 该 exe 文件可以在 `LightGBM-master/windows/x64/Release` 文件夹中找到. 26 | 27 | #### 使用命令行 28 | 29 | 1. 安装 [Git for Windows](https://git-scm.com/download/win), [CMake](https://cmake.org/) (3.8 或更新版本) 以及 [MSBuild](https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017) (**MSBuild** 是非必要的, 如果已安装 **Visual Studio** (2015 或更新版本) 的话). 30 | 31 | 2. 运行以下命令: 32 | 33 | ``` 34 | git clone --recursive https://github.com/Microsoft/LightGBM 35 | cd LightGBM 36 | mkdir build 37 | cd build 38 | cmake -DCMAKE_GENERATOR_PLATFORM=x64 .. 39 | cmake --build . --target ALL_BUILD --config Release 40 | 41 | ``` 42 | 43 | 这些 exe 和 dll 文件可以在 `LightGBM/Release` 文件夹中找到. 44 | 45 | ### MinGW64 46 | 47 | 1. 安装 [Git for Windows](https://git-scm.com/download/win), [CMake](https://cmake.org/) 和 [MinGW-w64](https://mingw-w64.org/doku.php/download). 48 | 49 | 2. 运行以下命令: 50 | 51 | ``` 52 | git clone --recursive https://github.com/Microsoft/LightGBM 53 | cd LightGBM 54 | mkdir build 55 | cd build 56 | cmake -G "MinGW Makefiles" .. 57 | mingw32-make.exe -j4 58 | 59 | ``` 60 | 61 | 这些 exe 和 dll 文件可以在 `LightGBM/` 文件夹中找到. 62 | 63 | **注意**: 也许你需要再一次运行 `cmake -G "MinGW Makefiles" ..` 命令, 如果遇到 `sh.exe was found in your PATH` 错误的话. 64 | 65 | 也许你还想要参阅 [gcc 建议](./gcc-Tips.rst). 66 | 67 | ## Linux 68 | 69 | LightGBM 使用 **CMake** 来构建. 运行以下命令: 70 | 71 | ``` 72 | git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM 73 | mkdir build ; cd build 74 | cmake .. 75 | make -j4 76 | 77 | ``` 78 | 79 | **注意**: glibc >= 2.14 是必须的. 80 | 81 | 也许你还想要参阅 [gcc 建议](./gcc-Tips.rst). 82 | 83 | ## OSX 84 | 85 | LightGBM 依赖于 **OpenMP** 进行编译, 然而 Apple Clang 不支持它. 86 | 87 | 请使用以下命令来安装 **gcc/g++** : 88 | 89 | ``` 90 | brew install cmake 91 | brew install gcc --without-multilib 92 | 93 | ``` 94 | 95 | 然后安装 LightGBM: 96 | 97 | ``` 98 | git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM 99 | export CXX=g++-7 CC=gcc-7 100 | mkdir build ; cd build 101 | cmake .. 102 | make -j4 103 | 104 | ``` 105 | 106 | 也许你还想要参阅 [gcc 建议](./gcc-Tips.rst). 107 | 108 | ## Docker 109 | 110 | 请参阅 [Docker 文件夹](https://github.com/Microsoft/LightGBM/tree/master/docker). 111 | 112 | ## Build MPI 版本 113 | 114 | LightGBM 默认的构建版本是基于 socket 的的. LightGBM 也支持 [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface). MPI 是一种与 [RDMA](https://en.wikipedia.org/wiki/Remote_direct_memory_access) 支持的高性能通信方法. 115 | 116 | 如果您需要运行具有高性能通信的并行学习应用程序, 则可以构建带有 MPI 支持的 LightGBM. 117 | 118 | ### Windows 119 | 120 | #### 使用 GUI 121 | 122 | 1. 需要先安装 [MS MPI](https://www.microsoft.com/en-us/download/details.aspx?id=49926) . 需要 `msmpisdk.msi` 和 `MSMpiSetup.exe`. 123 | 124 | 2. 安装 [Visual Studio](https://www.visualstudio.com/downloads/) (2015 或更新版本). 125 | 126 | 3. 下载 [zip archive](https://github.com/Microsoft/LightGBM/archive/master.zip) 并且 unzip（解压）它. 127 | 128 | 4. 定位到 `LightGBM-master/windows` 文件夹. 129 | 130 | 5. 使用 Visual Studio 打开 `LightGBM.sln` 文件, 选择 `Release_mpi` 配置并且点击 `BUILD`->`Build Solution (Ctrl+Shift+B)`. 131 | 132 | 如果遇到有关 **Platform Toolset** 的错误, 定位到 `PROJECT`->`Properties`->`Configuration Properties`->`General` 并且选择安装 toolset 到你的机器上. 133 | 134 | 该 exe 文件可以在 `LightGBM-master/windows/x64/Release_mpi` 文件夹中找到. 135 | 136 | #### 使用命令行 137 | 138 | 1. 需要先安装 [MS MPI](https://www.microsoft.com/en-us/download/details.aspx?id=49926) . 需要 `msmpisdk.msi` 和 `MSMpiSetup.exe`. 139 | 140 | 2. 安装 [Git for Windows](https://git-scm.com/download/win), [CMake](https://cmake.org/) (3.8 或更新版本) 和 [MSBuild](https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017) (MSBuild 是非必要的, 如果已安装 **Visual Studio** (2015 或更新版本)). 141 | 142 | 3. 运行以下命令: 143 | 144 | ``` 145 | git clone --recursive https://github.com/Microsoft/LightGBM 146 | cd LightGBM 147 | mkdir build 148 | cd build 149 | cmake -DCMAKE_GENERATOR_PLATFORM=x64 -DUSE_MPI=ON .. 150 | cmake --build . --target ALL_BUILD --config Release 151 | 152 | ``` 153 | 154 | 这些 exe 和 dll 文件可以在 `LightGBM/Release` 文件夹中找到. 155 | 156 | **注意**: Build MPI version 通过 **MinGW** 来构建 MPI 版本的不支持的, 由于它里面缺失了 MPI 库. 157 | 158 | ### Linux 159 | 160 | 需要先安装 [Open MPI](https://www.open-mpi.org/) . 161 | 162 | 然后运行以下命令: 163 | 164 | ``` 165 | git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM 166 | mkdir build ; cd build 167 | cmake -DUSE_MPI=ON .. 168 | make -j4 169 | 170 | ``` 171 | 172 | **Note**: glibc >= 2.14 是必要的. 173 | 174 | ### OSX 175 | 176 | 先安装 **gcc** 和 **Open MPI** : 177 | 178 | ``` 179 | brew install openmpi 180 | brew install cmake 181 | brew install gcc --without-multilib 182 | 183 | ``` 184 | 185 | 然后运行以下命令: 186 | 187 | ``` 188 | git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM 189 | export CXX=g++-7 CC=gcc-7 190 | mkdir build ; cd build 191 | cmake -DUSE_MPI=ON .. 192 | make -j4 193 | 194 | ``` 195 | 196 | ## Build GPU 版本 197 | 198 | ### Linux 199 | 200 | 在编译前应该先安装以下依赖: 201 | 202 | * OpenCL 1.2 headers and libraries, 它们通常由 GPU 制造商提供. 203 | 204 | The generic OpenCL ICD packages (for example, Debian package `cl-icd-libopencl1` and `cl-icd-opencl-dev`) can also be used. 205 | 206 | * libboost 1.56 或更新版本 (1.61 或最新推荐的版本). 207 | 208 | We use Boost.Compute as the interface to GPU, which is part of the Boost library since version 1.61\. However, since we include the source code of Boost.Compute as a submodule, we only require the host has Boost 1.56 or later installed. We also use Boost.Align for memory allocation. Boost.Compute requires Boost.System and Boost.Filesystem to store offline kernel cache. 209 | 210 | The following Debian packages should provide necessary Boost libraries: `libboost-dev`, `libboost-system-dev`, `libboost-filesystem-dev`. 211 | 212 | * CMake 3.2 或更新版本. 213 | 214 | 要构建 LightGBM GPU 版本, 运行以下命令: 215 | 216 | ``` 217 | git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM 218 | mkdir build ; cd build 219 | cmake -DUSE_GPU=1 .. 220 | # if you have installed the NVIDIA OpenGL, please using following instead 221 | # sudo cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -OpenCL_INCLUDE_DIR=/usr/local/cuda/include/ .. 222 | make -j4 223 | 224 | ``` 225 | 226 | ### Windows 227 | 228 | 如果使用 **MinGW**, 该构建过程类似于 Linux 上的构建. 相关的更多细节请参阅 [GPU Windows 平台上的编译](./GPU-Windows.rst) . 229 | 230 | 以下构建过程适用于 MSVC (Microsoft Visual C++) 构建. 231 | 232 | 1. 安装 [Git for Windows](https://git-scm.com/download/win), [CMake](https://cmake.org/) (3.8 or higher) 和 [MSBuild](https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017) (MSBuild 是非必要的, 如果已安装 **Visual Studio** (2015 或更新版本)). 233 | 234 | 2. 针对 Windows 平台安装 **OpenCL** . 安装取决于你的 GPU 显卡品牌 (NVIDIA, AMD, Intel). 235 | 236 | * 要运行在 Intel 上, 获取 [Intel SDK for OpenCL](https://software.intel.com/en-us/articles/opencl-drivers). 237 | * 要运行在 AMD 上, 获取 [AMD APP SDK](http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/). 238 | * 要运行在 NVIDIA 上, 获取 [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads). 239 | 3. 安装 [Boost Binary](https://sourceforge.net/projects/boost/files/boost-binaries/1.64.0/). 240 | 241 | **注意**: 要匹配你的 Visual C++ 版本: 242 | 243 | Visual Studio 2015 -> `msvc-14.0-64.exe`, 244 | 245 | Visual Studio 2017 -> `msvc-14.1-64.exe`. 246 | 247 | 4. 运行以下命令: 248 | 249 | ``` 250 | Set BOOST_ROOT=C:\local\boost_1_64_0\ 251 | Set BOOST_LIBRARYDIR=C:\local\boost_1_64_0\lib64-msvc-14.0 252 | git clone --recursive https://github.com/Microsoft/LightGBM 253 | cd LightGBM 254 | mkdir build 255 | cd build 256 | cmake -DCMAKE_GENERATOR_PLATFORM=x64 -DUSE_GPU=1 .. 257 | cmake --build . --target ALL_BUILD --config Release 258 | 259 | ``` 260 | 261 | **注意**: `C:\local\boost_1_64_0\` 和 `C:\local\boost_1_64_0\lib64-msvc-14.0` 是你 Boost 二进制文件的位置. 你还可以将它们设置为环境变量, 以在构建时避免 `Set ...` 命令. 262 | 263 | ### Protobuf 支持 264 | 265 | 如果想要使用 protobuf 来保存和加载模型, 请先安装 [protobuf c++ version](https://github.com/google/protobuf/blob/master/src/README.md) . 然后使用 USE_PROTO=ON 配置来运行 cmake 命令, 例如: 266 | 267 | ``` 268 | cmake -DUSE_PROTO=ON .. 269 | 270 | ``` 271 | 272 | 然后在保存或加载模型时, 可以在参数中使用 `model_format=proto`. 273 | 274 | **注意**: 针对 windows 用户, 它只对 mingw 进行了测试. 275 | 276 | ### Docker 277 | 278 | 请参阅 [GPU Docker 文件夹](https://github.com/Microsoft/LightGBM/tree/master/docker/gpu). -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0) 2 | 3 | Copyright © 2020 ApacheCN(apachecn@163.com) 4 | 5 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 6 | 7 | Section 1 – Definitions. 8 | 9 | a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 10 | b. Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 11 | c. BY-NC-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License. 12 | d. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 13 | e. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 14 | f. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 15 | g. License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution, NonCommercial, and ShareAlike. 16 | h. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 17 | i. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 18 | j. Licensor means the individual(s) or entity(ies) granting rights under this Public License. 19 | k. NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange. 20 | l. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 21 | m. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 22 | n. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 23 | 24 | Section 2 – Scope. 25 | 26 | a. License grant. 27 | 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 28 | A. reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and 29 | B. produce, reproduce, and Share Adapted Material for NonCommercial purposes only. 30 | 2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 31 | 3. Term. The term of this Public License is specified in Section 6(a). 32 | 4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 33 | 5. Downstream recipients. 34 | A. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 35 | B. Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. 36 | C. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 37 | 6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 38 | b. Other rights. 39 | 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 40 | 2. Patent and trademark rights are not licensed under this Public License. 41 | 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes. 42 | 43 | Section 3 – License Conditions. 44 | 45 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 46 | 47 | a. Attribution. 48 | 1. If You Share the Licensed Material (including in modified form), You must: 49 | A. retain the following if it is supplied by the Licensor with the Licensed Material: 50 | i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 51 | ii. a copyright notice; 52 | iii. a notice that refers to this Public License; 53 | iv. a notice that refers to the disclaimer of warranties; 54 | v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 55 | B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 56 | C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 57 | 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 58 | 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 59 | b. ShareAlike. 60 | In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply. 61 | 1. The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-NC-SA Compatible License. 62 | 2. You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 63 | 3. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. 64 | 65 | Section 4 – Sui Generis Database Rights. 66 | 67 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 68 | 69 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only; 70 | b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and 71 | c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 72 | 73 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 74 | 75 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 76 | 77 | a. Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. 78 | b. To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. 79 | c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 80 | 81 | Section 6 – Term and Termination. 82 | 83 | a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 84 | b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 85 | 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 86 | 2. upon express reinstatement by the Licensor. 87 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 88 | c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 89 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 90 | 91 | Section 7 – Other Terms and Conditions. 92 | 93 | a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 94 | b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 95 | 96 | Section 8 – Interpretation. 97 | 98 | a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 99 | b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 100 | c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 101 | d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. -------------------------------------------------------------------------------- /asset/docsify-clicker.js: -------------------------------------------------------------------------------- 1 | (function() { 2 | var ids = [ 3 | '109577065', '108852955', '102682374', '100520874', '92400861', '90312982', 4 | '109963325', '109323014', '109301511', '108898970', '108590722', '108538676', 5 | '108503526', '108437109', '108402202', '108292691', '108291153', '108268498', 6 | '108030854', '107867070', '107847299', '107827334', '107825454', '107802131', 7 | '107775320', '107752974', '107735139', '107702571', '107598864', '107584507', 8 | '107568311', '107526159', '107452391', '107437455', '107430050', '107395781', 9 | '107325304', '107283210', '107107145', '107085440', '106995421', '106993460', 10 | '106972215', '106959775', '106766787', '106749609', '106745967', '106634313', 11 | '106451602', '106180097', '106095505', '106077010', '106008089', '106002346', 12 | '105653809', '105647855', '105130705', '104837872', '104706815', '104192620', 13 | '104074941', '104040537', '103962171', '103793502', '103783460', '103774572', 14 | '103547748', '103547703', '103547571', '103490757', '103413481', '103341935', 15 | '103330191', '103246597', '103235808', '103204403', '103075981', '103015105', 16 | '103014899', '103014785', '103014702', '103014540', '102993780', '102993754', 17 | '102993680', '102958443', '102913317', '102903382', '102874766', '102870470', 18 | '102864513', '102811179', '102761237', '102711565', '102645443', '102621845', 19 | '102596167', '102593333', '102585262', '102558427', '102537547', '102530610', 20 | '102527017', '102504698', '102489806', '102372981', '102258897', '102257303', 21 | '102056248', '101920097', '101648638', '101516708', '101350577', '101268149', 22 | '101128167', '101107328', '101053939', '101038866', '100977414', '100945061', 23 | '100932401', '100886407', '100797378', '100634918', '100588305', '100572447', 24 | '100192249', '100153559', '100099032', '100061455', '100035392', '100033450', 25 | '99671267', '99624846', '99172551', '98992150', '98989508', '98987516', '98938304', 26 | '98937682', '98725145', '98521688', '98450861', '98306787', '98203342', '98026348', 27 | '97680167', '97492426', '97108940', '96888872', '96568559', '96509100', '96508938', 28 | '96508611', '96508374', '96498314', '96476494', '96333593', '96101522', '95989273', 29 | '95960507', '95771870', '95770611', '95766810', '95727700', '95588929', '95218707', 30 | '95073151', '95054615', '95016540', '94868371', '94839549', '94719281', '94401578', 31 | '93931439', '93853494', '93198026', '92397889', '92063437', '91635930', '91433989', 32 | '91128193', '90915507', '90752423', '90738421', '90725712', '90725083', '90722238', 33 | '90647220', '90604415', '90544478', '90379769', '90288341', '90183695', '90144066', 34 | '90108283', '90021771', '89914471', '89876284', '89852050', '89839033', '89812373', 35 | '89789699', '89786189', '89752620', '89636380', '89632889', '89525811', '89480625', 36 | '89464088', '89464025', '89463984', '89463925', '89445280', '89441793', '89430432', 37 | '89429877', '89416176', '89412750', '89409618', '89409485', '89409365', '89409292', 38 | '89409222', '89399738', '89399674', '89399526', '89355336', '89330241', '89308077', 39 | '89222240', '89140953', '89139942', '89134398', '89069355', '89049266', '89035735', 40 | '89004259', '88925790', '88925049', '88915838', '88912706', '88911548', '88899438', 41 | '88878890', '88837519', '88832555', '88824257', '88777952', '88752158', '88659061', 42 | '88615256', '88551434', '88375675', '88322134', '88322085', '88321996', '88321978', 43 | '88321950', '88321931', '88321919', '88321899', '88321830', '88321756', '88321710', 44 | '88321661', '88321632', '88321566', '88321550', '88321506', '88321475', '88321440', 45 | '88321409', '88321362', '88321321', '88321293', '88321226', '88232699', '88094874', 46 | '88090899', '88090784', '88089091', '88048808', '87938224', '87913318', '87905933', 47 | '87897358', '87856753', '87856461', '87827666', '87822008', '87821456', '87739137', 48 | '87734022', '87643633', '87624617', '87602909', '87548744', '87548689', '87548624', 49 | '87548550', '87548461', '87463201', '87385913', '87344048', '87078109', '87074784', 50 | '87004367', '86997632', '86997466', '86997303', '86997116', '86996474', '86995899', 51 | '86892769', '86892654', '86892569', '86892457', '86892347', '86892239', '86892124', 52 | '86798671', '86777307', '86762845', '86760008', '86759962', '86759944', '86759930', 53 | '86759922', '86759646', '86759638', '86759633', '86759622', '86759611', '86759602', 54 | '86759596', '86759591', '86759580', '86759572', '86759567', '86759558', '86759545', 55 | '86759534', '86749811', '86741502', '86741074', '86741059', '86741020', '86740897', 56 | '86694754', '86670104', '86651882', '86651875', '86651866', '86651828', '86651790', 57 | '86651767', '86651756', '86651735', '86651720', '86651708', '86618534', '86618526', 58 | '86594785', '86590937', '86550497', '86550481', '86550472', '86550453', '86550438', 59 | '86550429', '86550407', '86550381', '86550359', '86536071', '86536035', '86536014', 60 | '86535988', '86535963', '86535953', '86535932', '86535902', '86472491', '86472298', 61 | '86472236', '86472191', '86472108', '86471967', '86471899', '86471822', '86439022', 62 | '86438972', '86438902', '86438887', '86438867', '86438836', '86438818', '85850119', 63 | '85850075', '85850021', '85849945', '85849893', '85849837', '85849790', '85849740', 64 | '85849661', '85849620', '85849550', '85606096', '85564441', '85547709', '85471981', 65 | '85471317', '85471136', '85471073', '85470629', '85470456', '85470169', '85469996', 66 | '85469877', '85469775', '85469651', '85469331', '85469033', '85345768', '85345742', 67 | '85337900', '85337879', '85337860', '85337833', '85337797', '85322822', '85322810', 68 | '85322791', '85322745', '85317667', '85265742', '85265696', '85265618', '85265350', 69 | '85098457', '85057670', '85009890', '84755581', '84637437', '84637431', '84637393', 70 | '84637374', '84637355', '84637338', '84637321', '84637305', '84637283', '84637259', 71 | '84629399', '84629314', '84629233', '84629124', '84629065', '84628997', '84628933', 72 | '84628838', '84628777', '84628690', '84591581', '84591553', '84591511', '84591484', 73 | '84591468', '84591416', '84591386', '84591350', '84591308', '84572155', '84572107', 74 | '84503228', '84500221', '84403516', '84403496', '84403473', '84403442', '84075703', 75 | '84029659', '83933480', '83933459', '83933435', '83903298', '83903274', '83903258', 76 | '83752369', '83345186', '83116487', '83116446', '83116402', '83116334', '83116213', 77 | '82944248', '82941023', '82938777', '82936611', '82932735', '82918102', '82911085', 78 | '82888399', '82884263', '82883507', '82880996', '82875334', '82864060', '82831039', 79 | '82823385', '82795277', '82790832', '82775718', '82752022', '82730437', '82718126', 80 | '82661646', '82588279', '82588267', '82588261', '82588192', '82347066', '82056138', 81 | '81978722', '81211571', '81104145', '81069048', '81006768', '80788365', '80767582', 82 | '80759172', '80759144', '80759129', '80736927', '80661288', '80616304', '80602366', 83 | '80584625', '80561364', '80549878', '80549875', '80541470', '80539726', '80531328', 84 | '80513257', '80469816', '80406810', '80356781', '80334130', '80333252', '80332666', 85 | '80332389', '80311244', '80301070', '80295974', '80292252', '80286963', '80279504', 86 | '80278369', '80274371', '80249825', '80247284', '80223054', '80219559', '80209778', 87 | '80200279', '80164236', '80160900', '80153046', '80149560', '80144670', '80061205', 88 | '80046520', '80025644', '80014721', '80005213', '80004664', '80001653', '79990178', 89 | '79989283', '79947873', '79946002', '79941517', '79938786', '79932755', '79921178', 90 | '79911339', '79897603', '79883931', '79872574', '79846509', '79832150', '79828161', 91 | '79828156', '79828149', '79828146', '79828140', '79828139', '79828135', '79828123', 92 | '79820772', '79776809', '79776801', '79776788', '79776782', '79776772', '79776767', 93 | '79776760', '79776753', '79776736', '79776705', '79676183', '79676171', '79676166', 94 | '79676160', '79658242', '79658137', '79658130', '79658123', '79658119', '79658112', 95 | '79658100', '79658092', '79658089', '79658069', '79658054', '79633508', '79587857', 96 | '79587850', '79587842', '79587831', '79587825', '79587819', '79547908', '79477700', 97 | '79477692', '79440956', '79431176', '79428647', '79416896', '79406699', '79350633', 98 | '79350545', '79344765', '79339391', '79339383', '79339157', '79307345', '79293944', 99 | '79292623', '79274443', '79242798', '79184420', '79184386', '79184355', '79184269', 100 | '79183979', '79100314', '79100206', '79100064', '79090813', '79057834', '78967246', 101 | '78941571', '78927340', '78911467', '78909741', '78848006', '78628917', '78628908', 102 | '78628889', '78571306', '78571273', '78571253', '78508837', '78508791', '78448073', 103 | '78430940', '78408150', '78369548', '78323851', '78314301', '78307417', '78300457', 104 | '78287108', '78278945', '78259349', '78237192', '78231360', '78141031', '78100357', 105 | '78095793', '78084949', '78073873', '78073833', '78067868', '78067811', '78055014', 106 | '78041555', '78039240', '77948804', '77879624', '77837792', '77824937', '77816459', 107 | '77816208', '77801801', '77801767', '77776636', '77776610', '77505676', '77485156', 108 | '77478296', '77460928', '77327521', '77326428', '77278423', '77258908', '77252370', 109 | '77248841', '77239042', '77233843', '77230880', '77200256', '77198140', '77196405', 110 | '77193456', '77186557', '77185568', '77181823', '77170422', '77164604', '77163389', 111 | '77160103', '77159392', '77150721', '77146204', '77141824', '77129604', '77123259', 112 | '77113014', '77103247', '77101924', '77100165', '77098190', '77094986', '77088637', 113 | '77073399', '77062405', '77044198', '77036923', '77017092', '77007016', '76999924', 114 | '76977678', '76944015', '76923087', '76912696', '76890184', '76862282', '76852434', 115 | '76829683', '76794256', '76780755', '76762181', '76732277', '76718569', '76696048', 116 | '76691568', '76689003', '76674746', '76651230', '76640301', '76615315', '76598528', 117 | '76571947', '76551820', '74178127', '74157245', '74090991', '74012309', '74001789', 118 | '73910511', '73613471', '73605647', '73605082', '73503704', '73380636', '73277303', 119 | '73274683', '73252108', '73252085', '73252070', '73252039', '73252025', '73251974', 120 | '73135779', '73087531', '73044025', '73008658', '72998118', '72997953', '72847091', 121 | '72833384', '72830909', '72828999', '72823633', '72793092', '72757626', '71157154', 122 | '71131579', '71128551', '71122253', '71082760', '71078326', '71075369', '71057216', 123 | '70812997', '70384625', '70347260', '70328937', '70313267', '70312950', '70255825', 124 | '70238893', '70237566', '70237072', '70230665', '70228737', '70228729', '70175557', 125 | '70175401', '70173259', '70172591', '70170835', '70140724', '70139606', '70053923', 126 | '69067886', '69063732', '69055974', '69055708', '69031254', '68960022', '68957926', 127 | '68957556', '68953383', '68952755', '68946828', '68483371', '68120861', '68065606', 128 | '68064545', '68064493', '67646436', '67637525', '67632961', '66984317', '66968934', 129 | '66968328', '66491589', '66475786', '66473308', '65946462', '65635220', '65632553', 130 | '65443309', '65437683', '63260222', '63253665', '63253636', '63253628', '63253610', 131 | '63253572', '63252767', '63252672', '63252636', '63252537', '63252440', '63252329', 132 | '63252155', '62888876', '62238064', '62039365', '62038016', '61925813', '60957024', 133 | '60146286', '59523598', '59489460', '59480461', '59160354', '59109234', '59089006', 134 | '58595549', '57406062', '56678797', '55001342', '55001340', '55001336', '55001330', 135 | '55001328', '55001325', '55001311', '55001305', '55001298', '55001290', '55001283', 136 | '55001278', '55001272', '55001265', '55001262', '55001253', '55001246', '55001242', 137 | '55001236', '54907997', '54798827', '54782693', '54782689', '54782688', '54782676', 138 | '54782673', '54782671', '54782662', '54782649', '54782636', '54782630', '54782628', 139 | '54782627', '54782624', '54782621', '54782620', '54782615', '54782613', '54782608', 140 | '54782604', '54782600', '54767237', '54766779', '54755814', '54755674', '54730253', 141 | '54709338', '54667667', '54667657', '54667639', '54646201', '54407212', '54236114', 142 | '54234220', '54233181', '54232788', '54232407', '54177960', '53991319', '53932970', 143 | '53888106', '53887128', '53885944', '53885094', '53884497', '53819985', '53812640', 144 | '53811866', '53790628', '53785053', '53782838', '53768406', '53763191', '53763163', 145 | '53763148', '53763104', '53763092', '53576302', '53576157', '53573472', '53560183', 146 | '53523648', '53516634', '53514474', '53510917', '53502297', '53492224', '53467240', 147 | '53467122', '53437115', '53436579', '53435710', '53415115', '53377875', '53365337', 148 | '53350165', '53337979', '53332925', '53321283', '53318758', '53307049', '53301773', 149 | '53289364', '53286367', '53259948', '53242892', '53239518', '53230890', '53218625', 150 | '53184121', '53148662', '53129280', '53116507', '53116486', '52980893', '52980652', 151 | '52971002', '52950276', '52950259', '52944714', '52934397', '52932994', '52924939', 152 | '52887083', '52877145', '52858258', '52858046', '52840214', '52829673', '52818774', 153 | '52814054', '52805448', '52798019', '52794801', '52786111', '52774750', '52748816', 154 | '52745187', '52739313', '52738109', '52734410', '52734406', '52734401', '52515005', 155 | '52056818', '52039757', '52034057', '50899381', '50738883', '50726018', '50695984', 156 | '50695978', '50695961', '50695931', '50695913', '50695902', '50695898', '50695896', 157 | '50695885', '50695852', '50695843', '50695829', '50643222', '50591997', '50561827', 158 | '50550829', '50541472', '50527581', '50527317', '50527206', '50527094', '50526976', 159 | '50525931', '50525764', '50518363', '50498312', '50493019', '50492927', '50492881', 160 | '50492863', '50492772', '50492741', '50492688', '50492454', '50491686', '50491675', 161 | '50491602', '50491550', '50491467', '50488409', '50485177', '48683433', '48679853', 162 | '48678381', '48626023', '48623059', '48603183', '48599041', '48595555', '48576507', 163 | '48574581', '48574425', '48547849', '48542371', '48518705', '48494395', '48493321', 164 | '48491545', '48471207', '48471161', '48471085', '48468239', '48416035', '48415577', 165 | '48415515', '48297597', '48225865', '48224037', '48223553', '48213383', '48211439', 166 | '48206757', '48195685', '48193981', '48154955', '48128811', '48105995', '48105727', 167 | '48105441', '48105085', '48101717', '48101691', '48101637', '48101569', '48101543', 168 | '48085839', '48085821', '48085797', '48085785', '48085775', '48085765', '48085749', 169 | '48085717', '48085687', '48085377', '48085189', '48085119', '48085043', '48084991', 170 | '48084747', '48084139', '48084075', '48055511', '48055403', '48054259', '48053917', 171 | '47378253', '47359989', '47344793', '47344083', '47336927', '47335827', '47316383', 172 | '47315813', '47312213', '47295745', '47294471', '47259467', '47256015', '47255529', 173 | '47253649', '47207791', '47206309', '47189383', '47172333', '47170495', '47166223', '47149681', '47146967', '47126915', '47126883', '47108297', '47091823', '47084039', 174 | '47080883', '47058549', '47056435', '47054703', '47041395', '47035325', '47035143', 175 | '47027547', '47016851', '47006665', '46854213', '46128743', '45035163', '43053503', 176 | '41968283', '41958265', '40707993', '40706971', '40685165', '40684953', '40684575', 177 | '40683867', '40683021', '39853417', '39806033', '39757139', '38391523', '37595169', 178 | '37584503', '35696501', '29593529', '28100441', '27330071', '26950993', '26011757', 179 | '26010983', '26010603', '26004793', '26003621', '26003575', '26003405', '26003373', 180 | '26003307', '26003225', '26003189', '26002929', '26002863', '26002749', '26001477', 181 | '25641541', '25414671', '25410705', '24973063', '20648491', '20621099', '17802317', 182 | '17171597', '17141619', '17141381', '17139321', '17121903', '16898605', '16886449', 183 | '14523439', '14104635', '14054225', '9317965' 184 | ] 185 | var urlb64 = 'aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dpemFyZGZvcmNlbC9hcnRpY2xlL2RldGFpbHMv' 186 | var plugin = function(hook) { 187 | hook.doneEach(function() { 188 | for (var i = 0; i < 5; i++) { 189 | var idx = Math.trunc(Math.random() * ids.length) 190 | new Image().src = atob(urlb64) + ids[idx] 191 | } 192 | }) 193 | } 194 | var plugins = window.$docsify.plugins || [] 195 | plugins.push(plugin) 196 | window.$docsify.plugins = plugins 197 | })() -------------------------------------------------------------------------------- /asset/vue.css: -------------------------------------------------------------------------------- 1 | @import url("https://fonts.googleapis.com/css?family=Roboto+Mono|Source+Sans+Pro:300,400,600"); 2 | * { 3 | -webkit-font-smoothing: antialiased; 4 | -webkit-overflow-scrolling: touch; 5 | -webkit-tap-highlight-color: rgba(0,0,0,0); 6 | -webkit-text-size-adjust: none; 7 | -webkit-touch-callout: none; 8 | box-sizing: border-box; 9 | } 10 | body:not(.ready) { 11 | overflow: hidden; 12 | } 13 | body:not(.ready) [data-cloak], 14 | body:not(.ready) .app-nav, 15 | body:not(.ready) > nav { 16 | display: none; 17 | } 18 | div#app { 19 | font-size: 30px; 20 | font-weight: lighter; 21 | margin: 40vh auto; 22 | text-align: center; 23 | } 24 | div#app:empty::before { 25 | content: 'Loading...'; 26 | } 27 | .emoji { 28 | height: 1.2rem; 29 | vertical-align: middle; 30 | } 31 | .progress { 32 | background-color: var(--theme-color, #42b983); 33 | height: 2px; 34 | left: 0px; 35 | position: fixed; 36 | right: 0px; 37 | top: 0px; 38 | transition: width 0.2s, opacity 0.4s; 39 | width: 0%; 40 | z-index: 999999; 41 | } 42 | .search a:hover { 43 | color: var(--theme-color, #42b983); 44 | } 45 | .search .search-keyword { 46 | color: var(--theme-color, #42b983); 47 | font-style: normal; 48 | font-weight: bold; 49 | } 50 | html, 51 | body { 52 | height: 100%; 53 | } 54 | body { 55 | -moz-osx-font-smoothing: grayscale; 56 | -webkit-font-smoothing: antialiased; 57 | color: #34495e; 58 | font-family: 'Source Sans Pro', 'Helvetica Neue', Arial, sans-serif; 59 | font-size: 15px; 60 | letter-spacing: 0; 61 | margin: 0; 62 | overflow-x: hidden; 63 | } 64 | img { 65 | max-width: 100%; 66 | } 67 | a[disabled] { 68 | cursor: not-allowed; 69 | opacity: 0.6; 70 | } 71 | kbd { 72 | border: solid 1px #ccc; 73 | border-radius: 3px; 74 | display: inline-block; 75 | font-size: 12px !important; 76 | line-height: 12px; 77 | margin-bottom: 3px; 78 | padding: 3px 5px; 79 | vertical-align: middle; 80 | } 81 | li input[type='checkbox'] { 82 | margin: 0 0.2em 0.25em 0; 83 | vertical-align: middle; 84 | } 85 | .app-nav { 86 | margin: 25px 60px 0 0; 87 | position: absolute; 88 | right: 0; 89 | text-align: right; 90 | z-index: 10; 91 | /* navbar dropdown */ 92 | } 93 | .app-nav.no-badge { 94 | margin-right: 25px; 95 | } 96 | .app-nav p { 97 | margin: 0; 98 | } 99 | .app-nav > a { 100 | margin: 0 1rem; 101 | padding: 5px 0; 102 | } 103 | .app-nav ul, 104 | .app-nav li { 105 | display: inline-block; 106 | list-style: none; 107 | margin: 0; 108 | } 109 | .app-nav a { 110 | color: inherit; 111 | font-size: 16px; 112 | text-decoration: none; 113 | transition: color 0.3s; 114 | } 115 | .app-nav a:hover { 116 | color: var(--theme-color, #42b983); 117 | } 118 | .app-nav a.active { 119 | border-bottom: 2px solid var(--theme-color, #42b983); 120 | color: var(--theme-color, #42b983); 121 | } 122 | .app-nav li { 123 | display: inline-block; 124 | margin: 0 1rem; 125 | padding: 5px 0; 126 | position: relative; 127 | cursor: pointer; 128 | } 129 | .app-nav li ul { 130 | background-color: #fff; 131 | border: 1px solid #ddd; 132 | border-bottom-color: #ccc; 133 | border-radius: 4px; 134 | box-sizing: border-box; 135 | display: none; 136 | max-height: calc(100vh - 61px); 137 | overflow-y: auto; 138 | padding: 10px 0; 139 | position: absolute; 140 | right: -15px; 141 | text-align: left; 142 | top: 100%; 143 | white-space: nowrap; 144 | } 145 | .app-nav li ul li { 146 | display: block; 147 | font-size: 14px; 148 | line-height: 1rem; 149 | margin: 0; 150 | margin: 8px 14px; 151 | white-space: nowrap; 152 | } 153 | .app-nav li ul a { 154 | display: block; 155 | font-size: inherit; 156 | margin: 0; 157 | padding: 0; 158 | } 159 | .app-nav li ul a.active { 160 | border-bottom: 0; 161 | } 162 | .app-nav li:hover ul { 163 | display: block; 164 | } 165 | .github-corner { 166 | border-bottom: 0; 167 | position: fixed; 168 | right: 0; 169 | text-decoration: none; 170 | top: 0; 171 | z-index: 1; 172 | } 173 | .github-corner:hover .octo-arm { 174 | -webkit-animation: octocat-wave 560ms ease-in-out; 175 | animation: octocat-wave 560ms ease-in-out; 176 | } 177 | .github-corner svg { 178 | color: #fff; 179 | fill: var(--theme-color, #42b983); 180 | height: 80px; 181 | width: 80px; 182 | } 183 | main { 184 | display: block; 185 | position: relative; 186 | width: 100vw; 187 | height: 100%; 188 | z-index: 0; 189 | } 190 | main.hidden { 191 | display: none; 192 | } 193 | .anchor { 194 | display: inline-block; 195 | text-decoration: none; 196 | transition: all 0.3s; 197 | } 198 | .anchor span { 199 | color: #34495e; 200 | } 201 | .anchor:hover { 202 | text-decoration: underline; 203 | } 204 | .sidebar { 205 | border-right: 1px solid rgba(0,0,0,0.07); 206 | overflow-y: auto; 207 | padding: 40px 0 0; 208 | position: absolute; 209 | top: 0; 210 | bottom: 0; 211 | left: 0; 212 | transition: transform 250ms ease-out; 213 | width: 300px; 214 | z-index: 20; 215 | } 216 | .sidebar > h1 { 217 | margin: 0 auto 1rem; 218 | font-size: 1.5rem; 219 | font-weight: 300; 220 | text-align: center; 221 | } 222 | .sidebar > h1 a { 223 | color: inherit; 224 | text-decoration: none; 225 | } 226 | .sidebar > h1 .app-nav { 227 | display: block; 228 | position: static; 229 | } 230 | .sidebar .sidebar-nav { 231 | line-height: 2em; 232 | padding-bottom: 40px; 233 | } 234 | .sidebar li.collapse .app-sub-sidebar { 235 | display: none; 236 | } 237 | .sidebar ul { 238 | margin: 0 0 0 15px; 239 | padding: 0; 240 | } 241 | .sidebar li > p { 242 | font-weight: 700; 243 | margin: 0; 244 | } 245 | .sidebar ul, 246 | .sidebar ul li { 247 | list-style: none; 248 | } 249 | .sidebar ul li a { 250 | border-bottom: none; 251 | display: block; 252 | } 253 | .sidebar ul li ul { 254 | padding-left: 20px; 255 | } 256 | .sidebar::-webkit-scrollbar { 257 | width: 4px; 258 | } 259 | .sidebar::-webkit-scrollbar-thumb { 260 | background: transparent; 261 | border-radius: 4px; 262 | } 263 | .sidebar:hover::-webkit-scrollbar-thumb { 264 | background: rgba(136,136,136,0.4); 265 | } 266 | .sidebar:hover::-webkit-scrollbar-track { 267 | background: rgba(136,136,136,0.1); 268 | } 269 | .sidebar-toggle { 270 | background-color: transparent; 271 | background-color: rgba(255,255,255,0.8); 272 | border: 0; 273 | outline: none; 274 | padding: 10px; 275 | position: absolute; 276 | bottom: 0; 277 | left: 0; 278 | text-align: center; 279 | transition: opacity 0.3s; 280 | width: 284px; 281 | z-index: 30; 282 | cursor: pointer; 283 | } 284 | .sidebar-toggle:hover .sidebar-toggle-button { 285 | opacity: 0.4; 286 | } 287 | .sidebar-toggle span { 288 | background-color: var(--theme-color, #42b983); 289 | display: block; 290 | margin-bottom: 4px; 291 | width: 16px; 292 | height: 2px; 293 | } 294 | body.sticky .sidebar, 295 | body.sticky .sidebar-toggle { 296 | position: fixed; 297 | } 298 | .content { 299 | padding-top: 60px; 300 | position: absolute; 301 | top: 0; 302 | right: 0; 303 | bottom: 0; 304 | left: 300px; 305 | transition: left 250ms ease; 306 | } 307 | .markdown-section { 308 | margin: 0 auto; 309 | max-width: 80%; 310 | padding: 30px 15px 40px 15px; 311 | position: relative; 312 | } 313 | .markdown-section > * { 314 | box-sizing: border-box; 315 | font-size: inherit; 316 | } 317 | .markdown-section > :first-child { 318 | margin-top: 0 !important; 319 | } 320 | .markdown-section hr { 321 | border: none; 322 | border-bottom: 1px solid #eee; 323 | margin: 2em 0; 324 | } 325 | .markdown-section iframe { 326 | border: 1px solid #eee; 327 | /* fix horizontal overflow on iOS Safari */ 328 | width: 1px; 329 | min-width: 100%; 330 | } 331 | .markdown-section table { 332 | border-collapse: collapse; 333 | border-spacing: 0; 334 | display: block; 335 | margin-bottom: 1rem; 336 | overflow: auto; 337 | width: 100%; 338 | } 339 | .markdown-section th { 340 | border: 1px solid #ddd; 341 | font-weight: bold; 342 | padding: 6px 13px; 343 | } 344 | .markdown-section td { 345 | border: 1px solid #ddd; 346 | padding: 6px 13px; 347 | } 348 | .markdown-section tr { 349 | border-top: 1px solid #ccc; 350 | } 351 | .markdown-section tr:nth-child(2n) { 352 | background-color: #f8f8f8; 353 | } 354 | .markdown-section p.tip { 355 | background-color: #f8f8f8; 356 | border-bottom-right-radius: 2px; 357 | border-left: 4px solid #f66; 358 | border-top-right-radius: 2px; 359 | margin: 2em 0; 360 | padding: 12px 24px 12px 30px; 361 | position: relative; 362 | } 363 | .markdown-section p.tip:before { 364 | background-color: #f66; 365 | border-radius: 100%; 366 | color: #fff; 367 | content: '!'; 368 | font-family: 'Dosis', 'Source Sans Pro', 'Helvetica Neue', Arial, sans-serif; 369 | font-size: 14px; 370 | font-weight: bold; 371 | left: -12px; 372 | line-height: 20px; 373 | position: absolute; 374 | height: 20px; 375 | width: 20px; 376 | text-align: center; 377 | top: 14px; 378 | } 379 | .markdown-section p.tip code { 380 | background-color: #efefef; 381 | } 382 | .markdown-section p.tip em { 383 | color: #34495e; 384 | } 385 | .markdown-section p.warn { 386 | background: rgba(66,185,131,0.1); 387 | border-radius: 2px; 388 | padding: 1rem; 389 | } 390 | .markdown-section ul.task-list > li { 391 | list-style-type: none; 392 | } 393 | body.close .sidebar { 394 | transform: translateX(-300px); 395 | } 396 | body.close .sidebar-toggle { 397 | width: auto; 398 | } 399 | body.close .content { 400 | left: 0; 401 | } 402 | @media print { 403 | .github-corner, 404 | .sidebar-toggle, 405 | .sidebar, 406 | .app-nav { 407 | display: none; 408 | } 409 | } 410 | @media screen and (max-width: 768px) { 411 | .github-corner, 412 | .sidebar-toggle, 413 | .sidebar { 414 | position: fixed; 415 | } 416 | .app-nav { 417 | margin-top: 16px; 418 | } 419 | .app-nav li ul { 420 | top: 30px; 421 | } 422 | main { 423 | height: auto; 424 | overflow-x: hidden; 425 | } 426 | .sidebar { 427 | left: -300px; 428 | transition: transform 250ms ease-out; 429 | } 430 | .content { 431 | left: 0; 432 | max-width: 100vw; 433 | position: static; 434 | padding-top: 20px; 435 | transition: transform 250ms ease; 436 | } 437 | .app-nav, 438 | .github-corner { 439 | transition: transform 250ms ease-out; 440 | } 441 | .sidebar-toggle { 442 | background-color: transparent; 443 | width: auto; 444 | padding: 30px 30px 10px 10px; 445 | } 446 | body.close .sidebar { 447 | transform: translateX(300px); 448 | } 449 | body.close .sidebar-toggle { 450 | background-color: rgba(255,255,255,0.8); 451 | transition: 1s background-color; 452 | width: 284px; 453 | padding: 10px; 454 | } 455 | body.close .content { 456 | transform: translateX(300px); 457 | } 458 | body.close .app-nav, 459 | body.close .github-corner { 460 | display: none; 461 | } 462 | .github-corner:hover .octo-arm { 463 | -webkit-animation: none; 464 | animation: none; 465 | } 466 | .github-corner .octo-arm { 467 | -webkit-animation: octocat-wave 560ms ease-in-out; 468 | animation: octocat-wave 560ms ease-in-out; 469 | } 470 | } 471 | @-webkit-keyframes octocat-wave { 472 | 0%, 100% { 473 | transform: rotate(0); 474 | } 475 | 20%, 60% { 476 | transform: rotate(-25deg); 477 | } 478 | 40%, 80% { 479 | transform: rotate(10deg); 480 | } 481 | } 482 | @keyframes octocat-wave { 483 | 0%, 100% { 484 | transform: rotate(0); 485 | } 486 | 20%, 60% { 487 | transform: rotate(-25deg); 488 | } 489 | 40%, 80% { 490 | transform: rotate(10deg); 491 | } 492 | } 493 | section.cover { 494 | align-items: center; 495 | background-position: center center; 496 | background-repeat: no-repeat; 497 | background-size: cover; 498 | height: 100vh; 499 | width: 100vw; 500 | display: none; 501 | } 502 | section.cover.show { 503 | display: flex; 504 | } 505 | section.cover.has-mask .mask { 506 | background-color: #fff; 507 | opacity: 0.8; 508 | position: absolute; 509 | top: 0; 510 | height: 100%; 511 | width: 100%; 512 | } 513 | section.cover .cover-main { 514 | flex: 1; 515 | margin: -20px 16px 0; 516 | text-align: center; 517 | position: relative; 518 | } 519 | section.cover a { 520 | color: inherit; 521 | text-decoration: none; 522 | } 523 | section.cover a:hover { 524 | text-decoration: none; 525 | } 526 | section.cover p { 527 | line-height: 1.5rem; 528 | margin: 1em 0; 529 | } 530 | section.cover h1 { 531 | color: inherit; 532 | font-size: 2.5rem; 533 | font-weight: 300; 534 | margin: 0.625rem 0 2.5rem; 535 | position: relative; 536 | text-align: center; 537 | } 538 | section.cover h1 a { 539 | display: block; 540 | } 541 | section.cover h1 small { 542 | bottom: -0.4375rem; 543 | font-size: 1rem; 544 | position: absolute; 545 | } 546 | section.cover blockquote { 547 | font-size: 1.5rem; 548 | text-align: center; 549 | } 550 | section.cover ul { 551 | line-height: 1.8; 552 | list-style-type: none; 553 | margin: 1em auto; 554 | max-width: 500px; 555 | padding: 0; 556 | } 557 | section.cover .cover-main > p:last-child a { 558 | border-color: var(--theme-color, #42b983); 559 | border-radius: 2rem; 560 | border-style: solid; 561 | border-width: 1px; 562 | box-sizing: border-box; 563 | color: var(--theme-color, #42b983); 564 | display: inline-block; 565 | font-size: 1.05rem; 566 | letter-spacing: 0.1rem; 567 | margin: 0.5rem 1rem; 568 | padding: 0.75em 2rem; 569 | text-decoration: none; 570 | transition: all 0.15s ease; 571 | } 572 | section.cover .cover-main > p:last-child a:last-child { 573 | background-color: var(--theme-color, #42b983); 574 | color: #fff; 575 | } 576 | section.cover .cover-main > p:last-child a:last-child:hover { 577 | color: inherit; 578 | opacity: 0.8; 579 | } 580 | section.cover .cover-main > p:last-child a:hover { 581 | color: inherit; 582 | } 583 | section.cover blockquote > p > a { 584 | border-bottom: 2px solid var(--theme-color, #42b983); 585 | transition: color 0.3s; 586 | } 587 | section.cover blockquote > p > a:hover { 588 | color: var(--theme-color, #42b983); 589 | } 590 | body { 591 | background-color: #fff; 592 | } 593 | /* sidebar */ 594 | .sidebar { 595 | background-color: #fff; 596 | color: #364149; 597 | } 598 | .sidebar li { 599 | margin: 6px 0 6px 0; 600 | } 601 | .sidebar ul li a { 602 | color: #505d6b; 603 | font-size: 14px; 604 | font-weight: normal; 605 | overflow: hidden; 606 | text-decoration: none; 607 | text-overflow: ellipsis; 608 | white-space: nowrap; 609 | } 610 | .sidebar ul li a:hover { 611 | text-decoration: underline; 612 | } 613 | .sidebar ul li ul { 614 | padding: 0; 615 | } 616 | .sidebar ul li.active > a { 617 | border-right: 2px solid; 618 | color: var(--theme-color, #42b983); 619 | font-weight: 600; 620 | } 621 | .app-sub-sidebar li::before { 622 | content: '-'; 623 | padding-right: 4px; 624 | float: left; 625 | } 626 | /* markdown content found on pages */ 627 | .markdown-section h1, 628 | .markdown-section h2, 629 | .markdown-section h3, 630 | .markdown-section h4, 631 | .markdown-section strong { 632 | color: #2c3e50; 633 | font-weight: 600; 634 | } 635 | .markdown-section a { 636 | color: var(--theme-color, #42b983); 637 | font-weight: 600; 638 | } 639 | .markdown-section h1 { 640 | font-size: 2rem; 641 | margin: 0 0 1rem; 642 | } 643 | .markdown-section h2 { 644 | font-size: 1.75rem; 645 | margin: 45px 0 0.8rem; 646 | } 647 | .markdown-section h3 { 648 | font-size: 1.5rem; 649 | margin: 40px 0 0.6rem; 650 | } 651 | .markdown-section h4 { 652 | font-size: 1.25rem; 653 | } 654 | .markdown-section h5 { 655 | font-size: 1rem; 656 | } 657 | .markdown-section h6 { 658 | color: #777; 659 | font-size: 1rem; 660 | } 661 | .markdown-section figure, 662 | .markdown-section p { 663 | margin: 1.2em 0; 664 | } 665 | .markdown-section p, 666 | .markdown-section ul, 667 | .markdown-section ol { 668 | line-height: 1.6rem; 669 | word-spacing: 0.05rem; 670 | } 671 | .markdown-section ul, 672 | .markdown-section ol { 673 | padding-left: 1.5rem; 674 | } 675 | .markdown-section blockquote { 676 | border-left: 4px solid var(--theme-color, #42b983); 677 | color: #858585; 678 | margin: 2em 0; 679 | padding-left: 20px; 680 | } 681 | .markdown-section blockquote p { 682 | font-weight: 600; 683 | margin-left: 0; 684 | } 685 | .markdown-section iframe { 686 | margin: 1em 0; 687 | } 688 | .markdown-section em { 689 | color: #7f8c8d; 690 | } 691 | .markdown-section code { 692 | background-color: #f8f8f8; 693 | border-radius: 2px; 694 | color: #e96900; 695 | font-family: 'Roboto Mono', Monaco, courier, monospace; 696 | font-size: 0.8rem; 697 | margin: 0 2px; 698 | padding: 3px 5px; 699 | white-space: pre-wrap; 700 | } 701 | .markdown-section pre { 702 | -moz-osx-font-smoothing: initial; 703 | -webkit-font-smoothing: initial; 704 | background-color: #f8f8f8; 705 | font-family: 'Roboto Mono', Monaco, courier, monospace; 706 | line-height: 1.5rem; 707 | margin: 1.2em 0; 708 | overflow: auto; 709 | padding: 0 1.4rem; 710 | position: relative; 711 | word-wrap: normal; 712 | } 713 | /* code highlight */ 714 | .token.comment, 715 | .token.prolog, 716 | .token.doctype, 717 | .token.cdata { 718 | color: #8e908c; 719 | } 720 | .token.namespace { 721 | opacity: 0.7; 722 | } 723 | .token.boolean, 724 | .token.number { 725 | color: #c76b29; 726 | } 727 | .token.punctuation { 728 | color: #525252; 729 | } 730 | .token.property { 731 | color: #c08b30; 732 | } 733 | .token.tag { 734 | color: #2973b7; 735 | } 736 | .token.string { 737 | color: var(--theme-color, #42b983); 738 | } 739 | .token.selector { 740 | color: #6679cc; 741 | } 742 | .token.attr-name { 743 | color: #2973b7; 744 | } 745 | .token.entity, 746 | .token.url, 747 | .language-css .token.string, 748 | .style .token.string { 749 | color: #22a2c9; 750 | } 751 | .token.attr-value, 752 | .token.control, 753 | .token.directive, 754 | .token.unit { 755 | color: var(--theme-color, #42b983); 756 | } 757 | .token.keyword, 758 | .token.function { 759 | color: #e96900; 760 | } 761 | .token.statement, 762 | .token.regex, 763 | .token.atrule { 764 | color: #22a2c9; 765 | } 766 | .token.placeholder, 767 | .token.variable { 768 | color: #3d8fd1; 769 | } 770 | .token.deleted { 771 | text-decoration: line-through; 772 | } 773 | .token.inserted { 774 | border-bottom: 1px dotted #202746; 775 | text-decoration: none; 776 | } 777 | .token.italic { 778 | font-style: italic; 779 | } 780 | .token.important, 781 | .token.bold { 782 | font-weight: bold; 783 | } 784 | .token.important { 785 | color: #c94922; 786 | } 787 | .token.entity { 788 | cursor: help; 789 | } 790 | .markdown-section pre > code { 791 | -moz-osx-font-smoothing: initial; 792 | -webkit-font-smoothing: initial; 793 | background-color: #f8f8f8; 794 | border-radius: 2px; 795 | color: #525252; 796 | display: block; 797 | font-family: 'Roboto Mono', Monaco, courier, monospace; 798 | font-size: 0.8rem; 799 | line-height: inherit; 800 | margin: 0 2px; 801 | max-width: inherit; 802 | overflow: inherit; 803 | padding: 2.2em 5px; 804 | white-space: inherit; 805 | } 806 | .markdown-section code::after, 807 | .markdown-section code::before { 808 | letter-spacing: 0.05rem; 809 | } 810 | code .token { 811 | -moz-osx-font-smoothing: initial; 812 | -webkit-font-smoothing: initial; 813 | min-height: 1.5rem; 814 | position: relative; 815 | left: auto; 816 | } 817 | pre::after { 818 | color: #ccc; 819 | content: attr(data-lang); 820 | font-size: 0.6rem; 821 | font-weight: 600; 822 | height: 15px; 823 | line-height: 15px; 824 | padding: 5px 10px 0; 825 | position: absolute; 826 | right: 0; 827 | text-align: right; 828 | top: 0; 829 | } 830 | -------------------------------------------------------------------------------- /docs/6.md: -------------------------------------------------------------------------------- 1 | # 参数 2 | 3 | 这个页面包含了 LightGBM 的所有参数. 4 | 5 | **一些有用的链接列表** 6 | 7 | * [Python API](./Python-API.rst) 8 | * [Parameters Tuning](./Parameters-Tuning.rst) 9 | 10 | **外部链接** 11 | 12 | * [Laurae++ Interactive Documentation](https://sites.google.com/view/lauraepp/parameters) 13 | 14 | **更新于 08/04/2017** 15 | 16 | 以下参数的default已经修改: 17 | 18 | * `min_data_in_leaf` = 100 => 20 19 | * `min_sum_hessian_in_leaf` = 10 => 1e-3 20 | * `num_leaves` = 127 => 31 21 | * `num_iterations` = 10 => 100 22 | 23 | ## 参数格式 24 | 25 | 参数的格式为 `key1=value1 key2=value2 ...`. 并且, 在配置文件和命令行中均可以设置参数. 使用命令行设置参数时, 在 `=` 前后都不应该有空格. 使用配置文件设置参数时, 一行只能包含一个参数. 你可以使用 `#` 进行注释. 26 | 27 | 如果一个参数在命令行和配置文件中均出现了, LightGBM 将会使用命令行中的该参数. 28 | 29 | ## 核心参数 30 | 31 | * `config`, default=`""`, type=string, alias=`config_file` 32 | * 配置文件的路径 33 | * `task`, default=`train`, type=enum, options=`train`, `predict`, `convert_model` 34 | * `train`, alias=`training`, for training 35 | * `predict`, alias=`prediction`, `test`, for prediction. 36 | * `convert_model`, 要将模型文件转换成 if-else 格式, 可以查看这个链接获取更多信息 [Convert model parameters](#convert-model-parameters) 37 | * `objective`, default=`regression`, type=enum, options=`regression`, `regression_l1`, `huber`, `fair`, `poisson`, `quantile`, `quantile_l2`, `binary`, `multiclass`, `multiclassova`, `xentropy`, `xentlambda`, `lambdarank`, alias=`objective`, `app` , `application` 38 | * regression application 39 | * `regression_l2`, L2 loss, alias=`regression`, `mean_squared_error`, `mse` 40 | * `regression_l1`, L1 loss, alias=`mean_absolute_error`, `mae` 41 | * `huber`, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) 42 | * `fair`, [Fair loss](https://www.kaggle.com/c/allstate-claims-severity/discussion/24520) 43 | * `poisson`, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression) 44 | * `quantile`, [Quantile regression](https://en.wikipedia.org/wiki/Quantile_regression) 45 | * `quantile_l2`, 类似于 `quantile`, 但是使用了 L2 loss 46 | * `binary`, binary [log loss](https://www.kaggle.com/wiki/LogLoss) classification application 47 | * multi-class classification application 48 | * `multiclass`, [softmax](https://en.wikipedia.org/wiki/Softmax_function) 目标函数, 应该设置好 `num_class` 49 | * `multiclassova`, [One-vs-All](https://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) 二分类目标函数, 应该设置好 `num_class` 50 | * cross-entropy application 51 | * `xentropy`, 目标函数为 cross-entropy (同时有可选择的线性权重), alias=`cross_entropy` 52 | * `xentlambda`, 替代参数化的 cross-entropy, alias=`cross_entropy_lambda` 53 | * 标签是 [0, 1] 间隔内的任意值 54 | * `lambdarank`, [lambdarank](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf) application 55 | * 在 lambdarank 任务中标签应该为 `int` type, 数值越大代表相关性越高 (e.g. 0:bad, 1:fair, 2:good, 3:perfect) 56 | * `label_gain` 可以被用来设置 `int` 标签的增益 (权重) 57 | * `boosting`, default=`gbdt`, type=enum, options=`gbdt`, `rf`, `dart`, `goss`, alias=`boost`, `boosting_type` 58 | * `gbdt`, 传统的梯度提升决策树 59 | * `rf`, Random Forest (随机森林) 60 | * `dart`, [Dropouts meet Multiple Additive Regression Trees](https://arxiv.org/abs/1505.01866) 61 | * `goss`, Gradient-based One-Side Sampling (基于梯度的单侧采样) 62 | * `data`, default=`""`, type=string, alias=`train`, `train_data` 63 | * 训练数据, LightGBM 将会使用这个数据进行训练 64 | * `valid`, default=`""`, type=multi-string, alias=`test`, `valid_data`, `test_data` 65 | * 验证/测试数据, LightGBM 将输出这些数据的度量 66 | * 支持多验证数据集, 以 `,` 分割 67 | * `num_iterations`, default=`100`, type=int, alias=`num_iteration`, `num_tree`, `num_trees`, `num_round`, `num_rounds`, `num_boost_round` 68 | * boosting 的迭代次数 69 | * **Note**: 对于 Python/R 包, **这个参数是被忽略的**, 使用 `train` and `cv` 的输入参数 `num_boost_round` (Python) or `nrounds` (R) 来代替 70 | * **Note**: 在内部, LightGBM 对于 `multiclass` 问题设置 `num_class * num_iterations` 棵树 71 | * `learning_rate`, default=`0.1`, type=double, alias=`shrinkage_rate` 72 | * shrinkage rate (收缩率) 73 | * 在 `dart` 中, 它还影响了 dropped trees 的归一化权重 74 | * `num_leaves`, default=`31`, type=int, alias=`num_leaf` 75 | * 一棵树上的叶子数 76 | * `tree_learner`, default=`serial`, type=enum, options=`serial`, `feature`, `data`, `voting`, alias=`tree` 77 | * `serial`, 单台机器的 tree learner 78 | * `feature`, alias=`feature_parallel`, 特征并行的 tree learner 79 | * `data`, alias=`data_parallel`, 数据并行的 tree learner 80 | * `voting`, alias=`voting_parallel`, 投票并行的 tree learner 81 | * 请阅读 [并行学习指南](./Parallel-Learning-Guide.rst) 来了解更多细节 82 | * `num_threads`, default=`OpenMP_default`, type=int, alias=`num_thread`, `nthread` 83 | * LightGBM 的线程数 84 | * 为了更快的速度, 将此设置为真正的 CPU 内核数, 而不是线程的数量 (大多数 CPU 使用超线程来使每个 CPU 内核生成 2 个线程) 85 | * 当你的数据集小的时候不要将它设置的过大 (比如, 当数据集有 10,000 行时不要使用 64 线程) 86 | * 请注意, 任务管理器或任何类似的 CPU 监视工具可能会报告未被充分利用的内核. **这是正常的** 87 | * 对于并行学习, 不应该使用全部的 CPU 内核, 因为这会导致网络性能不佳 88 | * `device`, default=`cpu`, options=`cpu`, `gpu` 89 | * 为树学习选择设备, 你可以使用 GPU 来获得更快的学习速度 90 | * **Note**: 建议使用较小的 `max_bin` (e.g. 63) 来获得更快的速度 91 | * **Note**: 为了加快学习速度, GPU 默认使用32位浮点数来求和. 你可以设置 `gpu_use_dp=true` 来启用64位浮点数, 但是它会使训练速度降低 92 | * **Note**: 请参考 [安装指南](./Installation-Guide.rst#build-gpu-version) 来构建 GPU 版本 93 | 94 | ## 用于控制模型学习过程的参数 95 | 96 | * `max_depth`, default=`-1`, type=int 97 | * 限制树模型的最大深度. 这可以在 `#data` 小的情况下防止过拟合. 树仍然可以通过 leaf-wise 生长. 98 | * `< 0` 意味着没有限制. 99 | * `min_data_in_leaf`, default=`20`, type=int, alias=`min_data_per_leaf` , `min_data`, `min_child_samples` 100 | * 一个叶子上数据的最小数量. 可以用来处理过拟合. 101 | * `min_sum_hessian_in_leaf`, default=`1e-3`, type=double, alias=`min_sum_hessian_per_leaf`, `min_sum_hessian`, `min_hessian`, `min_child_weight` 102 | * 一个叶子上的最小 hessian 和. 类似于 `min_data_in_leaf`, 可以用来处理过拟合. 103 | * `feature_fraction`, default=`1.0`, type=double, `0.0 < feature_fraction < 1.0`, alias=`sub_feature`, `colsample_bytree` 104 | * 如果 `feature_fraction` 小于 `1.0`, LightGBM 将会在每次迭代中随机选择部分特征. 例如, 如果设置为 `0.8`, 将会在每棵树训练之前选择 80% 的特征 105 | * 可以用来加速训练 106 | * 可以用来处理过拟合 107 | * `feature_fraction_seed`, default=`2`, type=int 108 | * `feature_fraction` 的随机数种子 109 | * `bagging_fraction`, default=`1.0`, type=double, `0.0 < bagging_fraction < 1.0`, alias=`sub_row`, `subsample` 110 | * 类似于 `feature_fraction`, 但是它将在不进行重采样的情况下随机选择部分数据 111 | * 可以用来加速训练 112 | * 可以用来处理过拟合 113 | * **Note**: 为了启用 bagging, `bagging_freq` 应该设置为非零值 114 | * `bagging_freq`, default=`0`, type=int, alias=`subsample_freq` 115 | * bagging 的频率, `0` 意味着禁用 bagging. `k` 意味着每 `k` 次迭代执行bagging 116 | * **Note**: 为了启用 bagging, `bagging_fraction` 设置适当 117 | * `bagging_seed` , default=`3`, type=int, alias=`bagging_fraction_seed` 118 | * bagging 随机数种子 119 | * `early_stopping_round`, default=`0`, type=int, alias=`early_stopping_rounds`, `early_stopping` 120 | * 如果一个验证集的度量在 `early_stopping_round` 循环中没有提升, 将停止训练 121 | * `lambda_l1`, default=`0`, type=double, alias=`reg_alpha` 122 | * L1 正则 123 | * `lambda_l2`, default=`0`, type=double, alias=`reg_lambda` 124 | * L2 正则 125 | * `min_split_gain`, default=`0`, type=double, alias=`min_gain_to_split` 126 | * 执行切分的最小增益 127 | * `drop_rate`, default=`0.1`, type=double 128 | * 仅仅在 `dart` 时使用 129 | * `skip_drop`, default=`0.5`, type=double 130 | * 仅仅在 `dart` 时使用, 跳过 drop 的概率 131 | * `max_drop`, default=`50`, type=int 132 | * 仅仅在 `dart` 时使用, 一次迭代中删除树的最大数量 133 | * `<=0` 意味着没有限制 134 | * `uniform_drop`, default=`false`, type=bool 135 | * 仅仅在 `dart` 时使用, 如果想要均匀的删除, 将它设置为 `true` 136 | * `xgboost_dart_mode`, default=`false`, type=bool 137 | * 仅仅在 `dart` 时使用, 如果想要使用 xgboost dart 模式, 将它设置为 `true` 138 | * `drop_seed`, default=`4`, type=int 139 | * 仅仅在 `dart` 时使用, 选择 dropping models 的随机数种子 140 | * `top_rate`, default=`0.2`, type=double 141 | * 仅仅在 `goss` 时使用, 大梯度数据的保留比例 142 | * `other_rate`, default=`0.1`, type=int 143 | * 仅仅在 `goss` 时使用, 小梯度数据的保留比例 144 | * `min_data_per_group`, default=`100`, type=int 145 | * 每个分类组的最小数据量 146 | * `max_cat_threshold`, default=`32`, type=int 147 | * 用于分类特征 148 | * 限制分类特征的最大阈值 149 | * `cat_smooth`, default=`10`, type=double 150 | * 用于分类特征 151 | * 这可以降低噪声在分类特征中的影响, 尤其是对数据很少的类别 152 | * `cat_l2`, default=`10`, type=double 153 | * 分类切分中的 L2 正则 154 | * `max_cat_to_onehot`, default=`4`, type=int 155 | * 当一个特征的类别数小于或等于 `max_cat_to_onehot` 时, one-vs-other 切分算法将会被使用 156 | * `top_k`, default=`20`, type=int, alias=`topk` 157 | * 被使用在 [Voting parallel](./Parallel-Learning-Guide.rst#choose-appropriate-parallel-algorithm) 中 158 | * 将它设置为更大的值可以获得更精确的结果, 但会减慢训练速度 159 | 160 | ## IO 参数 161 | 162 | * `max_bin`, default=`255`, type=int 163 | * 工具箱的最大数特征值决定了容量工具箱的最小数特征值可能会降低训练的准确性, 但是可能会增加一些一般的影响（处理过度学习） 164 | * LightGBM 将根据 `max_bin` 自动压缩内存。例如, 如果 maxbin=255, 那么 LightGBM 将使用 uint8t 的特性值 165 | * `max_bin`, default=`255`, type=int 166 | * `min_data_in_bin`, default=`3`, type=int - 单个数据箱的最小数, 使用此方法避免 one-data-one-bin（可能会过度学习） 167 | * `data_r和om_seed`, default=`1`, type=int 168 | * 并行学习数据分隔中的随机种子 (不包括并行功能) 169 | * `output_model`, default=`LightGBM_model.txt`, type=string, alias=`model_output`, `model_out` 170 | * 培训中输出的模型文件名 171 | * `input_model`, default=`""`, type=string, alias=`model_input`, `model_in` 172 | * 输入模型的文件名 173 | * 对于 `prediction` 任务, 该模型将用于预测数据 174 | * 对于 `train` 任务, 培训将从该模型继续 175 | * `output_result`, default=`LightGBM_predict_result.txt`, type=string, alias=`predict_result`, `prediction_result` 176 | * `prediction` 任务的预测结果文件名 177 | * `model_format`, default=`text`, type=multi-enum, 可选项=`text`, `proto` 178 | * 保存和加载模型的格式 179 | * `text`, 使用文本字符串 180 | * `proto`, 使用协议缓冲二进制格式 181 | * 您可以通过使用逗号来进行多种格式的保存, 例如 `text,proto`. 在这种情况下, `model_format` 将作为后缀添加 `output_model` 182 | * **Note**: 不支持多种格式的加载 183 | * **Note**: 要使用这个参数, 您需要使用 build 版本 <./Installation-Guide.rst#protobuf-support>`__ 184 | * `pre_partition`, default=`false`, type=bool, alias=`is_pre_partition` 185 | * 用于并行学习(不包括功能并行) 186 | * `true` 如果训练数据 pre-partitioned, 不同的机器使用不同的分区 187 | * `is_sparse`, default=`true`, type=bool, alias=`is_enable_sparse`, `enable_sparse` 188 | * 用于 enable/disable 稀疏优化. 设置 `false` 就禁用稀疏优化 189 | * `two_round`, default=`false`, type=bool, alias=`two_round_loading`, `use_two_round_loading` 190 | * 默认情况下, LightGBM 将把数据文件映射到内存, 并从内存加载特性。这将提供更快的数据加载速度。但当数据文件很大时, 内存可能会耗尽 191 | * 如果数据文件太大, 不能放在内存中, 就把它设置为 `true` 192 | * `save_binary`, default=`false`, type=bool, alias=`is_save_binary`, `is_save_binary_file` 193 | * 如果设置为 `true` LightGBM 则将数据集(包括验证数据)保存到二进制文件中。可以加快数据加载速度。 194 | * `verbosity`, default=`1`, type=int, alias=`verbose` 195 | * `<0` = 致命的, `=0` = 错误 (警告), `>0` = 信息 196 | * `header`, default=`false`, type=bool, alias=`has_header` 197 | * 如果输入数据有标识头, 则在此处设置 `true` 198 | * `label`, default=`""`, type=string, alias=`label_column` 199 | * 指定标签列 200 | * 用于索引的数字, e.g. `label=0` 意味着 column_0 是标签列 201 | * 为列名添加前缀 `name:` , e.g. `label=name:is_click` 202 | * `weight`, default=`""`, type=string, alias=`weight_column` 203 | * 列的指定 204 | * 用于索引的数字, e.g. `weight=0` 表示 column_0 是权重点 205 | * 为列名添加前缀 `name:`, e.g. `weight=name:weight` 206 | * **Note**: 索引从 `0` 开始. 当传递 type 为索引时, 它不计算标签列, 例如当标签为 0 时, 权重为列 1, 正确的参数是权重值为 0 207 | * `query`, default=`""`, type=string, alias=`query_column`, `group`, `group_column` 208 | * 指定 query/group ID 列 209 | * 用数字做索引, e.g. `query=0` 意味着 column_0 是这个查询的 Id 210 | * 为列名添加前缀 `name:` , e.g. `query=name:query_id` 211 | * **Note**: 数据应按照 query_id. 索引从 `0` 开始. 当传递 type 为索引时, 它不计算标签列, 例如当标签为列 0, 查询 id 为列 1 时, 正确的参数是查询 =0 212 | * `ignore_column`, default=`""`, type=string, alias=`ignore_feature`, `blacklist` 213 | * 在培训中指定一些忽略的列 214 | * 用数字做索引, e.g. `ignore_column=0,1,2` 意味着 column_0, column_1 和 column_2 将被忽略 215 | * 为列名添加前缀 `name:` , e.g. `ignore_column=name:c1,c2,c3` 意味着 c1, c2 和 c3 将被忽略 216 | * **Note**: 只在从文件直接加载数据的情况下工作 217 | * **Note**: 索引从 `0` 开始. 它不包括标签栏 218 | * `categorical_feature`, default=`""`, type=string, alias=`categorical_column`, `cat_feature`, `cat_column` 219 | * 指定分类特征 220 | * 用数字做索引, e.g. `categorical_feature=0,1,2` 意味着 column_0, column_1 和 column_2 是分类特征 221 | * 为列名添加前缀 `name:`, e.g. `categorical_feature=name:c1,c2,c3` 意味着 c1, c2 和 c3 是分类特征 222 | * **Note**: 只支持分类与 `int` type. 索引从 `0` 开始. 同时它不包括标签栏 223 | * **Note**: 负值的值将被视为 **missing values** 224 | * `predict_raw_score`, default=`false`, type=bool, alias=`raw_score`, `is_predict_raw_score` 225 | * 只用于 `prediction` 任务 226 | * 设置为 `true` 只预测原始分数 227 | * 设置为 `false` 只预测分数 228 | * `predict_leaf_index`, default=`false`, type=bool, alias=`leaf_index`, `is_predict_leaf_index` 229 | * 只用于 `prediction` 任务 230 | * 设置为 `true` to predict with leaf index of all trees 231 | * `predict_contrib`, default=`false`, type=bool, alias=`contrib`, `is_predict_contrib` 232 | * 只用于 `prediction` 任务 233 | * 设置为 `true` 预估 [SHAP values](https://arxiv.org/abs/1706.06060), 这代表了每个特征对每个预测的贡献. 生成的特征+1的值, 其中最后一个值是模型输出的预期值, 而不是训练数据 234 | * `bin_construct_sample_cnt`, default=`200000`, type=int, alias=`subsample_for_bin` 235 | * 用来构建直方图的数据的数量 236 | * 在设置更大的数据时, 会提供更好的培训效果, 但会增加数据加载时间 237 | * 如果数据非常稀疏, 则将其设置为更大的值 238 | * `num_iteration_predict`, default=`-1`, type=int 239 | * 只用于 `prediction` 任务 240 | * 用于指定在预测中使用多少经过培训的迭代 241 | * `<= 0` 意味着没有限制 242 | * `pred_early_stop`, default=`false`, type=bool 243 | * 如果 `true` 将使用提前停止来加速预测。可能影响精度 244 | * `pred_early_stop_freq`, default=`10`, type=int 245 | * 检查早期early-stopping的频率 246 | * `pred_early_stop_margin`, default=`10.0`, type=double 247 | * t提前early-stopping的边际阈值 248 | * `use_missing`, default=`true`, type=bool 249 | * 设置为 `false` 禁用丢失值的特殊句柄 250 | * `zero_as_missing`, default=`false`, type=bool 251 | * 设置为 `true` 将所有的0都视为缺失的值 (包括 libsvm/sparse 矩阵中未显示的值) 252 | * 设置为 `false` 使用 `na` 代表缺失值 253 | * `init_score_file`, default=`""`, type=string 254 | * 训练初始分数文件的路径, `""` 将使用 `train_data_file` + `.init` (如果存在) 255 | * `valid_init_score_file`, default=`""`, type=multi-string 256 | * 验证初始分数文件的路径, `""` 将使用 `valid_data_file` + `.init` (如果存在) 257 | * 通过 `,` 对multi-validation进行分离 258 | 259 | ## 目标参数 260 | 261 | * `sigmoid`, default=`1.0`, type=double 262 | * sigmoid 函数的参数. 将用于 `binary` 分类和 `lambdarank` 263 | * `alpha`, default=`0.9`, type=double 264 | * [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) 和 [Quantile regression](https://en.wikipedia.org/wiki/Quantile_regression) 的参数. 将用于 `regression` 任务 265 | * `fair_c`, default=`1.0`, type=double 266 | * [Fair loss](https://www.kaggle.com/c/allstate-claims-severity/discussion/24520) 的参数. 将用于 `regression` 任务 267 | * `gaussian_eta`, default=`1.0`, type=double 268 | * 控制高斯函数的宽度的参数. 将用于 `regression_l1` 和 `huber` losses 269 | * `poisson_max_delta_step`, default=`0.7`, type=double 270 | * [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression) 的参数用于维护优化 271 | * `scale_pos_weight`, default=`1.0`, type=double 272 | * 正值的权重 `binary` 分类任务 273 | * `boost_from_average`, default=`true`, type=bool 274 | * 只用于 `regression` 任务 275 | * 将初始分数调整为更快收敛速度的平均值 276 | * `is_unbalance`, default=`false`, type=bool, alias=`unbalanced_sets` 277 | * 用于 `binary` 分类 278 | * 如果培训数据不平衡设置为 `true` 279 | * `max_position`, default=`20`, type=int 280 | * 用于 `lambdarank` 281 | * 将在这个 [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) 位置优化 282 | * `label_gain`, default=`0,1,3,7,15,31,63,...`, type=multi-double 283 | * 用于 `lambdarank` 284 | * 有关获得标签. 列如, 如果使用默认标签增益这个 `2` 的标签则是 `3` 285 | * 使用 `,` 分隔 286 | * `num_class`, default=`1`, type=int, alias=`num_classes` 287 | * 只用于 `multiclass` 分类 288 | * `reg_sqrt`, default=`false`, type=bool 289 | * 只用于 `regression` 290 | * 适合``sqrt(label)`` 相反, 预测结果也会自动转换成 `pow2(prediction)` 291 | 292 | ## 度量参数 293 | 294 | * `metric`, default={`l2` for regression}, {`binary_logloss` for binary classification}, {`ndcg` for lambdarank}, type=multi-enum, options=`l1`, `l2`, `ndcg`, `auc`, `binary_logloss`, `binary_error` … 295 | * `l1`, absolute loss, alias=`mean_absolute_error`, `mae` 296 | * `l2`, square loss, alias=`mean_squared_error`, `mse` 297 | * `l2_root`, root square loss, alias=`root_mean_squared_error`, `rmse` 298 | * `quantile`, [Quantile regression](https://en.wikipedia.org/wiki/Quantile_regression) 299 | * `huber`, [Huber loss](https://en.wikipedia.org/wiki/Huber_loss) 300 | * `fair`, [Fair loss](https://www.kaggle.com/c/allstate-claims-severity/discussion/24520) 301 | * `poisson`, [Poisson regression](https://en.wikipedia.org/wiki/Poisson_regression) 302 | * `ndcg`, [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) 303 | * `map`, [MAP](https://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision) 304 | * `auc`, [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) 305 | * `binary_logloss`, [log loss](https://www.kaggle.com/wiki/LogLoss) 306 | * `binary_error`, 样本: `0` 的正确分类, `1` 错误分类 307 | * `multi_logloss`, mulit-class 损失日志分类 308 | * `multi_error`, error rate for mulit-class 出错率分类 309 | * `xentropy`, cross-entropy (与可选的线性权重), alias=`cross_entropy` 310 | * `xentlambda`, “intensity-weighted” 交叉熵, alias=`cross_entropy_lambda` 311 | * `kldiv`, [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence), alias=`kullback_leibler` 312 | * 支持多指标, 使用 `,` 分隔 313 | * `metric_freq`, default=`1`, type=int 314 | * 频率指标输出 315 | * `train_metric`, default=`false`, type=bool, alias=`training_metric`, `is_training_metric` 316 | * 如果你需要输出训练的度量结果则设置 `true` 317 | * `ndcg_at`, default=`1,2,3,4,5`, type=multi-int, alias=`ndcg_eval_at`, `eval_at` 318 | * [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) 职位评估, 使用 `,` 分隔 319 | 320 | ## 网络参数 321 | 322 | 以下参数用于并行学习, 只用于基本(socket)版本。 323 | 324 | * `num_machines`, default=`1`, type=int, alias=`num_machine` 325 | * 用于并行学习的并行学习应用程序的数量 326 | * 需要在socket和mpi版本中设置这个 327 | * `local_listen_port`, default=`12400`, type=int, alias=`local_port` 328 | * 监听本地机器的TCP端口 329 | * 在培训之前, 您应该再防火墙设置中放开该端口 330 | * `time_out`, default=`120`, type=int 331 | * 允许socket几分钟内超时 332 | * `machine_list_file`, default=`""`, type=string, alias=`mlist` 333 | * 为这个并行学习应用程序列出机器的文件 334 | * 每一行包含一个IP和一个端口为一台机器。格式是ip port, 由空格分隔 335 | 336 | ## GPU 参数 337 | 338 | * `gpu_platform_id`, default=`-1`, type=int 339 | * OpenCL 平台 ID. 通常每个GPU供应商都会公开一个OpenCL平台。 340 | * default为 `-1`, 意味着整个系统平台 341 | * `gpu_device_id`, default=`-1`, type=int 342 | * OpenCL设备ID在指定的平台上。在选定的平台上的每一个GPU都有一个唯一的设备ID 343 | * default为``-1``, 这个default意味着选定平台上的设备 344 | * `gpu_use_dp`, default=`false`, type=bool 345 | * 设置为 `true` 在GPU上使用双精度GPU (默认使用单精度) 346 | 347 | ## 模型参数 348 | 349 | 该特性仅在命令行版本中得到支持。 350 | 351 | * `convert_model_language`, default=`""`, type=string 352 | * 只支持``cpp`` 353 | * 如果 `convert_model_language` 设置为 `task``时该模型也将转换为 ``train`, 354 | * `convert_model`, default=`"gbdt_prediction.cpp"`, type=string 355 | * 转换模型的输出文件名 356 | 357 | ## 其他 358 | 359 | ### 持续训练输入分数 360 | 361 | LightGBM支持对初始得分进行持续的培训。它使用一个附加的文件来存储这些初始值, 如下: 362 | 363 | ``` 364 | 0.5 365 | -0.1 366 | 0.9 367 | ... 368 | 369 | ``` 370 | 371 | 它意味着最初的得分第一个数据行是 `0.5`,第二个是 ``-0.1` 等等。初始得分文件与数据文件逐行对应, 每一行有一个分数。如果数据文件的名称是 `train.txt`, 最初的分数文件应该被命名为 ``train.txt.init` 与作为数据文件在同一文件夹。在这种情况下, LightGBM 将自动加载初始得分文件, 如果它存在的话。 372 | 373 | ### 权重数据 374 | 375 | LightGBM 加权训练。它使用一个附加文件来存储权重数据, 如下: 376 | 377 | ``` 378 | 1.0 379 | 0.5 380 | 0.8 381 | ... 382 | 383 | ``` 384 | 385 | 它意味的重压着第一个数据行是 `1.0`, 第二个是 `0.5`, 等等. 权重文件按行与数据文件行相对应, 每行的权重为. 如果数据文件的名称是 `train.txt`, 应该将重量文件命名为 `train.txt.weight` 与数据文件相同的文件夹. 在这种情况下, LightGBM 将自动加载权重文件, 如果它存在的话. 386 | 387 | **update**: 现在可以在数据文件中指定 `weight` 列。请参阅以上参数的参数. 388 | 389 | ### 查询数据 390 | 391 | 对于 LambdaRank 的学习, 它需要查询信息来训练数据. LightGBM 使用一个附加文件来存储查询数据, 如下: 392 | 393 | ``` 394 | 27 395 | 18 396 | 67 397 | ... 398 | 399 | ``` 400 | 401 | 它意味着第一个 `27` 行样本属于一个查询和下一个 `18` 行属于另一个, 等等. **Note**: 数据应该由查询来排序. 402 | 403 | 如果数据文件的名称是``train.txt`,这个查询文件应该被命名为``train.txt.query``查询在相同的培训数据文件夹中。在这种情况下, LightGBM将自动加载查询文件, 如果它存在的话。 404 | 405 | **update**: 现在可以在数据文件中指定特定的 query/group id。请参阅上面的参数组。 406 | -------------------------------------------------------------------------------- /docs/8.md: -------------------------------------------------------------------------------- 1 | # Python API 2 | 3 | ## Data Structure API 4 | 5 | > 6 | > class lightgbm.Dataset(data, label=None, max_bin=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True) 7 | > 8 | 9 | Bases: `object` 10 | 11 | Dataset in LightGBM. 12 | 13 | Constract Dataset. 14 | 15 | * Parameters: 16 | * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source of Dataset. If string, it represents the path to txt file. 17 | * **label** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Label of the data. 18 | * **max_bin** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Max number of discrete bins for features. If None, default value from parameters of CLI-version will be used. 19 | * **reference** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – If this is Dataset for validation, training data should be used as reference. 20 | * **weight** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weight for each instance. 21 | * **group** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group/query size for Dataset. 22 | * **init_score** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score for Dataset. 23 | * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. 24 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 25 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 26 | * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. 27 | * **free_raw_data** (_bool__,_ _optional_ _(__default=True__)_) – If True, raw data is freed after constructing inner Dataset. 28 | 29 | > construct() 30 | 31 | Lazy init. 32 | 33 | * Returns: 34 | * **self** – Returns self. 35 | * Return type: 36 | * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") 37 | 38 | > create_valid(data, label=None, weight=None, group=None, init_score=None, silent=False, params=None) 39 | 40 | Create validation data align with current Dataset. 41 | 42 | * Parameters: 43 | 44 | * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source of Dataset. If string, it represents the path to txt file. 45 | * **label** (_list_ _or_ _numpy 1-D array__,_ _optional_ _(__default=None__)_) – Label of the training data. 46 | * **weight** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weight for each instance. 47 | * **group** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group/query size for Dataset. 48 | * **init_score** (_list__,_ _numpy 1-D array_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score for Dataset. 49 | * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. 50 | * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. 51 | * Returns: 52 | * **self** – Returns self 53 | * Return type: 54 | * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") 55 | 56 | 57 | > get_field(field_name) 58 | 59 | Get property from the Dataset. 60 | 61 | * Parameters: 62 | * **field_name** (_string_) – The field name of the information. 63 | * Returns: 64 | * **info** – A numpy array with information from the Dataset. 65 | * Return type: 66 | * numpy array 67 | 68 | 69 | > get_group() 70 | 71 | Get the group of the Dataset. 72 | 73 | * Returns: 74 | * **group** – Group size of each group. 75 | * Return type: 76 | * numpy array 77 | 78 | > get_init_score() 79 | 80 | Get the initial score of the Dataset. 81 | 82 | * Returns: 83 | * **init_score** – Init score of Booster. 84 | * Return type: 85 | * numpy array 86 | 87 | > get_label() 88 | 89 | Get the label of the Dataset. 90 | 91 | * Returns: 92 | * **label** – The label information from the Dataset. 93 | * Return type: 94 | * numpy array 95 | 96 | > get_ref_chain(ref_limit=100) 97 | 98 | Get a chain of Dataset objects, starting with r, then going to r.reference if exists, then to r.reference.reference, etc. until we hit `ref_limit` or a reference loop. 99 | 100 | * Parameters: 101 | * **ref_limit** (_int__,_ _optional_ _(__default=100__)_) – The limit number of references. 102 | * Returns: 103 | * **ref_chain** – Chain of references of the Datasets. 104 | * Return type: 105 | * set of Dataset 106 | 107 | > get_weight() 108 | 109 | Get the weight of the Dataset. 110 | 111 | * Returns: 112 | * **weight** – Weight for each data point from the Dataset. 113 | * Return type: 114 | * numpy array 115 | 116 | > num_data() 117 | 118 | Get the number of rows in the Dataset. 119 | 120 | * Returns: 121 | * **number_of_rows** – The number of rows in the Dataset. 122 | * Return type: 123 | * int 124 | 125 | > num_feature() 126 | 127 | Get the number of columns (features) in the Dataset. 128 | 129 | * Returns: 130 | * **number_of_columns** – The number of columns (features) in the Dataset. 131 | * Return type: 132 | * int 133 | 134 | > save_binary(filename) 135 | 136 | Save Dataset to binary file. 137 | 138 | * Parameters: 139 | * **filename** (_string_) – Name of the output file. 140 | 141 | > set_categorical_feature(categorical_feature) 142 | 143 | Set categorical features. 144 | 145 | * Parameters: 146 | * **categorical_feature** (_list of int_ _or_ _strings_) – Names or indices of categorical features. 147 | 148 | > set_feature_name(feature_name) 149 | 150 | Set feature name. 151 | 152 | * Parameters: 153 | * **feature_name** (_list of strings_) – Feature names. 154 | 155 | > set_field(field_name, data) 156 | 157 | Set property into the Dataset. 158 | 159 | * Parameters: 160 | * **field_name** (_string_) – The field name of the information. 161 | * **data** (_list__,_ _numpy array_ _or_ _None_) – The array of data to be set. 162 | 163 | > set_group(group) 164 | 165 | Set group size of Dataset (used for ranking). 166 | 167 | * Parameters: 168 | * **group** (_list__,_ _numpy array_ _or_ _None_) – Group size of each group. 169 | 170 | > set_init_score(init_score) 171 | 172 | Set init score of Booster to start from. 173 | 174 | * Parameters: 175 | * **init_score** (_list__,_ _numpy array_ _or_ _None_) – Init score for Booster. 176 | 177 | > set_label(label) 178 | 179 | Set label of Dataset 180 | 181 | * Parameters: 182 | * **label** (_list__,_ _numpy array_ _or_ _None_) – The label information to be set into Dataset. 183 | 184 | > set_reference(reference) 185 | 186 | Set reference Dataset. 187 | 188 | * Parameters: 189 | * **reference** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Reference that is used as a template to consturct the current Dataset. 190 | 191 | > set_weight(weight) 192 | 193 | Set weight of each instance. 194 | 195 | * Parameters: 196 | * **weight** (_list__,_ _numpy array_ _or_ _None_) – Weight to be set for each data point. 197 | 198 | > subset(used_indices, params=None) 199 | 200 | Get subset of current Dataset. 201 | 202 | * Parameters: 203 | * **used_indices** (_list of int_) – Indices used to create the subset. 204 | * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters. 205 | * Returns: 206 | * **subset** – Subset of the current Dataset. 207 | * Return type: 208 | * [Dataset](#lightgbm.Dataset "lightgbm.Dataset") 209 | 210 | > class lightgbm.Booster(params=None, train_set=None, model_file=None, silent=False) 211 | 212 | Bases: `object` 213 | 214 | Booster in LightGBM. 215 | 216 | Initialize the Booster. 217 | 218 | * Parameters: 219 | * **params** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Parameters for Booster. 220 | * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – Training dataset. 221 | * **model_file** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Path to the model file. 222 | * **silent** (_bool__,_ _optional_ _(__default=False__)_) – Whether to print messages during construction. 223 | 224 | > add_valid(data, name) 225 | 226 | Add validation data. 227 | 228 | * Parameters: 229 | * **data** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Validation data. 230 | * **name** (_string_) – Name of validation data. 231 | 232 | > attr(key) 233 | 234 | Get attribute string from the Booster. 235 | 236 | * Parameters: 237 | * **key** (_string_) – The name of the attribute. 238 | * Returns: 239 | * **value** – The attribute value. Returns None if attribute do not exist. 240 | * Return type: 241 | * string or None 242 | 243 | > current_iteration() 244 | 245 | Get the index of the current iteration. 246 | 247 | * Returns: 248 | * **cur_iter** – The index of the current iteration. 249 | * Return type: 250 | * int 251 | 252 | > dump_model(num_iteration=-1) 253 | 254 | Dump Booster to json format. 255 | 256 | * Parameters: 257 | * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Index of the iteration that should to dumped. If <0, the best iteration (if exists) is dumped. 258 | * Returns: 259 | * **json_repr** – Json format of Booster. 260 | * Return type: 261 | * dict 262 | 263 | > eval(data, name, feval=None) 264 | 265 | Evaluate for data. 266 | 267 | * Parameters: 268 | * **data** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data for the evaluating. 269 | * **name** (_string_) – Name of the data. 270 | * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. 271 | * Returns: 272 | * **result** – List with evaluation results. 273 | * Return type: 274 | * list 275 | 276 | > eval_train(feval=None) 277 | 278 | Evaluate for training data. 279 | 280 | * Parameters: 281 | * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. 282 | * Returns: 283 | * **result** – List with evaluation results. 284 | * Return type: 285 | * list 286 | 287 | > eval_valid(feval=None) 288 | 289 | Evaluate for validation data. 290 | 291 | * Parameters: 292 | * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. 293 | * Returns: 294 | * **result** – List with evaluation results. 295 | * Return type: 296 | * list 297 | 298 | > feature_importance(importance_type='split', iteration=-1) 299 | 300 | Get feature importances. 301 | 302 | * Parameters: 303 | * **importance_type** (_string__,_ _optional_ _(__default="split"__)_) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature. 304 | * Returns: 305 | * **result** – Array with feature importances. 306 | * Return type: 307 | * numpy array 308 | 309 | > feature_name() 310 | 311 | Get names of features. 312 | 313 | * Returns: 314 | * **result** – List with names of features. 315 | * Return type: 316 | * list 317 | 318 | > free_dataset() 319 | 320 | Free Booster’s Datasets. 321 | 322 | > free_network() 323 | 324 | Free Network. 325 | 326 | > get_leaf_output(tree_id, leaf_id) 327 | 328 | Get the output of a leaf. 329 | 330 | * Parameters: 331 | * **tree_id** (_int_) – The index of the tree. 332 | * **leaf_id** (_int_) – The index of the leaf in the tree. 333 | * Returns: 334 | * **result** – The output of the leaf. 335 | * Return type: 336 | * float 337 | 338 | > num_feature() 339 | 340 | Get number of features. 341 | 342 | * Returns: 343 | * **num_feature** – The number of features. 344 | * Return type: 345 | * int 346 | 347 | > predict(data, num_iteration=-1, raw_score=False, pred_leaf=False, pred_contrib=False, data_has_header=False, is_reshape=True, pred_parameter=None) 348 | 349 | Make a prediction. 350 | 351 | * Parameters: 352 | * **data** (_string__,_ _numpy array_ _or_ _scipy.sparse_) – Data source for prediction. If string, it represents the path to txt file. 353 | * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Iteration used for prediction. If <0, the best iteration (if exists) is used for prediction. 354 | * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. 355 | * **pred_leaf** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict leaf index. 356 | * **pred_contrib** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict feature contributions. 357 | * **data_has_header** (_bool__,_ _optional_ _(__default=False__)_) – Whether the data has header. Used only if data is string. 358 | * **is_reshape** (_bool__,_ _optional_ _(__default=True__)_) – If True, result is reshaped to [nrow, ncol]. 359 | * **pred_parameter** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Other parameters for the prediction. 360 | * Returns: 361 | * **result** – Prediction result. 362 | * Return type: 363 | * numpy array 364 | 365 | > reset_parameter(params) 366 | 367 | Reset parameters of Booster. 368 | 369 | * Parameters: 370 | * **params** (_dict_) – New parameters for Booster. 371 | 372 | > rollback_one_iter() 373 | 374 | Rollback one iteration. 375 | 376 | > save_model(filename, num_iteration=-1) 377 | 378 | Save Booster to file. 379 | 380 | * Parameters: 381 | * **filename** (_string_) – Filename to save Booster. 382 | * **num_iteration** (_int__,_ _optional_ _(__default=-1__)_) – Index of the iteration that should to saved. If <0, the best iteration (if exists) is saved. 383 | 384 | > set_attr(**kwargs) 385 | 386 | Set the attribute of the Booster. 387 | 388 | * Parameters: 389 | * **kwargs** – The attributes to set. Setting a value to None deletes an attribute. | 390 | 391 | > set_network(machines, local_listen_port=12400, listen_time_out=120, num_machines=1) 392 | 393 | Set the network configuration. 394 | 395 | * Parameters: 396 | * **machines** (_list__,_ _set_ _or_ _string_) – Names of machines. 397 | * **local_listen_port** (_int__,_ _optional_ _(__default=12400__)_) – TCP listen port for local machines. 398 | * **listen_time_out** (_int__,_ _optional_ _(__default=120__)_) – Socket time-out in minutes. 399 | * **num_machines** (_int__,_ _optional_ _(__default=1__)_) – The number of machines for parallel learning application. 400 | 401 | > set_train_data_name(name) 402 | 403 | Set the name to the training Dataset. 404 | 405 | * Parameters: 406 | * **name** (_string_) – Name for training Dataset. 407 | 408 | > update(train_set=None, fobj=None) 409 | 410 | Update for one iteration. 411 | 412 | * Parameters: 413 | * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset") _or_ _None__,_ _optional_ _(__default=None__)_) – Training data. If None, last training data is used. 414 | * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – 415 | Customized objective function. 416 | For multi-class task, the score is group by class_id first, then group by row_id. If you want to get i-th row score in j-th class, the access way is score[j * num_data + i] and you should group grad and hess in this way as well. 417 | * Returns: 418 | * **is_finished** – Whether the update was successfully finished. 419 | * Return type: 420 | * bool 421 | 422 | ## Training API 423 | 424 | 425 | > lightgbm.train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, evals_result=None, verbose_eval=True, learning_rates=None, keep_training_booster=False, callbacks=None) 426 | 427 | Perform the training with given parameters. 428 | 429 | * Parameters: 430 | * **params** (_dict_) – Parameters for training. 431 | * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data to be trained. 432 | * **num_boost_round** (_int__,_ _optional_ _(__default=100__)_) – Number of boosting iterations. 433 | * **valid_sets** (_list of Datasets_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of data to be evaluated during training. 434 | * **valid_names** (_list of string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of `valid_sets`. 435 | * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Customized objective function. 436 | * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Customized evaluation function. Note: should return (eval_name, eval_result, is_higher_better) or list of such tuples. 437 | * **init_model** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename of LightGBM model or Booster instance used for continue training. 438 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 439 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 440 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Requires at least one validation data and one metric. If there’s more than one, will check all of them. If early stopping occurs, the model will add `best_iteration` field. 441 | * **evals_result** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – 442 | This dictionary used to store all evaluation results of all the items in `valid_sets`. 443 | Example 444 | With a `valid_sets` = [valid_set, train_set], `valid_names` = [‘eval’, ‘train’] and a `params` = (‘metric’:’logloss’) returns: {‘train’: {‘logloss’: [‘0.48253’, ‘0.35953’, …]}, ‘eval’: {‘logloss’: [‘0.480385’, ‘0.357756’, …]}}. 445 | * **verbose_eval** (_bool_ _or_ _int__,_ _optional_ _(__default=True__)_) – 446 | Requires at least one validation data. If True, the eval metric on the valid set is printed at each boosting stage. If int, the eval metric on the valid set is printed at every `verbose_eval` boosting stage. The last boosting stage or the boosting stage found by using `early_stopping_rounds` is also printed. 447 | Example 448 | With `verbose_eval` = 4 and at least one item in evals, an evaluation metric is printed every 4 (instead of 1) boosting stages. 449 | * **learning_rates** (_list__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of learning rates for each boosting round or a customized function that calculates `learning_rate` in terms of current number of round (e.g. yields learning rate decay). 450 | * **keep_training_booster** (_bool__,_ _optional_ _(__default=False__)_) – Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning. You can still use _InnerPredictor as `init_model` for future continue training. 451 | * **callbacks** (_list of callables_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 452 | * Returns: 453 | * **booster** – The trained Booster model. 454 | * Return type: 455 | * [Booster](#lightgbm.Booster "lightgbm.Booster") 456 | 457 | > lightgbm.cv(params, train_set, num_boost_round=10, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name='auto', categorical_feature='auto', early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None) 458 | 459 | Perform the cross-validation with given paramaters. 460 | 461 | * Parameters: 462 | * **params** (_dict_) – Parameters for Booster. 463 | * **train_set** ([_Dataset_](#lightgbm.Dataset "lightgbm.Dataset")) – Data to be trained on. 464 | * **num_boost_round** (_int__,_ _optional_ _(__default=10__)_) – Number of boosting iterations. 465 | * **folds** (_a generator_ _or_ _iterator of_ _(__train_idx__,_ _test_idx__)_ _tuples_ _or_ _None__,_ _optional_ _(__default=None__)_) – The train and test indices for the each fold. This argument has highest priority over other data split arguments. 466 | * **nfold** (_int__,_ _optional_ _(__default=5__)_) – Number of folds in CV. 467 | * **stratified** (_bool__,_ _optional_ _(__default=True__)_) – Whether to perform stratified sampling. 468 | * **shuffle** (_bool__,_ _optional_ _(__default=True__)_) – Whether to shuffle before splitting data. 469 | * **metrics** (_string__,_ _list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Evaluation metrics to be monitored while CV. If not None, the metric in `params` will be overridden. 470 | * **fobj** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom objective function. 471 | * **feval** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Custom evaluation function. 472 | * **init_model** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename of LightGBM model or Booster instance used for continue training. 473 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 474 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 475 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. CV error needs to decrease at least every `early_stopping_rounds` round(s) to continue. Last entry in evaluation history is the one from best iteration. 476 | * **fpreproc** (_callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those. 477 | * **verbose_eval** (_bool__,_ _int__, or_ _None__,_ _optional_ _(__default=None__)_) – Whether to display the progress. If None, progress will be displayed when np.ndarray is returned. If True, progress will be displayed at every boosting stage. If int, progress will be displayed at every given `verbose_eval` boosting stage. 478 | * **show_stdv** (_bool__,_ _optional_ _(__default=True__)_) – Whether to display the standard deviation in progress. Results are not affected by this parameter, and always contains std. 479 | * **seed** (_int__,_ _optional_ _(__default=0__)_) – Seed used to generate the folds (passed to numpy.random.seed). 480 | * **callbacks** (_list of callables_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 481 | * Returns: 482 | * **eval_hist** – Evaluation history. The dictionary has the following format: {‘metric1-mean’: [values], ‘metric1-stdv’: [values], ‘metric2-mean’: [values], ‘metric1-stdv’: [values], …}. 483 | * Return type: 484 | * dict 485 | 486 | ## Scikit-learn API 487 | 488 | > class lightgbm.LGBMModel(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) 489 | 490 | Bases: `object` 491 | 492 | Implementation of the scikit-learn API for LightGBM. 493 | 494 | Construct a gradient boosting model. 495 | 496 | * Parameters: 497 | * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. 498 | * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. 499 | * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. 500 | * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. 501 | * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. 502 | * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. 503 | * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. 504 | * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. 505 | * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. 506 | * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). 507 | * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). 508 | * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. 509 | * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. 510 | * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. 511 | * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. 512 | * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. 513 | * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. 514 | * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. 515 | * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. 516 | * ****kwargs** (_other parameters_) – 517 | Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. 518 | Note 519 | **kwargs is not supported in sklearn, it may cause unexpected issues. 520 | 521 | > n_features_ 522 | 523 | _int_ – The number of features of fitted model. 524 | 525 | > classes_ 526 | 527 | _array of shape = [n_classes]_ – The class label array (only for classification problem). 528 | 529 | > n_classes_ 530 | 531 | _int_ – The number of classes (only for classification problem). 532 | 533 | > best_score_ 534 | 535 | _dict or None_ – The best score of fitted model. 536 | 537 | > best_iteration_ 538 | 539 | _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. 540 | 541 | > objective_ 542 | 543 | _string or callable_ – The concrete objective used while fitting this model. 544 | 545 | > booster_ 546 | 547 | _Booster_ – The underlying Booster of this model. 548 | 549 | > evals_result_ 550 | 551 | _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. 552 | 553 | > feature_importances_ 554 | 555 | _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). 556 | 557 | Note 558 | 559 | A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: 560 | 561 | > ``` 562 | > y_true: array-like of shape = [n_samples] 563 | > ``` 564 | > 565 | > The target values. 566 | > 567 | > ``` 568 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 569 | > ``` 570 | > 571 | > The predicted values. 572 | > 573 | > ``` 574 | > group: array-like 575 | > ``` 576 | > 577 | > Group/query data, used for ranking task. 578 | > 579 | > ``` 580 | > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 581 | > ``` 582 | > 583 | > The value of the gradient for each sample point. 584 | > 585 | > ``` 586 | > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 587 | > ``` 588 | > 589 | > The value of the second derivative for each sample point. 590 | 591 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. 592 | 593 | > apply(X, num_iteration=0) 594 | 595 | Return the predicted leaf every tree for each sample. 596 | 597 | * Parameters: 598 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. 599 | * **num_iteration** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). 600 | * Returns: 601 | * **X_leaves** – The predicted leaf every tree for each sample. 602 | * Return type: 603 | * array-like of shape = [n_samples, n_trees] 604 | 605 | > best_iteration_ 606 | 607 | Get the best iteration of fitted model. 608 | 609 | > best_score_ 610 | 611 | Get the best score of fitted model. 612 | 613 | > booster_ 614 | 615 | Get the underlying lightgbm Booster of this model. 616 | 617 | > evals_result_ 618 | 619 | Get the evaluation results. 620 | 621 | > feature_importances_ 622 | 623 | Get feature importances. 624 | 625 | Note 626 | 627 | Feature importance in sklearn interface used to normalize to 1, it’s deprecated after 2.0.4 and same as Booster.feature_importance() now. 628 | 629 | > fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) 630 | 631 | Build a gradient boosting model from the training set (X, y). 632 | 633 | * Parameters: 634 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. 635 | * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). 636 | * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. 637 | * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. 638 | * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. 639 | * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. 640 | * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. 641 | * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. 642 | * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. 643 | * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. 644 | * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. 645 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. 646 | * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. 647 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 648 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 649 | * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 650 | * Returns: 651 | * **self** – Returns self. 652 | * Return type: 653 | * object 654 | Note 655 | Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) 656 | 657 | > ``` 658 | > y_true: array-like of shape = [n_samples] 659 | > ``` 660 | > 661 | > The target values. 662 | > 663 | > ``` 664 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) 665 | > ``` 666 | > 667 | > The predicted values. 668 | > 669 | > ``` 670 | > weight: array-like of shape = [n_samples] 671 | > ``` 672 | > 673 | > The weight of samples. 674 | > 675 | > ``` 676 | > group: array-like 677 | > ``` 678 | > 679 | > Group/query data, used for ranking task. 680 | > 681 | > ``` 682 | > eval_name: str 683 | > ``` 684 | > 685 | > The name of evaluation. 686 | > 687 | > ``` 688 | > eval_result: float 689 | > ``` 690 | > 691 | > The eval result. 692 | > 693 | > ``` 694 | > is_bigger_better: bool 695 | > ``` 696 | > 697 | > Is eval result bigger better, e.g. AUC is bigger_better. 698 | 699 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. 700 | 701 | > n_features_ 702 | 703 | Get the number of features of fitted model. 704 | 705 | > objective_ 706 | 707 | Get the concrete objective used while fitting this model. 708 | 709 | > predict(X, raw_score=False, num_iteration=0) 710 | 711 | Return the predicted value for each sample. 712 | 713 | * Parameters: 714 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. 715 | * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. 716 | * **num_iter ation** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). 717 | * Returns: 718 | * **predicted_result** – The predicted values. 719 | * Return type: 720 | * array-like of shape = [n_samples] or shape = [n_samples, n_classes] 721 | 722 | > class lightgbm.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) 723 | 724 | Bases: `lightgbm.sklearn.LGBMModel`, `object` 725 | 726 | LightGBM classifier. 727 | 728 | Construct a gradient boosting model. 729 | 730 | * Parameters: 731 | * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. 732 | * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. 733 | * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. 734 | * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. 735 | * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. 736 | * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. 737 | * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. 738 | * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. 739 | * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. 740 | * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). 741 | * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). 742 | * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. 743 | * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. 744 | * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. 745 | * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. 746 | * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. 747 | * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. 748 | * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. 749 | * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. 750 | * ****kwargs** (_other parameters_) – 751 | Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. 752 | Note 753 | **kwargs is not supported in sklearn, it may cause unexpected issues. 754 | 755 | > n_features_ 756 | 757 | _int_ – The number of features of fitted model. 758 | 759 | > classes_ 760 | 761 | _array of shape = [n_classes]_ – The class label array (only for classification problem). 762 | 763 | > n_classes_ 764 | 765 | _int_ – The number of classes (only for classification problem). 766 | 767 | > best_score_ 768 | 769 | _dict or None_ – The best score of fitted model. 770 | 771 | > best_iteration_ 772 | 773 | _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. 774 | 775 | > objective_ 776 | 777 | _string or callable_ – The concrete objective used while fitting this model. 778 | 779 | > booster_ 780 | 781 | _Booster_ – The underlying Booster of this model. 782 | 783 | > evals_result_ 784 | 785 | _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. 786 | 787 | > feature_importances_ 788 | 789 | _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). 790 | 791 | Note 792 | 793 | A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: 794 | 795 | > ``` 796 | > y_true: array-like of shape = [n_samples] 797 | > ``` 798 | > 799 | > The target values. 800 | > 801 | > ``` 802 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 803 | > ``` 804 | > 805 | > The predicted values. 806 | > 807 | > ``` 808 | > group: array-like 809 | > ``` 810 | > 811 | > Group/query data, used for ranking task. 812 | > 813 | > ``` 814 | > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 815 | > ``` 816 | > 817 | > The value of the gradient for each sample point. 818 | > 819 | > ``` 820 | > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 821 | > ``` 822 | > 823 | > The value of the second derivative for each sample point. 824 | 825 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. 826 | 827 | > classes_ 828 | 829 | Get the class label array. 830 | 831 | > fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='logloss', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) 832 | 833 | Build a gradient boosting model from the training set (X, y). 834 | 835 | * Parameters: 836 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. 837 | * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). 838 | * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. 839 | * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. 840 | * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. 841 | * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. 842 | * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. 843 | * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. 844 | * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. 845 | * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. 846 | * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="logloss"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. 847 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. 848 | * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. 849 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 850 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 851 | * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 852 | * Returns: 853 | * **self** – Returns self. 854 | * Return type: 855 | * object 856 | 857 | Note 858 | 859 | Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) 860 | 861 | > ``` 862 | > y_true: array-like of shape = [n_samples] 863 | > ``` 864 | > 865 | > The target values. 866 | > 867 | > ``` 868 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) 869 | > ``` 870 | > 871 | > The predicted values. 872 | > 873 | > ``` 874 | > weight: array-like of shape = [n_samples] 875 | > ``` 876 | > 877 | > The weight of samples. 878 | > 879 | > ``` 880 | > group: array-like 881 | > ``` 882 | > 883 | > Group/query data, used for ranking task. 884 | > 885 | > ``` 886 | > eval_name: str 887 | > ``` 888 | > 889 | > The name of evaluation. 890 | > 891 | > ``` 892 | > eval_result: float 893 | > ``` 894 | > 895 | > The eval result. 896 | > 897 | > ``` 898 | > is_bigger_better: bool 899 | > ``` 900 | > 901 | > Is eval result bigger better, e.g. AUC is bigger_better. 902 | 903 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. 904 | 905 | > n_classes_ 906 | 907 | Get the number of classes. 908 | 909 | > predict_proba(X, raw_score=False, num_iteration=0) 910 | 911 | Return the predicted probability for each class for each sample. 912 | 913 | * Parameters: 914 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input features matrix. 915 | * **raw_score** (_bool__,_ _optional_ _(__default=False__)_) – Whether to predict raw scores. 916 | * **num_iteration** (_int__,_ _optional_ _(__default=0__)_) – Limit number of iterations in the prediction; defaults to 0 (use all trees). 917 | * Returns: 918 | * **predicted_probability** – The predicted probability for each class for each sample. 919 | * Return type: 920 | * array-like of shape = [n_samples, n_classes] 921 | 922 | > class lightgbm.LGBMRegressor(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) 923 | 924 | Bases: `lightgbm.sklearn.LGBMModel`, `object` 925 | 926 | LightGBM regressor. 927 | 928 | Construct a gradient boosting model. 929 | 930 | * Parameters: 931 | * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. 932 | * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. 933 | * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. 934 | * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. 935 | * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. 936 | * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. 937 | * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. 938 | * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. 939 | * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. 940 | * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). 941 | * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). 942 | * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. 943 | * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. 944 | * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. 945 | * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. 946 | * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. 947 | * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. 948 | * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. 949 | * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. 950 | * ****kwargs** (_other parameters_) – 951 | Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. 952 | Note 953 | **kwargs is not supported in sklearn, it may cause unexpected issues. 954 | 955 | > n_features_ 956 | 957 | _int_ – The number of features of fitted model. 958 | 959 | > classes_ 960 | 961 | _array of shape = [n_classes]_ – The class label array (only for classification problem). 962 | 963 | > n_classes_ 964 | 965 | _int_ – The number of classes (only for classification problem). 966 | 967 | > best_score_ 968 | 969 | _dict or None_ – The best score of fitted model. 970 | 971 | > best_iteration_ 972 | 973 | _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. 974 | 975 | > objective_ 976 | 977 | _string or callable_ – The concrete objective used while fitting this model. 978 | 979 | > booster_ 980 | 981 | _Booster_ – The underlying Booster of this model. 982 | 983 | > evals_result_ 984 | 985 | _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. 986 | 987 | > feature_importances_ 988 | 989 | _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). 990 | 991 | Note 992 | 993 | A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: 994 | 995 | > ``` 996 | > y_true: array-like of shape = [n_samples] 997 | > ``` 998 | > 999 | > The target values. 1000 | > 1001 | > ``` 1002 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1003 | > ``` 1004 | > 1005 | > The predicted values. 1006 | > 1007 | > ``` 1008 | > group: array-like 1009 | > ``` 1010 | > 1011 | > Group/query data, used for ranking task. 1012 | > 1013 | > ``` 1014 | > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1015 | > ``` 1016 | > 1017 | > The value of the gradient for each sample point. 1018 | > 1019 | > ``` 1020 | > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1021 | > ``` 1022 | > 1023 | > The value of the second derivative for each sample point. 1024 | 1025 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. 1026 | 1027 | > fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_metric='l2', early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) 1028 | 1029 | Build a gradient boosting model from the training set (X, y). 1030 | 1031 | * Parameters: 1032 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. 1033 | * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). 1034 | * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. 1035 | * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. 1036 | * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. 1037 | * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. 1038 | * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. 1039 | * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. 1040 | * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. 1041 | * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. 1042 | * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="l2"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. 1043 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – A ctivates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. 1044 | * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. 1045 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 1046 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 1047 | * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 1048 | * Returns: 1049 | * **self** – Returns self. 1050 | * Return type: 1051 | * object 1052 | 1053 | Note 1054 | 1055 | Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) 1056 | 1057 | > ``` 1058 | > y_true: array-like of shape = [n_samples] 1059 | > ``` 1060 | > 1061 | > The target values. 1062 | > 1063 | > ``` 1064 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) 1065 | > ``` 1066 | > 1067 | > The predicted values. 1068 | > 1069 | > ``` 1070 | > weight: array-like of shape = [n_samples] 1071 | > ``` 1072 | > 1073 | > The weight of samples. 1074 | > 1075 | > ``` 1076 | > group: array-like 1077 | > ``` 1078 | > 1079 | > Group/query data, used for ranking task. 1080 | > 1081 | > ``` 1082 | > eval_name: str 1083 | > ``` 1084 | > 1085 | > The name of evaluation. 1086 | > 1087 | > ``` 1088 | > eval_result: float 1089 | > ``` 1090 | > 1091 | > The eval result. 1092 | > 1093 | > ``` 1094 | > is_bigger_better: bool 1095 | > ``` 1096 | > 1097 | > Is eval result bigger better, e.g. AUC is bigger_better. 1098 | 1099 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. 1100 | 1101 | > class lightgbm.LGBMRanker(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=10, max_bin=255, subsample_for_bin=200000, objective=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=1, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, **kwargs) 1102 | 1103 | Bases: `lightgbm.sklearn.LGBMModel` 1104 | 1105 | LightGBM ranker. 1106 | 1107 | Construct a gradient boosting model. 1108 | 1109 | * Parameters: 1110 | * **boosting_type** (_string__,_ _optional_ _(__default="gbdt"__)_) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest. 1111 | * **num_leaves** (_int__,_ _optional_ _(__default=31__)_) – Maximum tree leaves for base learners. 1112 | * **max_depth** (_int__,_ _optional_ _(__default=-1__)_) – Maximum tree depth for base learners, -1 means no limit. 1113 | * **learning_rate** (_float__,_ _optional_ _(__default=0.1__)_) – Boosting learning rate. 1114 | * **n_estimators** (_int__,_ _optional_ _(__default=10__)_) – Number of boosted trees to fit. 1115 | * **max_bin** (_int__,_ _optional_ _(__default=255__)_) – Number of bucketed bins for feature values. 1116 | * **subsample_for_bin** (_int__,_ _optional_ _(__default=50000__)_) – Number of samples for constructing bins. 1117 | * **objective** (_string__,_ _callable_ _or_ _None__,_ _optional_ _(__default=None__)_) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker. 1118 | * **min_split_gain** (_float__,_ _optional_ _(__default=0.__)_) – Minimum loss reduction required to make a further partition on a leaf node of the tree. 1119 | * **min_child_weight** (_float__,_ _optional_ _(__default=1e-3__)_) – Minimum sum of instance weight(hessian) needed in a child(leaf). 1120 | * **min_child_samples** (_int__,_ _optional_ _(__default=20__)_) – Minimum number of data need in a child(leaf). 1121 | * **subsample** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of the training instance. 1122 | * **subsample_freq** (_int__,_ _optional_ _(__default=1__)_) – Frequence of subsample, <=0 means no enable. 1123 | * **colsample_bytree** (_float__,_ _optional_ _(__default=1.__)_) – Subsample ratio of columns when constructing each tree. 1124 | * **reg_alpha** (_float__,_ _optional_ _(__default=0.__)_) – L1 regularization term on weights. 1125 | * **reg_lambda** (_float__,_ _optional_ _(__default=0.__)_) – L2 regularization term on weights. 1126 | * **random_state** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Random number seed. Will use default seeds in c++ code if set to None. 1127 | * **n_jobs** (_int__,_ _optional_ _(__default=-1__)_) – Number of parallel threads. 1128 | * **silent** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print messages while running boosting. 1129 | * ****kwargs** (_other parameters_) – 1130 | Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters. 1131 | Note 1132 | **kwargs is not supported in sklearn, it may cause unexpected issues. 1133 | 1134 | > n_features_ 1135 | 1136 | _int_ – The number of features of fitted model. 1137 | 1138 | > classes_ 1139 | 1140 | _array of shape = [n_classes]_ – The class label array (only for classification problem). 1141 | 1142 | > n_classes_ 1143 | 1144 | _int_ – The number of classes (only for classification problem). 1145 | 1146 | > best_score_ 1147 | 1148 | _dict or None_ – The best score of fitted model. 1149 | 1150 | > best_iteration_ 1151 | 1152 | _int or None_ – The best iteration of fitted model if `early_stopping_rounds` has been specified. 1153 | 1154 | > objective_ 1155 | 1156 | _string or callable_ – The concrete objective used while fitting this model. 1157 | 1158 | > booster_ 1159 | 1160 | _Booster_ – The underlying Booster of this model. 1161 | 1162 | > evals_result_ 1163 | 1164 | _dict or None_ – The evaluation results if `early_stopping_rounds` has been specified. 1165 | 1166 | > feature_importances_ 1167 | 1168 | _array of shape = [n_features]_ – The feature importances (the higher, the more important the feature). 1169 | 1170 | Note 1171 | 1172 | A custom objective function can be provided for the `objective` parameter. In this case, it should have the signature `objective(y_true, y_pred) -> grad, hess` or `objective(y_true, y_pred, group) -> grad, hess`: 1173 | 1174 | > ``` 1175 | > y_true: array-like of shape = [n_samples] 1176 | > ``` 1177 | > 1178 | > The target values. 1179 | > 1180 | > ``` 1181 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1182 | > ``` 1183 | > 1184 | > The predicted values. 1185 | > 1186 | > ``` 1187 | > group: array-like 1188 | > ``` 1189 | > 1190 | > Group/query data, used for ranking task. 1191 | > 1192 | > ``` 1193 | > grad: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1194 | > ``` 1195 | > 1196 | > The value of the gradient for each sample point. 1197 | > 1198 | > ``` 1199 | > hess: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) 1200 | > ``` 1201 | > 1202 | > The value of the second derivative for each sample point. 1203 | 1204 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well. 1205 | 1206 | > fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=[1], early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None) 1207 | 1208 | Build a gradient boosting model from the training set (X, y). 1209 | 1210 | * Parameters: 1211 | * **X** (_array-like_ _or_ _sparse matrix of shape =_ _[__n_samples__,_ _n_features__]_) – Input feature matrix. 1212 | * **y** (_array-like of shape =_ _[__n_samples__]_) – The target values (class labels in classification, real numbers in regression). 1213 | * **sample_weight** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Weights of training data. 1214 | * **init_score** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Init score of training data. 1215 | * **group** (_array-like of shape =_ _[__n_samples__] or_ _None__,_ _optional_ _(__default=None__)_) – Group data of training data. 1216 | * **eval_set** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – A list of (X, y) tuple pairs to use as a validation sets for early-stopping. 1217 | * **eval_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Names of eval_set. 1218 | * **eval_sample_weight** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Weights of eval data. 1219 | * **eval_init_score** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Init score of eval data. 1220 | * **eval_group** (_list of arrays_ _or_ _None__,_ _optional_ _(__default=None__)_) – Group data of eval data. 1221 | * **eval_metric** (_string__,_ _list of strings__,_ _callable_ _or_ _None__,_ _optional_ _(__default="ndcg"__)_) – If string, it should be a built-in evaluation metric to use. If callable, it should be a custom evaluation metric, see note for more details. 1222 | * **eval_at** (_list of int__,_ _optional_ _(__default=__[__1__]__)_) – The evaluation positions of NDCG. 1223 | * **early_stopping_rounds** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Activates early stopping. The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` round(s) to continue training. 1224 | * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – If True and an evaluation set is used, writes the evaluation progress. 1225 | * **feature_name** (_list of strings_ _or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. 1226 | * **categorical_feature** (_list of strings_ _or_ _int__, or_ _'auto'__,_ _optional_ _(__default="auto"__)_) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. 1227 | * **callbacks** (_list of callback functions_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. 1228 | * Returns: 1229 | * **self** – Returns self. 1230 | * Return type: 1231 | * object 1232 | 1233 | Note 1234 | 1235 | Custom eval function expects a callable with following functions: `func(y_true, y_pred)`, `func(y_true, y_pred, weight)` or `func(y_true, y_pred, weight, group)`. Returns (eval_name, eval_result, is_bigger_better) or list of (eval_name, eval_result, is_bigger_better) 1236 | 1237 | > ``` 1238 | > y_true: array-like of shape = [n_samples] 1239 | > ``` 1240 | > 1241 | > The target values. 1242 | > 1243 | > ``` 1244 | > y_pred: array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class) 1245 | > ``` 1246 | > 1247 | > The predicted values. 1248 | > 1249 | > ``` 1250 | > weight: array-like of shape = [n_samples] 1251 | > ``` 1252 | > 1253 | > The weight of samples. 1254 | > 1255 | > ``` 1256 | > group: array-like 1257 | > ``` 1258 | > 1259 | > Group/query data, used for ranking task. 1260 | > 1261 | > ``` 1262 | > eval_name: str 1263 | > ``` 1264 | > 1265 | > The name of evaluation. 1266 | > 1267 | > ``` 1268 | > eval_result: float 1269 | > ``` 1270 | > 1271 | > The eval result. 1272 | > 1273 | > ``` 1274 | > is_bigger_better: bool 1275 | > ``` 1276 | > 1277 | > Is eval result bigger better, e.g. AUC is bigger_better. 1278 | 1279 | For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]. 1280 | 1281 | ## Callbacks 1282 | 1283 | > lightgbm.early_stopping(stopping_rounds, verbose=True) 1284 | 1285 | Create a callback that activates early stopping. 1286 | 1287 | Note 1288 | 1289 | Activates early stopping. Requires at least one validation data and one metric. If there’s more than one, will check all of them. 1290 | 1291 | * Parameters: 1292 | * **stopping_rounds** (_int_) – The possible number of rounds without the trend occurrence. 1293 | * **verbose** (_bool__,_ _optional_ _(__default=True__)_) – Whether to print message with early stopping information. 1294 | * Returns: 1295 | * **callback** – The callback that activates early stopping. 1296 | * Return type: 1297 | * function 1298 | 1299 | > lightgbm.print_evaluation(period=1, show_stdv=True) 1300 | 1301 | Create a callback that prints the evaluation results. 1302 | 1303 | * Parameters: 1304 | * **period** (_int__,_ _optional_ _(__default=1__)_) – The period to print the evaluation results. 1305 | * **show_stdv** (_bool__,_ _optional_ _(__default=True__)_) – Whether to show stdv (if provided). 1306 | * Returns: 1307 | * **callback** – The callback that prints the evaluation results every `period` iteration(s). 1308 | * Return type: 1309 | * function 1310 | 1311 | > lightgbm.record_evaluation(eval_result) 1312 | 1313 | Create a callback that records the evaluation history into `eval_result`. 1314 | 1315 | * Parameters: 1316 | * **eval_result** (_dict_) – A dictionary to store the evaluation results. 1317 | * Returns: 1318 | * **callback** – The callback that records the evaluation history into the passed dictionary. 1319 | * Return type: 1320 | * function 1321 | 1322 | > lightgbm.reset_parameter(**kwargs) 1323 | 1324 | Create a callback that resets the parameter after the first iteration. 1325 | 1326 | Note 1327 | 1328 | The initial parameter will still take in-effect on first iteration. 1329 | 1330 | * Parameters: 1331 | * **kwargs** (_value should be list_ _or_ _function_) – List of parameters for each boosting round or a customized function that calculates the parameter in terms of current number of round (e.g. yields learning rate decay). If list lst, parameter = lst[current_round]. If function func, parameter = func(current_round). 1332 | * Returns: 1333 | * **callback** – The callback that resets the parameter after the first iteration. 1334 | * Return type: 1335 | * function 1336 | 1337 | ## Plotting 1338 | 1339 | > lightgbm.plot_importance(booster, ax=None, height=0.2, xlim=None, ylim=None, title='Feature importance', xlabel='Feature importance', ylabel='Features', importance_type='split', max_num_features=None, ignore_zero=True, figsize=None, grid=True, **kwargs) 1340 | 1341 | Plot model’s feature importances. 1342 | 1343 | * Parameters: 1344 | * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance which feature importance should be plotted. 1345 | * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. 1346 | * **height** (_float__,_ _optional_ _(__default=0.2__)_) – Bar height, passed to `ax.barh()`. 1347 | * **xlim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.xlim()`. 1348 | * **ylim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.ylim()`. 1349 | * **title** (_string_ _or_ _None__,_ _optional_ _(__default="Feature importance"__)_) – Axes title. If None, title is disabled. 1350 | * **xlabel** (_string_ _or_ _None__,_ _optional_ _(__default="Feature importance"__)_) – X-axis title label. If None, title is disabled. 1351 | * **ylabel** (_string_ _or_ _None__,_ _optional_ _(__default="Features"__)_) – Y-axis title label. If None, title is disabled. 1352 | * **importance_type** (_string__,_ _optional_ _(__default="split"__)_) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature. 1353 | * **max_num_features** (_int_ _or_ _None__,_ _optional_ _(__default=None__)_) – Max number of top features displayed on plot. If None or <1, all features will be displayed. 1354 | * **ignore_zero** (_bool__,_ _optional_ _(__default=True__)_) – Whether to ignore features with zero importance. 1355 | * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. 1356 | * **grid** (_bool__,_ _optional_ _(__default=True__)_) – Whether to add a grid for axes. 1357 | * ****kwargs** (_other parameters_) – Other parameters passed to `ax.barh()`. 1358 | * Returns: 1359 | * **ax** – The plot with model’s feature importances. 1360 | * Return type: 1361 | * matplotlib.axes.Axes 1362 | 1363 | > lightgbm.plot_metric(booster, metric=None, dataset_names=None, ax=None, xlim=None, ylim=None, title='Metric during training', xlabel='Iterations', ylabel='auto', figsize=None, grid=True) 1364 | 1365 | Plot one metric during training. 1366 | 1367 | * Parameters: 1368 | * **booster** (_dict_ _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Dictionary returned from `lightgbm.train()` or LGBMModel instance. 1369 | * **metric** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – The metric name to plot. Only one metric supported because different metrics have various scales. If None, first metric picked from dictionary (according to hashcode). 1370 | * **dataset_names** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – List of the dataset names which are used to calculate metric to plot. If None, all datasets are used. 1371 | * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. 1372 | * **xlim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.xlim()`. 1373 | * **ylim** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Tuple passed to `ax.ylim()`. 1374 | * **title** (_string_ _or_ _None__,_ _optional_ _(__default="Metric during training"__)_) – Axes title. If None, title is disabled. 1375 | * **xlabel** (_string_ _or_ _None__,_ _optional_ _(__default="Iterations"__)_) – X-axis title label. If None, title is disabled. 1376 | * **ylabel** (_string_ _or_ _None__,_ _optional_ _(__default="auto"__)_) – Y-axis title label. If ‘auto’, metric name is used. If None, title is disabled. 1377 | * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. 1378 | * **grid** (_bool__,_ _optional_ _(__default=True__)_) – Whether to add a grid for axes. 1379 | * Returns: 1380 | * **ax** – The plot with metric’s history over the training. 1381 | * Return type: 1382 | * matplotlib.axes.Axes 1383 | 1384 | > lightgbm.plot_tree(booster, ax=None, tree_index=0, figsize=None, graph_attr=None, node_attr=None, edge_attr=None, show_info=None) 1385 | 1386 | Plot specified tree. 1387 | 1388 | * Parameters: 1389 | * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance to be plotted. 1390 | * **ax** (_matplotlib.axes.Axes_ _or_ _None__,_ _optional_ _(__default=None__)_) – Target axes instance. If None, new figure and axes will be created. 1391 | * **tree_index** (_int__,_ _optional_ _(__default=0__)_) – The index of a target tree to plot. 1392 | * **figsize** (_tuple of 2 elements_ _or_ _None__,_ _optional_ _(__default=None__)_) – Figure size. 1393 | * **graph_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for the graph. 1394 | * **node_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all nodes. 1395 | * **edge_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all edges. 1396 | * **show_info** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – What information should be showed on nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’. 1397 | * Returns: 1398 | * **ax** – The plot with single tree. 1399 | * Return type: 1400 | * matplotlib.axes.Axes 1401 | 1402 | > lightgbm.create_tree_digraph(booster, tree_index=0, show_info=None, name=None, comment=None, filename=None, directory=None, format=None, engine=None, encoding=None, graph_attr=None, node_attr=None, edge_attr=None, body=None, strict=False) 1403 | 1404 | Create a digraph representation of specified tree. 1405 | 1406 | Note 1407 | 1408 | For more information please visit [http://graphviz.readthedocs.io/en/stable/api.html#digraph](http://graphviz.readthedocs.io/en/stable/api.html#digraph). 1409 | 1410 | * Parameters: 1411 | * **booster** ([_Booster_](#lightgbm.Booster "lightgbm.Booster") _or_ [_LGBMModel_](#lightgbm.LGBMModel "lightgbm.LGBMModel")) – Booster or LGBMModel instance. 1412 | * **tree_index** (_int__,_ _optional_ _(__default=0__)_) – The index of a target tree to convert. 1413 | * **show_info** (_list_ _or_ _None__,_ _optional_ _(__default=None__)_) – What information should be showed on nodes. Possible values of list items: ‘split_gain’, ‘internal_value’, ‘internal_count’, ‘leaf_count’. 1414 | * **name** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Graph name used in the source code. 1415 | * **comment** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Comment added to the first line of the source. 1416 | * **filename** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Filename for saving the source. If None, `name` + ‘.gv’ is used. 1417 | * **directory** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – (Sub)directory for source saving and rendering. 1418 | * **format** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Rendering output format (‘pdf’, ‘png’, …). 1419 | * **engine** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Layout command used (‘dot’, ‘neato’, …). 1420 | * **encoding** (_string_ _or_ _None__,_ _optional_ _(__default=None__)_) – Encoding for saving the source. 1421 | * **graph_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for the graph. 1422 | * **node_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all nodes. 1423 | * **edge_attr** (_dict_ _or_ _None__,_ _optional_ _(__default=None__)_) – Mapping of (attribute, value) pairs set for all edges. 1424 | * **body** (_list of strings_ _or_ _None__,_ _optional_ _(__default=None__)_) – Lines to add to the graph body. 1425 | * **strict** (_bool__,_ _optional_ _(__default=False__)_) – Whether rendering should merge multi-edges. 1426 | * Returns: 1427 | * **graph** – The digraph representation of specified tree. 1428 | * Return type: 1429 | * graphviz.Digraph --------------------------------------------------------------------------------