├── .gitignore ├── LICENSE ├── README.md ├── 二分类算法_进一步特征工程,代码优化20191223.ipynb └── 进一步特征add_cv.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 PandasCute 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Provide-Banks-with-precision-marketing-solutions-Provide-Banks-with-precision-marketing-solutions 2 | 「二分类算法」提供银行精准营销解决方案 | 练习赛-0.93984283 3 | ## 主办方: [科赛网](https://www.kesci.com/) 4 | 5 | **赛道链接**:https://www.kesci.com/home/competition/5c234c6626ba91002bfdfdd3 6 | **赛程时间**:*2018.12.26-2019.12.25* 7 | **参与人**:[小兔子乖乖](https://github.com/PandasCute) 8 | **百度云盘下载链接**:为避免数据丢失,提供数据集下载地址链接:链接:https://pan.baidu.com/s/1Nf76lIjWWTyn18HmQEpflg 提取码:p352 9 | 10 | **数据集解释**:train_set.csv为训练集,test_set.csv表示测试集 11 | 12 | ## 1.数据说明 13 | 14 | | NO | 字段名称| 数据类型| 字段描述 | 15 | |:-------:|:-------:|:-------:|:-------:| 16 | |1| ID| Int |客户唯一标识| 17 | |2 |age| Int| 客户年龄| 18 | |3 |job |String |客户的职业| 19 | |4| marital |String| 婚姻状况| 20 | |5 |education| String |受教育水平| 21 | |6 |default| String| 是否有违约记录| 22 | |7|balance| Int| 每年账户的平均余额| 23 | |8 |housing|String |是否有住房贷款| 24 | |9 |loan |String |是否有个人贷款| 25 | |10 |contact| String| 与客户联系的沟通方式| 26 | |11| day |Int| 最后一次联系的时间(几号)| 27 | |12 |month |String |最后一次联系的时间(月份)| 28 | |13 |duration| Int| 最后一次联系的交流时长| 29 | |14 |campaign |Int |在本次活动中,与该客户交流过的次数| 30 | |15 |pdays| Int |距离上次活动最后一次联系该客户,过去了多久(999表示没有联系过)| 31 | |16 |previous |Int| 在本次活动之前,与该客户交流过的次数| 32 | |17 |poutcome |String| 上一次活动的结果| 33 | |18 |y |Int| 预测客户是否会订购定期存款业务| 34 | 35 | ## 2.配置环境与依赖库 36 | - python3 37 | - scikit-learn 38 | - jupyter notebook 39 | - numpy,pandas 40 | 41 | ## 3.运行代码步骤说明 42 | 43 | - 直接运行,路径根据各自所需路径自行修改 44 | ## 4.特征工程 45 | 特征工程主要有[biubiu](https://www.kesci.com/home/project/5c36b5b8e691ba002c3a51f8) 和CountVectorizer() 46 | 47 | 48 | ## 5.模型训练 49 | 最终结果 50 | 5 - 小兔子乖乖 0.93984283 3 19-02-12 14:08 19-02-14 09:53 51 | 如有侵权联系[小兔子乖乖](https://github.com/PandasCute) 哦 52 | -------------------------------------------------------------------------------- /二分类算法_进一步特征工程,代码优化20191223.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "#读取csv格式文件\n", 10 | "import pandas as pd\n", 11 | "path = './' #用户的路径环境\n", 12 | "from tqdm import tqdm\n", 13 | "from sklearn.model_selection import train_test_split\n", 14 | "from sklearn.feature_extraction.text import CountVectorizer\n", 15 | "from sklearn.preprocessing import OneHotEncoder,LabelEncoder\n", 16 | "from scipy import sparse\n", 17 | "import lightgbm as lgb\n", 18 | "train = pd.read_csv(path+'train_set.csv')\n", 19 | "test = pd.read_csv(path+'test_set.csv')\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 2, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "test['y']=-1\n", 29 | "cate_features = ['job', 'marital','education','default','housing','loan','contact','month','poutcome']\n", 30 | "num_features = ['age', 'balance','day','duration','campaign','pdays','previous']\n", 31 | "feature1 = cate_features + num_features\n", 32 | "data = pd.concat([train,test])" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 3, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "def feature_count(data, features):\n", 42 | " feature_name = 'count'\n", 43 | " for i in features:\n", 44 | " feature_name += '_' + i\n", 45 | " temp = data.groupby(features).size().reset_index().rename(columns={0: feature_name})\n", 46 | " data = data.merge(temp, 'left', on=features)\n", 47 | " return data,feature_name\n", 48 | "ll=[]\n", 49 | "for f in['campaign', 'contact','default','education','housing','job','loan','marital','poutcome','pdays','previous']:\n", 50 | " data,_=feature_count(data,['day','month',f])\n", 51 | " ll.append(_)" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 4, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "def feat_count(df, df_feature,fe,value,name=\"\"):\n", 61 | " df_count = pd.DataFrame(df_feature.groupby(fe)[value].count()).reset_index()\n", 62 | " if not name:\n", 63 | " df_count.columns = fe + [value+\"_%s_count\" % (\"_\".join(fe[0]))]\n", 64 | " else:\n", 65 | " df_count.columns = fe + [name]\n", 66 | " df = df.merge(df_count, on=fe, how=\"left\")#.fillna(0)\n", 67 | " return df,df_count.columns[1]\n" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 5, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stderr", 77 | "output_type": "stream", 78 | "text": [ 79 | " 0%| | 0/10 [00:00