└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # MT-LIFT 2 | *MT-LIFT* is a large-scale and unbiased dataset collected from two months of coupon marketing scenarios for food delivery in the Meituan App, [Meituan (美团)](https://www.meituan.com). **It is the first unbiased industrial dataset featuring multiple treatments and comprehensive chain labels (click and conversion) information!** 3 | 4 | ## Overview: 5 | To eliminate the impact of confounding factors on uplift modeling, we collect it from randomized controlled trials, where treatments (coupons) were randomly assigned to ensure consistent potential distribution between the treatment and control groups. To protect data privacy, we have implemented both anonymization and desensitization techniques on the features. 6 | 7 | You can download the dataset from the [Google drive](https://drive.google.com/file/d/1dslFa9EGrVVoO_040ZYM16cIH-SKDuss/view?usp=drive_link) | [Baidu drive](https://pan.baidu.com/s/1YmE5g-Y71ULNptiWqpToPA?pwd=06nb). 8 | 9 | MT-LIFT comprising of extensive feature and label information, making it a valuable resource for various research areas. It particularly supports the following research: 10 | 11 | - Click-through rate (CTR) prediction. 12 | - Conversion rate (CVR) prediction. 13 | - Joint modeling. 14 | - Uplift modeling. 15 | 16 | Compared with other datasets, MT-LIFT has the following advantages: 17 | - ✅ We collected it from an unbiased treatment assignment, ensuring consistent potential distribution between the treatment and control groups. 18 | - ✅ It has abundant features and provides ample opportunities to extract valuable information. 19 | - ✅ It has multiple treatments for exploring the effects of differential interventions. 20 | - ✅ We collected it from the impression space, including comprehensive chain information for accurate analysis of user responses. 21 | 22 | If you find it helpful, please cite our paper: 23 | [![LINK](https://img.shields.io/badge/-Paper%20Link-lightgrey)](https://arxiv.org/abs/2402.03379) 24 | 25 | ``` 26 | @inproceedings{huang2024entire, 27 | title={Entire Chain Uplift Modeling with Context-Enhanced Learning for Intelligent Marketing}, 28 | author={Huang, Yinqiu and Wang, Shuli and Gao, Min and Wei, Xue and Li, Changhao and Luo, Chuan and Zhu, Yinhua and Xiao, Xiong and Luo, Yi}, 29 | booktitle={Companion Proceedings of the ACM on Web Conference 2024}, 30 | pages={226--234}, 31 | year={2024} 32 | } 33 | ``` 34 | ---- 35 | 36 | 37 | ## Data Descriptions 38 | 39 | The file structure of the dataset is listed as follows: 40 | 41 | ```shell 42 | MT-LIFT 43 | ├── train.csv 44 | └── test.csv 45 | ``` 46 | 47 | Data fields: 48 | There are 102 fields in total, of which 99 are features(f0~f98). 49 | 50 | ``` 51 | | Field Name: | Description | 52 | | -------------- | ------------------------------------------------------------ | 53 | | click | The click label. | 54 | | conversion | The conversion label. | 55 | | treatment | The treatment label. In the range of [0, 4]. | 56 | | f0-f98 | The features. | 57 | ``` 58 | 59 | The statistics of MT-LIFT: 60 | ``` 61 | | size | 5,541,842. | 62 | | Features | 99. | 63 | | Average Visit/Click Ratio | 33.49%. | 64 | | Average Conversion Ratio | 6.82%. | 65 | | Relative Average Visit/Click Uplift | 56.81%. | 66 | | Average Visit/Click Uplift | 13.49%. | 67 | | Relative Average Conversion Uplift | 169.23%. | 68 | | Average Conversion Uplift | 5.19%. | 69 | ``` 70 | 71 | 72 | --------------------------------------------------------------------------------