├── imgs ├── demo.gif ├── logo.png ├── header.png ├── intro.png ├── logo2.png ├── framework.png ├── sft_algo.png ├── overall_results.png ├── execution_accuracy.png ├── test_score_high_res.png ├── critic_score_mean_pairs.jpg ├── critic_score_mean_pairs.pdf ├── response_length_mean_pairs.png ├── execution_accuracy_by_difficulty.png ├── model_performance_comparison_17B.png └── model_performance_comparison_4B.png └── README.md /imgs/demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/demo.gif -------------------------------------------------------------------------------- /imgs/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/logo.png -------------------------------------------------------------------------------- /imgs/header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/header.png -------------------------------------------------------------------------------- /imgs/intro.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/intro.png -------------------------------------------------------------------------------- /imgs/logo2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/logo2.png -------------------------------------------------------------------------------- /imgs/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/framework.png -------------------------------------------------------------------------------- /imgs/sft_algo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/sft_algo.png -------------------------------------------------------------------------------- /imgs/overall_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/overall_results.png -------------------------------------------------------------------------------- /imgs/execution_accuracy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/execution_accuracy.png -------------------------------------------------------------------------------- /imgs/test_score_high_res.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/test_score_high_res.png -------------------------------------------------------------------------------- /imgs/critic_score_mean_pairs.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/critic_score_mean_pairs.jpg -------------------------------------------------------------------------------- /imgs/critic_score_mean_pairs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/critic_score_mean_pairs.pdf -------------------------------------------------------------------------------- /imgs/response_length_mean_pairs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/response_length_mean_pairs.png -------------------------------------------------------------------------------- /imgs/execution_accuracy_by_difficulty.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/execution_accuracy_by_difficulty.png -------------------------------------------------------------------------------- /imgs/model_performance_comparison_17B.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/model_performance_comparison_17B.png -------------------------------------------------------------------------------- /imgs/model_performance_comparison_4B.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taichengguo/MTSQL-R1/HEAD/imgs/model_performance_comparison_4B.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 |
3 |
4 |
7 |
8 |
12 |
13 |
14 | 15 | 16 |
17 |    📄 Arxiv   |   🤗 Hugging Face   18 |
19 | 20 |
|
67 | | Text-to-SQL | 🔄 Long-Horizon Formulation with Environment Feedback | Leverages environment feedback through database execution and explicit memory verification to guide SQL generation and error correction |
68 | | LLM Training | 🎓 **Two-Stage** Training Framework | 1) **Tool-Integrated High-Quality SFT Dataset construction** by Self-Taught; Warm-Start SFT 2)**Curriculum RL Training** with **Multi-level rewards** (Outcome and Dense Process Reward) Design |
69 | | LLM Training | 🔁 **Multi-Turn** End-to-End RL Training | Enables end-to-end training across multiple turns with database and memory to enhance coherence |
70 |
71 |
92 |
93 |
94 | 95 | 96 | 97 |
105 |
106 |
107 | 108 | ## Stage1: Self-Taught Warm-Start SFT 109 | 110 | - Step1: Random Sampling with high temperature for generating natural reasoning trajectories 111 | - Step2: Difficulty-Aware Reject Sampling 112 | - Step3: SFT Model with Tool-Integrated Multi-Turn Trajectories and Loss Masking 113 | - Step4: Update Dataset, Model and repeat 114 | 115 |
116 |
117 |
118 | 119 | 120 | 121 | ## Stage2: End-to-End Long-Horizon Reinforcement Learning 122 | 123 | - Step1: Curriculum Data Partition by difficulty 124 | - Step2: Outcome and Process Reward Design 125 | - Step3: Multi-Turn RL with Loss Masking 126 | 127 | 128 | 129 | 130 |
135 |
136 |
137 |
142 |
143 |
144 | 145 | 146 | 147 |
164 |
165 |
166 | 167 | 168 | 169 | ## Performance over different difficulties and turns 170 | 171 |
172 |
173 |
174 |
180 |
181 |
182 |