├── .gitignore
├── README.md
├── data
└── rawcode_1k.jsonl
├── environment.yaml
├── figures
├── ioexample.png
└── overview.png
├── processed_data
├── codeio_1k_gens.jsonl
├── codeio_1k_gens_rev.jsonl
├── codeio_1k_gens_rev_verified.jsonl
├── codeio_1k_gens_verified.jsonl
├── codeio_1k_msg.jsonl
├── codeio_1k_msg_rev.jsonl
├── codeio_demo_final.jsonl
├── rawcode_1k.jsonl
├── rawcode_1k_msg.jsonl
├── rawcode_1k_parsed.jsonl
└── rawcode_1k_unified.jsonl
├── requirements.txt
├── scripts
└── pipeline_check.sh
└── src
├── assemble_codeio_demo.py
├── batched_api_inference.py
├── build_codeio_msg.py
├── build_codeio_rev_msg.py
├── build_transform_msg.py
├── check_io_pred_acc_mp.py
├── check_io_pred_acc_mp_inplace.py
├── codeio_utils.py
├── parse_gen_ios.py
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__
2 | temp/*
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
4 |
5 |
6 | 📑 Paper    |    🌐 Project Page    |    🤗 Released Resources    |    💾 Dataset    |    📦 Repo
7 |
8 |
9 |
10 |
11 |
12 |
13 | ## Table of contents
14 |
15 | - [Introduction](#Introduction)
16 | - [Released Resources](#Released-Resources)
17 | - [Dataset](#Dataset)
18 | - [Models](#Models)
19 | - [Get Started](#Get-Started)
20 | - [Setup](#Setup)
21 | - [Data Processing](#Data-Processing)
22 | - [Training](#Training)
23 | - [Citation](#Citation)
24 | - [Acknowledgement](#Acknowledgement)
25 |
26 | ## Introduction
27 | CodeI/O is a novel approach that transforms code-based reasoning patterns into natural language formats to enhance Large Language Models' reasoning capabilities. Unlike traditional methods focusing on specific skills, our approach systematically extracts universal reasoning primitives while maintaining procedural rigor, enabling better performance across various reasoning tasks.
28 |
29 | **Key Features & Contributions**
30 | - 🔄 Universal Transformation: Converts diverse code patterns into natural language Chain-of-Thought rationales
31 | - 🧠 Syntax-Decoupled: Decouples reasoning from code syntax while preserving logical structure
32 | - 📊 Multi-Task Enhancement: Improves performance across symbolic, scientific, logic, mathematical, commonsense and code reasoning
33 | - ✨ Fully-Verifiable: Supports precise prediction verification through cached ground-truth matching or code re-execution
34 | - 🚀 Advanced Iteration: Enhanced version (CodeI/O++) with multi-turn revision for better accuracy
35 |
36 | ## Released Resources
37 |
38 | #### Dataset
39 |
40 | |Dataset|Link|
41 | |-|-|
42 | |CodeI/O-PythonEdu-Reasoning|[🤗](https://huggingface.co/datasets/hkust-nlp/CodeIO-Pyedu-Reasoning)|
43 | |CodeI/O-PythonEdu-Raw|[🤗](https://huggingface.co/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning-Raw)|
44 | |LeetCode-O Benchmark|[🤗](https://huggingface.co/datasets/hkust-nlp/LeetCode-O)|
45 |
46 | Due to our collaborators' compliance requirements, we only release the PythonEdu-Reasoning subset of the CodeI/O(++) dataset.
47 |
48 |
49 |
50 | #### Models
51 |
52 |
53 | Base Model / Training |
54 | CodeI/O |
55 | CodeI/O++ |
56 |
57 |
58 | Stage 1 |
59 | Stage 2 |
60 | Stage 1 |
61 | Stage 2 |
62 |
63 |
64 | Qwen 2.5 7B Coder |
65 |
66 | 🤗 |
67 | 🤗 |
68 | 🤗 |
69 | 🤗 |
70 |
71 |
72 | LLaMA 3.1 8B |
73 | 🤗 |
74 | 🤗 |
75 | 🤗 |
76 | 🤗 |
77 |
78 |
79 | DeepSeek v2 Lite Coder |
80 | 🤗 |
81 | 🤗 |
82 | 🤗 |
83 | 🤗 |
84 |
85 |
86 |
87 |
88 | ## Get Started
89 |
90 | ### Setup
91 |
92 | We provide both the `requirements.txt` and `environment.yaml`. You can choose either way to setup the environment.
93 | ```
94 | conda create -n codeio_exec python 3.11
95 | conda activate codeio_exec
96 | pip install -r requirements.txt
97 | ```
98 | or
99 | ```
100 | conda env create -f environment.yaml --name codeio_exec
101 | conda activate codeio_exec
102 | ```
103 | Please note that our setup does not guarantee the execution of all types of Python code; you may need to update the environment to meet your personal requirements when processing different code files.
104 |
105 | ### Data Processing
106 |
107 | We provide a complete guide for you to build data for CodeI/O on a toy dataset. After all these steps you can get a dataset with the same format as in our [huggingface dataset](https://huggingface.co/datasets/hkust-nlp/CodeIO-Pyedu-Reasoning).
108 |
109 | All intermediate results will be stored under `./data`, but we have also provided a set of pre-processed files under `./processed_data`.
110 |
111 | #### Step 1: Convert raw code files into the unified format.
112 |
113 | ##### Step 1.1: Build Messages
114 | ```
115 | python ./src/build_transform_msg.py \
116 | --raw_code_file data/rawcode_1k.jsonl \
117 | --raw_code_msg_file data/rawcode_1k_msg.jsonl
118 | ```
119 | ##### Step 1.2: Inference
120 | ```
121 | python ./src/batched_api_inference.py \
122 | --input data/rawcode_1k_msg.jsonl \
123 | --output data/rawcode_1k_unified.jsonl \
124 | --model deepseek-chat \
125 | --num_process 10 \
126 | --num_thread 10 \
127 | --key \
128 | --temperature 0.7 \
129 | --max_tokens 4096
130 | ```
131 | You can also use GPT series models to do this transformation step, since recently the DeepSeek API is under heavy pressure. For example, set `--model` as `gpt-4o-mini-2024-07-18` and change `--key` accordingly.
132 | You may find some the requests failed, it's OK and we just skip them.
133 |
134 | *Note that we only provide the code to inference with OpenAI-style APIs. However, it is also 100\% feasible to deploy other open-source models and inference locally via frameworks like [vllm](https://github.com/vllm-project/vllm) or [sglang](https://github.com/sgl-project/sglang). Please refer to their official websites for more details.
135 | #### Step 2: Parse & Generate I/O Pairs
136 | ```
137 | python ./src/parse_gen_ios.py \
138 | --input_file data/rawcode_1k_unified.jsonl \
139 | --output_file data/rawcode_1k_parsed.jsonl \
140 | --python_path "python" \
141 | --run_path "./temp/temp/temp"
142 | ```
143 | The `--python_path` is the python path you will use to run the I/O pair generation code, which can be different from what you use in the main workflow, e.g., installed with some specific packages. The `--run_path` is the path where the I/O pair generation code will be executed, since sometimes it will store some temp files in the file systems, so we explicitly assign a place for it to save them.
144 |
145 | #### Step 3: Build Input-Output Prediction Instances
146 | We only pick 3 input prediction and 3 output prediction instances for each sample.
147 | ```
148 | python ./src/build_codeio_msg.py \
149 | --input_file data/rawcode_1k_parsed.jsonl \
150 | --output_file data/codeio_1k_msg.jsonl
151 | ```
152 |
153 | #### Step 4: Inference on CodeI/O data
154 | ```
155 | python ./src/batched_api_inference.py \
156 | --input data/codeio_1k_msg.jsonl \
157 | --output data/codeio_1k_gens.jsonl \
158 | --model deepseek-chat \
159 | --num_process 10 \
160 | --num_thread 10 \
161 | --key \
162 | --temperature 0.7 \
163 | --max_tokens 4096
164 | ```
165 | #### Step 5: Verification
166 | ```
167 | bash ./scripts/pipeline_check.sh \
168 | data/rawcode_1k_parsed.jsonl \
169 | data/codeio_1k_gens.jsonl \
170 | data/codeio_1k_gens_verified.jsonl \
171 | python \
172 | ./temp/temp/temp
173 | ```
174 | In the bash script we run the verification for several times to try our best avoid the runtime effect brought by multi-processing execution (e.g. timeout). This is helpful for large scale verification. You can change the number of process to match your machine (e.g. more if you have a large number of CPUs and a large memory).
175 |
176 | #### Step 6: Second Turn - Revision and Re-verification
177 | ##### Step 6.1: Build Multi-turn Messages
178 | ```
179 | python ./src/build_codeio_rev_msg.py \
180 | --input_file data/codeio_1k_gens_verified.jsonl \
181 | --output_file data/codeio_1k_msg_rev.jsonl
182 | ```
183 | ##### Step 6.2: Re-generate
184 | ```
185 | python ./src/batched_api_inference.py \
186 | --input data/codeio_1k_msg_rev.jsonl \
187 | --output data/codeio_1k_gens_rev.jsonl \
188 | --model deepseek-chat \
189 | --num_process 10 \
190 | --num_thread 10 \
191 | --key \
192 | --temperature 0.7 \
193 | --max_tokens 4096
194 | ```
195 | ##### Step 6.3: Re-verification
196 | ```
197 | bash ./scripts/pipeline_check.sh \
198 | data/rawcode_1k_parsed.jsonl \
199 | data/codeio_1k_gens_rev.jsonl \
200 | data/codeio_1k_gens_rev_verified.jsonl \
201 | python \
202 | ./temp/temp/temp
203 | ```
204 | ##### Step 6.4: Final Data
205 | ```
206 | python ./src/assemble_codeio_demo.py \
207 | --result_file_turn1 data/codeio_1k_gens_verified.jsonl \
208 | --result_file_turn2 data/codeio_1k_gens_rev_verified.jsonl \
209 | --output_file codeio_demo_final.jsonl
210 | ```
211 | By doing so, you can get data `data/codeio_demo_final.jsonl` with the same format as in our [huggingface dataset](https://huggingface.co/datasets/hkust-nlp/CodeIO-Pyedu-Reasoning).
212 |
213 | ### Training
214 | You can use any popular training framework to train your model like [llama-factory](https://github.com/hiyouga/LLaMA-Factory).
215 |
216 | ## Citation
217 | If you find this work helpful, please kindly cite as:
218 | ```
219 | @article{li2025codeio,
220 | title={CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction},
221 | author={Li, Junlong and Guo, Daya and Yang, Dejian and Xu, Runxin and Wu, Yu and He, Junxian},
222 | journal={arXiv preprint arXiv:2502.07316},
223 | year={2025}
224 | }
225 | ```
226 |
227 | ## Acknowledgement
228 | We thank Fan Zhou, Wei Liu and Yiheng Xu for their valuable feedback and suggestions! 🤗🤗🤗
229 |
--------------------------------------------------------------------------------
/environment.yaml:
--------------------------------------------------------------------------------
1 | channels:
2 | - defaults
3 | dependencies:
4 | - _libgcc_mutex=0.1=main
5 | - _openmp_mutex=5.1=1_gnu
6 | - bzip2=1.0.8=h5eee18b_6
7 | - ca-certificates=2024.7.2=h06a4308_0
8 | - ld_impl_linux-64=2.38=h1181459_1
9 | - libffi=3.4.4=h6a678d5_1
10 | - libgcc-ng=11.2.0=h1234567_1
11 | - libgomp=11.2.0=h1234567_1
12 | - libstdcxx-ng=11.2.0=h1234567_1
13 | - libuuid=1.41.5=h5eee18b_0
14 | - ncurses=6.4=h6a678d5_0
15 | - openssl=3.0.15=h5eee18b_0
16 | - python=3.11.9=h955ad1f_0
17 | - readline=8.2=h5eee18b_0
18 | - sqlite=3.45.3=h5eee18b_0
19 | - tk=8.6.14=h39e8969_0
20 | - xz=5.4.6=h5eee18b_1
21 | - zlib=1.2.13=h5eee18b_1
22 | - pip:
23 | - absl-py==2.1.0
24 | - accelerate==1.1.1
25 | - addict==2.4.0
26 | - aiohappyeyeballs==2.4.4
27 | - aiohttp==3.11.9
28 | - aiosignal==1.3.1
29 | - annotated-types==0.7.0
30 | - anyio==4.6.2.post1
31 | - appdirs==1.4.4
32 | - argparse==1.4.0
33 | - arviz==0.20.0
34 | - astropy==6.1.4
35 | - astropy-iers-data==0.2024.11.4.0.33.34
36 | - attrs==24.2.0
37 | - beautifulsoup4==4.12.3
38 | - bibtexparser==1.4.3
39 | - biopython==1.84
40 | - blis==0.7.11
41 | - cachetools==5.5.0
42 | - catalogue==1.0.2
43 | - certifi==2024.8.30
44 | - cffi==1.17.1
45 | - charset-normalizer==3.3.2
46 | - chess==1.11.1
47 | - cirq==1.4.1
48 | - cirq-aqt==1.4.1
49 | - cirq-core==1.4.1
50 | - cirq-google==1.4.1
51 | - cirq-ionq==1.4.1
52 | - cirq-pasqal==1.4.1
53 | - cirq-rigetti==1.4.1
54 | - cirq-web==1.4.1
55 | - clarabel==0.9.0
56 | - click==8.1.7
57 | - cloudpathlib==0.19.0
58 | - cloudpickle==3.1.0
59 | - colorama==0.4.6
60 | - compressed-tensors==0.8.0
61 | - contourpy==1.3.0
62 | - cpm-kernels==1.0.11
63 | - cpmpy==0.9.23
64 | - cramjam==2.8.3
65 | - cryptography==43.0.3
66 | - cvxpy==1.5.3
67 | - cycler==0.12.1
68 | - cymem==2.0.8
69 | - datasets==3.1.0
70 | - deprecated==1.2.14
71 | - dill==0.3.8
72 | - diskcache==5.6.3
73 | - distro==1.9.0
74 | - docker-pycreds==0.4.0
75 | - docopt==0.6.2
76 | - duet==0.2.9
77 | - ecos==2.0.14
78 | - einops==0.5.0
79 | - en-core-web-sm==2.3.1
80 | - et-xmlfile==2.0.0
81 | - evalplus==0.1.0.dev598
82 | - evaluate==0.4.3
83 | - fastapi==0.115.5
84 | - fastparquet==2024.5.0
85 | - fastprogress==1.0.3
86 | - filelock==3.16.1
87 | - fire==0.7.0
88 | - fonttools==4.54.1
89 | - frozendict==2.4.6
90 | - frozenlist==1.5.0
91 | - fsspec==2024.9.0
92 | - func-timeout==4.3.5
93 | - fuzzywuzzy==0.18.0
94 | - gguf==0.10.0
95 | - gitdb==4.0.11
96 | - gitpython==3.1.43
97 | - gmpy2==2.2.1
98 | - google-api-core==2.22.0
99 | - google-auth==2.35.0
100 | - googleapis-common-protos==1.65.0
101 | - greenlet==3.1.1
102 | - grilops==0.10.3
103 | - grpc-interceptor==0.15.4
104 | - grpcio==1.67.1
105 | - grpcio-status==1.62.3
106 | - h11==0.14.0
107 | - h5netcdf==1.4.0
108 | - h5py==3.12.1
109 | - html5lib==1.1
110 | - httpcore==1.0.6
111 | - httptools==0.6.4
112 | - httpx==0.27.2
113 | - huggingface-hub==0.24.7
114 | - idna==3.10
115 | - immutabledict==4.2.0
116 | - importlib-metadata==8.5.0
117 | - interegular==0.3.3
118 | - jieba==0.42.1
119 | - jinja2==3.1.4
120 | - jiter==0.8.0
121 | - joblib==1.4.2
122 | - json5==0.10.0
123 | - jsonschema==4.23.0
124 | - jsonschema-specifications==2024.10.1
125 | - kiwisolver==1.4.7
126 | - langcodes==3.4.0
127 | - language-data==1.2.0
128 | - lark==1.2.2
129 | - levenshtein==0.26.1
130 | - llvmlite==0.43.0
131 | - lm-format-enforcer==0.10.9
132 | - lxml==5.3.0
133 | - mando==0.7.1
134 | - marisa-trie==1.2.0
135 | - markdown==3.7
136 | - markdown-it-py==3.0.0
137 | - markupsafe==2.1.5
138 | - matplotlib==3.9.2
139 | - matplotlib-inline==0.1.7
140 | - mdurl==0.1.2
141 | - mistral-common==1.5.1
142 | - mistune==3.0.2
143 | - mmengine-lite==0.10.5
144 | - mosestokenizer==1.0.0
145 | - mpmath==1.3.0
146 | - msgpack==1.1.0
147 | - msgspec==0.18.6
148 | - multidict==6.1.0
149 | - multipledispatch==1.0.0
150 | - multiprocess==0.70.16
151 | - multitasking==0.0.11
152 | - murmurhash==1.0.10
153 | - nest-asyncio==1.6.0
154 | - networkx==3.4.1
155 | - nltk==3.8
156 | - numba==0.60.0
157 | - numpoly==1.2.14
158 | - numpy==1.26.4
159 | - openai==1.55.3
160 | - opencc==1.1.9
161 | - opencv-python-headless==4.10.0.84
162 | - openfile==0.0.7
163 | - openpyxl==3.1.5
164 | - ortools==9.9.3963
165 | - osqp==0.6.7.post3
166 | - outlines==0.0.46
167 | - packaging==23.2
168 | - pandas>=2.0.0
169 | - partial-json-parser==0.2.1.1.post4
170 | - patsy==0.5.6
171 | - peewee==3.17.7
172 | - pillow==10.4.0
173 | - pip==24.2
174 | - plac==1.1.3
175 | - platformdirs==4.3.6
176 | - playwright==1.49.1
177 | - poetry-core==1.9.0
178 | - portalocker==3.0.0
179 | - preshed==3.0.9
180 | - prettytable==3.12.0
181 | - prometheus-client==0.21.0
182 | - prometheus-fastapi-instrumentator==7.0.0
183 | - propcache==0.2.1
184 | - proto-plus==1.25.0
185 | - protobuf==4.25.5
186 | - psutil==6.1.0
187 | - pulp==2.9.0
188 | - py-cpuinfo==9.0.0
189 | - pyairports==2.1.1
190 | - pyarrow==17.0.0
191 | - pyasn1==0.6.1
192 | - pyasn1-modules==0.4.1
193 | - pycosat==0.6.6
194 | - pycountry==24.6.1
195 | - pycparser==2.22
196 | - pycryptodome==3.21.0
197 | - pydantic==2.9.2
198 | - pydantic-core==2.23.4
199 | - pyee==12.0.0
200 | - pyenchant==3.2.2
201 | - pyerfa==2.0.1.4
202 | - pyext==0.5
203 | - pygments==2.18.0
204 | - pymc3==3.11.4
205 | - pympler==1.1
206 | - pyparsing==3.2.0
207 | - pyperclip==1.9.0
208 | - pyquil==4.14.3
209 | - python-dateutil==2.9.0.post0
210 | - python-dotenv==1.0.1
211 | - python-levenshtein==0.26.1
212 | - python-rapidjson==1.20
213 | - pytz==2024.2
214 | - pywavelets==1.7.0
215 | - pyyaml==6.0.2
216 | - pyzmq==26.2.0
217 | - qcs-api-client-common==0.10.0
218 | - qcs-sdk-python==0.20.1
219 | - qdldl==0.1.7.post4
220 | - quil==0.13.1
221 | - radon==6.0.1
222 | - rank-bm25==0.2.2
223 | - rapidfuzz==3.10.1
224 | - ray==2.39.0
225 | - referencing==0.35.1
226 | - regex==2024.9.11
227 | - requests==2.32.3
228 | - retrying==1.3.4
229 | - rich==13.8.1
230 | - rouge==1.0.1
231 | - rouge-chinese==1.0.3
232 | - rouge-score==0.1.2
233 | - rpcq==3.11.0
234 | - rpds-py==0.21.0
235 | - rsa==4.9
236 | - ruamel-yaml==0.18.6
237 | - ruamel-yaml-clib==0.2.12
238 | - sacrebleu==2.4.3
239 | - safetensors==0.4.5
240 | - scikit-learn==1.5.0
241 | - scipy==1.14.1
242 | - scs==3.2.7
243 | - seaborn==0.13.2
244 | - semver==3.0.2
245 | - sentence-transformers==2.2.2
246 | - sentencepiece==0.2.0
247 | - sentry-sdk==2.18.0
248 | - setproctitle==1.3.3
249 | - setuptools==72.1.0
250 | - shellingham==1.5.4
251 | - shortuuid==1.0.13
252 | - six==1.16.0
253 | - smart-open==7.0.4
254 | - smmap==5.0.1
255 | - sniffio==1.3.1
256 | - sortedcontainers==2.4.0
257 | - soupsieve==2.6
258 | - spacy==2.3.9
259 | - spacy-legacy==3.0.12
260 | - spacy-loggers==1.0.5
261 | - srsly==1.0.7
262 | - starlette==0.41.3
263 | - strip-markdown==1.3
264 | - sympy==1.13.1
265 | - tabulate==0.9.0
266 | - tempdir==0.7.1
267 | - termcolor==2.5.0
268 | - theano-pymc==1.1.2
269 | - thinc==7.4.6
270 | - threadpoolctl==3.5.0
271 | - tiktoken==0.7.0
272 | - timeout-decorator==0.5.0
273 | - tokenizers==0.20.3
274 | - toolwrapper==2.1.0
275 | - torch==2.5.1
276 | - torchvision==0.20.1
277 | - tqdm==4.66.5
278 | - traitlets==5.14.3
279 | - transformers==4.46.2
280 | - triton==3.1.0
281 | - typer==0.12.5
282 | - types-deprecated==1.2.9.20240311
283 | - typing-extensions==4.12.2
284 | - tzdata==2024.1
285 | - urllib3==2.2.3
286 | - uvicorn==0.32.1
287 | - uvloop==0.21.0
288 | - wandb==0.18.7
289 | - wasabi==0.10.1
290 | - watchfiles==1.0.0
291 | - wcwidth==0.2.13
292 | - webencodings==0.5.1
293 | - websockets==14.1
294 | - wget==3.2
295 | - wheel==0.43.0
296 | - wrapt==1.16.0
297 | - xarray==2024.10.0
298 | - xarray-einstats==0.8.0
299 | - xformers==0.0.28.post3
300 | - xxhash==3.5.0
301 | - yapf==0.43.0
302 | - yarl==1.18.3
303 | - yfinance==0.2.48
304 | - z3-solver==4.13.3.0
305 | - zipp==3.20.2
306 |
--------------------------------------------------------------------------------
/figures/ioexample.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hkust-nlp/CodeIO/1d3541cc928e9f76da9c80e95778635a11e0583c/figures/ioexample.png
--------------------------------------------------------------------------------
/figures/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hkust-nlp/CodeIO/1d3541cc928e9f76da9c80e95778635a11e0583c/figures/overview.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | absl-py==2.1.0
2 | accelerate==1.1.1
3 | addict==2.4.0
4 | aiohappyeyeballs==2.4.4
5 | aiohttp==3.11.9
6 | aiosignal==1.3.1
7 | annotated-types==0.7.0
8 | anyio==4.6.2.post1
9 | appdirs==1.4.4
10 | arviz==0.20.0
11 | astropy==6.1.4
12 | astropy-iers-data==0.2024.11.4.0.33.34
13 | attrs==24.2.0
14 | beautifulsoup4==4.12.3
15 | bibtexparser==1.4.3
16 | biopython==1.84
17 | blis==0.7.11
18 | cachetools==5.5.0
19 | catalogue==1.0.2
20 | certifi==2024.8.30
21 | cffi==1.17.1
22 | charset-normalizer==3.3.2
23 | chess==1.11.1
24 | cirq==1.4.1
25 | cirq-aqt==1.4.1
26 | cirq-core==1.4.1
27 | cirq-google==1.4.1
28 | cirq-ionq==1.4.1
29 | cirq-pasqal==1.4.1
30 | cirq-rigetti==1.4.1
31 | cirq-web==1.4.1
32 | clarabel==0.9.0
33 | click==8.1.7
34 | cloudpathlib==0.19.0
35 | cloudpickle==3.1.0
36 | colorama==0.4.6
37 | compressed-tensors==0.8.0
38 | contourpy==1.3.0
39 | cpm-kernels==1.0.11
40 | cpmpy==0.9.23
41 | cramjam==2.8.3
42 | cryptography==43.0.3
43 | cvxpy==1.5.3
44 | cycler==0.12.1
45 | cymem==2.0.8
46 | datasets==3.1.0
47 | Deprecated==1.2.14
48 | dill==0.3.8
49 | diskcache==5.6.3
50 | distro==1.9.0
51 | docker-pycreds==0.4.0
52 | docopt==0.6.2
53 | duet==0.2.9
54 | ecos==2.0.14
55 | einops==0.5.0
56 | et_xmlfile==2.0.0
57 | evaluate==0.4.3
58 | fastapi==0.115.5
59 | fastparquet==2024.5.0
60 | fastprogress==1.0.3
61 | filelock==3.16.1
62 | fire==0.7.0
63 | fonttools==4.54.1
64 | frozendict==2.4.6
65 | frozenlist==1.5.0
66 | fsspec==2024.9.0
67 | func_timeout==4.3.5
68 | fuzzywuzzy==0.18.0
69 | gguf==0.10.0
70 | gitdb==4.0.11
71 | GitPython==3.1.43
72 | gmpy2==2.2.1
73 | google-api-core==2.22.0
74 | google-auth==2.35.0
75 | googleapis-common-protos==1.65.0
76 | greenlet==3.1.1
77 | grilops==0.10.3
78 | grpc-interceptor==0.15.4
79 | grpcio==1.67.1
80 | grpcio-status==1.62.3
81 | h11==0.14.0
82 | h5netcdf==1.4.0
83 | h5py==3.12.1
84 | html5lib==1.1
85 | httpcore==1.0.6
86 | httptools==0.6.4
87 | httpx==0.27.2
88 | huggingface-hub==0.24.7
89 | idna==3.10
90 | immutabledict==4.2.0
91 | importlib_metadata==8.5.0
92 | interegular==0.3.3
93 | jieba==0.42.1
94 | Jinja2==3.1.4
95 | jiter==0.8.0
96 | joblib==1.4.2
97 | json5==0.10.0
98 | jsonschema==4.23.0
99 | jsonschema-specifications==2024.10.1
100 | kiwisolver==1.4.7
101 | langcodes==3.4.0
102 | language_data==1.2.0
103 | lark==1.2.2
104 | Levenshtein==0.26.1
105 | llvmlite==0.43.0
106 | lm-format-enforcer==0.10.9
107 | lxml==5.3.0
108 | mando==0.7.1
109 | marisa-trie==1.2.0
110 | Markdown==3.7
111 | markdown-it-py==3.0.0
112 | MarkupSafe==2.1.5
113 | matplotlib==3.9.2
114 | matplotlib-inline==0.1.7
115 | mdurl==0.1.2
116 | mistral_common==1.5.1
117 | mistune==3.0.2
118 | mmengine-lite==0.10.5
119 | mosestokenizer==1.0.0
120 | mpmath==1.3.0
121 | msgpack==1.1.0
122 | msgspec==0.18.6
123 | multidict==6.1.0
124 | multipledispatch==1.0.0
125 | multiprocess==0.70.16
126 | multitasking==0.0.11
127 | murmurhash==1.0.10
128 | nest-asyncio==1.6.0
129 | networkx==3.4.1
130 | nltk==3.8
131 | numba==0.60.0
132 | numpoly==1.2.14
133 | numpy==1.26.4
134 | openai==1.55.3
135 | OpenCC==1.1.9
136 | opencv-python-headless==4.10.0.84
137 | openfile==0.0.7
138 | openpyxl==3.1.5
139 | ortools==9.9.3963
140 | osqp==0.6.7.post3
141 | outlines==0.0.46
142 | packaging==23.2
143 | pandas>=2.0.0
144 | partial-json-parser==0.2.1.1.post4
145 | patsy==0.5.6
146 | peewee==3.17.7
147 | pillow==10.4.0
148 | plac==1.1.3
149 | platformdirs==4.3.6
150 | playwright==1.49.1
151 | poetry-core==1.9.0
152 | portalocker==3.0.0
153 | preshed==3.0.9
154 | prettytable==3.12.0
155 | prometheus-fastapi-instrumentator==7.0.0
156 | prometheus_client==0.21.0
157 | propcache==0.2.1
158 | proto-plus==1.25.0
159 | protobuf==4.25.5
160 | psutil==6.1.0
161 | PuLP==2.9.0
162 | py-cpuinfo==9.0.0
163 | pyairports==2.1.1
164 | pyarrow==17.0.0
165 | pyasn1==0.6.1
166 | pyasn1_modules==0.4.1
167 | pycosat==0.6.6
168 | pycountry==24.6.1
169 | pycparser==2.22
170 | pycryptodome==3.21.0
171 | pydantic==2.9.2
172 | pydantic_core==2.23.4
173 | pyee==12.0.0
174 | pyenchant==3.2.2
175 | pyerfa==2.0.1.4
176 | pyext==0.5
177 | Pygments==2.18.0
178 | pymc3==3.11.4
179 | Pympler==1.1
180 | pyparsing==3.2.0
181 | pyperclip==1.9.0
182 | pyquil==4.14.3
183 | python-dateutil==2.9.0.post0
184 | python-dotenv==1.0.1
185 | python-Levenshtein==0.26.1
186 | python-rapidjson==1.20
187 | pytz==2024.2
188 | PyWavelets==1.7.0
189 | PyYAML==6.0.2
190 | pyzmq==26.2.0
191 | qcs-api-client-common==0.10.0
192 | qcs-sdk-python==0.20.1
193 | qdldl==0.1.7.post4
194 | quil==0.13.1
195 | radon==6.0.1
196 | rank-bm25==0.2.2
197 | RapidFuzz==3.10.1
198 | ray==2.39.0
199 | referencing==0.35.1
200 | regex==2024.9.11
201 | requests==2.32.3
202 | retrying==1.3.4
203 | rich==13.8.1
204 | rouge==1.0.1
205 | rouge-chinese==1.0.3
206 | rouge_score==0.1.2
207 | rpcq==3.11.0
208 | rpds-py==0.21.0
209 | rsa==4.9
210 | ruamel.yaml==0.18.6
211 | ruamel.yaml.clib==0.2.12
212 | sacrebleu==2.4.3
213 | safetensors==0.4.5
214 | scikit-learn==1.5.0
215 | scipy==1.14.1
216 | scs==3.2.7
217 | seaborn==0.13.2
218 | semver==3.0.2
219 | sentence-transformers==2.2.2
220 | sentencepiece==0.2.0
221 | sentry-sdk==2.18.0
222 | setproctitle==1.3.3
223 | shellingham==1.5.4
224 | shortuuid==1.0.13
225 | six==1.16.0
226 | smart-open==7.0.4
227 | smmap==5.0.1
228 | sniffio==1.3.1
229 | sortedcontainers==2.4.0
230 | soupsieve==2.6
231 | spacy==2.3.9
232 | spacy-legacy==3.0.12
233 | spacy-loggers==1.0.5
234 | srsly==1.0.7
235 | starlette==0.41.3
236 | strip-markdown==1.3
237 | sympy==1.13.1
238 | tabulate==0.9.0
239 | tempdir==0.7.1
240 | termcolor==2.5.0
241 | Theano-PyMC==1.1.2
242 | thinc==7.4.6
243 | threadpoolctl==3.5.0
244 | tiktoken==0.7.0
245 | timeout-decorator==0.5.0
246 | tokenizers==0.20.3
247 | toolwrapper==2.1.0
248 | torch==2.5.1
249 | torchvision==0.20.1
250 | tqdm==4.66.5
251 | traitlets==5.14.3
252 | transformers==4.46.2
253 | triton==3.1.0
254 | typer==0.12.5
255 | types-Deprecated==1.2.9.20240311
256 | typing_extensions==4.12.2
257 | tzdata==2024.1
258 | urllib3==2.2.3
259 | uvicorn==0.32.1
260 | uvloop==0.21.0
261 | wandb==0.18.7
262 | wasabi==0.10.1
263 | watchfiles==1.0.0
264 | wcwidth==0.2.13
265 | webencodings==0.5.1
266 | websockets==14.1
267 | wget==3.2
268 | wrapt==1.16.0
269 | xarray==2024.10.0
270 | xarray-einstats==0.8.0
271 | xformers==0.0.28.post3
272 | xxhash==3.5.0
273 | yapf==0.43.0
274 | yarl==1.18.3
275 | yfinance==0.2.48
276 | z3-solver==4.13.3.0
277 | zipp==3.20.2
278 |
--------------------------------------------------------------------------------
/scripts/pipeline_check.sh:
--------------------------------------------------------------------------------
1 | pfn=${1:-data/rawcode_1k_parsed.jsonl}
2 | ifn=${2:-data/codeio_1k_gens.jsonl}
3 | ofn=${3:-data/codeio_1k_gens_verified.jsonl}
4 | pythonpath=${4:-"python"}
5 | runpath=${5:-"./temp/temp/temp"}
6 |
7 | python ./src/check_io_pred_acc_mp.py \
8 | --parsed_file_name $pfn \
9 | --pred_file_name $ifn \
10 | --res_file_name $ofn \
11 | --batchsize 1024 \
12 | --num_processes 24 \
13 | --python_path $pythonpath \
14 | --run_path $runpath
15 |
16 | for i in {1..10}
17 | do
18 | echo "trial $i"
19 |
20 | if [ $i -eq 1 ]; then
21 | numprocess=24
22 | elif [ $i -eq 2 ]; then
23 | numprocess=16
24 | elif [ $i -eq 8 ] || [ $i -eq 9 ] || [ $i -eq 10 ]; then
25 | numprocess=4
26 | else
27 | numprocess=8
28 | fi
29 |
30 | echo "numprocess: $numprocess"
31 |
32 | python ./src/check_io_pred_acc_mp_inplace.py \
33 | --parsed_file_name $pfn \
34 | --pred_file_name $ofn \
35 | --batchsize 1024 \
36 | --write_batchsize 16 \
37 | --num_processes $numprocess \
38 | --python_path $pythonpath \
39 | --run_path $runpath
40 | done
--------------------------------------------------------------------------------
/src/assemble_codeio_demo.py:
--------------------------------------------------------------------------------
1 |
2 | from utils import *
3 | from tqdm import tqdm
4 |
5 | if __name__=="__main__":
6 | import argparse
7 | parser = argparse.ArgumentParser()
8 | parser.add_argument("--result_file_turn1", type=str, default=None)
9 | parser.add_argument("--result_file_turn2", type=str, default=None)
10 | parser.add_argument("--output_file", type=str, default=None)
11 | args = parser.parse_args()
12 |
13 | fn1 = args.result_file_turn1
14 | fn2 = args.result_file_turn2
15 | ofn = args.output_file
16 | dt1 = load_jsonl_yield(fn1)
17 | dt2 = load_jsonl_yield(fn2)
18 | ndt = []
19 | for item in tqdm(dt1):
20 | status = item['res']['status']
21 | if status == 'success':
22 | sample = {"prompt":item['messages'][0]['content'],
23 | "turn_1":item['output'],
24 | "feedback_1":item['res']['message'],
25 | "turn_2":None,
26 | "feedback_2":None}
27 | ndt.append(sample)
28 | if len(ndt)==1000:
29 | write_jsonl(ndt,ofn,"a")
30 | ndt = []
31 | for item in tqdm(dt2):
32 | # elegant_show(item,full=True)
33 | # raise ValueError
34 | sample = {"prompt":item['messages'][0]['content'],
35 | "turn_1":item['messages'][1]['content'],
36 | "feedback_1":item['messages'][2]['content'],
37 | "turn_2":item['output'],
38 | "feedback_2":item['res']['message']}
39 | ndt.append(sample)
40 | if len(ndt)==1000:
41 | write_jsonl(ndt,ofn,"a")
42 | ndt = []
43 |
44 | write_jsonl(ndt,ofn,"a")
--------------------------------------------------------------------------------
/src/batched_api_inference.py:
--------------------------------------------------------------------------------
1 | try:
2 | from openai import OpenAI
3 | except:
4 | pass
5 | import datetime
6 | import json
7 | import multiprocessing
8 | from argparse import ArgumentParser
9 | import os
10 | import time
11 | from tqdm import tqdm
12 |
13 | from multiprocessing import Process, Queue, Lock, Value
14 | import concurrent
15 | from concurrent.futures import ThreadPoolExecutor
16 |
17 | ###############################################
18 | max_try_one_call = 3
19 | SYSTEM = None
20 | ###############################################
21 |
22 | def get_client():
23 | assert model.startswith("gpt") or model.startswith("deepseek")
24 | params = {
25 | "api_key": key,
26 | "timeout":10000.0
27 | }
28 | if model.startswith("deepseek"):
29 | params["base_url"] = "https://api.deepseek.com"
30 |
31 | client = OpenAI(
32 | **params
33 | )
34 | return client
35 |
36 | def timer(func):
37 | def format_time(time_delta):
38 | hours, remainder = divmod(time_delta.total_seconds(), 3600)
39 | minutes, seconds = divmod(remainder, 60)
40 | return f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d}"
41 | def wrapper(*args, **kwargs):
42 | start_time = datetime.datetime.now()
43 | print("开始时间:", start_time.strftime("%Y-%m-%d %H:%M:%S"))
44 | result = func(*args, **kwargs)
45 | end_time = datetime.datetime.now()
46 | print("结束时间:", end_time.strftime("%Y-%m-%d %H:%M:%S"))
47 | elapsed_time = end_time - start_time
48 | print("执行时间:", format_time(elapsed_time))
49 | return result
50 | return wrapper
51 |
52 | def load_jsonl_yield(path):
53 | with open(path) as f:
54 | for row, line in enumerate(f):
55 | try:
56 | line = json.loads(line)
57 | yield line
58 | except:
59 | pass
60 |
61 | def check_exists(line):
62 | if "output" in line and line["output"] is not None:
63 | return True
64 | return False
65 |
66 | def process_line(js, good_cnt, bad_cnt, lock, output_path):
67 | messages = js['messages']
68 | response = None
69 | finish_reason = None
70 | for i in range(max_try_one_call): # retry if failed
71 | try:
72 | client = get_client()
73 | chat_completion = client.chat.completions.create(
74 | model = model,
75 | messages = messages,
76 | max_tokens = max_tokens,
77 | temperature = temperature,
78 | timeout=10000.0
79 | )
80 | if model == "deepseek-reasoner":
81 | reasoning = chat_completion.choices[0].message.reasoning_content
82 | else:
83 | reasoning = None
84 | response = chat_completion.choices[0].message.content
85 | finish_reason = chat_completion.choices[0].finish_reason
86 | break
87 | except Exception as e:
88 | if i>>>", problem_statement).replace("<<<>>>", io_req)
8 | tag = "<<<