├── xcodeeval-hf.png ├── xcodeeval_fig_1.png ├── LICENSE ├── evaluation ├── README.md ├── apr │ ├── get_result.py │ ├── gen_apr.py │ └── eval_apr.py ├── program_synthesis │ ├── get_result.py │ ├── gen_program_synthesis.py │ └── eval_program_synthesis.py └── code_translation │ ├── get_result.py │ ├── gen_code_translation.py │ └── eval_code_translation.py ├── requirement.txt ├── tag_classification.md ├── README.md ├── code_translation.md ├── program_synthesis.md ├── code_compilation.md ├── apr.md └── retrieval.md /xcodeeval-hf.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ntunlp/xCodeEval/HEAD/xcodeeval-hf.png -------------------------------------------------------------------------------- /xcodeeval_fig_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ntunlp/xCodeEval/HEAD/xcodeeval_fig_1.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 NLP Group, Nanyang Technological University 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /evaluation/README.md: -------------------------------------------------------------------------------- 1 | # How to perform evaluation using ExecEval 2 | 3 | ## Configure ExecEval 4 | 5 | Follow the instrauction [here](https://github.com/ntunlp/execeval). 6 | 7 | > :warning: ❌ Do not run ExecEval without Docker image 8 | 9 | `Java` and `Kotlin 1.5` has a lots of Memory related issue. If you don't have large amount of memory please reduce the multiple worker. If you are running in a laptop or Desktop with limited amount of RAM (~ 32 GB), please use num `NUM_WORKERS`=1. Performance holds well upto 1/3rd of the CPU available at max. 10 | 11 | ## Setup Environment 12 | Install python packages in your own environment. 13 | ``` 14 | pip install -r requirement.txt 15 | ``` 16 | 17 | Install ExecEval. 18 | ``` 19 | git clone https://github.com/ntunlp/ExecEval 20 | cd ExecEval 21 | docker build . -t exec-eval:1.0 22 | docker run -it -p 5000:5000 -e NUM_WORKERS=37 exec-eval:1.0 23 | ``` 24 | 25 | ## Generate samples 26 | 27 | Generate samples using OpenAI api. 28 | 29 | ``` 30 | python evaluation/program_synthesis/gen_program_synthesis.py 31 | python evaluation/code_translation/gen_code_translation.py 32 | python evaluation/apr/gen_apr.py 33 | ``` 34 | 35 | ## Eval Samples using ExecEval 36 | 37 | Keep ExecEval server/endpoint running and then run the following code, 38 | 39 | ``` 40 | python evaluation/program_synthesis/eval_program_synthesis.py 41 | python evaluation/code_translation/eval_code_translation.py 42 | python evaluation/apr/eval_apr.py 43 | ``` 44 | 45 | ## Calculate pass@k 46 | 47 | Calculate pass@k by the following script, 48 | 49 | ``` 50 | python evaluation/program_synthesis/get_result.py 51 | python evaluation/code_translation/get_result.py 52 | python evaluation/apr/get_result.py 53 | ``` -------------------------------------------------------------------------------- /requirement.txt: -------------------------------------------------------------------------------- 1 | aiohappyeyeballs==2.4.0 2 | aiohttp==3.10.5 3 | aiosignal==1.3.1 4 | altair==5.4.1 5 | astor==0.8.1 6 | async-timeout==4.0.3 7 | attrs==24.2.0 8 | backports.zoneinfo==0.2.1 9 | base58==2.1.1 10 | black==21.12b0 11 | blinker==1.8.2 12 | Brotli==1.1.0 13 | cachetools==5.5.0 14 | certifi==2024.8.30 15 | charset-normalizer==3.3.2 16 | click==7.1.2 17 | datasets==2.16.1 18 | dill==0.3.8 19 | exceptiongroup==1.2.2 20 | filelock==3.16.0 21 | flake8==7.1.1 22 | frozenlist==1.4.1 23 | fsspec==2024.6.1 24 | gitdb==4.0.11 25 | GitPython==3.1.43 26 | huggingface-hub==0.25.0 27 | idna==3.10 28 | importlib_resources==6.4.5 29 | inflate64==1.0.0 30 | iniconfig==2.0.0 31 | isort==5.8.0 32 | Jinja2==3.1.4 33 | jsonlines==4.0.0 34 | jsonschema==4.23.0 35 | jsonschema-specifications==2023.12.1 36 | MarkupSafe==2.1.5 37 | mccabe==0.7.0 38 | multidict==6.1.0 39 | multiprocess==0.70.16 40 | multivolumefile==0.2.3 41 | mypy-extensions==1.0.0 42 | narwhals==1.8.1 43 | numpy==1.24.4 44 | openai==0.28.0 45 | packaging==24.1 46 | pandas==2.0.3 47 | pathspec==0.12.1 48 | pillow==10.4.0 49 | pkgutil_resolve_name==1.3.10 50 | platformdirs==4.3.3 51 | plotly==5.24.1 52 | pluggy==1.5.0 53 | promptsource @ git+https://github.com/sbmaruf/promptsource@70dc08cf37b6483765382de7f75db5906ca0d742 54 | protobuf==5.28.1 55 | psutil==6.0.0 56 | py7zr==0.22.0 57 | pyarrow==17.0.0 58 | pybcj==1.0.2 59 | pycodestyle==2.12.1 60 | pycryptodomex==3.20.0 61 | pydeck==0.9.1 62 | pyflakes==3.2.0 63 | pyppmd==1.1.0 64 | pytest==8.3.3 65 | python-dateutil==2.9.0.post0 66 | pytz==2024.2 67 | PyYAML==6.0.2 68 | pyzstd==0.16.1 69 | referencing==0.35.1 70 | requests==2.32.3 71 | rpds-py==0.20.0 72 | six==1.16.0 73 | smmap==5.0.1 74 | streamlit==0.82.0 75 | tenacity==9.0.0 76 | texttable==1.7.0 77 | toml==0.10.2 78 | tomli==1.2.3 79 | tornado==6.4.1 80 | tqdm==4.66.5 81 | typing_extensions==4.12.2 82 | tzdata==2024.1 83 | tzlocal==5.2 84 | urllib3==2.2.3 85 | validators==0.34.0 86 | watchdog==4.0.2 87 | xxhash==3.5.0 88 | yarl==1.11.1 89 | zipp==3.20.2 90 | -------------------------------------------------------------------------------- /evaluation/apr/get_result.py: -------------------------------------------------------------------------------- 1 | import os 2 | from collections import defaultdict 3 | import tqdm 4 | import jsonlines 5 | from typing import List, Union 6 | import itertools 7 | import numpy as np 8 | 9 | LANG_CLUSTER_TO_LANG_COMPILER = { 10 | "C": "GNU C11", 11 | "C#": "Mono C#", 12 | "C++": "GNU C++17", 13 | "Go": "Go", 14 | "Java": "Java 17", 15 | "Javascript": "Node.js", 16 | "Kotlin": "Kotlin 1.4", 17 | "PHP": "PHP", 18 | "Python": "PyPy 3", 19 | "Ruby": "Ruby 3", 20 | "Rust": "Rust 2018", 21 | } 22 | 23 | path = f'{os.environ["DUMP_FOLDER"]}/oai/apr_n_sample_20/' 24 | output_path = os.path.join( 25 | path, "eval_apr_val_execeval" 26 | ) 27 | ks = range(1, 21) 28 | 29 | 30 | def estimate_pass_at_k( 31 | num_samples: Union[int, List[int], np.ndarray], 32 | num_correct: Union[List[int], np.ndarray], 33 | k: int, 34 | ) -> np.ndarray: 35 | """ 36 | Estimates pass@k of each problem and returns them in an array. 37 | """ 38 | 39 | def estimator(n: int, c: int, k: int): 40 | """ 41 | Calculates 1 - comb(n - c, k) / comb(n, k). 42 | """ 43 | if n - c < k: 44 | return 1.0 45 | return 1.0 - np.prod(1.0 - k / np.arange(n - c + 1, n + 1)) 46 | 47 | if isinstance(num_samples, int): 48 | num_samples_it = itertools.repeat(num_samples, len(num_correct)) 49 | else: 50 | assert len(num_samples) == len(num_correct) 51 | num_samples_it = iter(num_samples) 52 | 53 | return np.array( 54 | [estimator(int(n), int(c), k) for n, c in zip(num_samples_it, num_correct)] 55 | ) 56 | 57 | 58 | def get_execeval_out_file_name(compiler): 59 | return os.path.join(output_path, f"{compiler}.jsonl") 60 | 61 | 62 | # construct result as {[task_id]: [unit_test_results]} 63 | # task_id will be src_uid_lang 64 | 65 | pass_at_k = defaultdict(dict) 66 | 67 | for lang, compiler in tqdm.tqdm(LANG_CLUSTER_TO_LANG_COMPILER.items()): 68 | execeval_out_file = get_execeval_out_file_name(compiler) 69 | results = defaultdict(list) 70 | with jsonlines.open(execeval_out_file) as jrp: 71 | for sample in jrp: 72 | src_uid = sample["source_data"]["src_uid"] 73 | task_id = f"{src_uid}|||{lang}" 74 | for ut_res in sample["unit_test_results"]: 75 | if "error" in ut_res: 76 | continue 77 | results[task_id].append(ut_res) 78 | 79 | total, correct = [], [] 80 | for result in results.values(): 81 | passed = [ 82 | all(x["exec_outcome"] == "PASSED" for x in ut_res) for ut_res in result 83 | ] 84 | total.append(len(passed)) 85 | correct.append(sum(passed)) 86 | total = np.array(total) 87 | correct = np.array(correct) 88 | 89 | pass_at_k[lang] = { 90 | f"pass@{k}": estimate_pass_at_k(total, correct, k).mean() 91 | for k in ks 92 | if (total >= k).all() 93 | } 94 | 95 | 96 | langs = sorted(list(pass_at_k.keys())) 97 | for lang in langs: 98 | print(f" & {lang}", end="") 99 | print() 100 | avg = 0 101 | for lang in langs: 102 | print(f" & {round(pass_at_k[lang]['pass@5']*100, 2)}", end="") 103 | avg += pass_at_k[lang]["pass@5"] * 100 104 | avg /= len(langs) 105 | print(f" & {round(avg, 2)}") 106 | -------------------------------------------------------------------------------- /evaluation/program_synthesis/get_result.py: -------------------------------------------------------------------------------- 1 | import os 2 | from collections import defaultdict 3 | import tqdm 4 | import jsonlines 5 | from typing import List, Union 6 | import itertools 7 | import numpy as np 8 | 9 | LANG_CLUSTER_TO_LANG_COMPILER = { 10 | "C": "GNU C11", 11 | "C#": "Mono C#", 12 | "C++": "GNU C++17", 13 | "Go": "Go", 14 | "Java": "Java 17", 15 | "Javascript": "Node.js", 16 | "Kotlin": "Kotlin 1.4", 17 | "PHP": "PHP", 18 | "Python": "PyPy 3", 19 | "Ruby": "Ruby 3", 20 | "Rust": "Rust 2018", 21 | } 22 | 23 | path = f'{os.environ["DUMP_FOLDER"]}/oai/prog_synthesis_n_sample_20/' 24 | output_path = os.path.join( 25 | path, "reproduce_1" 26 | ) # "eval_program_synthesis_val_execeval_fixtemp_nsampling_20_stop_at_first_fail_true", 27 | ks = range(1, 21) 28 | 29 | 30 | def estimate_pass_at_k( 31 | num_samples: Union[int, List[int], np.ndarray], 32 | num_correct: Union[List[int], np.ndarray], 33 | k: int, 34 | ) -> np.ndarray: 35 | """ 36 | Estimates pass@k of each problem and returns them in an array. 37 | """ 38 | 39 | def estimator(n: int, c: int, k: int): 40 | """ 41 | Calculates 1 - comb(n - c, k) / comb(n, k). 42 | """ 43 | if n - c < k: 44 | return 1.0 45 | return 1.0 - np.prod(1.0 - k / np.arange(n - c + 1, n + 1)) 46 | 47 | if isinstance(num_samples, int): 48 | num_samples_it = itertools.repeat(num_samples, len(num_correct)) 49 | else: 50 | assert len(num_samples) == len(num_correct) 51 | num_samples_it = iter(num_samples) 52 | 53 | return np.array( 54 | [estimator(int(n), int(c), k) for n, c in zip(num_samples_it, num_correct)] 55 | ) 56 | 57 | 58 | def get_execeval_out_file_name(compiler): 59 | return os.path.join(output_path, f"{compiler}.jsonl") 60 | 61 | 62 | # construct result as {[task_id]: [unit_test_results]} 63 | # task_id will be src_uid_lang 64 | 65 | pass_at_k = defaultdict(dict) 66 | 67 | for lang, compiler in tqdm.tqdm(LANG_CLUSTER_TO_LANG_COMPILER.items()): 68 | execeval_out_file = get_execeval_out_file_name(compiler) 69 | results = defaultdict(list) 70 | with jsonlines.open(execeval_out_file) as jrp: 71 | for sample in jrp: 72 | src_uid = sample["source_data"]["src_uid"] 73 | task_id = f"{src_uid}|||{lang}" 74 | for ut_res in sample["unit_test_results"]: 75 | if "error" in ut_res: 76 | continue 77 | results[task_id].append(ut_res) 78 | 79 | total, correct = [], [] 80 | for result in results.values(): 81 | passed = [ 82 | all(x["exec_outcome"] == "PASSED" for x in ut_res) for ut_res in result 83 | ] 84 | total.append(len(passed)) 85 | correct.append(sum(passed)) 86 | total = np.array(total) 87 | correct = np.array(correct) 88 | 89 | pass_at_k[lang] = { 90 | f"pass@{k}": estimate_pass_at_k(total, correct, k).mean() 91 | for k in ks 92 | if (total >= k).all() 93 | } 94 | 95 | 96 | langs = sorted(list(pass_at_k.keys())) 97 | for lang in langs: 98 | print(f" & {lang}", end="") 99 | print() 100 | avg = 0 101 | for lang in langs: 102 | print(f" & {round(pass_at_k[lang]['pass@5']*100, 2)}", end="") 103 | avg += pass_at_k[lang]["pass@5"] * 100 104 | avg /= len(langs) 105 | print(f" & {round(avg, 2)}") 106 | -------------------------------------------------------------------------------- /evaluation/code_translation/get_result.py: -------------------------------------------------------------------------------- 1 | import os 2 | from collections import defaultdict 3 | import tqdm 4 | import jsonlines 5 | from typing import List, Union 6 | import itertools 7 | import numpy as np 8 | 9 | LANG_CLUSTER_TO_LANG_COMPILER = { 10 | "C": "GNU C11", 11 | "C#": "Mono C#", 12 | "C++": "GNU C++17", 13 | "Go": "Go", 14 | "Java": "Java 17", 15 | "Javascript": "Node.js", 16 | "Kotlin": "Kotlin 1.4", 17 | "PHP": "PHP", 18 | "Python": "PyPy 3", 19 | "Ruby": "Ruby 3", 20 | "Rust": "Rust 2018", 21 | } 22 | 23 | path = f'{os.environ["DUMP_FOLDER"]}/oai/code_translation_n_sample_20/' 24 | ks = range(1, 21) 25 | 26 | 27 | def estimate_pass_at_k( 28 | num_samples: Union[int, List[int], np.ndarray], 29 | num_correct: Union[List[int], np.ndarray], 30 | k: int, 31 | ) -> np.ndarray: 32 | """ 33 | Estimates pass@k of each problem and returns them in an array. 34 | """ 35 | 36 | def estimator(n: int, c: int, k: int): 37 | """ 38 | Calculates 1 - comb(n - c, k) / comb(n, k). 39 | """ 40 | if n - c < k: 41 | return 1.0 42 | return 1.0 - np.prod(1.0 - k / np.arange(n - c + 1, n + 1)) 43 | 44 | if isinstance(num_samples, int): 45 | num_samples_it = itertools.repeat(num_samples, len(num_correct)) 46 | else: 47 | assert len(num_samples) == len(num_correct) 48 | num_samples_it = iter(num_samples) 49 | 50 | return np.array( 51 | [estimator(int(n), int(c), k) for n, c in zip(num_samples_it, num_correct)] 52 | ) 53 | 54 | 55 | def get_execeval_out_file_name(split_name, compiler): 56 | return os.path.join(path, split_name, "eval_code_translation_compact_small_execeval", f"{compiler}.jsonl") 57 | 58 | 59 | # construct result as {[task_id]: [unit_test_results]} 60 | # task_id will be src_uid_lang 61 | 62 | pass_at_k = defaultdict(dict) 63 | for split_name in ("compact_small", "compact"): 64 | for lang, compiler in tqdm.tqdm(LANG_CLUSTER_TO_LANG_COMPILER.items()): 65 | execeval_out_file = get_execeval_out_file_name(split_name, compiler) 66 | results = defaultdict(list) 67 | with jsonlines.open(execeval_out_file) as jrp: 68 | for sample in jrp: 69 | src_uid = sample["source_data"]["src_uid"] 70 | task_id = f"{src_uid}|||{lang}" 71 | for ut_res in sample["unit_test_results"]: 72 | if "error" in ut_res: 73 | continue 74 | results[task_id].append(ut_res) 75 | 76 | total, correct = [], [] 77 | for result in results.values(): 78 | passed = [ 79 | all(x["exec_outcome"] == "PASSED" for x in ut_res) for ut_res in result 80 | ] 81 | total.append(len(passed)) 82 | correct.append(sum(passed)) 83 | total = np.array(total) 84 | correct = np.array(correct) 85 | 86 | pass_at_k[lang] = { 87 | f"pass@{k}": estimate_pass_at_k(total, correct, k).mean() 88 | for k in ks 89 | if (total >= k).all() 90 | } 91 | 92 | print("-"*10) 93 | print(split_name) 94 | print("-"*10) 95 | langs = sorted(list(pass_at_k.keys())) 96 | for lang in langs: 97 | print(f" & {lang}", end="") 98 | print() 99 | avg = 0 100 | for lang in langs: 101 | print(f" & {round(pass_at_k[lang]['pass@5']*100, 2)}", end="") 102 | avg += pass_at_k[lang]["pass@5"] * 100 103 | avg /= len(langs) 104 | print(f" & {round(avg, 2)}") 105 | -------------------------------------------------------------------------------- /evaluation/apr/gen_apr.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import tqdm 4 | import json 5 | import openai 6 | import argparse 7 | import datasets 8 | import concurrent 9 | import numpy as np 10 | from promptsource.templates import Template 11 | 12 | SHORT_LANG_MAP = { 13 | "GNU C++": "C++", 14 | "GNU C++17": "C++", 15 | "MS C++ 2017": "C++", 16 | "MS C++": "C++", 17 | "Java 8": "Java", 18 | "Java 6": "Java", 19 | "GNU C++11": "C++", 20 | "Java 11": "Java", 21 | "GNU C++14": "C++", 22 | "Mono C#": "C#", 23 | "GNU C": "C", 24 | "Python 3": "Python", 25 | "PyPy 3": "Python", 26 | "GNU C11": "C", 27 | "Go": "Go", 28 | "Rust": "Rust", 29 | "PyPy 2": "Python", 30 | "Python 2": "Python", 31 | "MS C#": "C#", 32 | "Kotlin": "Kotlin", 33 | "GNU C++0x": "C++", 34 | "Java 7": "Java", 35 | "Node.js": "Javascript", 36 | ".NET Core C#": "C#", 37 | "PHP": "PHP", 38 | "GNU C++17 Diagnostics": "C++", 39 | "Clang++17 Diagnostics": "C++", 40 | "JavaScript": "Javascript", 41 | "Ruby": "Ruby", 42 | "C# 10": "C#", 43 | "C# 8": "C#", 44 | "Clang++20 Diagnostics": "C++", 45 | "GNU C++17 (64)": "C++", 46 | "GNU C++20 (64)": "C++", 47 | "Java 17": "Java", 48 | "Kotlin 1.4": "Kotlin", 49 | "Kotlin 1.5": "Kotlin", 50 | "Kotlin 1.6": "Kotlin", 51 | "Kotlin 1.7": "Kotlin", 52 | "PyPy 3-64": "Python", 53 | "Python 3 + libs": "Python", 54 | "Ruby 3": "Ruby", 55 | "Rust 2021": "Rust", 56 | } 57 | 58 | LANGS = sorted(set([v for k, v in SHORT_LANG_MAP.items()])) 59 | 60 | 61 | openai.api_key = os.environ["OPENAI_API_KEY"] 62 | 63 | 64 | def gen(prompt, temperature, nsample): 65 | cnt = 0 66 | while True: 67 | if cnt == 999: 68 | return None 69 | try: 70 | c = openai.ChatCompletion.create( 71 | model="gpt-3.5-turbo", 72 | messages=[ 73 | {"role": "user", "content": f"{prompt}"}, 74 | ], 75 | temperature=temperature, 76 | top_p=1, 77 | n=nsample, 78 | frequency_penalty=0.0, 79 | presence_penalty=0.0, 80 | ) 81 | break 82 | except Exception as e: 83 | cnt += 1 84 | time.sleep(5) 85 | print(f"{e}") 86 | c["prompt"] = prompt 87 | return c 88 | 89 | 90 | xcodeeval_prompt_template = { 91 | "apr": [ 92 | "Fix a buggy program written in {{lang_cluster}} language to solve the following programming problem:\nDescription: {{prob_desc_description}}\nInput Specification: {{prob_desc_input_spec}}\nOutput Specification: {{prob_desc_output_spec}}\n{% for input, output in zip(prob_desc_sample_inputs, prob_desc_sample_outputs) %}\nSample Input:\n{{input}}\nSample Output:\n{{output}}\n{% endfor %}\nNotes: {{prob_desc_notes}}\nTake input from {{prob_desc_input_from}} and output to {{prob_desc_output_to}}\n\nHere is the code with a bug of {{bug_exec_outcome}}:\n\n{{bug_source_code}}\n\nProvide the fixed {{lang_cluster}} code without any description or extra tokens.\n\nFixed source code:\n ||END-of-SRC|| " 93 | ] 94 | } 95 | 96 | 97 | def process_prompt(dt, temperature, template, nsample, output_dir, index, dry_run=0): 98 | language = dt["lang_cluster"] 99 | file_path = os.path.join(output_dir, f"{index}_{temperature}_{language}.json") 100 | if not os.path.exists(file_path): 101 | dt["prob_desc_sample_inputs"] = json.loads(dt["prob_desc_sample_inputs"]) 102 | dt["prob_desc_sample_outputs"] = json.loads(dt["prob_desc_sample_outputs"]) 103 | lm_io = template.apply(dt) 104 | assert len(lm_io) == 2, f"{json.dumps(lm_io, indent=4)}" 105 | if dry_run: 106 | open(file_path, "w").write(f"{json.dumps(lm_io[0], indent=4)}") 107 | else: 108 | out = gen(lm_io[0], temperature, nsample) 109 | export_data = {"oai_response": out, "source_data": dt} 110 | open(file_path, "w").write(f"{json.dumps(export_data, indent=4)}") 111 | 112 | 113 | def main(): 114 | parser = argparse.ArgumentParser() 115 | parser.add_argument( 116 | "--output-dir", 117 | default="dumped/oai/apr_n_sample_20", 118 | help="Output Folder to save the API request.", 119 | ) 120 | parser.add_argument( 121 | "--num-proc", 122 | default=1, 123 | help="Number of parallel API request.", 124 | ) 125 | parser.add_argument( 126 | "--dry-run", 127 | default=0, 128 | help="Number of parallel API request.", 129 | ) 130 | parser.add_argument( 131 | "--nsample", 132 | default=20, 133 | type=int, 134 | help="Number of parallel API request.", 135 | ) 136 | args = parser.parse_args() 137 | if not os.path.exists(args.output_dir): 138 | os.makedirs(args.output_dir, exist_ok=True) 139 | templates = [ 140 | Template(f"apr_{idx}", template, "xCodeEval", delimeter="||END-of-SRC||") 141 | for idx, template in enumerate(xcodeeval_prompt_template["apr"]) 142 | ] 143 | template = templates[0] 144 | 145 | apr_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "apr", num_proc=16, trust_remote_code=True)["compact"] 146 | # temperature_list = np.linspace(0, 2, args.nsample) 147 | temperature_list = [0.3157894736842105] 148 | with concurrent.futures.ProcessPoolExecutor( 149 | max_workers=int(args.num_proc) 150 | ) as executor: 151 | futures = [] 152 | for idx, dt in tqdm.tqdm( 153 | enumerate(apr_dataset), 154 | total=len(apr_dataset), 155 | desc=f"Preparing samples lang", 156 | ): 157 | for temperature in temperature_list: 158 | future = executor.submit( 159 | process_prompt, 160 | dt, 161 | temperature, 162 | template, 163 | args.nsample, 164 | args.output_dir, 165 | idx, 166 | args.dry_run, 167 | ) 168 | futures.append(future) 169 | 170 | for future in tqdm.tqdm( 171 | concurrent.futures.as_completed(futures), 172 | total=len(futures), 173 | desc=f"Calling OpenAI API", 174 | ): 175 | try: 176 | future.result() 177 | except Exception as e: 178 | print(f"Error occurred: {e}") 179 | 180 | 181 | if __name__ == "__main__": 182 | main() 183 | -------------------------------------------------------------------------------- /evaluation/program_synthesis/gen_program_synthesis.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import tqdm 4 | import json 5 | import openai 6 | import argparse 7 | import datasets 8 | import concurrent 9 | import numpy as np 10 | from promptsource.templates import Template 11 | 12 | SHORT_LANG_MAP = { 13 | "GNU C++": "C++", 14 | "GNU C++17": "C++", 15 | "MS C++ 2017": "C++", 16 | "MS C++": "C++", 17 | "Java 8": "Java", 18 | "Java 6": "Java", 19 | "GNU C++11": "C++", 20 | "Java 11": "Java", 21 | "GNU C++14": "C++", 22 | "Mono C#": "C#", 23 | "GNU C": "C", 24 | "Python 3": "Python", 25 | "PyPy 3": "Python", 26 | "GNU C11": "C", 27 | "Go": "Go", 28 | "Rust": "Rust", 29 | "PyPy 2": "Python", 30 | "Python 2": "Python", 31 | "MS C#": "C#", 32 | "Kotlin": "Kotlin", 33 | "GNU C++0x": "C++", 34 | "Java 7": "Java", 35 | "Node.js": "Javascript", 36 | ".NET Core C#": "C#", 37 | "PHP": "PHP", 38 | "GNU C++17 Diagnostics": "C++", 39 | "Clang++17 Diagnostics": "C++", 40 | "JavaScript": "Javascript", 41 | "Ruby": "Ruby", 42 | "C# 10": "C#", 43 | "C# 8": "C#", 44 | "Clang++20 Diagnostics": "C++", 45 | "GNU C++17 (64)": "C++", 46 | "GNU C++20 (64)": "C++", 47 | "Java 17": "Java", 48 | "Kotlin 1.4": "Kotlin", 49 | "Kotlin 1.5": "Kotlin", 50 | "Kotlin 1.6": "Kotlin", 51 | "Kotlin 1.7": "Kotlin", 52 | "PyPy 3-64": "Python", 53 | "Python 3 + libs": "Python", 54 | "Ruby 3": "Ruby", 55 | "Rust 2021": "Rust", 56 | } 57 | 58 | LANGS = sorted(set([v for k, v in SHORT_LANG_MAP.items()])) 59 | 60 | 61 | openai.api_key = os.environ["OPENAI_API_KEY"] 62 | 63 | 64 | def gen(prompt, temperature, nsample): 65 | cnt = 0 66 | while True: 67 | if cnt == 999: 68 | return None 69 | try: 70 | c = openai.ChatCompletion.create( 71 | model="gpt-3.5-turbo", 72 | messages=[ 73 | {"role": "user", "content": f"{prompt}"}, 74 | ], 75 | temperature=temperature, 76 | top_p=1, 77 | n=nsample, 78 | frequency_penalty=0.0, 79 | presence_penalty=0.0, 80 | ) 81 | break 82 | except Exception as e: 83 | cnt += 1 84 | time.sleep(5) 85 | print(f"{e}") 86 | c["prompt"] = prompt 87 | return c 88 | 89 | 90 | xcodeeval_prompt_template = { 91 | "program_synthesis": [ 92 | "Write a program in {{lang_cluster}} to solve this programming problem:\nDescription: {{prob_desc_description}}\nInput Specification: {{prob_desc_input_spec}}\nOutput Specification: {{prob_desc_output_spec}}\n{% for input, output in zip(prob_desc_sample_inputs, prob_desc_sample_outputs) %}\nSample Input:\n{{input}}\nSample Output:\n{{output}}\n{% endfor %}\nNotes: {{prob_desc_notes}}\nTake input from {{prob_desc_input_from}} and output to {{prob_desc_output_to}}\nProvide the {{lang_cluster}} code without any extra description or tokens. Target code: ||END-of-SRC|| ", 93 | ] 94 | } 95 | 96 | 97 | def process_prompt( 98 | dt, temperature, nsample, language, template, output_dir, index, dry_run=0 99 | ): 100 | file_path = os.path.join(output_dir, f"{index}_{temperature}_{language}.json") 101 | if not os.path.exists(file_path): 102 | dt["lang_cluster"] = language 103 | dt["prob_desc_sample_inputs"] = json.loads(dt["prob_desc_sample_inputs"]) 104 | dt["prob_desc_sample_outputs"] = json.loads(dt["prob_desc_sample_outputs"]) 105 | lm_io = template.apply(dt) 106 | assert len(lm_io) == 2, f"{json.dumps(lm_io, indent=4)}" 107 | if dry_run: 108 | open(file_path, "w").write(f"{json.dumps(lm_io[0], indent=4)}") 109 | else: 110 | out = gen(lm_io[0], temperature, nsample) 111 | export_data = {"oai_response": out, "source_data": dt} 112 | open(file_path, "w").write(f"{json.dumps(export_data, indent=4)}") 113 | 114 | 115 | def main(): 116 | parser = argparse.ArgumentParser() 117 | parser.add_argument( 118 | "--output-dir", 119 | default="dumped/oai/program_synthesis_n_sample_20", 120 | help="Output Folder to save the API request.", 121 | ) 122 | parser.add_argument( 123 | "--num-proc", 124 | default=1, 125 | help="Number of parallel API request.", 126 | ) 127 | parser.add_argument( 128 | "--dry-run", 129 | default=0, 130 | help="Number of parallel API request.", 131 | ) 132 | parser.add_argument( 133 | "--nsample", 134 | default=20, 135 | type=int, 136 | help="Number of parallel API request.", 137 | ) 138 | args = parser.parse_args() 139 | if not os.path.exists(args.output_dir): 140 | os.makedirs(args.output_dir, exist_ok=True) 141 | templates = [ 142 | Template(f"prog_syn_{idx}", template, "xCodeEval", delimeter="||END-of-SRC||") 143 | for idx, template in enumerate(xcodeeval_prompt_template["program_synthesis"]) 144 | ] 145 | template = templates[0] 146 | 147 | prog_synthesis_dataset = datasets.load_dataset( 148 | "NTU-NLP-sg/xCodeEval", "program_synthesis", num_proc=16, trust_remote_code=True 149 | )["compact"] 150 | # temperature_list = np.linspace(0, 2, args.nsample) 151 | temperature_list = [0.3157894736842105] 152 | for language in LANGS: 153 | with concurrent.futures.ProcessPoolExecutor( 154 | max_workers=int(args.num_proc) 155 | ) as executor: 156 | futures = [] 157 | for idx, dt in tqdm.tqdm( 158 | enumerate(prog_synthesis_dataset), 159 | total=len(prog_synthesis_dataset), 160 | desc=f"Preparing samples {language} lang", 161 | ): 162 | for temperature in temperature_list: 163 | future = executor.submit( 164 | process_prompt, 165 | dt, 166 | temperature, 167 | args.nsample, 168 | language, 169 | template, 170 | args.output_dir, 171 | idx, 172 | args.dry_run, 173 | ) 174 | futures.append(future) 175 | 176 | for future in tqdm.tqdm( 177 | concurrent.futures.as_completed(futures), 178 | total=len(futures), 179 | desc=f"Calling OpenAI API for {language} lang", 180 | ): 181 | try: 182 | future.result() 183 | except Exception as e: 184 | print(f"Error occurred: {e}") 185 | 186 | 187 | if __name__ == "__main__": 188 | main() 189 | -------------------------------------------------------------------------------- /evaluation/code_translation/gen_code_translation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import tqdm 4 | import json 5 | import openai 6 | import argparse 7 | import datasets 8 | import concurrent 9 | import numpy as np 10 | from promptsource.templates import Template 11 | 12 | SHORT_LANG_MAP = { 13 | "GNU C++": "C++", 14 | "GNU C++17": "C++", 15 | "MS C++ 2017": "C++", 16 | "MS C++": "C++", 17 | "Java 8": "Java", 18 | "Java 6": "Java", 19 | "GNU C++11": "C++", 20 | "Java 11": "Java", 21 | "GNU C++14": "C++", 22 | "Mono C#": "C#", 23 | "GNU C": "C", 24 | "Python 3": "Python", 25 | "PyPy 3": "Python", 26 | "GNU C11": "C", 27 | "Go": "Go", 28 | "Rust": "Rust", 29 | "PyPy 2": "Python", 30 | "Python 2": "Python", 31 | "MS C#": "C#", 32 | "Kotlin": "Kotlin", 33 | "GNU C++0x": "C++", 34 | "Java 7": "Java", 35 | "Node.js": "Javascript", 36 | ".NET Core C#": "C#", 37 | "PHP": "PHP", 38 | "GNU C++17 Diagnostics": "C++", 39 | "Clang++17 Diagnostics": "C++", 40 | "JavaScript": "Javascript", 41 | "Ruby": "Ruby", 42 | "C# 10": "C#", 43 | "C# 8": "C#", 44 | "Clang++20 Diagnostics": "C++", 45 | "GNU C++17 (64)": "C++", 46 | "GNU C++20 (64)": "C++", 47 | "Java 17": "Java", 48 | "Kotlin 1.4": "Kotlin", 49 | "Kotlin 1.5": "Kotlin", 50 | "Kotlin 1.6": "Kotlin", 51 | "Kotlin 1.7": "Kotlin", 52 | "PyPy 3-64": "Python", 53 | "Python 3 + libs": "Python", 54 | "Ruby 3": "Ruby", 55 | "Rust 2021": "Rust", 56 | } 57 | 58 | LANGS = sorted(set([v for k, v in SHORT_LANG_MAP.items()])) 59 | 60 | 61 | openai.api_key = os.environ["OPENAI_API_KEY"] 62 | 63 | 64 | def gen(prompt, temperature, nsample): 65 | cnt = 0 66 | while True: 67 | if cnt == 999: 68 | return None 69 | try: 70 | c = openai.ChatCompletion.create( 71 | model="gpt-3.5-turbo", 72 | messages=[ 73 | {"role": "user", "content": f"{prompt}"}, 74 | ], 75 | temperature=temperature, 76 | top_p=1, 77 | n=nsample, 78 | frequency_penalty=0.0, 79 | presence_penalty=0.0, 80 | ) 81 | break 82 | except Exception as e: 83 | cnt += 1 84 | time.sleep(5) 85 | print(f"{e}") 86 | c["prompt"] = prompt 87 | return c 88 | 89 | 90 | xcodeeval_prompt_template = { 91 | "code_translation": [ 92 | "Here is code in {{source_lang}} programming lanaguge. Translate the following code from {{source_lang}} to {{target_lang}} programming lanaguge. Do not output any extra description or tokens other than the translated code. \n\n{{source_code}}||END-of-SRC|| " 93 | ] 94 | } 95 | 96 | 97 | def process_prompt( 98 | dt, temperature, template, language, nsample, output_dir, index, dry_run=0 99 | ): 100 | dt["source_lang"] = dt["lang"] 101 | dt["target_lang"] = language 102 | language = f"{dt['source_lang']}--{dt['target_lang']}" 103 | file_path = os.path.join(output_dir, f"{index}_{temperature}_{language}.json") 104 | if not os.path.exists(file_path): 105 | dt["prob_desc_sample_inputs"] = json.loads(dt["prob_desc_sample_inputs"]) 106 | dt["prob_desc_sample_outputs"] = json.loads(dt["prob_desc_sample_outputs"]) 107 | lm_io = template.apply(dt) 108 | assert len(lm_io) == 2, f"{json.dumps(lm_io, indent=4)}" 109 | if dry_run: 110 | open(file_path, "w").write(f"{json.dumps(lm_io[0], indent=4)}") 111 | else: 112 | out = gen(lm_io[0], temperature, nsample) 113 | export_data = {"oai_response": out, "source_data": dt} 114 | open(file_path, "w").write(f"{json.dumps(export_data, indent=4)}") 115 | 116 | 117 | def main(): 118 | parser = argparse.ArgumentParser() 119 | parser.add_argument( 120 | "--output-dir", 121 | default="dumped/oai/code_translation_n_sample_20", 122 | help="Output Folder to save the API request.", 123 | ) 124 | parser.add_argument( 125 | "--num-proc", 126 | default=1, 127 | help="Number of parallel API request.", 128 | ) 129 | parser.add_argument( 130 | "--dry-run", 131 | default=0, 132 | help="Number of parallel API request.", 133 | ) 134 | parser.add_argument( 135 | "--nsample", 136 | default=20, 137 | type=int, 138 | help="Number of parallel API request.", 139 | ) 140 | args = parser.parse_args() 141 | if not os.path.exists(args.output_dir): 142 | os.makedirs(args.output_dir, exist_ok=True) 143 | templates = [ 144 | Template( 145 | f"code_translation_{idx}", template, "xCodeEval", delimeter="||END-of-SRC||" 146 | ) 147 | for idx, template in enumerate(xcodeeval_prompt_template["code_translation"]) 148 | ] 149 | template = templates[0] 150 | 151 | code_translation_dataset_small = datasets.load_dataset( 152 | "NTU-NLP-sg/xCodeEval", "code_translation", num_proc=16, trust_remote_code=True 153 | )[ 154 | "compact_small" 155 | ] 156 | code_translation_dataset = datasets.load_dataset( 157 | "NTU-NLP-sg/xCodeEval", "code_translation", num_proc=16 158 | )[ 159 | "compact" 160 | ] 161 | temperature_list = [0.3157894736842105] 162 | 163 | out_dir = args.output_dir + "/compact_small" 164 | if not os.path.exists(out_dir): 165 | os.makedirs(out_dir, exist_ok=True) 166 | with concurrent.futures.ProcessPoolExecutor( 167 | max_workers=int(args.num_proc) 168 | ) as executor: 169 | futures = [] 170 | for idx, dt in tqdm.tqdm( 171 | enumerate(code_translation_dataset_small), 172 | total=len(code_translation_dataset_small), 173 | desc=f"Preparing samples", 174 | ): 175 | for language in LANGS: 176 | if SHORT_LANG_MAP[dt["lang"]] == language: 177 | continue 178 | for temperature in temperature_list: 179 | future = executor.submit( 180 | process_prompt, 181 | dt, 182 | temperature, 183 | template, 184 | language, 185 | args.nsample, 186 | out_dir, 187 | idx, 188 | args.dry_run, 189 | ) 190 | futures.append(future) 191 | 192 | for future in tqdm.tqdm( 193 | concurrent.futures.as_completed(futures), 194 | total=len(futures), 195 | desc=f"Calling OpenAI API", 196 | ): 197 | try: 198 | future.result() 199 | except Exception as e: 200 | print(f"Error occurred: {e}") 201 | 202 | out_dir = args.output_dir + "/compact" 203 | if not os.path.exists(out_dir): 204 | os.makedirs(out_dir, exist_ok=True) 205 | with concurrent.futures.ProcessPoolExecutor( 206 | max_workers=int(args.num_proc) 207 | ) as executor: 208 | futures = [] 209 | for idx, dt in tqdm.tqdm( 210 | enumerate(code_translation_dataset), 211 | total=len(code_translation_dataset), 212 | desc=f"Preparing samples", 213 | ): 214 | for language in ["Python"]: 215 | if SHORT_LANG_MAP[dt["lang"]] == language: 216 | continue 217 | for temperature in temperature_list: 218 | future = executor.submit( 219 | process_prompt, 220 | dt, 221 | temperature, 222 | template, 223 | language, 224 | args.nsample, 225 | out_dir, 226 | idx, 227 | args.dry_run, 228 | ) 229 | futures.append(future) 230 | 231 | for future in tqdm.tqdm( 232 | concurrent.futures.as_completed(futures), 233 | total=len(futures), 234 | desc=f"Calling OpenAI API", 235 | ): 236 | try: 237 | future.result() 238 | except Exception as e: 239 | print(f"Error occurred: {e}") 240 | 241 | 242 | if __name__ == "__main__": 243 | main() 244 | -------------------------------------------------------------------------------- /evaluation/apr/eval_apr.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import tqdm 4 | import jsonlines 5 | import datasets 6 | import concurrent.futures 7 | from dataclasses import dataclass, field 8 | import itertools 9 | 10 | import requests 11 | from typing import List, Optional, Union, Tuple 12 | from enum import Enum 13 | from multiprocessing import Pool 14 | 15 | 16 | class ExecOutcome(Enum): 17 | PASSED = "PASSED" # code executes and output matches expected output 18 | WRONG_ANSWER = ( 19 | "WRONG_ANSWER" # code executes and output does NOT matches expected output 20 | ) 21 | TIME_LIMIT_EXCEEDED = "TIME_LIMIT_EXCEEDED" # code executes and didn't exit in time, output is ignored in this case 22 | RUNTIME_ERROR = "RUNTIME_ERROR" # code failed to execute (crashed) 23 | COMPILATION_ERROR = "COMPILATION_ERROR" # code failed to compile 24 | MEMORY_LIMIT_EXCEEDED = ( 25 | "MEMORY_LIMIT_EXCEEDED" # code exceeded memory limit during execution 26 | ) 27 | 28 | 29 | @dataclass 30 | class ExtendedUnittest: 31 | input: str 32 | output: List[str] = field(default_factory=list) 33 | result: Optional[str] = None 34 | exec_outcome: Optional[ExecOutcome] = None 35 | 36 | def json(self): 37 | _json = self.__dict__ 38 | if self.exec_outcome is not None: 39 | _json["exec_outcome"] = self.exec_outcome.name 40 | 41 | return _json 42 | 43 | @classmethod 44 | def from_json(cls, _json): 45 | return cls( 46 | input=_json.get("input", ""), 47 | output=_json.get("output", list()), 48 | result=_json.get("result", None), 49 | exec_outcome=_json.get("exec_outcome", None), 50 | ) 51 | 52 | 53 | class EmptyValueError(Exception): 54 | def __init__(self, *args, **kwargs): 55 | super().__init__(*args, **kwargs) 56 | 57 | 58 | class EmptyUnittestError(EmptyValueError): 59 | pass 60 | 61 | 62 | class EmptyLanguageError(EmptyValueError): 63 | pass 64 | 65 | 66 | class EmptySourceCodeError(EmptyValueError): 67 | pass 68 | 69 | 70 | class APICommunication: 71 | _session: requests.Session 72 | 73 | def __init__(self, server_url: str = "http://localhost:5000"): 74 | self._session = requests.Session() 75 | self.execute_code_url = f"{server_url}/api/execute_code" 76 | self.get_runtimes_url = f"{server_url}/api/all_runtimes" 77 | 78 | def __enter__(self): 79 | return self 80 | 81 | def __exit__(self, *args): 82 | self._session.close() 83 | 84 | def get_runtimes(self): 85 | return self._session.get(self.get_runtimes_url).json() 86 | 87 | def execute_code( 88 | self, 89 | language: str, 90 | source_code: str, 91 | unittests: List[dict], 92 | limits: Optional[dict] = None, 93 | block_network: bool = True, 94 | stop_on_first_fail: bool = True, 95 | use_sanitizer: bool = False, 96 | compiler_program_name: Optional[str] = None, 97 | compiler_flags: Optional[str] = None, 98 | interpreter_cmd: Optional[str] = None, 99 | interpreter_flags: Optional[str] = None, 100 | sample_id: Optional[int] = None, 101 | task_id: Union[str, int, None] = None, 102 | ) -> Tuple[List[ExtendedUnittest], Optional[int], Union[str, int, None]]: 103 | if language is None: 104 | raise EmptyLanguageError 105 | 106 | if source_code is None: 107 | raise EmptySourceCodeError 108 | 109 | if unittests is None or len(unittests) == 0: 110 | raise EmptyUnittestError 111 | 112 | request_body = dict( 113 | language=language, 114 | source_code=source_code, 115 | unittests=unittests, 116 | limits=limits if isinstance(limits, dict) else None, 117 | compile_cmd=compiler_program_name, 118 | compile_flags=compiler_flags, 119 | execute_cmd=interpreter_cmd, 120 | execute_flags=interpreter_flags, 121 | block_network=block_network, 122 | stop_on_first_fail=stop_on_first_fail, 123 | use_sanitizer=use_sanitizer, 124 | ) 125 | json_response = self._session.post( 126 | self.execute_code_url, 127 | json=request_body, 128 | headers={"Content-Type": "application/json"}, 129 | ).json() 130 | 131 | if "data" not in json_response: 132 | return json_response, sample_id, task_id 133 | 134 | return ( 135 | json_response["data"], 136 | sample_id, 137 | task_id, 138 | ) 139 | 140 | 141 | def get_idx(file_name): 142 | return int(file_name.split(".json")[0].split("_")[0]) 143 | 144 | 145 | def sanitize_code(code): 146 | FLAG = True 147 | while FLAG == True: 148 | FLAG = False 149 | if code.startswith("```"): 150 | FLAG = True 151 | code = code.replace("```", "", 1) 152 | last_index = code.rfind("```") 153 | if last_index != -1: 154 | FLAG = True 155 | code = code[:last_index] + "" + code[last_index + len("```") :] 156 | if code.startswith("cpp"): 157 | FLAG = True 158 | code = code.replace("cpp", "", 1) 159 | return code 160 | 161 | 162 | def fix_uts(uts): 163 | uts_fx = [] 164 | for ut in uts: 165 | uts_fx.append( 166 | { 167 | "input": ut["input"], 168 | "output": ut["output"], 169 | } 170 | ) 171 | return uts_fx 172 | 173 | 174 | def process(args): 175 | sample, execeval = args 176 | src_uid = sample["source_data"]["src_uid"] 177 | unit_tests = json.loads(sample["source_data"]["hidden_unit_tests"]) 178 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[sample["source_data"]["lang_cluster"]] 179 | sample["unit_test_results"] = [] 180 | for choice in sample["oai_response"]["choices"]: 181 | code = choice["message"]["content"] 182 | code = sanitize_code(code) 183 | unit_test_results, _, _ = execeval.execute_code( 184 | compiler, 185 | code, 186 | fix_uts(unit_tests), 187 | task_id=src_uid, 188 | # stop_on_first_fail=False 189 | ) 190 | # print(unit_test_results) 191 | # print(file, code, [e['exec_outcome'] for e in unit_test_results]) 192 | sample["unit_test_results"].append(unit_test_results) 193 | return sample 194 | 195 | 196 | LANG_CLUSTER_TO_LANG_COMPILER = { 197 | "C": "GNU C11", 198 | "C#": "Mono C#", 199 | "C++": "GNU C++17", 200 | "Go": "Go", 201 | "Java": "Java 17", 202 | "Javascript": "Node.js", 203 | "Kotlin": "Kotlin 1.4", 204 | "PHP": "PHP", 205 | "Python": "PyPy 3", 206 | "Ruby": "Ruby 3", 207 | "Rust": "Rust 2018", 208 | } 209 | 210 | 211 | def main(): 212 | path = f'{os.environ["DUMP_FOLDER"]}/oai/apr_n_sample_20/' 213 | for k, debug_compiler in LANG_CLUSTER_TO_LANG_COMPILER.items(): 214 | output_path = os.path.join(path, "eval_apr_val_execeval") 215 | os.makedirs(output_path, exist_ok=True) 216 | output_file = os.path.join(output_path, f"{debug_compiler}.jsonl") 217 | with jsonlines.open(output_file, "w") as jwp: 218 | with concurrent.futures.ThreadPoolExecutor( 219 | max_workers=129 220 | ) as thread_executor: 221 | files = sorted(os.listdir(path)) 222 | with APICommunication(server_url="http://localhost:5000") as execeval: 223 | all_samples = [] 224 | for file in files: 225 | full_path = os.path.join(path, file) 226 | if os.path.isdir(full_path): 227 | continue 228 | sample = json.load(open(full_path)) 229 | if ( 230 | sample["source_data"]["lang_cluster"] 231 | not in LANG_CLUSTER_TO_LANG_COMPILER 232 | ): 233 | continue 234 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[ 235 | sample["source_data"]["lang_cluster"] 236 | ] 237 | if compiler != debug_compiler: 238 | continue 239 | all_samples.append(sample) 240 | future_to_val_results = { 241 | thread_executor.submit(process, args) 242 | for args in itertools.product(all_samples, [execeval]) 243 | } 244 | 245 | for _out in tqdm.tqdm( 246 | concurrent.futures.as_completed(future_to_val_results), 247 | total=len(all_samples), 248 | desc=f"{debug_compiler}", 249 | ): 250 | try: 251 | __out = _out.result() 252 | jwp.write(__out) 253 | except Exception as emsg: 254 | print("Exception msg: {}".format(emsg)) 255 | pass 256 | 257 | 258 | if __name__ == "__main__": 259 | main() 260 | -------------------------------------------------------------------------------- /tag_classification.md: -------------------------------------------------------------------------------- 1 | # Tag Classification Task 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> tag_classification_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "tag_classification") 8 | >>> print(tag_classification_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['src_uid', 'file_name', 'lang_cluster', 'lang', 'source_code', 'prob_desc_output_spec', 'prob_desc_sample_outputs', 'prob_desc_output_to', 'prob_desc_input_spec', 'prob_desc_memory_limit', 'prob_desc_description', 'prob_desc_time_limit', 'prob_desc_created_at', 'prob_desc_input_from', 'code_uid', 'prob_desc_sample_inputs', 'tags', 'prob_desc_notes', 'difficulty'], 13 | num_rows: 5494008 14 | }) 15 | validation: Dataset({ 16 | features: ['src_uid', 'file_name', 'lang_cluster', 'lang', 'source_code', 'prob_desc_output_spec', 'prob_desc_sample_outputs', 'prob_desc_output_to', 'prob_desc_input_spec', 'prob_desc_memory_limit', 'prob_desc_description', 'prob_desc_time_limit', 'prob_desc_created_at', 'prob_desc_input_from', 'code_uid', 'prob_desc_sample_inputs', 'tags', 'prob_desc_notes', 'difficulty'], 17 | num_rows: 18696 18 | }) 19 | test: Dataset({ 20 | features: ['src_uid', 'file_name', 'lang_cluster', 'lang', 'source_code', 'prob_desc_output_spec', 'prob_desc_sample_outputs', 'prob_desc_output_to', 'prob_desc_input_spec', 'prob_desc_memory_limit', 'prob_desc_description', 'prob_desc_time_limit', 'prob_desc_created_at', 'prob_desc_input_from', 'code_uid', 'prob_desc_sample_inputs', 'tags', 'prob_desc_notes', 'difficulty'], 21 | num_rows: 74733 22 | }) 23 | }) 24 | ``` 25 | 26 | 27 | To download the tag classification data, 28 | 29 | ``` 30 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 31 | cd xCodeEval 32 | git lfs pull --include "tag_classification/*" 33 | git lfs pull --include "problem_descriptions.jsonl" 34 | ``` 35 | 36 | ## train/validation/test split 37 | ``` 38 | { 39 | "lang": "GNU C++17", 40 | "source_code": "#include \n#include \nusing namespace std::chrono; \n\nusing namespace std;\n#define f0r(a, b) for (a = 0; a < b; a++)\n#define f1r(a, b, c) for (a = b; a < c; a++)\n#define ms(arr, v) memset(arr, v, sizeof(arr))\n#define pb push_back\n#define io ios_base::sync_with_stdio(false); cin.tie(NULL); cout.tie(NULL)\n#define mp make_pair\n#define f first\n#define s second\ntypedef long long ll;\ntypedef double ld;\ntypedef pair pii;\ntypedef pair pll;\nll i, j;\n\nll n, q, Q, T, m, k, r, x, y, z, g;\nstring a, b;\nunsigned char ans[200001];\n\nint main() {\n io;\n cin >> n >> a >> b;\n ms(ans, 0);\n for (int i = n-1; i >= 0; i--) {\n // f0r(j, n) cout << (char)(ans[j] < 10 ? ans[j] + '0' : ans[j]);\n cout << endl;\n if (i == n-1) {\n if (a[i] > b[i]) {\n int diff = a[i] - b[i];\n int x = b[i] + (diff>>1);\n while (x > 'z') {\n x -= 26;\n ans[i-1]++;\n }\n ans[i] = x;\n } else {\n int diff = b[i] - a[i];\n int x = a[i] + (diff>>1);\n ans[i] = x;\n }\n continue;\n }\n if (a[i] < b[i]) {\n int diff = b[i] - a[i];\n ans[i] += a[i] + (diff>>1);\n if (diff % 2 != 0) {\n ans[i+1] += 13;\n while (ans[i+1] > 'z') {\n ans[i+1] -= 26;\n ans[i]++;\n }\n }\n while (ans[i] > 'z') {\n ans[i] -= 26;\n ans[i-1]++;\n }\n } else if (a[i] == b[i]) {\n ans[i] = a[i];\n } else {\n int diff = a[i] - b[i];\n ans[i] += b[i] + (diff>>1);\n if (diff % 2 != 0) {\n ans[i+1] += 13;\n while (ans[i+1] > 'z') {\n ans[i+1] -= 26;\n ans[i]++;\n }\n }\n while (ans[i] > 'z') {\n ans[i] -= 26;\n ans[i-1]++;\n }\n }\n }\n f0r(i, n) cout << ans[i];\n cout << endl;\n // cout << \"FLUSH PLEASE\" << endl;\n}", 41 | "tags": [ 42 | "bitmasks", 43 | "strings", 44 | "number theory", 45 | "math" 46 | ], 47 | "lang_cluster": "C++", 48 | "code_uid": "d215a290b8ae7055870ec4fb61218286", 49 | "src_uid": "5f4009d4065f5ad39e662095f8f5c068", 50 | "difficulty": 1900 51 | } 52 | ``` 53 | 54 | ## Key Definitions 55 | 56 | 57 | 1. `lang`: Runtime/Compiler version of the `source_code`. 58 | 2. `source_code`: A program. 59 | 3. `tags`: List of potential algorithmic techniques required to write the program. 60 | 4. `lang_cluster`: A generic programming language name the value of `lang` belongs to. 61 | 5. `code_uid`: A unique ID for the sample. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `code_uid`. 62 | 6. `src_uid`: A specific identifier that shows which problem the code is associated with. This identifier is **important** for the training of the model. The problem referred to by the `src_uid` provides a natural description of the problem that the code successfully solved. Refer to [Structure of `problem_descriptions.jsonl`](./README.md#structure-of-problem_descriptionsjsonl) 63 | 7. `difficulty`: Difficulty rating of the problem indicated by `src_uid`. The higher the harder. 64 | 65 | 66 | ## MD5 hash of the data 67 | 68 | Run the following, 69 | 70 | ``` 71 | cd xCodeEval/ 72 | tar c tag_classification | md5sum 73 | ``` 74 | 75 | Output should match, `610645116e29db3771e11a890a435699`. 76 | 77 | 78 | ## Tree 79 | 80 | A total of 3 directories and 133 files. 81 | 82 | ``` 83 | . 84 | ├── test 85 | │ ├── C#.jsonl 86 | │ ├── C++.jsonl 87 | │ ├── C.jsonl 88 | │ ├── Go.jsonl 89 | │ ├── Java.jsonl 90 | │ ├── Javascript.jsonl 91 | │ ├── Kotlin.jsonl 92 | │ ├── PHP.jsonl 93 | │ ├── Python.jsonl 94 | │ ├── Ruby.jsonl 95 | │ └── Rust.jsonl 96 | ├── train 97 | │ ├── train_000.jsonl 98 | │ ├── train_001.jsonl 99 | │ ├── train_002.jsonl 100 | │ ├── train_003.jsonl 101 | │ ├── train_004.jsonl 102 | │ ├── train_005.jsonl 103 | │ ├── train_006.jsonl 104 | │ ├── train_007.jsonl 105 | │ ├── train_008.jsonl 106 | │ ├── train_009.jsonl 107 | │ ├── train_010.jsonl 108 | │ ├── train_011.jsonl 109 | │ ├── train_012.jsonl 110 | │ ├── train_013.jsonl 111 | │ ├── train_014.jsonl 112 | │ ├── train_015.jsonl 113 | │ ├── train_016.jsonl 114 | │ ├── train_017.jsonl 115 | │ ├── train_018.jsonl 116 | │ ├── train_019.jsonl 117 | │ ├── train_020.jsonl 118 | │ ├── train_021.jsonl 119 | │ ├── train_022.jsonl 120 | │ ├── train_023.jsonl 121 | │ ├── train_024.jsonl 122 | │ ├── train_025.jsonl 123 | │ ├── train_026.jsonl 124 | │ ├── train_027.jsonl 125 | │ ├── train_028.jsonl 126 | │ ├── train_029.jsonl 127 | │ ├── train_030.jsonl 128 | │ ├── train_031.jsonl 129 | │ ├── train_032.jsonl 130 | │ ├── train_033.jsonl 131 | │ ├── train_034.jsonl 132 | │ ├── train_035.jsonl 133 | │ ├── train_036.jsonl 134 | │ ├── train_037.jsonl 135 | │ ├── train_038.jsonl 136 | │ ├── train_039.jsonl 137 | │ ├── train_040.jsonl 138 | │ ├── train_041.jsonl 139 | │ ├── train_042.jsonl 140 | │ ├── train_043.jsonl 141 | │ ├── train_044.jsonl 142 | │ ├── train_045.jsonl 143 | │ ├── train_046.jsonl 144 | │ ├── train_047.jsonl 145 | │ ├── train_048.jsonl 146 | │ ├── train_049.jsonl 147 | │ ├── train_050.jsonl 148 | │ ├── train_051.jsonl 149 | │ ├── train_052.jsonl 150 | │ ├── train_053.jsonl 151 | │ ├── train_054.jsonl 152 | │ ├── train_055.jsonl 153 | │ ├── train_056.jsonl 154 | │ ├── train_057.jsonl 155 | │ ├── train_058.jsonl 156 | │ ├── train_059.jsonl 157 | │ ├── train_060.jsonl 158 | │ ├── train_061.jsonl 159 | │ ├── train_062.jsonl 160 | │ ├── train_063.jsonl 161 | │ ├── train_064.jsonl 162 | │ ├── train_065.jsonl 163 | │ ├── train_066.jsonl 164 | │ ├── train_067.jsonl 165 | │ ├── train_068.jsonl 166 | │ ├── train_069.jsonl 167 | │ ├── train_070.jsonl 168 | │ ├── train_071.jsonl 169 | │ ├── train_072.jsonl 170 | │ ├── train_073.jsonl 171 | │ ├── train_074.jsonl 172 | │ ├── train_075.jsonl 173 | │ ├── train_076.jsonl 174 | │ ├── train_077.jsonl 175 | │ ├── train_078.jsonl 176 | │ ├── train_079.jsonl 177 | │ ├── train_080.jsonl 178 | │ ├── train_081.jsonl 179 | │ ├── train_082.jsonl 180 | │ ├── train_083.jsonl 181 | │ ├── train_084.jsonl 182 | │ ├── train_085.jsonl 183 | │ ├── train_086.jsonl 184 | │ ├── train_087.jsonl 185 | │ ├── train_088.jsonl 186 | │ ├── train_089.jsonl 187 | │ ├── train_090.jsonl 188 | │ ├── train_091.jsonl 189 | │ ├── train_092.jsonl 190 | │ ├── train_093.jsonl 191 | │ ├── train_094.jsonl 192 | │ ├── train_095.jsonl 193 | │ ├── train_096.jsonl 194 | │ ├── train_097.jsonl 195 | │ ├── train_098.jsonl 196 | │ ├── train_099.jsonl 197 | │ ├── train_100.jsonl 198 | │ ├── train_101.jsonl 199 | │ ├── train_102.jsonl 200 | │ ├── train_103.jsonl 201 | │ ├── train_104.jsonl 202 | │ ├── train_105.jsonl 203 | │ ├── train_106.jsonl 204 | │ ├── train_107.jsonl 205 | │ ├── train_108.jsonl 206 | │ ├── train_109.jsonl 207 | │ └── train_110.jsonl 208 | └── validation 209 | ├── C#.jsonl 210 | ├── C++.jsonl 211 | ├── C.jsonl 212 | ├── Go.jsonl 213 | ├── Java.jsonl 214 | ├── Javascript.jsonl 215 | ├── Kotlin.jsonl 216 | ├── PHP.jsonl 217 | ├── Python.jsonl 218 | ├── Ruby.jsonl 219 | └── Rust.jsonl 220 | ``` 221 | 222 | -------------------------------------------------------------------------------- /evaluation/program_synthesis/eval_program_synthesis.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import tqdm 4 | import jsonlines 5 | import datasets 6 | import concurrent.futures 7 | from dataclasses import dataclass, field 8 | import itertools 9 | from collections import defaultdict 10 | 11 | import requests 12 | from typing import List, Optional, Union, Tuple 13 | from enum import Enum 14 | from multiprocessing import Pool 15 | 16 | 17 | class ExecOutcome(Enum): 18 | PASSED = "PASSED" # code executes and output matches expected output 19 | WRONG_ANSWER = ( 20 | "WRONG_ANSWER" # code executes and output does NOT matches expected output 21 | ) 22 | TIME_LIMIT_EXCEEDED = "TIME_LIMIT_EXCEEDED" # code executes and didn't exit in time, output is ignored in this case 23 | RUNTIME_ERROR = "RUNTIME_ERROR" # code failed to execute (crashed) 24 | COMPILATION_ERROR = "COMPILATION_ERROR" # code failed to compile 25 | MEMORY_LIMIT_EXCEEDED = ( 26 | "MEMORY_LIMIT_EXCEEDED" # code exceeded memory limit during execution 27 | ) 28 | 29 | 30 | @dataclass 31 | class ExtendedUnittest: 32 | input: str 33 | output: List[str] = field(default_factory=list) 34 | result: Optional[str] = None 35 | exec_outcome: Optional[ExecOutcome] = None 36 | 37 | def json(self): 38 | _json = self.__dict__ 39 | if self.exec_outcome is not None: 40 | _json["exec_outcome"] = self.exec_outcome.name 41 | 42 | return _json 43 | 44 | @classmethod 45 | def from_json(cls, _json): 46 | return cls( 47 | input=_json.get("input", ""), 48 | output=_json.get("output", list()), 49 | result=_json.get("result", None), 50 | exec_outcome=_json.get("exec_outcome", None), 51 | ) 52 | 53 | 54 | class EmptyValueError(Exception): 55 | def __init__(self, *args, **kwargs): 56 | super().__init__(*args, **kwargs) 57 | 58 | 59 | class EmptyUnittestError(EmptyValueError): 60 | pass 61 | 62 | 63 | class EmptyLanguageError(EmptyValueError): 64 | pass 65 | 66 | 67 | class EmptySourceCodeError(EmptyValueError): 68 | pass 69 | 70 | 71 | class APICommunication: 72 | _session: requests.Session 73 | 74 | def __init__(self, server_url: str = "http://localhost:5000"): 75 | self._session = requests.Session() 76 | self.execute_code_url = f"{server_url}/api/execute_code" 77 | self.get_runtimes_url = f"{server_url}/api/all_runtimes" 78 | 79 | def __enter__(self): 80 | return self 81 | 82 | def __exit__(self, *args): 83 | self._session.close() 84 | 85 | def get_runtimes(self): 86 | return self._session.get(self.get_runtimes_url).json() 87 | 88 | def execute_code( 89 | self, 90 | language: str, 91 | source_code: str, 92 | unittests: List[dict], 93 | limits: Optional[dict] = None, 94 | block_network: bool = True, 95 | stop_on_first_fail: bool = True, 96 | use_sanitizer: bool = False, 97 | compiler_program_name: Optional[str] = None, 98 | compiler_flags: Optional[str] = None, 99 | interpreter_cmd: Optional[str] = None, 100 | interpreter_flags: Optional[str] = None, 101 | sample_id: Optional[int] = None, 102 | task_id: Union[str, int, None] = None, 103 | ) -> Tuple[List[ExtendedUnittest], Optional[int], Union[str, int, None]]: 104 | if language is None: 105 | raise EmptyLanguageError 106 | 107 | if source_code is None: 108 | raise EmptySourceCodeError 109 | 110 | if unittests is None or len(unittests) == 0: 111 | raise EmptyUnittestError 112 | 113 | request_body = dict( 114 | language=language, 115 | source_code=source_code, 116 | unittests=unittests, 117 | limits=limits if isinstance(limits, dict) else None, 118 | compile_cmd=compiler_program_name, 119 | compile_flags=compiler_flags, 120 | execute_cmd=interpreter_cmd, 121 | execute_flags=interpreter_flags, 122 | block_network=block_network, 123 | stop_on_first_fail=stop_on_first_fail, 124 | use_sanitizer=use_sanitizer, 125 | ) 126 | json_response = self._session.post( 127 | self.execute_code_url, 128 | json=request_body, 129 | headers={"Content-Type": "application/json"}, 130 | ).json() 131 | 132 | if "data" not in json_response: 133 | return json_response, sample_id, task_id 134 | 135 | return ( 136 | json_response["data"], 137 | sample_id, 138 | task_id, 139 | ) 140 | 141 | 142 | def get_idx(file_name): 143 | return int(file_name.split(".json")[0].split("_")[0]) 144 | 145 | 146 | def sanitize_code(code): 147 | FLAG = True 148 | while FLAG == True: 149 | FLAG = False 150 | if code.startswith("```"): 151 | FLAG = True 152 | code = code.replace("```", "", 1) 153 | last_index = code.rfind("```") 154 | if last_index != -1: 155 | FLAG = True 156 | code = code[:last_index] + "" + code[last_index + len("```") :] 157 | if code.startswith("cpp"): 158 | FLAG = True 159 | code = code.replace("cpp", "", 1) 160 | return code 161 | 162 | 163 | def fix_uts(uts): 164 | uts_fx = [] 165 | for ut in uts: 166 | uts_fx.append( 167 | { 168 | "input": ut["input"], 169 | "output": ut["output"], 170 | } 171 | ) 172 | return uts_fx 173 | 174 | 175 | def process(args): 176 | sample, execeval = args 177 | src_uid = sample["source_data"]["src_uid"] 178 | unit_tests = json.loads(sample["source_data"]["hidden_unit_tests"]) 179 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[sample["source_data"]["lang_cluster"]] 180 | sample["unit_test_results"] = [] 181 | for choice in sample["oai_response"]["choices"]: 182 | code = choice["message"]["content"] 183 | code = sanitize_code(code) 184 | unit_test_results, _, _ = execeval.execute_code( 185 | compiler, 186 | code, 187 | fix_uts(unit_tests), 188 | task_id=src_uid, 189 | # stop_on_first_fail=False 190 | ) 191 | # print(unit_test_results) 192 | # print(file, code, [e['exec_outcome'] for e in unit_test_results]) 193 | sample["unit_test_results"].append(unit_test_results) 194 | return sample 195 | 196 | 197 | LANG_CLUSTER_TO_LANG_COMPILER = { 198 | "C": "GNU C11", 199 | "C#": "Mono C#", 200 | "C++": "GNU C++17", 201 | "Go": "Go", 202 | "Java": "Java 17", 203 | "Javascript": "Node.js", 204 | "Kotlin": "Kotlin 1.4", 205 | "PHP": "PHP", 206 | "Python": "PyPy 3", 207 | "Ruby": "Ruby 3", 208 | "Rust": "Rust 2018", 209 | } 210 | 211 | 212 | def main(): 213 | path = f'{os.environ["DUMP_FOLDER"]}/oai/prog_synthesis_n_sample_20/' 214 | for k, debug_compiler in LANG_CLUSTER_TO_LANG_COMPILER.items(): 215 | output_path = os.path.join(path, "reproduce_1") 216 | os.makedirs(output_path, exist_ok=True) 217 | output_file = os.path.join(output_path, f"{debug_compiler}.jsonl") 218 | with concurrent.futures.ThreadPoolExecutor(max_workers=129) as thread_executor: 219 | with jsonlines.open(output_file, "w") as jwp: 220 | files = sorted(os.listdir(path)) 221 | with APICommunication(server_url="http://localhost:5000") as execeval: 222 | all_samples = [] 223 | for file in files: 224 | full_path = os.path.join(path, file) 225 | if os.path.isdir(full_path): 226 | continue 227 | sample = json.load(open(full_path)) 228 | if ( 229 | sample["source_data"]["lang_cluster"] 230 | not in LANG_CLUSTER_TO_LANG_COMPILER 231 | ): 232 | continue 233 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[ 234 | sample["source_data"]["lang_cluster"] 235 | ] 236 | if compiler != debug_compiler: 237 | continue 238 | all_samples.append(sample) 239 | future_to_val_results = { 240 | thread_executor.submit(process, args) 241 | for args in itertools.product(all_samples, [execeval]) 242 | } 243 | 244 | for _out in tqdm.tqdm( 245 | concurrent.futures.as_completed(future_to_val_results), 246 | total=len(all_samples), 247 | desc=f"{debug_compiler}", 248 | ): 249 | try: 250 | __out = _out.result() 251 | jwp.write(__out) 252 | except Exception as emsg: 253 | print("Exception msg: {}".format(emsg)) 254 | pass 255 | 256 | 257 | if __name__ == "__main__": 258 | main() 259 | -------------------------------------------------------------------------------- /evaluation/code_translation/eval_code_translation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import tqdm 4 | import jsonlines 5 | import datasets 6 | import concurrent.futures 7 | from dataclasses import dataclass, field 8 | import itertools 9 | from collections import defaultdict 10 | 11 | import requests 12 | from typing import List, Optional, Union, Tuple 13 | from enum import Enum 14 | from multiprocessing import Pool 15 | 16 | 17 | class ExecOutcome(Enum): 18 | PASSED = "PASSED" # code executes and output matches expected output 19 | WRONG_ANSWER = ( 20 | "WRONG_ANSWER" # code executes and output does NOT matches expected output 21 | ) 22 | TIME_LIMIT_EXCEEDED = "TIME_LIMIT_EXCEEDED" # code executes and didn't exit in time, output is ignored in this case 23 | RUNTIME_ERROR = "RUNTIME_ERROR" # code failed to execute (crashed) 24 | COMPILATION_ERROR = "COMPILATION_ERROR" # code failed to compile 25 | MEMORY_LIMIT_EXCEEDED = ( 26 | "MEMORY_LIMIT_EXCEEDED" # code exceeded memory limit during execution 27 | ) 28 | 29 | 30 | @dataclass 31 | class ExtendedUnittest: 32 | input: str 33 | output: List[str] = field(default_factory=list) 34 | result: Optional[str] = None 35 | exec_outcome: Optional[ExecOutcome] = None 36 | 37 | def json(self): 38 | _json = self.__dict__ 39 | if self.exec_outcome is not None: 40 | _json["exec_outcome"] = self.exec_outcome.name 41 | 42 | return _json 43 | 44 | @classmethod 45 | def from_json(cls, _json): 46 | return cls( 47 | input=_json.get("input", ""), 48 | output=_json.get("output", list()), 49 | result=_json.get("result", None), 50 | exec_outcome=_json.get("exec_outcome", None), 51 | ) 52 | 53 | 54 | class EmptyValueError(Exception): 55 | def __init__(self, *args, **kwargs): 56 | super().__init__(*args, **kwargs) 57 | 58 | 59 | class EmptyUnittestError(EmptyValueError): 60 | pass 61 | 62 | 63 | class EmptyLanguageError(EmptyValueError): 64 | pass 65 | 66 | 67 | class EmptySourceCodeError(EmptyValueError): 68 | pass 69 | 70 | 71 | class APICommunication: 72 | _session: requests.Session 73 | 74 | def __init__(self, server_url: str = "http://localhost:5000"): 75 | self._session = requests.Session() 76 | self.execute_code_url = f"{server_url}/api/execute_code" 77 | self.get_runtimes_url = f"{server_url}/api/all_runtimes" 78 | 79 | def __enter__(self): 80 | return self 81 | 82 | def __exit__(self, *args): 83 | self._session.close() 84 | 85 | def get_runtimes(self): 86 | return self._session.get(self.get_runtimes_url).json() 87 | 88 | def execute_code( 89 | self, 90 | language: str, 91 | source_code: str, 92 | unittests: List[dict], 93 | limits: Optional[dict] = None, 94 | block_network: bool = True, 95 | stop_on_first_fail: bool = True, 96 | use_sanitizer: bool = False, 97 | compiler_program_name: Optional[str] = None, 98 | compiler_flags: Optional[str] = None, 99 | interpreter_cmd: Optional[str] = None, 100 | interpreter_flags: Optional[str] = None, 101 | sample_id: Optional[int] = None, 102 | task_id: Union[str, int, None] = None, 103 | ) -> Tuple[List[ExtendedUnittest], Optional[int], Union[str, int, None]]: 104 | if language is None: 105 | raise EmptyLanguageError 106 | 107 | if source_code is None: 108 | raise EmptySourceCodeError 109 | 110 | if unittests is None or len(unittests) == 0: 111 | raise EmptyUnittestError 112 | 113 | request_body = dict( 114 | language=language, 115 | source_code=source_code, 116 | unittests=unittests, 117 | limits=limits if isinstance(limits, dict) else None, 118 | compile_cmd=compiler_program_name, 119 | compile_flags=compiler_flags, 120 | execute_cmd=interpreter_cmd, 121 | execute_flags=interpreter_flags, 122 | block_network=block_network, 123 | stop_on_first_fail=stop_on_first_fail, 124 | use_sanitizer=use_sanitizer, 125 | ) 126 | json_response = self._session.post( 127 | self.execute_code_url, 128 | json=request_body, 129 | headers={"Content-Type": "application/json"}, 130 | ).json() 131 | 132 | if "data" not in json_response: 133 | return json_response, sample_id, task_id 134 | 135 | return ( 136 | json_response["data"], 137 | sample_id, 138 | task_id, 139 | ) 140 | 141 | 142 | def get_idx(file_name): 143 | return int(file_name.split(".json")[0].split("_")[0]) 144 | 145 | 146 | def sanitize_code(code): 147 | FLAG = True 148 | while FLAG == True: 149 | FLAG = False 150 | if code.startswith("```"): 151 | FLAG = True 152 | code = code.replace("```", "", 1) 153 | last_index = code.rfind("```") 154 | if last_index != -1: 155 | FLAG = True 156 | code = code[:last_index] + "" + code[last_index + len("```") :] 157 | if code.startswith("cpp"): 158 | FLAG = True 159 | code = code.replace("cpp", "", 1) 160 | return code 161 | 162 | 163 | def fix_uts(uts): 164 | uts_fx = [] 165 | for ut in uts: 166 | uts_fx.append( 167 | { 168 | "input": ut["input"], 169 | "output": ut["output"], 170 | } 171 | ) 172 | return uts_fx 173 | 174 | 175 | def process(args): 176 | sample, execeval = args 177 | src_uid = sample["source_data"]["src_uid"] 178 | unit_tests = json.loads(sample["source_data"]["hidden_unit_tests"]) 179 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[sample["source_data"]["target_lang"]] 180 | sample["unit_test_results"] = list() 181 | for choice in sample["oai_response"]["choices"]: 182 | code = choice["message"]["content"] 183 | code = sanitize_code(code) 184 | unit_test_results, _, _ = execeval.execute_code( 185 | compiler, 186 | code, 187 | fix_uts(unit_tests), 188 | task_id=src_uid, # stop_on_first_fail=False 189 | ) 190 | # print(unit_test_results) 191 | # print(file, code, [e['exec_outcome'] for e in unit_test_results]) 192 | sample["unit_test_results"].append(unit_test_results) 193 | return sample 194 | 195 | 196 | LANG_CLUSTER_TO_LANG_COMPILER = { 197 | "C": "GNU C11", 198 | "C#": "Mono C#", 199 | "C++": "GNU C++17", 200 | "Go": "Go", 201 | "Java": "Java 17", 202 | "Javascript": "Node.js", 203 | "Kotlin": "Kotlin 1.4", 204 | "PHP": "PHP", 205 | "Python": "PyPy 3", 206 | "Ruby": "Ruby 3", 207 | "Rust": "Rust 2018", 208 | } 209 | 210 | 211 | def main(): 212 | parent_path = ( 213 | f'{os.environ["DUMP_FOLDER"]}/oai/code_translation_n_sample_20/' 214 | ) 215 | for split in ("compact", "compact_small"): 216 | path = os.path.join(parent_path, split) 217 | for k, debug_compiler in LANG_CLUSTER_TO_LANG_COMPILER.items(): 218 | output_path = os.path.join(path, f"eval_code_translation_{split}_execeval") 219 | os.makedirs(output_path, exist_ok=True) 220 | output_file = os.path.join(output_path, f"{debug_compiler}.jsonl") 221 | with jsonlines.open(output_file, "w") as jwp: 222 | with concurrent.futures.ThreadPoolExecutor( 223 | max_workers=129 224 | ) as thread_executor: 225 | files = sorted(os.listdir(path)) 226 | with APICommunication( 227 | server_url="http://localhost:5000" 228 | ) as execeval: 229 | all_samples = [] 230 | for file in files: 231 | full_path = os.path.join(path, file) 232 | if os.path.isdir(full_path): 233 | continue 234 | sample = json.load(open(full_path)) 235 | if ( 236 | sample["source_data"]["target_lang"] 237 | not in LANG_CLUSTER_TO_LANG_COMPILER 238 | ): 239 | continue 240 | compiler = LANG_CLUSTER_TO_LANG_COMPILER[ 241 | sample["source_data"]["target_lang"] 242 | ] 243 | if compiler != debug_compiler: 244 | continue 245 | all_samples.append(sample) 246 | future_to_val_results = { 247 | thread_executor.submit(process, args) 248 | for args in itertools.product(all_samples, [execeval]) 249 | } 250 | for _out in tqdm.tqdm( 251 | concurrent.futures.as_completed(future_to_val_results), 252 | total=len(all_samples), 253 | desc=f"{debug_compiler}", 254 | ): 255 | # try: 256 | __out = _out.result() 257 | jwp.write(__out) 258 | # except Exception as emsg: 259 | # print("Exception msg: {}".format(emsg)) 260 | # pass 261 | 262 | 263 | if __name__ == "__main__": 264 | main() -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # xCodeEval 2 | [xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval](https://arxiv.org/abs/2303.03004) 3 | 4 | # Update: 5 | 6 | - September 18, 2024: Evaluation code for generative tasks released. [Follow it here](#evaluation) 7 | - Nov 7, 2023: A sample eval script is [here](https://github.com/ntunlp/xCodeEval/pull/8). 8 | - July 13, 2023: StarEncode retrieval model released. [Follow it here](#additional-resources) 9 | - Jul 6, 2023: [ExecEval](https://github.com/ntunlp/ExecEval) has been updated with changes for java, kotlin, go. Please `git pull`, `docker build`, `docker run` for latest updates. 10 | 11 | 12 | We introduce **xCodeEval**, the largest executable multilingual multitask benchmark to date consisting of 25 M document-level coding examples from about 7.5K unique problems covering up to 17 programming languages with execution-level parallelism. It features a total of seven tasks involving code understanding, generation, translation and retrieval, and it employs an execution-based evaluation. We develop a test-case based multilingual code execution engine, [**ExecEval**](https://github.com/ntunlp/ExecEval) that supports all the programming languages in **xCodeEval**. We also propose a novel data splitting and a data selection schema for balancing data distributions over multiple attributes based on geometric mean and graph-theoretic principle. 13 | 14 | This repository contains the sample code and data link for xCodeEval [paper](https://arxiv.org/abs/2303.03004). 15 | 16 | # Data Download 17 | 18 | [Huggingface-dataset](https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval) 19 | 20 | Currently this repository supports huggingface [`load_dataset()`](https://huggingface.co/docs/datasets/v1.11.0/package_reference/loading_methods.html#datasets.load_dataset) api. Follow the following example to load dataset for individual examples. 21 | 22 | ``` 23 | import datasets 24 | 25 | prog_synthesis_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "program_synthesis") 26 | code_translation_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "code_translation") 27 | tag_classification_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "tag_classification") 28 | apr_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "apr") 29 | pcode_compilation_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "code_compilation") 30 | retrieval_code_code_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_code_code") 31 | retrieval_nl_code_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_nl_code") 32 | retrieval_corpus_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_corpus") 33 | 34 | ``` 35 | 36 | ## Hf large data download tricks. 37 | 38 | 39 | If you are facing long delay with data processing, add a `ignore_verifications=True`. 40 | 41 | ``` 42 | prog_synthesis_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "program_synthesis", ignore_verifications=True) 43 | ``` 44 | 45 | If you are facing long delay with data downloading, use huggingface streaming mode. 46 | 47 | ``` 48 | prog_synthesis_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "program_synthesis", streaming=True) 49 | ``` 50 | 51 | ## Just Give me the raw data (😠) 52 | 53 | Data can be also downloaded as a git LFS repo from huggingface. 54 | 55 | ![xCodeEval_hf](https://github.com/ntunlp/xCodeEval/blob/main/xcodeeval-hf.png?raw=true) 56 | 57 | You can download the full data using the following command. 58 | 59 | ``` 60 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 61 | cd xCodeEval 62 | git lfs pull 63 | ``` 64 | 65 | To download a specific part of the dataset, 66 | 67 | ``` 68 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 69 | cd xCodeEval 70 | git lfs pull --include "apr/test/*" 71 | ``` 72 | 73 | We propose 7 Tasks. 74 | 75 | 1. [Tag Classification](./tag_classification.md) 76 | 2. [Code Compilation](./code_compilation.md) 77 | 3. [Program Synthesis](./program_synthesis.md) 78 | 4. [Code Translation](./code_translation.md) 79 | 5. [Automatic Program Repair](./apr.md) 80 | 6. [Code-Code Retrieval](./retrieval.md) 81 | 7. [NL-Code Retrieval](./retrieval.md) 82 | 83 | # Evaluation 84 | For details on evaluation please follow instructions from [evaluation/README.md](./evaluation/README.md). 85 | 86 | # Common Data for different tasks 87 | 88 | ![xCodeEval_fig_1](xcodeeval_fig_1.png) 89 | 90 | We have two data files that are required for multiple tasks. 91 | 92 | 1. `problem_descriptions.jsonl` 93 | 2. `unittest_db.json` 94 | 95 | You can find these two files in the root directory of the [main](https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval/tree/main) branch of huggingface dataset repository. To avoid data redundancy we didn't include these data with the relevant tasks, rather we add a unique id `src_uid` to retrieve these data. 96 | 97 | ## Structure of `problem_descriptions.jsonl` 98 | 99 | A sample, 100 | 101 | ```json 102 | { 103 | "description": "There are $$$n$$$ positive integers $$$a_1, a_2, \\dots, a_n$$$. For the one move you can choose any even value $$$c$$$ and divide by two all elements that equal $$$c$$$.For example, if $$$a=[6,8,12,6,3,12]$$$ and you choose $$$c=6$$$, and $$$a$$$ is transformed into $$$a=[3,8,12,3,3,12]$$$ after the move.You need to find the minimal number of moves for transforming $$$a$$$ to an array of only odd integers (each element shouldn't be divisible by $$$2$$$).", 104 | "input_from": "standard input", 105 | "output_to": "standard output", 106 | "time_limit": "3 seconds", 107 | "memory_limit": "256 megabytes", 108 | "input_spec": "The first line of the input contains one integer $$$t$$$ ($$$1 \\le t \\le 10^4$$$) \u2014 the number of test cases in the input. Then $$$t$$$ test cases follow. The first line of a test case contains $$$n$$$ ($$$1 \\le n \\le 2\\cdot10^5$$$) \u2014 the number of integers in the sequence $$$a$$$. The second line contains positive integers $$$a_1, a_2, \\dots, a_n$$$ ($$$1 \\le a_i \\le 10^9$$$). The sum of $$$n$$$ for all test cases in the input doesn't exceed $$$2\\cdot10^5$$$.", 109 | "output_spec": "For $$$t$$$ test cases print the answers in the order of test cases in the input. The answer for the test case is the minimal number of moves needed to make all numbers in the test case odd (i.e. not divisible by $$$2$$$).", 110 | "notes": "NoteIn the first test case of the example, the optimal sequence of moves can be as follows: before making moves $$$a=[40, 6, 40, 3, 20, 1]$$$; choose $$$c=6$$$; now $$$a=[40, 3, 40, 3, 20, 1]$$$; choose $$$c=40$$$; now $$$a=[20, 3, 20, 3, 20, 1]$$$; choose $$$c=20$$$; now $$$a=[10, 3, 10, 3, 10, 1]$$$; choose $$$c=10$$$; now $$$a=[5, 3, 5, 3, 5, 1]$$$ \u2014 all numbers are odd. Thus, all numbers became odd after $$$4$$$ moves. In $$$3$$$ or fewer moves, you cannot make them all odd.", 111 | "sample_inputs": [ 112 | "4\n6\n40 6 40 3 20 1\n1\n1024\n4\n2 4 8 16\n3\n3 1 7" 113 | ], 114 | "sample_outputs": [ 115 | "4\n10\n4\n0" 116 | ], 117 | "tags": [ 118 | "number theory", 119 | "greedy" 120 | ], 121 | "src_uid": "afcd41492158e68095b01ff1e88c3dd4", 122 | "difficulty": 1200, 123 | "created_at": 1576321500 124 | } 125 | ``` 126 | 127 | ### Key Definitions 128 | 129 | 1. `description`: Problem description in textual format, math operations are written in latex. 130 | 2. `input_from`: How the program should take the unit test. 131 | 3. `output_to`: Where the program should output the result of the unit test. 132 | 4. `time_limit`: Time limit to solve the problem. 133 | 5. `memory_limit`: Memory limit to solve the problem. 134 | 6. `input_spec`: How and in what order the input will be given to the program? It also includes the date range, types, and sizes. 135 | 7. `output_spec`: How the outputs should be printed. Most of the time the unit test results are matched with an *exact string match* or *floating point comparison* with a precision boundary. 136 | 8. `sample_inputs`: A sample input for the code that is expected to solve the problem described in `description`. 137 | 9. `sample_outputs`: The expected output for the `sample_input` that is expected to solve the problem described in `description`. 138 | 10. `notes`: Explanation of `sample_inputs` & `sample_outputs`. 139 | 11. `tags`: The problem categories. 140 | 12. `src_uid`: The unique id of the problem. This ID is referred to in the task data samples instead of putting all this information. 141 | 13. `difficulty`: How difficult is it to solve the problem for a human (annotated by an expert human)? 142 | 14. `created_at`: The Unix timestamp when the problem was released. Use `datetime` lib in Python to parse it to a human-readable format. 143 | 144 | ## Structure of `unittest_db.json` 145 | 146 | The structure of the `json` file, 147 | 148 | ```python 149 | unittest_db = { 150 | "db884d679d9cfb1dc4bc511f83beedda" : [ 151 | { 152 | "input": "4\r\n3 2 3 2\r\n", 153 | "output": [ 154 | "1" 155 | ], 156 | }, 157 | { 158 | ... 159 | }, 160 | ... 161 | ] 162 | "3bc096d8cd3418948d5be6bf297aa9b5":[ 163 | ... 164 | ], 165 | ... 166 | } 167 | ``` 168 | 169 | ### Key Definitions 170 | 171 | 1. `unittest_db.json` dict keys i.e., `db884d679d9cfb1dc4bc511f83beedda` are the `src_uid` from `problem_descriptions.jsonl`. 172 | 2. `input`: Input of the unit test. 173 | 3. `output`: List of expected outputs for the unit test. 174 | 175 | 176 | # Additional Resources 177 | 178 | 1. **nl-code**: [NTU-NLP-sg/xCodeEval-nl-code-starencoder-ckpt-37](https://huggingface.co/NTU-NLP-sg/xCodeEval-nl-code-starencoder-ckpt-37) 179 | 2. **code-code**: [NTU-NLP-sg/xCodeEval-code-code-starencoder-ckpt-37](https://huggingface.co/NTU-NLP-sg/xCodeEval-code-code-starencoder-ckpt-37) 180 | 181 | 182 | # License 183 | 184 | This repository is under [MIT](https://github.com/ntunlp/xCodeEval/blob/main/LICENSE) license. But the data is distributes through [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. 185 | 186 | # Citation 187 | 188 | ``` 189 | @misc{khan2023xcodeeval, 190 | title={xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval}, 191 | author={Mohammad Abdullah Matin Khan and M Saiful Bari and Xuan Long Do and Weishi Wang and Md Rizwan Parvez and Shafiq Joty}, 192 | year={2023}, 193 | eprint={2303.03004}, 194 | archivePrefix={arXiv}, 195 | primaryClass={cs.CL} 196 | } 197 | ``` 198 | 199 | Part of this work was submitted as a requirement for the Master of Science degree in Computer Science and Applications at the Islamic University of Technology by Muhammad Abdullah Matin Khan Zarzis. (The thesis or project report will be added upon publication). 200 | 201 | ``` 202 | @misc{khan2024xcodeeval, 203 | title={Development of a Code Search Engine Using Natural Language Processing Techniques}, 204 | author={Mohammad Abdullah Matin Khan}, 205 | year={2024}, 206 | publication={Journal of Engineering and Technology (JET)} 207 | url=TBA 208 | } 209 | ``` 210 | -------------------------------------------------------------------------------- /code_translation.md: -------------------------------------------------------------------------------- 1 | # Code Translation Task 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> code_translation_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "code_translation") 8 | >>> print(code_translation_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['prob_desc_memory_limit', 'prob_desc_sample_inputs', 'prob_desc_output_spec', 'file_name', 'code_uid', 'lang_cluster', 'prob_desc_sample_outputs', 'prob_desc_description', 'prob_desc_output_to', 'lang', 'prob_desc_notes', 'prob_desc_created_at', 'source_code', 'exec_outcome', 'prob_desc_input_from', 'difficulty', 'src_uid', 'prob_desc_input_spec', 'prob_desc_time_limit', 'hidden_unit_tests'], 13 | num_rows: 5538841 14 | }) 15 | validation: Dataset({ 16 | features: ['prob_desc_memory_limit', 'prob_desc_sample_inputs', 'prob_desc_output_spec', 'file_name', 'code_uid', 'lang_cluster', 'prob_desc_sample_outputs', 'prob_desc_description', 'prob_desc_output_to', 'lang', 'prob_desc_notes', 'prob_desc_created_at', 'source_code', 'exec_outcome', 'prob_desc_input_from', 'difficulty', 'src_uid', 'prob_desc_input_spec', 'prob_desc_time_limit', 'hidden_unit_tests'], 17 | num_rows: 7034 18 | }) 19 | test: Dataset({ 20 | features: ['prob_desc_memory_limit', 'prob_desc_sample_inputs', 'prob_desc_output_spec', 'file_name', 'code_uid', 'lang_cluster', 'prob_desc_sample_outputs', 'prob_desc_description', 'prob_desc_output_to', 'lang', 'prob_desc_notes', 'prob_desc_created_at', 'source_code', 'exec_outcome', 'prob_desc_input_from', 'difficulty', 'src_uid', 'prob_desc_input_spec', 'prob_desc_time_limit', 'hidden_unit_tests'], 21 | num_rows: 20356 22 | }) 23 | validation_small: Dataset({ 24 | features: ['prob_desc_memory_limit', 'prob_desc_sample_inputs', 'prob_desc_output_spec', 'file_name', 'code_uid', 'lang_cluster', 'prob_desc_sample_outputs', 'prob_desc_description', 'prob_desc_output_to', 'lang', 'prob_desc_notes', 'prob_desc_created_at', 'source_code', 'exec_outcome', 'prob_desc_input_from', 'difficulty', 'src_uid', 'prob_desc_input_spec', 'prob_desc_time_limit', 'hidden_unit_tests'], 25 | num_rows: 440 26 | }) 27 | }) 28 | ``` 29 | 30 | ## Download data using git LFS 31 | 32 | When loading with huggingface `load_dataset()` api, no need to do additional data linking. But if you are donwloading data in raw `*.jsonl` format, you need to link proper field for the task. To link the data, use `src_uid` to match row from `problem_descriptions.jsonl` and `unittest_db.json`. 33 | 34 | To download the code translation data, 35 | 36 | ``` 37 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 38 | cd xCodeEval 39 | git lfs pull --include "code_translation/*" 40 | git lfs pull --include "problem_descriptions.jsonl" 41 | git lfs pull --include "unittest_db.json" 42 | ``` 43 | 44 | 45 | ## A Sample from train/validation/test split 46 | 47 | ``` 48 | { 49 | "lang": "GNU C++11", 50 | "source_code": "#include\nint main()\n{\n long long n,a,b,i,maxb;\n scanf(\"%I64d %I64d %I64d\",&n,&a,&b);\n maxb=b;\n long long x[n];\n for(i=0;i<=n-1;i++)\n {\n scanf(\"%I64d\",&x[i]);\n }\n for(i=0;i<=n-1;i++)\n {\n if(x[i]==0)\n {\n if(b>0)\n b=b-1;\n else if(a>0)\n a=a-1;\n }\n else if(x[i]==1)\n {\n if(a>0&&b0)\n b=b-1;\n }\n if(a==0&&b==0)\n break;\n }\n if(i==n)\n i=i-1;\n printf(\"%I64d\\n\",i+1);\n return 0;\n}//2019-10-08 13:28:20.149", 51 | "lang_cluster": "C++", 52 | "tags": [ 53 | "greedy" 54 | ], 55 | "code_uid": "80765395fd5eb715873f3eaa3e1d36f1", 56 | "src_uid": "75ef1f52ef3a86992159eef566dddc89", 57 | "difficulty": 1500.0, 58 | "exec_outcome": "PASSED" 59 | } 60 | ``` 61 | 62 | 63 | ## Key Definitions 64 | 65 | 1. `lang`: Runtime/Compiler version of the `source_code`. 66 | 2. `source_code`: A program. 67 | 3. `code_uid`: A unique ID for the sample. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `code_uid`. 68 | 4. `src_uid`: A specific identifier that shows which problem the code is associated with. This identifier is **important** for the training of the model. The problem referred to by the `src_uid` provides a natural description of the problem that the code successfully solved. Refer to [Structure of `problem_descriptions.jsonl`](./README.md#structure-of-problem_descriptionsjsonl) 69 | 5. `difficulty`: Difficulty rating of the problem indicated by `src_uid`. The higher the harder. 70 | 6. `exec_outcome`: Execution outcome status. Follow [Section 4.1](https://arxiv.org/pdf/2303.03004.pdf) to know the potential list of outcomes. The `exec_outcome` flags in the training data comes from a pre-run environmeent. However, training data doesn't includes unit-test to avoid potential hacks. We provide unit test for only dev and test data. 71 | 7. `lang_cluster`: A generic programming language name the value of `lang` belongs to. 72 | 73 | The following keys will come from `problem_descriptions.jsonl` by matching `src_uid`, 74 | 75 | 8. `prob_desc_description`: Problem description in textual format, math operations are written in latex. 76 | 9. `prob_desc_input_from`: How the program should take the unit test. 77 | 10. `prob_desc_output_to`: Where the program should output the result of the unit test. 78 | 11. `prob_desc_time_limit`: Time limit to solve the problem. 79 | 12. `prob_desc_memory_limit`: Memory limit to solve the problem. 80 | 13. `prob_desc_input_spec`: How and in what order the input will be given to the program? It also includes the date range, types, and sizes. 81 | 14. `prob_desc_output_spec`: How the outputs should be printed. Most of the time the unit test results are matched with an *exact string match* or *floating point comparison* with a precision boundary. 82 | 15. `prob_desc_sample_inputs`: A sample input for the code that is expected to solve the problem described in `description`. 83 | 16. `prob_desc_sample_outputs`: The expected output for the `sample_input` that is expected to solve the problem described in `description`. 84 | 17. `prob_desc_notes`: Explanation of `sample_inputs` & `sample_outputs`. 85 | 18. `prob_desc_created_at`: The Unix timestamp when the problem was released. Use `datetime` lib in Python to parse it to a human-readable format. 86 | 87 | source information will come from the name of the `*.jsonl` file name. 88 | 19. `file_name`: Name of the source jsonl file from where data is loaded. 89 | 90 | Unit test information will come from `unittest_db.json` by matching `src_uid`. 91 | 20. `hidden_unit_tests`: a list of unit tests returned as string. use `json.loads(hidden_unit_tests)` to load the data. 92 | 93 | ## Definition of parallelism 94 | 95 | Dev and test data don't require parallel counterparts since the model has to generate a code segment which will be evaluated by unit-test. 96 | But creating training data for translation can be tricky and may require creative solutions. All the samples whose `src_uid` is the same are parallel. So if we want to take all possible pairwise data it may be extremely large. It may not be viable to train that amount of data. So the authors of this benchmark expect the users to come up with new ideas on how they can pair up solution in different programming languages. No hints for more creativity ... !!!! 97 | 98 | ## MD5 hash of the data 99 | 100 | Run the following, 101 | 102 | ``` 103 | cd xCodeEval/ 104 | tar c code_translation | md5sum 105 | ``` 106 | 107 | Output should match, `a6e72db3f75d9d8fc3d11f7f597c7824`. 108 | 109 | 110 | ## Tree 111 | 112 | 3 directories, 133 files 113 | 114 | ``` 115 | . 116 | ├── test 117 | │ ├── C#.jsonl 118 | │ ├── C++.jsonl 119 | │ ├── C.jsonl 120 | │ ├── Go.jsonl 121 | │ ├── Java.jsonl 122 | │ ├── Javascript.jsonl 123 | │ ├── Kotlin.jsonl 124 | │ ├── PHP.jsonl 125 | │ ├── Python.jsonl 126 | │ ├── Ruby.jsonl 127 | │ └── Rust.jsonl 128 | ├── train 129 | │ ├── train_000.jsonl 130 | │ ├── train_001.jsonl 131 | │ ├── train_002.jsonl 132 | │ ├── train_003.jsonl 133 | │ ├── train_004.jsonl 134 | │ ├── train_005.jsonl 135 | │ ├── train_006.jsonl 136 | │ ├── train_007.jsonl 137 | │ ├── train_008.jsonl 138 | │ ├── train_009.jsonl 139 | │ ├── train_010.jsonl 140 | │ ├── train_011.jsonl 141 | │ ├── train_012.jsonl 142 | │ ├── train_013.jsonl 143 | │ ├── train_014.jsonl 144 | │ ├── train_015.jsonl 145 | │ ├── train_016.jsonl 146 | │ ├── train_017.jsonl 147 | │ ├── train_018.jsonl 148 | │ ├── train_019.jsonl 149 | │ ├── train_020.jsonl 150 | │ ├── train_021.jsonl 151 | │ ├── train_022.jsonl 152 | │ ├── train_023.jsonl 153 | │ ├── train_024.jsonl 154 | │ ├── train_025.jsonl 155 | │ ├── train_026.jsonl 156 | │ ├── train_027.jsonl 157 | │ ├── train_028.jsonl 158 | │ ├── train_029.jsonl 159 | │ ├── train_030.jsonl 160 | │ ├── train_031.jsonl 161 | │ ├── train_032.jsonl 162 | │ ├── train_033.jsonl 163 | │ ├── train_034.jsonl 164 | │ ├── train_035.jsonl 165 | │ ├── train_036.jsonl 166 | │ ├── train_037.jsonl 167 | │ ├── train_038.jsonl 168 | │ ├── train_039.jsonl 169 | │ ├── train_040.jsonl 170 | │ ├── train_041.jsonl 171 | │ ├── train_042.jsonl 172 | │ ├── train_043.jsonl 173 | │ ├── train_044.jsonl 174 | │ ├── train_045.jsonl 175 | │ ├── train_046.jsonl 176 | │ ├── train_047.jsonl 177 | │ ├── train_048.jsonl 178 | │ ├── train_049.jsonl 179 | │ ├── train_050.jsonl 180 | │ ├── train_051.jsonl 181 | │ ├── train_052.jsonl 182 | │ ├── train_053.jsonl 183 | │ ├── train_054.jsonl 184 | │ ├── train_055.jsonl 185 | │ ├── train_056.jsonl 186 | │ ├── train_057.jsonl 187 | │ ├── train_058.jsonl 188 | │ ├── train_059.jsonl 189 | │ ├── train_060.jsonl 190 | │ ├── train_061.jsonl 191 | │ ├── train_062.jsonl 192 | │ ├── train_063.jsonl 193 | │ ├── train_064.jsonl 194 | │ ├── train_065.jsonl 195 | │ ├── train_066.jsonl 196 | │ ├── train_067.jsonl 197 | │ ├── train_068.jsonl 198 | │ ├── train_069.jsonl 199 | │ ├── train_070.jsonl 200 | │ ├── train_071.jsonl 201 | │ ├── train_072.jsonl 202 | │ ├── train_073.jsonl 203 | │ ├── train_074.jsonl 204 | │ ├── train_075.jsonl 205 | │ ├── train_076.jsonl 206 | │ ├── train_077.jsonl 207 | │ ├── train_078.jsonl 208 | │ ├── train_079.jsonl 209 | │ ├── train_080.jsonl 210 | │ ├── train_081.jsonl 211 | │ ├── train_082.jsonl 212 | │ ├── train_083.jsonl 213 | │ ├── train_084.jsonl 214 | │ ├── train_085.jsonl 215 | │ ├── train_086.jsonl 216 | │ ├── train_087.jsonl 217 | │ ├── train_088.jsonl 218 | │ ├── train_089.jsonl 219 | │ ├── train_090.jsonl 220 | │ ├── train_091.jsonl 221 | │ ├── train_092.jsonl 222 | │ ├── train_093.jsonl 223 | │ ├── train_094.jsonl 224 | │ ├── train_095.jsonl 225 | │ ├── train_096.jsonl 226 | │ ├── train_097.jsonl 227 | │ ├── train_098.jsonl 228 | │ ├── train_099.jsonl 229 | │ ├── train_100.jsonl 230 | │ ├── train_101.jsonl 231 | │ ├── train_102.jsonl 232 | │ ├── train_103.jsonl 233 | │ ├── train_104.jsonl 234 | │ ├── train_105.jsonl 235 | │ ├── train_106.jsonl 236 | │ ├── train_107.jsonl 237 | │ ├── train_108.jsonl 238 | │ ├── train_109.jsonl 239 | │ └── train_110.jsonl 240 | └── validation 241 | ├── C#.jsonl 242 | ├── C++.jsonl 243 | ├── C.jsonl 244 | ├── Go.jsonl 245 | ├── Java.jsonl 246 | ├── Javascript.jsonl 247 | ├── Kotlin.jsonl 248 | ├── PHP.jsonl 249 | ├── Python.jsonl 250 | ├── Ruby.jsonl 251 | └── Rust.jsonl 252 | ``` 253 | -------------------------------------------------------------------------------- /program_synthesis.md: -------------------------------------------------------------------------------- 1 | # Program Synthesis Task 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> program_synthesis_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "program_synthesis") 8 | >>> print(code_translation_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['prob_desc_output_to', 'prob_desc_created_at', 'file_name', 'prob_desc_memory_limit', 'code_uid', 'prob_desc_sample_inputs', 'prob_desc_description', 'difficulty', 'prob_desc_sample_outputs', 'prob_desc_time_limit', 'source_code', 'prob_desc_input_spec', 'prob_desc_notes', 'src_uid', 'lang', 'prob_desc_input_from', 'tags', 'prob_desc_output_spec', 'exec_outcome', 'lang_cluster', 'hidden_unit_tests'], 13 | num_rows: 5538841 14 | }) 15 | validation: Dataset({ 16 | features: ['prob_desc_output_to', 'prob_desc_created_at', 'file_name', 'prob_desc_memory_limit', 'code_uid', 'prob_desc_sample_inputs', 'prob_desc_description', 'difficulty', 'prob_desc_sample_outputs', 'prob_desc_time_limit', 'source_code', 'prob_desc_input_spec', 'prob_desc_notes', 'src_uid', 'lang', 'prob_desc_input_from', 'tags', 'prob_desc_output_spec', 'exec_outcome', 'lang_cluster', 'hidden_unit_tests'], 17 | num_rows: 106 18 | }) 19 | test: Dataset({ 20 | features: ['prob_desc_output_to', 'prob_desc_created_at', 'file_name', 'prob_desc_memory_limit', 'code_uid', 'prob_desc_sample_inputs', 'prob_desc_description', 'difficulty', 'prob_desc_sample_outputs', 'prob_desc_time_limit', 'source_code', 'prob_desc_input_spec', 'prob_desc_notes', 'src_uid', 'lang', 'prob_desc_input_from', 'tags', 'prob_desc_output_spec', 'exec_outcome', 'lang_cluster', 'hidden_unit_tests'], 21 | num_rows: 952 22 | }) 23 | }) 24 | ``` 25 | 26 | ## Download data using git LFS 27 | 28 | When loading with huggingface `load_dataset()` api, no need to do additional data linking. But if you are donwloading data in raw `*.jsonl` format, you need to link proper field for the task. To link the data, use `src_uid` to match row from `problem_descriptions.jsonl` and `unittest_db.json`. 29 | 30 | To download the program synthesis data, 31 | 32 | ``` 33 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 34 | cd xCodeEval 35 | git lfs pull --include "program_synthesis/*" 36 | git lfs pull --include "problem_descriptions.jsonl" 37 | git lfs pull --include "unittest_db.json" 38 | ``` 39 | 40 | 41 | ## A Sample from train split 42 | 43 | ``` 44 | { 45 | "lang": "GNU C++17", 46 | "source_code": "#include \n#include \nusing namespace std::chrono; \n\nusing namespace std;\n#define f0r(a, b) for (a = 0; a < b; a++)\n#define f1r(a, b, c) for (a = b; a < c; a++)\n#define ms(arr, v) memset(arr, v, sizeof(arr))\n#define pb push_back\n#define io ios_base::sync_with_stdio(false); cin.tie(NULL); cout.tie(NULL)\n#define mp make_pair\n#define f first\n#define s second\ntypedef long long ll;\ntypedef double ld;\ntypedef pair pii;\ntypedef pair pll;\nll i, j;\n\nll n, q, Q, T, m, k, r, x, y, z, g;\nstring a, b;\nunsigned char ans[200001];\n\nint main() {\n io;\n cin >> n >> a >> b;\n ms(ans, 0);\n for (int i = n-1; i >= 0; i--) {\n // f0r(j, n) cout << (char)(ans[j] < 10 ? ans[j] + '0' : ans[j]);\n cout << endl;\n if (i == n-1) {\n if (a[i] > b[i]) {\n int diff = a[i] - b[i];\n int x = b[i] + (diff>>1);\n while (x > 'z') {\n x -= 26;\n ans[i-1]++;\n }\n ans[i] = x;\n } else {\n int diff = b[i] - a[i];\n int x = a[i] + (diff>>1);\n ans[i] = x;\n }\n continue;\n }\n if (a[i] < b[i]) {\n int diff = b[i] - a[i];\n ans[i] += a[i] + (diff>>1);\n if (diff % 2 != 0) {\n ans[i+1] += 13;\n while (ans[i+1] > 'z') {\n ans[i+1] -= 26;\n ans[i]++;\n }\n }\n while (ans[i] > 'z') {\n ans[i] -= 26;\n ans[i-1]++;\n }\n } else if (a[i] == b[i]) {\n ans[i] = a[i];\n } else {\n int diff = a[i] - b[i];\n ans[i] += b[i] + (diff>>1);\n if (diff % 2 != 0) {\n ans[i+1] += 13;\n while (ans[i+1] > 'z') {\n ans[i+1] -= 26;\n ans[i]++;\n }\n }\n while (ans[i] > 'z') {\n ans[i] -= 26;\n ans[i-1]++;\n }\n }\n }\n f0r(i, n) cout << ans[i];\n cout << endl;\n // cout << \"FLUSH PLEASE\" << endl;\n}", 47 | "tags": [ 48 | "number theory", 49 | "bitmasks", 50 | "math", 51 | "strings" 52 | ], 53 | "lang_cluster": "C++", 54 | "src_uid": "5f4009d4065f5ad39e662095f8f5c068", 55 | "code_uid": "3a1d3fab20424f3fadea46e0ac60f0c6", 56 | "difficulty": 1900, 57 | "exec_outcome": "PASSED" 58 | } 59 | ``` 60 | 61 | ## A Sample from validation/test split 62 | ``` 63 | { 64 | "description": "You are given a string $$$s$$$ consisting of lowercase Latin letters. Let the length of $$$s$$$ be $$$|s|$$$. You may perform several operations on this string.In one operation, you can choose some index $$$i$$$ and remove the $$$i$$$-th character of $$$s$$$ ($$$s_i$$$) if at least one of its adjacent characters is the previous letter in the Latin alphabet for $$$s_i$$$. For example, the previous letter for b is a, the previous letter for s is r, the letter a has no previous letters. Note that after each removal the length of the string decreases by one. So, the index $$$i$$$ should satisfy the condition $$$1 \\le i \\le |s|$$$ during each operation.For the character $$$s_i$$$ adjacent characters are $$$s_{i-1}$$$ and $$$s_{i+1}$$$. The first and the last characters of $$$s$$$ both have only one adjacent character (unless $$$|s| = 1$$$).Consider the following example. Let $$$s=$$$ bacabcab. During the first move, you can remove the first character $$$s_1=$$$ b because $$$s_2=$$$ a. Then the string becomes $$$s=$$$ acabcab. During the second move, you can remove the fifth character $$$s_5=$$$ c because $$$s_4=$$$ b. Then the string becomes $$$s=$$$ acabab. During the third move, you can remove the sixth character $$$s_6=$$$'b' because $$$s_5=$$$ a. Then the string becomes $$$s=$$$ acaba. During the fourth move, the only character you can remove is $$$s_4=$$$ b, because $$$s_3=$$$ a (or $$$s_5=$$$ a). The string becomes $$$s=$$$ acaa and you cannot do anything with it. Your task is to find the maximum possible number of characters you can remove if you choose the sequence of operations optimally.", 65 | "input_from": "standard input", 66 | "output_to": "standard output", 67 | "time_limit": "2 seconds", 68 | "memory_limit": "256 megabytes", 69 | "input_spec": "The only line of the input contains one integer $$$|s|$$$ ($$$1 \\le |s| \\le 100$$$) \u2014 the length of $$$s$$$. The second line of the input contains one string $$$s$$$ consisting of $$$|s|$$$ lowercase Latin letters.", 70 | "output_spec": "Print one integer \u2014 the maximum possible number of characters you can remove if you choose the sequence of moves optimally.", 71 | "notes": "NoteThe first example is described in the problem statement. Note that the sequence of moves provided in the statement is not the only, but it can be shown that the maximum possible answer to this test is $$$4$$$.In the second example, you can remove all but one character of $$$s$$$. The only possible answer follows. During the first move, remove the third character $$$s_3=$$$ d, $$$s$$$ becomes bca. During the second move, remove the second character $$$s_2=$$$ c, $$$s$$$ becomes ba. And during the third move, remove the first character $$$s_1=$$$ b, $$$s$$$ becomes a. ", 72 | "sample_inputs": [ 73 | "8\nbacabcab", 74 | "4\nbcda", 75 | "6\nabbbbb" 76 | ], 77 | "sample_outputs": [ 78 | "4", 79 | "3", 80 | "5" 81 | ], 82 | "tags": [ 83 | "brute force", 84 | "constructive algorithms", 85 | "strings", 86 | "greedy" 87 | ], 88 | "src_uid": "9ce37bc2d361f5bb8a0568fb479b8a38", 89 | "difficulty": 1600 90 | } 91 | ``` 92 | 93 | ## Key Definitions 94 | 95 | 1. `lang`: Runtime/Compiler version of the `source_code`. 96 | 2. `source_code`: A program. 97 | 3. `tags`: List of potential algorithmic techniques required to write the program. 98 | 4. `lang_cluster`: A generic programming language name the value of `lang` belongs to. 99 | 5. `src_uid`: A specific identifier that shows which problem the code is associated with. This identifier is **important** for the training of the model. The problem referred to by the `src_uid` provides a natural description of the problem that the code successfully solved. Refer to [Structure of `problem_descriptions.jsonl`](./README.md#structure-of-problem_descriptionsjsonl) 100 | 6. `code_uid`: A unique ID for the sample. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `code_uid`. 101 | 7. `difficulty`: Difficulty rating of the problem indicated by `src_uid`. The higher the harder. 102 | 8. `exec_outcome`: Execution outcome status. Follow [Section 4.1](https://arxiv.org/pdf/2303.03004.pdf) to know the potential list of outcomes. The `exec_outcome` flags in the training data comes from a pre-run environmeent. However, training data doesn't includes unit-test to avoid potential hacks. We provide unit test for only dev and test data. 103 | 104 | The following keys will come from `problem_descriptions.jsonl` by matching `src_uid`, 105 | 106 | 9. `prob_desc_description`: Problem description in textual format, math operations are written in latex. 107 | 10. `prob_desc_input_from`: How the program should take the unit test. 108 | 11. `prob_desc_output_to`: Where the program should output the result of the unit test. 109 | 12. `prob_desc_time_limit`: Time limit to solve the problem. 110 | 13. `prob_desc_memory_limit`: Memory limit to solve the problem. 111 | 14. `prob_desc_input_spec`: How and in what order the input will be given to the program? It also includes the date range, types, and sizes. 112 | 15. `prob_desc_output_spec`: How the outputs should be printed. Most of the time the unit test results are matched with an *exact string match* or *floating point comparison* with a precision boundary. 113 | 16. `prob_desc_sample_inputs`: A sample input for the code that is expected to solve the problem described in `description`. 114 | 17. `prob_desc_sample_outputs`: The expected output for the `sample_input` that is expected to solve the problem described in `description`. 115 | 18. `prob_desc_notes`: Explanation of `sample_inputs` & `sample_outputs`. 116 | 19. `prob_desc_created_at`: The Unix timestamp when the problem was released. Use `datetime` lib in Python to parse it to a human-readable format. 117 | 118 | source information will come from the name of the `*.jsonl` file name. 119 | 20. `file_name`: Name of the source jsonl file from where data is loaded. 120 | 121 | Unit test information will come from `unittest_db.json` by matching `src_uid`. 122 | 21. `hidden_unit_tests`: a list of unit tests returned as string. use `json.loads(hidden_unit_tests)` to load the data. 123 | 124 | ## MD5 hash of the data 125 | 126 | Run the following, 127 | 128 | ``` 129 | cd xCodeEval/ 130 | tar c program_synthesis | md5sum 131 | ``` 132 | 133 | Output should match, `4a1f740298075f1726ab7caa73495342`. 134 | 135 | 136 | ## Tree 137 | 138 | 3 directories, 113 files 139 | 140 | ``` 141 | . 142 | ├── test 143 | │ └── prog_syn_test.jsonl 144 | ├── train 145 | │ ├── train_000.jsonl 146 | │ ├── train_001.jsonl 147 | │ ├── train_002.jsonl 148 | │ ├── train_003.jsonl 149 | │ ├── train_004.jsonl 150 | │ ├── train_005.jsonl 151 | │ ├── train_006.jsonl 152 | │ ├── train_007.jsonl 153 | │ ├── train_008.jsonl 154 | │ ├── train_009.jsonl 155 | │ ├── train_010.jsonl 156 | │ ├── train_011.jsonl 157 | │ ├── train_012.jsonl 158 | │ ├── train_013.jsonl 159 | │ ├── train_014.jsonl 160 | │ ├── train_015.jsonl 161 | │ ├── train_016.jsonl 162 | │ ├── train_017.jsonl 163 | │ ├── train_018.jsonl 164 | │ ├── train_019.jsonl 165 | │ ├── train_020.jsonl 166 | │ ├── train_021.jsonl 167 | │ ├── train_022.jsonl 168 | │ ├── train_023.jsonl 169 | │ ├── train_024.jsonl 170 | │ ├── train_025.jsonl 171 | │ ├── train_026.jsonl 172 | │ ├── train_027.jsonl 173 | │ ├── train_028.jsonl 174 | │ ├── train_029.jsonl 175 | │ ├── train_030.jsonl 176 | │ ├── train_031.jsonl 177 | │ ├── train_032.jsonl 178 | │ ├── train_033.jsonl 179 | │ ├── train_034.jsonl 180 | │ ├── train_035.jsonl 181 | │ ├── train_036.jsonl 182 | │ ├── train_037.jsonl 183 | │ ├── train_038.jsonl 184 | │ ├── train_039.jsonl 185 | │ ├── train_040.jsonl 186 | │ ├── train_041.jsonl 187 | │ ├── train_042.jsonl 188 | │ ├── train_043.jsonl 189 | │ ├── train_044.jsonl 190 | │ ├── train_045.jsonl 191 | │ ├── train_046.jsonl 192 | │ ├── train_047.jsonl 193 | │ ├── train_048.jsonl 194 | │ ├── train_049.jsonl 195 | │ ├── train_050.jsonl 196 | │ ├── train_051.jsonl 197 | │ ├── train_052.jsonl 198 | │ ├── train_053.jsonl 199 | │ ├── train_054.jsonl 200 | │ ├── train_055.jsonl 201 | │ ├── train_056.jsonl 202 | │ ├── train_057.jsonl 203 | │ ├── train_058.jsonl 204 | │ ├── train_059.jsonl 205 | │ ├── train_060.jsonl 206 | │ ├── train_061.jsonl 207 | │ ├── train_062.jsonl 208 | │ ├── train_063.jsonl 209 | │ ├── train_064.jsonl 210 | │ ├── train_065.jsonl 211 | │ ├── train_066.jsonl 212 | │ ├── train_067.jsonl 213 | │ ├── train_068.jsonl 214 | │ ├── train_069.jsonl 215 | │ ├── train_070.jsonl 216 | │ ├── train_071.jsonl 217 | │ ├── train_072.jsonl 218 | │ ├── train_073.jsonl 219 | │ ├── train_074.jsonl 220 | │ ├── train_075.jsonl 221 | │ ├── train_076.jsonl 222 | │ ├── train_077.jsonl 223 | │ ├── train_078.jsonl 224 | │ ├── train_079.jsonl 225 | │ ├── train_080.jsonl 226 | │ ├── train_081.jsonl 227 | │ ├── train_082.jsonl 228 | │ ├── train_083.jsonl 229 | │ ├── train_084.jsonl 230 | │ ├── train_085.jsonl 231 | │ ├── train_086.jsonl 232 | │ ├── train_087.jsonl 233 | │ ├── train_088.jsonl 234 | │ ├── train_089.jsonl 235 | │ ├── train_090.jsonl 236 | │ ├── train_091.jsonl 237 | │ ├── train_092.jsonl 238 | │ ├── train_093.jsonl 239 | │ ├── train_094.jsonl 240 | │ ├── train_095.jsonl 241 | │ ├── train_096.jsonl 242 | │ ├── train_097.jsonl 243 | │ ├── train_098.jsonl 244 | │ ├── train_099.jsonl 245 | │ ├── train_100.jsonl 246 | │ ├── train_101.jsonl 247 | │ ├── train_102.jsonl 248 | │ ├── train_103.jsonl 249 | │ ├── train_104.jsonl 250 | │ ├── train_105.jsonl 251 | │ ├── train_106.jsonl 252 | │ ├── train_107.jsonl 253 | │ ├── train_108.jsonl 254 | │ ├── train_109.jsonl 255 | │ └── train_110.jsonl 256 | └── validation 257 | └── prog_syn_val.jsonl 258 | ``` 259 | -------------------------------------------------------------------------------- /code_compilation.md: -------------------------------------------------------------------------------- 1 | # Code Compilation Task 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> code_compilation_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "code_compilation") 8 | >>> print(code_compilation_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['lang_cluster', 'source_code', 'file_name', 'compilation_error', 'difficulty', 'src_uid', 'code_uid', 'lang'], 13 | num_rows: 19915150 14 | }) 15 | validation: Dataset({ 16 | features: ['lang_cluster', 'source_code', 'file_name', 'compilation_error', 'difficulty', 'src_uid', 'code_uid', 'lang'], 17 | num_rows: 6394 18 | }) 19 | test: Dataset({ 20 | features: ['lang_cluster', 'source_code', 'file_name', 'compilation_error', 'difficulty', 'src_uid', 'code_uid', 'lang'], 21 | num_rows: 30388 22 | }) 23 | }) 24 | ``` 25 | 26 | To download the code_compilation data, 27 | 28 | ``` 29 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 30 | cd xCodeEval 31 | git lfs pull --include "code_compilation/*" 32 | git lfs pull --include "problem_descriptions.jsonl" 33 | ``` 34 | 35 | Note that `code_compilation` data is extremely large. Feel free to download a subset of data first. 36 | 37 | 38 | ## A Sample from train/validation/test split 39 | ``` 40 | { 41 | "lang": "GNU C++17", 42 | "source_code": "#include\nusing namespace std;\nchar s[200005],t[200005];\nint s1[200005],t1[200005];\nint n;\nint main()\n{\n cin>>n;\n cin>>s+1>>t+1;\n for(int i=1; i<=n; i++)\n s1[n+1-i]=s[i]-'a'+1,t1[n+1-i]=t[i]-'a'+1;\n\n for(int i=1; i<=n; i++)\n s1[i]+=t1[i];\n\n for(int i=1; i<=n; i++)\n {\n if(s1[i]>=26)\n {\n s1[i]%=26;\n s1[i+1]+=1;\n }\n }\n\n if(s1[n+1]) s1[n]+=26;\n\n\n for(int i=1; i<=n; i++)\n {\n if(!s1[i])\n {\n s1[i+1]--;\n s1[i]=26;\n }\n if(s1[i]&1)\n s1[i-1]+=26;\n\n }\n for(int i=1; i<=n; i++)\n {\n s1[i]/=2;\n if(s1[i]==0)\n s1[i]=26,s1[i+1]--;\n }\n\n for(int i=n; i>=1; i--)\n printf(\"%c\",'a'-1+s1[i]);\n\n return 0;\n}\n", 43 | "lang_cluster": "C++", 44 | "compilation_error": false, 45 | "code_uid": "c6afb1328299497e14ac949dbe60d038", 46 | "src_uid": "5f4009d4065f5ad39e662095f8f5c068", 47 | "difficulty": 1900, 48 | } 49 | ``` 50 | 51 | ## Key Definitions 52 | 53 | 54 | 1. `lang`: Runtime/Compiler version of the `source_code`. 55 | 2. `source_code`: A program. 56 | 3. `lang_cluster`: A generic programming language name the value of `lang` belongs to. 57 | 4. `compilation_error`: True/False, Indicates if the code generates a compilation error or not. 58 | 5. `code_uid`: A unique ID for the sample. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `code_uid`. 59 | 6. `src_uid`: A specific identifier that shows which problem the code is associated with. This identifier is **important** for the training of the model. The problem referred to by the `src_uid` provides a natural description of the problem that the code successfully solved. Refer to [Structure of `problem_descriptions.jsonl`](./README.md#structure-of-problem_descriptionsjsonl) 60 | 7. `difficulty`: Difficulty rating of the problem indicated by `src_uid`. The higher the harder. 61 | 62 | 63 | ## MD5 hash of the data 64 | 65 | Run the following, 66 | 67 | ``` 68 | cd xCodeEval/ 69 | tar c code_compilation | md5sum 70 | ``` 71 | 72 | Output should match, `24854d10e95089b79fca207053b5f1ae`. 73 | 74 | 75 | # Tree 76 | 77 | 3 directories, 421 files 78 | 79 | ``` 80 | . 81 | ├── test 82 | │ ├── C#.jsonl 83 | │ ├── C++.jsonl 84 | │ ├── C.jsonl 85 | │ ├── Go.jsonl 86 | │ ├── Java.jsonl 87 | │ ├── Javascript.jsonl 88 | │ ├── Kotlin.jsonl 89 | │ ├── PHP.jsonl 90 | │ ├── Python.jsonl 91 | │ ├── Ruby.jsonl 92 | │ └── Rust.jsonl 93 | ├── train 94 | │ ├── train_0000.jsonl 95 | │ ├── train_0001.jsonl 96 | │ ├── train_0002.jsonl 97 | │ ├── train_0003.jsonl 98 | │ ├── train_0004.jsonl 99 | │ ├── train_0005.jsonl 100 | │ ├── train_0006.jsonl 101 | │ ├── train_0007.jsonl 102 | │ ├── train_0008.jsonl 103 | │ ├── train_0009.jsonl 104 | │ ├── train_0010.jsonl 105 | │ ├── train_0011.jsonl 106 | │ ├── train_0012.jsonl 107 | │ ├── train_0013.jsonl 108 | │ ├── train_0014.jsonl 109 | │ ├── train_0015.jsonl 110 | │ ├── train_0016.jsonl 111 | │ ├── train_0017.jsonl 112 | │ ├── train_0018.jsonl 113 | │ ├── train_0019.jsonl 114 | │ ├── train_0020.jsonl 115 | │ ├── train_0021.jsonl 116 | │ ├── train_0022.jsonl 117 | │ ├── train_0023.jsonl 118 | │ ├── train_0024.jsonl 119 | │ ├── train_0025.jsonl 120 | │ ├── train_0026.jsonl 121 | │ ├── train_0027.jsonl 122 | │ ├── train_0028.jsonl 123 | │ ├── train_0029.jsonl 124 | │ ├── train_0030.jsonl 125 | │ ├── train_0031.jsonl 126 | │ ├── train_0032.jsonl 127 | │ ├── train_0033.jsonl 128 | │ ├── train_0034.jsonl 129 | │ ├── train_0035.jsonl 130 | │ ├── train_0036.jsonl 131 | │ ├── train_0037.jsonl 132 | │ ├── train_0038.jsonl 133 | │ ├── train_0039.jsonl 134 | │ ├── train_0040.jsonl 135 | │ ├── train_0041.jsonl 136 | │ ├── train_0042.jsonl 137 | │ ├── train_0043.jsonl 138 | │ ├── train_0044.jsonl 139 | │ ├── train_0045.jsonl 140 | │ ├── train_0046.jsonl 141 | │ ├── train_0047.jsonl 142 | │ ├── train_0048.jsonl 143 | │ ├── train_0049.jsonl 144 | │ ├── train_0050.jsonl 145 | │ ├── train_0051.jsonl 146 | │ ├── train_0052.jsonl 147 | │ ├── train_0053.jsonl 148 | │ ├── train_0054.jsonl 149 | │ ├── train_0055.jsonl 150 | │ ├── train_0056.jsonl 151 | │ ├── train_0057.jsonl 152 | │ ├── train_0058.jsonl 153 | │ ├── train_0059.jsonl 154 | │ ├── train_0060.jsonl 155 | │ ├── train_0061.jsonl 156 | │ ├── train_0062.jsonl 157 | │ ├── train_0063.jsonl 158 | │ ├── train_0064.jsonl 159 | │ ├── train_0065.jsonl 160 | │ ├── train_0066.jsonl 161 | │ ├── train_0067.jsonl 162 | │ ├── train_0068.jsonl 163 | │ ├── train_0069.jsonl 164 | │ ├── train_0070.jsonl 165 | │ ├── train_0071.jsonl 166 | │ ├── train_0072.jsonl 167 | │ ├── train_0073.jsonl 168 | │ ├── train_0074.jsonl 169 | │ ├── train_0075.jsonl 170 | │ ├── train_0076.jsonl 171 | │ ├── train_0077.jsonl 172 | │ ├── train_0078.jsonl 173 | │ ├── train_0079.jsonl 174 | │ ├── train_0080.jsonl 175 | │ ├── train_0081.jsonl 176 | │ ├── train_0082.jsonl 177 | │ ├── train_0083.jsonl 178 | │ ├── train_0084.jsonl 179 | │ ├── train_0085.jsonl 180 | │ ├── train_0086.jsonl 181 | │ ├── train_0087.jsonl 182 | │ ├── train_0088.jsonl 183 | │ ├── train_0089.jsonl 184 | │ ├── train_0090.jsonl 185 | │ ├── train_0091.jsonl 186 | │ ├── train_0092.jsonl 187 | │ ├── train_0093.jsonl 188 | │ ├── train_0094.jsonl 189 | │ ├── train_0095.jsonl 190 | │ ├── train_0096.jsonl 191 | │ ├── train_0097.jsonl 192 | │ ├── train_0098.jsonl 193 | │ ├── train_0099.jsonl 194 | │ ├── train_0100.jsonl 195 | │ ├── train_0101.jsonl 196 | │ ├── train_0102.jsonl 197 | │ ├── train_0103.jsonl 198 | │ ├── train_0104.jsonl 199 | │ ├── train_0105.jsonl 200 | │ ├── train_0106.jsonl 201 | │ ├── train_0107.jsonl 202 | │ ├── train_0108.jsonl 203 | │ ├── train_0109.jsonl 204 | │ ├── train_0110.jsonl 205 | │ ├── train_0111.jsonl 206 | │ ├── train_0112.jsonl 207 | │ ├── train_0113.jsonl 208 | │ ├── train_0114.jsonl 209 | │ ├── train_0115.jsonl 210 | │ ├── train_0116.jsonl 211 | │ ├── train_0117.jsonl 212 | │ ├── train_0118.jsonl 213 | │ ├── train_0119.jsonl 214 | │ ├── train_0120.jsonl 215 | │ ├── train_0121.jsonl 216 | │ ├── train_0122.jsonl 217 | │ ├── train_0123.jsonl 218 | │ ├── train_0124.jsonl 219 | │ ├── train_0125.jsonl 220 | │ ├── train_0126.jsonl 221 | │ ├── train_0127.jsonl 222 | │ ├── train_0128.jsonl 223 | │ ├── train_0129.jsonl 224 | │ ├── train_0130.jsonl 225 | │ ├── train_0131.jsonl 226 | │ ├── train_0132.jsonl 227 | │ ├── train_0133.jsonl 228 | │ ├── train_0134.jsonl 229 | │ ├── train_0135.jsonl 230 | │ ├── train_0136.jsonl 231 | │ ├── train_0137.jsonl 232 | │ ├── train_0138.jsonl 233 | │ ├── train_0139.jsonl 234 | │ ├── train_0140.jsonl 235 | │ ├── train_0141.jsonl 236 | │ ├── train_0142.jsonl 237 | │ ├── train_0143.jsonl 238 | │ ├── train_0144.jsonl 239 | │ ├── train_0145.jsonl 240 | │ ├── train_0146.jsonl 241 | │ ├── train_0147.jsonl 242 | │ ├── train_0148.jsonl 243 | │ ├── train_0149.jsonl 244 | │ ├── train_0150.jsonl 245 | │ ├── train_0151.jsonl 246 | │ ├── train_0152.jsonl 247 | │ ├── train_0153.jsonl 248 | │ ├── train_0154.jsonl 249 | │ ├── train_0155.jsonl 250 | │ ├── train_0156.jsonl 251 | │ ├── train_0157.jsonl 252 | │ ├── train_0158.jsonl 253 | │ ├── train_0159.jsonl 254 | │ ├── train_0160.jsonl 255 | │ ├── train_0161.jsonl 256 | │ ├── train_0162.jsonl 257 | │ ├── train_0163.jsonl 258 | │ ├── train_0164.jsonl 259 | │ ├── train_0165.jsonl 260 | │ ├── train_0166.jsonl 261 | │ ├── train_0167.jsonl 262 | │ ├── train_0168.jsonl 263 | │ ├── train_0169.jsonl 264 | │ ├── train_0170.jsonl 265 | │ ├── train_0171.jsonl 266 | │ ├── train_0172.jsonl 267 | │ ├── train_0173.jsonl 268 | │ ├── train_0174.jsonl 269 | │ ├── train_0175.jsonl 270 | │ ├── train_0176.jsonl 271 | │ ├── train_0177.jsonl 272 | │ ├── train_0178.jsonl 273 | │ ├── train_0179.jsonl 274 | │ ├── train_0180.jsonl 275 | │ ├── train_0181.jsonl 276 | │ ├── train_0182.jsonl 277 | │ ├── train_0183.jsonl 278 | │ ├── train_0184.jsonl 279 | │ ├── train_0185.jsonl 280 | │ ├── train_0186.jsonl 281 | │ ├── train_0187.jsonl 282 | │ ├── train_0188.jsonl 283 | │ ├── train_0189.jsonl 284 | │ ├── train_0190.jsonl 285 | │ ├── train_0191.jsonl 286 | │ ├── train_0192.jsonl 287 | │ ├── train_0193.jsonl 288 | │ ├── train_0194.jsonl 289 | │ ├── train_0195.jsonl 290 | │ ├── train_0196.jsonl 291 | │ ├── train_0197.jsonl 292 | │ ├── train_0198.jsonl 293 | │ ├── train_0199.jsonl 294 | │ ├── train_0200.jsonl 295 | │ ├── train_0201.jsonl 296 | │ ├── train_0202.jsonl 297 | │ ├── train_0203.jsonl 298 | │ ├── train_0204.jsonl 299 | │ ├── train_0205.jsonl 300 | │ ├── train_0206.jsonl 301 | │ ├── train_0207.jsonl 302 | │ ├── train_0208.jsonl 303 | │ ├── train_0209.jsonl 304 | │ ├── train_0210.jsonl 305 | │ ├── train_0211.jsonl 306 | │ ├── train_0212.jsonl 307 | │ ├── train_0213.jsonl 308 | │ ├── train_0214.jsonl 309 | │ ├── train_0215.jsonl 310 | │ ├── train_0216.jsonl 311 | │ ├── train_0217.jsonl 312 | │ ├── train_0218.jsonl 313 | │ ├── train_0219.jsonl 314 | │ ├── train_0220.jsonl 315 | │ ├── train_0221.jsonl 316 | │ ├── train_0222.jsonl 317 | │ ├── train_0223.jsonl 318 | │ ├── train_0224.jsonl 319 | │ ├── train_0225.jsonl 320 | │ ├── train_0226.jsonl 321 | │ ├── train_0227.jsonl 322 | │ ├── train_0228.jsonl 323 | │ ├── train_0229.jsonl 324 | │ ├── train_0230.jsonl 325 | │ ├── train_0231.jsonl 326 | │ ├── train_0232.jsonl 327 | │ ├── train_0233.jsonl 328 | │ ├── train_0234.jsonl 329 | │ ├── train_0235.jsonl 330 | │ ├── train_0236.jsonl 331 | │ ├── train_0237.jsonl 332 | │ ├── train_0238.jsonl 333 | │ ├── train_0239.jsonl 334 | │ ├── train_0240.jsonl 335 | │ ├── train_0241.jsonl 336 | │ ├── train_0242.jsonl 337 | │ ├── train_0243.jsonl 338 | │ ├── train_0244.jsonl 339 | │ ├── train_0245.jsonl 340 | │ ├── train_0246.jsonl 341 | │ ├── train_0247.jsonl 342 | │ ├── train_0248.jsonl 343 | │ ├── train_0249.jsonl 344 | │ ├── train_0250.jsonl 345 | │ ├── train_0251.jsonl 346 | │ ├── train_0252.jsonl 347 | │ ├── train_0253.jsonl 348 | │ ├── train_0254.jsonl 349 | │ ├── train_0255.jsonl 350 | │ ├── train_0256.jsonl 351 | │ ├── train_0257.jsonl 352 | │ ├── train_0258.jsonl 353 | │ ├── train_0259.jsonl 354 | │ ├── train_0260.jsonl 355 | │ ├── train_0261.jsonl 356 | │ ├── train_0262.jsonl 357 | │ ├── train_0263.jsonl 358 | │ ├── train_0264.jsonl 359 | │ ├── train_0265.jsonl 360 | │ ├── train_0266.jsonl 361 | │ ├── train_0267.jsonl 362 | │ ├── train_0268.jsonl 363 | │ ├── train_0269.jsonl 364 | │ ├── train_0270.jsonl 365 | │ ├── train_0271.jsonl 366 | │ ├── train_0272.jsonl 367 | │ ├── train_0273.jsonl 368 | │ ├── train_0274.jsonl 369 | │ ├── train_0275.jsonl 370 | │ ├── train_0276.jsonl 371 | │ ├── train_0277.jsonl 372 | │ ├── train_0278.jsonl 373 | │ ├── train_0279.jsonl 374 | │ ├── train_0280.jsonl 375 | │ ├── train_0281.jsonl 376 | │ ├── train_0282.jsonl 377 | │ ├── train_0283.jsonl 378 | │ ├── train_0284.jsonl 379 | │ ├── train_0285.jsonl 380 | │ ├── train_0286.jsonl 381 | │ ├── train_0287.jsonl 382 | │ ├── train_0288.jsonl 383 | │ ├── train_0289.jsonl 384 | │ ├── train_0290.jsonl 385 | │ ├── train_0291.jsonl 386 | │ ├── train_0292.jsonl 387 | │ ├── train_0293.jsonl 388 | │ ├── train_0294.jsonl 389 | │ ├── train_0295.jsonl 390 | │ ├── train_0296.jsonl 391 | │ ├── train_0297.jsonl 392 | │ ├── train_0298.jsonl 393 | │ ├── train_0299.jsonl 394 | │ ├── train_0300.jsonl 395 | │ ├── train_0301.jsonl 396 | │ ├── train_0302.jsonl 397 | │ ├── train_0303.jsonl 398 | │ ├── train_0304.jsonl 399 | │ ├── train_0305.jsonl 400 | │ ├── train_0306.jsonl 401 | │ ├── train_0307.jsonl 402 | │ ├── train_0308.jsonl 403 | │ ├── train_0309.jsonl 404 | │ ├── train_0310.jsonl 405 | │ ├── train_0311.jsonl 406 | │ ├── train_0312.jsonl 407 | │ ├── train_0313.jsonl 408 | │ ├── train_0314.jsonl 409 | │ ├── train_0315.jsonl 410 | │ ├── train_0316.jsonl 411 | │ ├── train_0317.jsonl 412 | │ ├── train_0318.jsonl 413 | │ ├── train_0319.jsonl 414 | │ ├── train_0320.jsonl 415 | │ ├── train_0321.jsonl 416 | │ ├── train_0322.jsonl 417 | │ ├── train_0323.jsonl 418 | │ ├── train_0324.jsonl 419 | │ ├── train_0325.jsonl 420 | │ ├── train_0326.jsonl 421 | │ ├── train_0327.jsonl 422 | │ ├── train_0328.jsonl 423 | │ ├── train_0329.jsonl 424 | │ ├── train_0330.jsonl 425 | │ ├── train_0331.jsonl 426 | │ ├── train_0332.jsonl 427 | │ ├── train_0333.jsonl 428 | │ ├── train_0334.jsonl 429 | │ ├── train_0335.jsonl 430 | │ ├── train_0336.jsonl 431 | │ ├── train_0337.jsonl 432 | │ ├── train_0338.jsonl 433 | │ ├── train_0339.jsonl 434 | │ ├── train_0340.jsonl 435 | │ ├── train_0341.jsonl 436 | │ ├── train_0342.jsonl 437 | │ ├── train_0343.jsonl 438 | │ ├── train_0344.jsonl 439 | │ ├── train_0345.jsonl 440 | │ ├── train_0346.jsonl 441 | │ ├── train_0347.jsonl 442 | │ ├── train_0348.jsonl 443 | │ ├── train_0349.jsonl 444 | │ ├── train_0350.jsonl 445 | │ ├── train_0351.jsonl 446 | │ ├── train_0352.jsonl 447 | │ ├── train_0353.jsonl 448 | │ ├── train_0354.jsonl 449 | │ ├── train_0355.jsonl 450 | │ ├── train_0356.jsonl 451 | │ ├── train_0357.jsonl 452 | │ ├── train_0358.jsonl 453 | │ ├── train_0359.jsonl 454 | │ ├── train_0360.jsonl 455 | │ ├── train_0361.jsonl 456 | │ ├── train_0362.jsonl 457 | │ ├── train_0363.jsonl 458 | │ ├── train_0364.jsonl 459 | │ ├── train_0365.jsonl 460 | │ ├── train_0366.jsonl 461 | │ ├── train_0367.jsonl 462 | │ ├── train_0368.jsonl 463 | │ ├── train_0369.jsonl 464 | │ ├── train_0370.jsonl 465 | │ ├── train_0371.jsonl 466 | │ ├── train_0372.jsonl 467 | │ ├── train_0373.jsonl 468 | │ ├── train_0374.jsonl 469 | │ ├── train_0375.jsonl 470 | │ ├── train_0376.jsonl 471 | │ ├── train_0377.jsonl 472 | │ ├── train_0378.jsonl 473 | │ ├── train_0379.jsonl 474 | │ ├── train_0380.jsonl 475 | │ ├── train_0381.jsonl 476 | │ ├── train_0382.jsonl 477 | │ ├── train_0383.jsonl 478 | │ ├── train_0384.jsonl 479 | │ ├── train_0385.jsonl 480 | │ ├── train_0386.jsonl 481 | │ ├── train_0387.jsonl 482 | │ ├── train_0388.jsonl 483 | │ ├── train_0389.jsonl 484 | │ ├── train_0390.jsonl 485 | │ ├── train_0391.jsonl 486 | │ ├── train_0392.jsonl 487 | │ ├── train_0393.jsonl 488 | │ ├── train_0394.jsonl 489 | │ ├── train_0395.jsonl 490 | │ ├── train_0396.jsonl 491 | │ ├── train_0397.jsonl 492 | │ └── train_0398.jsonl 493 | └── validation 494 | ├── C#.jsonl 495 | ├── C++.jsonl 496 | ├── C.jsonl 497 | ├── Go.jsonl 498 | ├── Java.jsonl 499 | ├── Javascript.jsonl 500 | ├── Kotlin.jsonl 501 | ├── PHP.jsonl 502 | ├── Python.jsonl 503 | ├── Ruby.jsonl 504 | └── Rust.jsonl 505 | ``` 506 | -------------------------------------------------------------------------------- /apr.md: -------------------------------------------------------------------------------- 1 | # Automatic Program Repair (APR) Task 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> apr_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "apr") 8 | >>> print(apr_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['fix_source_code', 'prob_desc_memory_limit', 'bug_source_code', 'similarity_score', 'difficulty', 'prob_desc_input_from', 'prob_desc_output_to', 'prob_desc_description', 'equal_cnt', 'lang', 'fix_code_uid', 'bug_exec_outcome', 'prob_desc_sample_inputs', 'bug_code_uid', 'prob_desc_output_spec', 'prob_desc_input_spec', 'insert_cnt', 'replace_cnt', 'prob_desc_created_at', 'fix_ops_cnt', 'prob_desc_time_limit', 'lang_cluster', 'delete_cnt', 'potential_dominant_fix_op', 'src_uid', 'prob_desc_sample_outputs', 'apr_id', 'tags', 'file_name', 'prob_desc_notes', 'fix_exec_outcome', 'hidden_unit_tests'], 13 | num_rows: 4672070 14 | }) 15 | validation: Dataset({ 16 | features: ['fix_source_code', 'prob_desc_memory_limit', 'bug_source_code', 'similarity_score', 'difficulty', 'prob_desc_input_from', 'prob_desc_output_to', 'prob_desc_description', 'equal_cnt', 'lang', 'fix_code_uid', 'bug_exec_outcome', 'prob_desc_sample_inputs', 'bug_code_uid', 'prob_desc_output_spec', 'prob_desc_input_spec', 'insert_cnt', 'replace_cnt', 'prob_desc_created_at', 'fix_ops_cnt', 'prob_desc_time_limit', 'lang_cluster', 'delete_cnt', 'potential_dominant_fix_op', 'src_uid', 'prob_desc_sample_outputs', 'apr_id', 'tags', 'file_name', 'prob_desc_notes', 'fix_exec_outcome', 'hidden_unit_tests'], 17 | num_rows: 5068 18 | }) 19 | test: Dataset({ 20 | features: ['fix_source_code', 'prob_desc_memory_limit', 'bug_source_code', 'similarity_score', 'difficulty', 'prob_desc_input_from', 'prob_desc_output_to', 'prob_desc_description', 'equal_cnt', 'lang', 'fix_code_uid', 'bug_exec_outcome', 'prob_desc_sample_inputs', 'bug_code_uid', 'prob_desc_output_spec', 'prob_desc_input_spec', 'insert_cnt', 'replace_cnt', 'prob_desc_created_at', 'fix_ops_cnt', 'prob_desc_time_limit', 'lang_cluster', 'delete_cnt', 'potential_dominant_fix_op', 'src_uid', 'prob_desc_sample_outputs', 'apr_id', 'tags', 'file_name', 'prob_desc_notes', 'fix_exec_outcome', 'hidden_unit_tests'], 21 | num_rows: 17699 22 | }) 23 | }) 24 | ``` 25 | 26 | 27 | ## Download data using git LFS 28 | 29 | When loading with huggingface `load_dataset()` api, no need to do additional data linking. But if you are donwloading data in raw `*.jsonl` format, you need to link proper field for the task. To link the data, use `src_uid` to match row from `problem_descriptions.jsonl` and `unittest_db.json`. 30 | 31 | 32 | To download the automatic program repair data, 33 | 34 | ``` 35 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 36 | cd xCodeEval 37 | git lfs pull --include "apr/*" 38 | git lfs pull --include "problem_descriptions.jsonl" 39 | git lfs pull --include "unittest_db.json" 40 | ``` 41 | 42 | 43 | 44 | ## A data from train split 45 | ``` 46 | { 47 | "similarity_score": 0.6323809523809524, 48 | "equal_cnt": 30, 49 | "replace_cnt": 8, 50 | "delete_cnt": 0, 51 | "insert_cnt": 21, 52 | "fix_ops_cnt": 29, 53 | "bug_source_code": "#include\r\n\r\nusing namespace std;\r\nint main()\r\n{\r\n\tlong long int t;\r\n\tcin>>t;\r\n\twhile(t--)\r\n\t{\r\n\t\tlong long int n,co;co=0;\r\n\t\tcin>>n;long long int yy=0;long long int mm=0;\r\n\t\tlong long int s[n],b[n],c[n];\r\n\t\tfor (long long int i = 0; i < n; i++)\r\n\t\t{\r\n\t\t\tcin>>s[i];\r\n\t\t\tb[i]=s[i];\r\n\t\t\tc[i]=s[i];\r\n\t\t\tif(yy==0){if(c[i]==0)\r\n\t\t\t{\r\n\t\t\t\tc[i]=1;yy=1;\r\n\t\t\t}\r\n\t\t}\r\n}\r\n\t\tlong long int l;\r\n\t\tmm=0;\r\n\t\tfor (long long int i = n-1; i>=0; i--)\r\n\t\t{\r\n\t\t\tif(b[i]==1)\r\n\t\t\t{\r\n\t\t\t\tb[i]=0;l=i;break;\r\n\t\t\t}mm++;\r\n\t\t}\r\n\t\tlong long int j=0;long long int m=0;long long int op=0;long long int po=0;\r\n\t\tfor (long long int i = 0; i < n; i++)\r\n\t\t{\r\n\t\t\tj=m;\r\n\t\t\tfor (; j < n; j++)\r\n\t\t\t{\r\n\t\t\t\tif(s[i]>s[j])\r\n\t\t\t\t{\r\n\t\t\t\t\tco++;\r\n\t\t\t\t}if(b[i]>b[j])\r\n\t\t\t\t{\r\n\t\t\t\t\top++;\r\n\t\t\t\t}\t\r\n\t\t\t\tif(c[i]>c[j])\r\n\t\t\t\t{\r\n\t\t\t\t\tpo++;\r\n\t\t\t\t}\t\r\n\t\t\t}m++;\r\n\t\t}\r\n\t\tlong long int big = co > po ? (co > op ? co : op) : (po > op ? po : op) ;\r\n\tcout<\r\n\r\nusing namespace std;\r\nint main()\r\n{\r\n\tlong long int t;\r\n\tcin>>t;\r\n\twhile(t--)\r\n\t{\r\n\t\tlong long int n,co;co=0;\r\n\t\tcin>>n;long long int yy=0;long long int mm=0;\r\n\t\tlong long int s[n],b[n],c[n],s1[n+2],b1[n+2],c1[n+2];s1[0]=0;b1[0]=0;c1[0]=0;c1[n+1]=0;b1[n+1]=0;s1[n+1]=0;s1[1]=0;b1[1]=0;c1[1]=0;\r\n\t\tfor (long long int i = 0; i < n; i++)\r\n\t\t{\r\n\t\t\t// s1[i+1]=0;b1[i+1]=0;c1[i+1]=0;\r\n\t\t\tcin>>s[i];s1[i+1]=s1[i]+s[i];\r\n\t\t\tb[i]=s[i];\r\n\t\t\tc[i]=s[i];\r\n\t\t\tif(yy==0)\r\n\t\t\t{\r\n\t\t\t\t\tif(c[i]==0)\r\n\t\t\t\t{\r\n\t\t\t\t\tc[i]=1;yy=1;\r\n\t\t\t\t}\r\n\t\t\t}\r\n\t\t\t\tc1[i+1]=c1[i]+c[i];\r\n\t\t\t\r\n\t\t\t\t\r\n\t\t}\r\n\t\tlong long int l;\r\n\t\tmm=0;\r\n\t\tfor (long long int i = n-1; i>=0; i--)\r\n\t\t{\r\n\t\t\tif(b[i]==1)\r\n\t\t\t{\r\n\t\t\t\tb[i]=0;l=i;break;\r\n\t\t\t}mm++;\r\n\t\t}\r\n\t\tfor (int i = 0; i < n; ++i)\r\n\t\t{\r\n\t\t\tb1[i+1]=b1[i]+b[i];\r\n\t\t}\r\n\t\tlong long int j=0;long long int m=0;long long int op=0;long long int po=0;\r\n\t\tfor (int i = 0; i < n; ++i)\r\n\t\t{\r\n\t\t\tif(s[i]==1){co=co-(s1[n]-s1[i+1])+n-i-1;}//\r\n\t\t\tif(b[i]==1){po=po-b1[n]+b1[i+1]+n-i-1;}\r\n\t\t\tif(c[i]==1){op=op-c1[n]+c1[i+1]+n-i-1;}//cout<s[j])\r\n\t\t// \t\t{\r\n\t\t// \t\t\tco++;\r\n\t\t// \t\t}if(b[i]>b[j])\r\n\t\t// \t\t{\r\n\t\t// \t\t\top++;\r\n\t\t// \t\t}\t\r\n\t\t// \t\tif(c[i]>c[j])\r\n\t\t// \t\t{\r\n\t\t// \t\t\tpo++;\r\n\t\t// \t\t}\t\r\n\t\t// \t}m++;\r\n\t\t// }\r\n\t\t// for (int i = 0; i < n; ++i)\r\n\t\t// {\r\n\t\t// \t code cout< po ? (co > op ? co : op) : (po > op ? po : op) ;\r\n\tcout<\nusing namespace std;\n\nstruct aho {\n int go[2], link, back;\n};\n\naho suf[1600], pr[40];\nstring sufs[1600], prs[40];\n\nvoid add_string(aho m[], string ms[], int &top, string s) {\n int u = 0;\n for (int i = 0; i < s.size(); ++i) {\n if (m[u].go[s[i]] == 0) {\n ++top;\n m[u].go[s[i]] = top;\n ms[top] = ms[u] + s[i];\n }\n u = m[u].go[s[i]];\n }\n}\n\nvoid build_aho(aho m[]) {\n m[0].link = 0;\n vector v = {0};\n for (int i = 0; i < v.size(); ++i) {\n for (int j = 0; j < 2; ++j)\n if (m[v[i]].go[j] != 0) {\n if (i == 0)\n m[m[v[i]].go[j]].link = 0;\n else\n m[m[v[i]].go[j]].link = m[m[v[i]].link].go[j];\n v.push_back(m[v[i]].go[j]);\n } else\n m[v[i]].go[j] = m[m[v[i]].link].go[j];\n }\n}\nint find_string(aho m[], string s) {\n int u = 0;\n for (int i = 0; i < s.size(); ++i)\n u = m[u].go[s[i]];\n return u;\n}\n\nstring cnt(string s) {\n for (int i = 0; i < s.size(); ++i)\n s[i] += '0';\n return s;\n}\n\nlong long dp[41][41][1600][2];\n\nint main() {\n int i, j, k, n, top, l;\n string s;\n //freopen(\"input.txt\", \"r\", stdin);\n //freopen(\"output.txt\", \"w\", stdout);\n //ios_base::sync_with_stdio(false); cin.tie(0); cout.tie(0);\n cin >> n;\n cin >> s;\n for (i = 0; i < s.size(); ++i)\n s[i] -= '0';\n if (s.size() > n) {\n for (i = 0; i < s.size() - n; ++i)\n if (s[i] != s[i + n]) {\n cout << 0;\n return 0;\n }\n cout << 1;\n return 0;\n }\n add_string(pr, prs, j = 0, s);\n build_aho(pr);\n k = 0;\n for (int i = 0; i < s.size(); ++i) {\n string s1 = \"\";\n for (int j = i; j < s.size(); ++j)\n s1 += s[j];\n add_string(suf, sufs, k, s1);\n }\n build_aho(suf);\n dp[0][0][0][0] = 1;\n for (i = 0; i < n; ++i)\n for (j = 0; j < s.size(); ++j)\n for (l = 0; l <= k; ++l)\n if (l != s.size()) {\n for (int i1 = 0; i1 < 2; ++i1)\n dp[i + 1][pr[j].go[i1]][l][1] += dp[i][j][l][1];\n for (int i1 = 0; i1 < 2; ++i1)\n if (sufs[suf[l].go[i1]].size() <= sufs[l].size())\n dp[i + 1][pr[j].go[i1]][l][1] += dp[i][j][l][0];\n else {\n dp[i + 1][pr[j].go[i1]][suf[l].go[i1]][0] += dp[i][j][l][0];\n }\n }\n long long ans = 1;\n for (i = 0; i < n; ++i)\n ans *= 2;\n for (i = 0; i <= s.size(); ++i)\n for (j = 0; j <= k; ++j) {\n string s1 = prs[i] + sufs[j];\n int u = 0;\n bool flag = true;\n for (l = 0; l < s1.size(); ++l) {\n u = suf[u].go[s1[l]];\n if (u == s.size())\n flag = false;\n }\n if (flag)\n ans -= dp[n][i][j][0] + dp[n][i][j][1];\n }\n cout << ans;\n}\n", 84 | "lang": "GNU C++17", 85 | "bug_code_uid": "b272efc49e5fe8e487b835410c053bc8", 86 | "src_uid": "0034806908c9794086736a2d07fc654c", 87 | "apr_id": "6f93ba2ec792482a3d847a0800a20e52", 88 | "difficulty": 2900, 89 | "tags": [ 90 | "dp", 91 | "strings" 92 | ], 93 | "bug_exec_outcome": "MEMORY_LIMIT_EXCEEDED", 94 | "potential_dominant_fix_op": "replace", 95 | "lang_cluster": "C++" 96 | } 97 | ``` 98 | 99 | ## Key Definitions 100 | 101 | 1. `similarity_score`: A similarity score between `bug_source_code` and `fix_source_code` given by `difflib`. 102 | 2. `equal_cnt`: A metric comparing `bug_source_code` and `fix_source_code`. Recommended by `difflib`. 103 | 3. `replace_cnt`: A metric comparing `bug_source_code` and `fix_source_code`. Recommended by `difflib`. 104 | 4. `delete_cnt`: A metric comparing `bug_source_code` and `fix_source_code`. Recommended by `difflib`. 105 | 5. `insert_cnt`: A metric comparing `bug_source_code` and `fix_source_code`. Recommended by `difflib`. 106 | 6. `fix_ops_cnt`: A metric comparing `bug_source_code` and `fix_source_code`. Recommended by `difflib`. 107 | 7. `bug_source_code`: Buggy code. 108 | 8. `fix_source_code`: A potential fix of the buggy code that passed all the unit tests. 109 | 9. `lang`: Runtime/Compiler version of the `source_code`. 110 | 10. `fix_code_uid`: A unique ID for the fix code. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `fix_code_uid`. 111 | 11. `bug_code_uid`: A unique ID for the buggy code. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `bug_code_uid`. 112 | 12. `src_uid`: A specific identifier that shows which problem the code is associated with. This identifier is **important** for the training of the model. The problem referred to by the `src_uid` provides a natural description of the problem that the code successfully solved. Refer to [Structure of `problem_descriptions.jsonl`](./README.md#structure-of-problem_descriptionsjsonl) 113 | 13. `apr_id`: A unique ID for the apr sample. It is not important for model training. If you find any issue with the sample, you can report it to us mentioning the `apr_id`. 114 | 14. `difficulty`: Difficulty rating of the problem indicated by `src_uid`. The higher the harder. 115 | 15. `tags`: List of potential algorithmic techniques required to write the program. 116 | 16. `bug_exec_outcome`: A pre-run execution outcome of `bug_source_code`. Follow [Section 4.1](https://arxiv.org/pdf/2303.03004.pdf) to know the potential list of outcomes. The `exec_outcome` flags in the training data comes from a pre-run environmeent. However, training data doesn't includes unit-test to avoid potential hacks. We provide unit test for only dev and test data. 117 | 17. `fix_exec_outcome`: A pre-run execution outcome of `fix_source_code`. Follow [Section 4.1](https://arxiv.org/pdf/2303.03004.pdf) to know the potential list of outcomes. The `exec_outcome` flags in the training data comes from a pre-run environmeent. However, training data doesn't includes unit-test to avoid potential hacks. We provide unit test for only dev and test data. 118 | 18. `potential_dominant_fix_op`: A potential fix op recommended by difflib. 119 | 19. `lang_cluster`: A generic programming language name the value of `lang` belongs to. 120 | 121 | The following keys will come from `problem_descriptions.jsonl` by matching `src_uid`, 122 | 123 | 20. `prob_desc_description`: Problem description in textual format, math operations are written in latex. 124 | 21. `prob_desc_input_from`: How the program should take the unit test. 125 | 22. `prob_desc_output_to`: Where the program should output the result of the unit test. 126 | 23. `prob_desc_time_limit`: Time limit to solve the problem. 127 | 24. `prob_desc_memory_limit`: Memory limit to solve the problem. 128 | 25. `prob_desc_input_spec`: How and in what order the input will be given to the program? It also includes the date range, types, and sizes. 129 | 26. `prob_desc_output_spec`: How the outputs should be printed. Most of the time the unit test results are matched with an *exact string match* or *floating point comparison* with a precision boundary. 130 | 27. `prob_desc_sample_inputs`: A sample input for the code that is expected to solve the problem described in `description`. 131 | 28. `prob_desc_sample_outputs`: The expected output for the `sample_input` that is expected to solve the problem described in `description`. 132 | 29. `prob_desc_notes`: Explanation of `sample_inputs` & `sample_outputs`. 133 | 30. `prob_desc_created_at`: The Unix timestamp when the problem was released. Use `datetime` lib in Python to parse it to a human-readable format. 134 | 135 | source information will come from the name of the `*.jsonl` file name. 136 | 31. `file_name`: Name of the source jsonl file from where data is loaded. 137 | 138 | Unit test information will come from `unittest_db.json` by matching `src_uid`. 139 | 32. `hidden_unit_tests`: a list of unit tests returned as string. use `json.loads(hidden_unit_tests)` to load the data. 140 | 141 | ## MD5 hash of the data 142 | 143 | Run the following, 144 | 145 | ``` 146 | cd xCodeEval/ 147 | tar c apr | md5sum 148 | ``` 149 | 150 | Output should match, `5cba33fa21d64cf3ba190744ab49f0da`. 151 | 152 | 153 | ## Tree 154 | 155 | 3 directories, 116 files 156 | 157 | ``` 158 | . 159 | ├── test 160 | │ ├── C#.jsonl 161 | │ ├── C++.jsonl 162 | │ ├── C.jsonl 163 | │ ├── Go.jsonl 164 | │ ├── Java.jsonl 165 | │ ├── Javascript.jsonl 166 | │ ├── Kotlin.jsonl 167 | │ ├── PHP.jsonl 168 | │ ├── Python.jsonl 169 | │ ├── Ruby.jsonl 170 | │ └── Rust.jsonl 171 | ├── train 172 | │ ├── train_000.jsonl 173 | │ ├── train_001.jsonl 174 | │ ├── train_002.jsonl 175 | │ ├── train_003.jsonl 176 | │ ├── train_004.jsonl 177 | │ ├── train_005.jsonl 178 | │ ├── train_006.jsonl 179 | │ ├── train_007.jsonl 180 | │ ├── train_008.jsonl 181 | │ ├── train_009.jsonl 182 | │ ├── train_010.jsonl 183 | │ ├── train_011.jsonl 184 | │ ├── train_012.jsonl 185 | │ ├── train_013.jsonl 186 | │ ├── train_014.jsonl 187 | │ ├── train_015.jsonl 188 | │ ├── train_016.jsonl 189 | │ ├── train_017.jsonl 190 | │ ├── train_018.jsonl 191 | │ ├── train_019.jsonl 192 | │ ├── train_020.jsonl 193 | │ ├── train_021.jsonl 194 | │ ├── train_022.jsonl 195 | │ ├── train_023.jsonl 196 | │ ├── train_024.jsonl 197 | │ ├── train_025.jsonl 198 | │ ├── train_026.jsonl 199 | │ ├── train_027.jsonl 200 | │ ├── train_028.jsonl 201 | │ ├── train_029.jsonl 202 | │ ├── train_030.jsonl 203 | │ ├── train_031.jsonl 204 | │ ├── train_032.jsonl 205 | │ ├── train_033.jsonl 206 | │ ├── train_034.jsonl 207 | │ ├── train_035.jsonl 208 | │ ├── train_036.jsonl 209 | │ ├── train_037.jsonl 210 | │ ├── train_038.jsonl 211 | │ ├── train_039.jsonl 212 | │ ├── train_040.jsonl 213 | │ ├── train_041.jsonl 214 | │ ├── train_042.jsonl 215 | │ ├── train_043.jsonl 216 | │ ├── train_044.jsonl 217 | │ ├── train_045.jsonl 218 | │ ├── train_046.jsonl 219 | │ ├── train_047.jsonl 220 | │ ├── train_048.jsonl 221 | │ ├── train_049.jsonl 222 | │ ├── train_050.jsonl 223 | │ ├── train_051.jsonl 224 | │ ├── train_052.jsonl 225 | │ ├── train_053.jsonl 226 | │ ├── train_054.jsonl 227 | │ ├── train_055.jsonl 228 | │ ├── train_056.jsonl 229 | │ ├── train_057.jsonl 230 | │ ├── train_058.jsonl 231 | │ ├── train_059.jsonl 232 | │ ├── train_060.jsonl 233 | │ ├── train_061.jsonl 234 | │ ├── train_062.jsonl 235 | │ ├── train_063.jsonl 236 | │ ├── train_064.jsonl 237 | │ ├── train_065.jsonl 238 | │ ├── train_066.jsonl 239 | │ ├── train_067.jsonl 240 | │ ├── train_068.jsonl 241 | │ ├── train_069.jsonl 242 | │ ├── train_070.jsonl 243 | │ ├── train_071.jsonl 244 | │ ├── train_072.jsonl 245 | │ ├── train_073.jsonl 246 | │ ├── train_074.jsonl 247 | │ ├── train_075.jsonl 248 | │ ├── train_076.jsonl 249 | │ ├── train_077.jsonl 250 | │ ├── train_078.jsonl 251 | │ ├── train_079.jsonl 252 | │ ├── train_080.jsonl 253 | │ ├── train_081.jsonl 254 | │ ├── train_082.jsonl 255 | │ ├── train_083.jsonl 256 | │ ├── train_084.jsonl 257 | │ ├── train_085.jsonl 258 | │ ├── train_086.jsonl 259 | │ ├── train_087.jsonl 260 | │ ├── train_088.jsonl 261 | │ ├── train_089.jsonl 262 | │ ├── train_090.jsonl 263 | │ ├── train_091.jsonl 264 | │ ├── train_092.jsonl 265 | │ └── train_093.jsonl 266 | └── validation 267 | ├── C#.jsonl 268 | ├── C++.jsonl 269 | ├── C.jsonl 270 | ├── Go.jsonl 271 | ├── Java.jsonl 272 | ├── Javascript.jsonl 273 | ├── Kotlin.jsonl 274 | ├── PHP.jsonl 275 | ├── Python.jsonl 276 | ├── Ruby.jsonl 277 | └── Rust.jsonl 278 | ``` -------------------------------------------------------------------------------- /retrieval.md: -------------------------------------------------------------------------------- 1 | # Code Retrieval 2 | 3 | ## Download data using huggingface `load_dataset()` 4 | 5 | ``` 6 | >>> import datasets 7 | >>> retrieval_code_code_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_code_code") 8 | >>> print(retrieval_code_code_dataset) 9 | 10 | DatasetDict({ 11 | train: Dataset({ 12 | features: ['negative_code', 'positive_code', 'source_code', 'file_name', 'src_uid'], 13 | num_rows: 50706 14 | }) 15 | validation: Dataset({ 16 | features: ['negative_code', 'positive_code', 'source_code', 'file_name', 'src_uid'], 17 | num_rows: 2535 18 | }) 19 | test: Dataset({ 20 | features: ['negative_code', 'positive_code', 'source_code', 'file_name', 'src_uid'], 21 | num_rows: 10044 22 | }) 23 | }) 24 | 25 | >>> retrieval_nl_code_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_nl_code") 26 | >>> print(retrieval_nl_code_dataset) 27 | 28 | DatasetDict({ 29 | train: Dataset({ 30 | features: ['positive_code', 'negative_code', 'file_name', 'nl', 'src_uid'], 31 | num_rows: 61898 32 | }) 33 | validation: Dataset({ 34 | features: ['positive_code', 'negative_code', 'file_name', 'nl', 'src_uid'], 35 | num_rows: 2900 36 | }) 37 | test: Dataset({ 38 | features: ['positive_code', 'negative_code', 'file_name', 'nl', 'src_uid'], 39 | num_rows: 11701 40 | }) 41 | }) 42 | 43 | >>> retrieval_corpus_code_dataset = datasets.load_dataset("NTU-NLP-sg/xCodeEval", "retrieval_corpus") 44 | >>> print(retrieval_corpus_code_dataset) 45 | 46 | DatasetDict({ 47 | test: Dataset({ 48 | features: ['file_name', 'source_code', 'idx'], 49 | num_rows: 25043700 50 | }) 51 | }) 52 | ``` 53 | 54 | 55 | To download the retrieval data, 56 | 57 | ``` 58 | GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/NTU-NLP-sg/xCodeEval 59 | cd xCodeEval 60 | git lfs pull --include "retrieval_corpus/*" 61 | git lfs pull --include "retrieval_nl_code/*" 62 | git lfs pull --include "retrieval_code_code/*" 63 | ``` 64 | 65 | ## A Sample from retrieval_corpus 66 | ``` 67 | { 68 | "idx": "9336887", 69 | "source_code": "#include\n#include\n#include\n#include\nusing namespace std;\nint main()\n{\n long long n,a,b,c,cnt=0;\n cin>>n>>a>>b>>c;\n if(b-c>=a) cnt=n/a;\n else\n {\n if(a>=b)\n {\n if(b>n)\n {\n cout<<0<n)\n {\n cout<<0<n) cnt=n/a;\n else\n {\n n=n-a;\n cnt=n/(b-c);\n ++cnt;\n }\n }\n }\n cout<