├── CODE_OF_CONDUCT.md
├── LICENSE
├── README.md
├── SECURITY.md
├── SUPPORT.md
├── imgs
├── leakage
│ ├── humaneval_leakage.png
│ └── mbpp_leakage.png
├── logo
│ └── wave.png
└── main
│ ├── example.png
│ ├── gen-dis.png
│ └── pipeline.png
├── requirements.txt
└── src
├── data
├── llm_gen_dis
│ ├── fewshot_case
│ │ ├── bad_case
│ │ │ └── bad_case_0.txt
│ │ └── good_case
│ │ │ └── good_case_0.txt
│ ├── main.py
│ ├── prompt
│ │ ├── discriminator.txt
│ │ └── generator.txt
│ ├── sampler.py
│ └── utils.py
└── raw_code_collection
│ ├── main.py
│ └── utils
│ ├── kcenter_greedy.py
│ └── sampling_def.py
├── eval
├── .DS_Store
├── evalplus
│ ├── humaneval.zip
│ └── mbpp.zip
├── mbpp_500
│ ├── evaluate.py
│ ├── generate.py
│ └── llmchain
│ │ ├── __init__.py
│ │ └── utils
│ │ ├── __init__.py
│ │ ├── prompter.py
│ │ └── templates
│ │ ├── README.md
│ │ ├── alpaca.json
│ │ ├── alpaca_legacy.json
│ │ ├── alpaca_short.json
│ │ └── vigogne.json
└── reference
│ └── references_mbpp.json
├── script
├── coreset.sh
├── data_generate.sh
├── evaluate.sh
├── generate.sh
└── train.sh
└── train
├── llama2_flash_attn_monkey_patch.py
├── train.py
├── train_mem.py
└── utils.py
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Microsoft Open Source Code of Conduct
2 |
3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4 |
5 | Resources:
6 |
7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | WaveCoder: Widespread And Versatile Enhanced Code LLM
5 |
6 |
7 |
8 |
9 | 
10 | 
11 | 
12 |
13 |
14 |
15 |
16 | [📜 Paper] •
17 |
18 | [🤗 HF Models] •
19 | [🐱 GitHub]
20 |
21 | [🐦 Twitter] •
22 | [💬 Reddit] •
23 | [🍀 Unofficial Blog]
24 |
25 |
26 |
27 |
28 |
29 | Repo for "WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation" [ACL 2024 Main]
30 |
31 |
32 |
33 |
34 |
35 | Figure 1: WaveCoder models pipeline.
36 |
37 |
38 | ## 🔥 News
39 |
40 |
41 |
42 | - [2024/05/16] WaveCoder paper is accepted by main conference of ACL 2024.
43 | - [2024/04/10] 🔥🔥🔥 WaveCoder repo, models released at [🤗 HuggingFace]()!
44 | - [2023/12/26] WaveCoder paper released.
45 |
46 | ## 💡 Introduction
47 |
48 | WaveCoder 🌊 is a series of large language models (LLMs) for the coding domain, designed to solve relevant problems in the field of code through instruction-following learning. Its training dataset was generated from a subset of code-search-net data using a generator-discriminator framework based on LLMs that we proposed, covering four general code-related tasks: code generation, code summary, code translation, and code repair.
49 |
50 | | Model | HumanEval | MBPP(500) | HumanEval
Fix(Avg.) | HumanEval
Explain(Avg.) |
51 | | ----------------------------------------------------------------------------------------------------------------------------- | --------- | --------- | ---------------------- | -------------------------- |
52 | | GPT-4 | 85.4 | - | 47.8 | 52.1 |
53 | | [
WaveCoder-DS-6.7B](https://github.com/microsoft/WaveCoder) | 65.8 | 63.0 | 49.5 | 40.8 |
54 | | [
WaveCoder-Pro-6.7B](https://github.com/microsoft/WaveCoder) | 74. 4 | 63.4 | 52.1 | 43.0 |
55 | | [
WaveCoder-Ultra-6.7B](https://github.com/microsoft/WaveCoder) | 79.9 | 64.6 | 52.3 | 45.7 |
56 |
57 | ### LLM-based Generator-Discriminator
58 |
59 |
60 |
61 |
62 | Figure 2: Main framwork of LLM-based Generator-Discriminator.
63 |
64 |
65 | ### Example of Instruction Generation
66 |
67 |
68 |
69 |
70 | Figure 3: An Example of Our Data Generation.
71 |
72 |
73 | ### Data Decontamination
74 |
75 | We combine our dataset with the decontaminated [evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) dataset (WaveCoder-evol-instruct) to train WaveCoder-Ultra-6.7B.
76 |
77 |
82 |
83 |
84 |  |
85 |  |
86 |
87 |
88 |
89 | ## 🚀 Quick Start
90 |
91 | ### ⚙️ Setup
92 |
93 | We recommend using [Conda](https://docs.conda.io/projects/miniconda) to manage your environment. Run the following commands to setup your environment:
94 |
95 | ```sh
96 | conda create -n wavecoder python=3.9
97 | conda activate wavecoder
98 | cd src
99 | pip install -r requirements.txt
100 | pip install transformers==4.34.1
101 | pip install flash-attn==2.5.5
102 | ```
103 |
104 | ### ⚡️ Training
105 |
106 | We also open-source our complete training scripts for the community, and you may construct your own dataset for training. Our training scripts refer to [Fastchat](https://github.com/lm-sys/FastChat)
107 |
108 | To train a model, run the following command:
109 |
110 | ```sh
111 | cd src
112 | bash script/train.sh
113 | ```
114 |
115 | ### ⚖️ Evaluation
116 |
117 | - For [HumanEval](https://huggingface.co/datasets/openai_humaneval) benchmark, we use the code base from [Evalplus](https://github.com/evalplus/evalplus). We recommend using the code base from [Magicoder](https://github.com/ise-uiuc/magicoder) and the following command to reproduce the HumanEval result of WaveCoder.
118 |
119 | ```sh
120 | MODEL_KEY=deepseek-ai/deepseek-coder-6.7b-base
121 | MODEL=microsoft/wavecoder-ultra-6.7b
122 |
123 | DATASET=humaneval
124 | SAVE_PATH=evalplus-$(basename $MODEL)-$DATASET.jsonl
125 | SANITIZED_PATH=humaneval_result/evalplus-$(basename $MODEL)-$DATASET-sanitized.jsonl
126 |
127 | python -m experiments.text2code \
128 | --model_key $MODEL_KEY \
129 | --model_name_or_path $MODEL \
130 | --save_path $SAVE_PATH \
131 | --dataset $DATASET \
132 | --temperature 0.0 \
133 | --top_p 1.0 \
134 | --max_new_tokens 512 \
135 | --n_problems_per_batch 28 \
136 | --n_samples_per_problem 1 \
137 | --n_batches 1
138 |
139 | echo "$MODEL"
140 | evalplus.evaluate --dataset $DATASET --samples $SAVE_PATH
141 | ```
142 |
143 | - For MBPP (500), you can get generations by running the following command:
144 |
145 | ```sh
146 | cd src
147 | bash script/generate.sh
148 | ```
149 |
150 | and then get a pass_k score and the error type analysis by running the following command:
151 |
152 | ```sh
153 | bash script/evaluate.sh
154 | ```
155 |
156 | - For HumanEvalFix and HumanEvalExplain benchmarks, we use the code base from [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
157 |
158 | ### 🌲 Data Generation
159 |
160 | Firstly, you should prepare your raw code data and save it as .jsonl file, then you can run the following command:
161 |
162 | ```sh
163 | cd src
164 | bash script/coreset.sh
165 | ```
166 |
167 | to get the coreset of you raw data. Once you get the coreset, you can run
168 |
169 | ```sh
170 | cd src
171 | bash script/data_generate.sh
172 | ```
173 |
174 | to launch the LLM-based Generator-Discriminator framework. You can customize your data by controlling the prompt and the configurations in the above .sh script.
175 |
176 | ## 📖 License
177 |
178 | This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the its [License](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL).
179 |
180 | ## ☕️ Citation
181 |
182 | If you find this repository helpful, please consider citing our paper:
183 |
184 | ```
185 | @article{yu2023wavecoder,
186 | title={Wavecoder: Widespread and versatile enhanced instruction tuning with refined data generation},
187 | author={Yu, Zhaojian and Zhang, Xin and Shang, Ning and Huang, Yangyu and Xu, Can and Zhao, Yishujie and Hu, Wenxiang and Yin, Qiufeng},
188 | journal={arXiv preprint arXiv:2312.14187},
189 | year={2023}
190 | }
191 | ```
192 |
193 | ## 🍀 Contributing
194 |
195 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
196 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
197 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
198 |
199 | Resources:
200 |
201 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
202 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
203 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
204 |
205 | ## ✨ Star History
206 |
207 | [](https://star-history.com/#microsoft/WaveCoder&Date)
208 |
--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet) and [Xamarin](https://github.com/xamarin).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/security.md/definition), please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/security.md/msrc/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/security.md/msrc/pgp).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/security.md/msrc/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/security.md/cvd).
40 |
41 |
42 |
--------------------------------------------------------------------------------
/SUPPORT.md:
--------------------------------------------------------------------------------
1 | # TODO: The maintainer of this repo has not yet edited this file
2 |
3 | **REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
4 |
5 | - **No CSS support:** Fill out this template with information about how to file issues and get help.
6 | - **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
7 | - **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.
8 |
9 | *Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
10 |
11 | # Support
12 |
13 | ## How to file issues and get help
14 |
15 | This project uses GitHub Issues to track bugs and feature requests. Please search the existing
16 | issues before filing new issues to avoid duplicates. For new issues, file your bug or
17 | feature request as a new Issue.
18 |
19 | For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
20 | FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
21 | CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
22 |
23 | ## Microsoft Support Policy
24 |
25 | Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
26 |
--------------------------------------------------------------------------------
/imgs/leakage/humaneval_leakage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/leakage/humaneval_leakage.png
--------------------------------------------------------------------------------
/imgs/leakage/mbpp_leakage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/leakage/mbpp_leakage.png
--------------------------------------------------------------------------------
/imgs/logo/wave.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/logo/wave.png
--------------------------------------------------------------------------------
/imgs/main/example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/main/example.png
--------------------------------------------------------------------------------
/imgs/main/gen-dis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/main/gen-dis.png
--------------------------------------------------------------------------------
/imgs/main/pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/imgs/main/pipeline.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp
2 | openai==0.28.1
3 | httpx==0.25.1
4 | markdown2==2.4.10
5 | nh3==0.2.14
6 | numpy==1.26.2
7 | pydantic==1.10.13
8 | psutil==5.9.6
9 | requests==2.32.0
10 | rich==13.7.0
11 | tiktoken==0.5.1
12 | uvicorn==0.24.0.post1
13 | accelerate==0.24.1
14 | peft==0.6.2
15 | sentencepiece==0.1.99
16 | protobuf==4.23.4
17 | einops==0.7.0
18 | wandb==0.16.0
19 | torch==2.1.1
20 |
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/fewshot_case/bad_case/bad_case_0.txt:
--------------------------------------------------------------------------------
1 |
2 | Input:The Python code calculates the diameter of a pipe (DN) based on the flow rate (Q), velocity (V), and dynamic viscosity (mu) using a mathematical formula.
3 |
4 | Output:
5 |
6 | task_name: Calculate pipe diameter
7 | instruction: Write a Python function to calculate the diameter of a pipe based on the flow rate, velocity, and dynamic viscosity.
8 | information: The formula to calculate the diameter of a pipe (DN) is DN = (4 * Q / (math.pi * V * mu)) ** 0.5, where Q is the flow rate, V is the velocity, and mu is the dynamic viscosity.
9 | solution:
10 |
11 | def calculate_pipe_diameter(Q, V, mu):
12 | DN = (4 * Q / (math.pi * V * mu)) ** 0.5
13 | return DN
14 |
15 | Analysis:
16 |
17 | - step 1: check the Instruction:
18 | 1. Instruction: the programming language should be specified in the instruction.
19 | - step 2: check the Solution:
20 | 1. Solution: Solution is related to instruction and information. Solution is the specific resolution to instruction and information.
21 | 2. Solution: In the solution, it should only contain the code and comments within the code. There should be no explanations provided outside the code.
22 | 3. Solution: The code that do contain algorithmic logic in solution should not often be buried inside complex or poorly documented functions, making them difficult to follow or learn from.
23 | 4. Solution: The code should import the necessary libraries if required.
24 | - Overall answer: no
25 | - reasons: The code meets most of the requirements, but it does not import the necessary math library, which is required for calculating the diameter of a pipe using the mathematical constant pi.
26 |
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/fewshot_case/good_case/good_case_0.txt:
--------------------------------------------------------------------------------
1 | Input:
2 | The Python code calculates the area of a circle given its radius.
3 | Output:
4 | task_name: Calculate Circle Area
5 | instruction: Write a Python function that calculates the area of a circle given its radius.
6 | information: The formula to calculate the area of a circle is A = pi * r^2, where A is the area and r is the radius.
7 | solution:
8 |
9 | import math
10 |
11 | def area_of_circle(radius):
12 | return math.pi * radius ** 2
13 |
14 | Analysis:
15 |
16 | - step 1: check the Instruction:
17 | 1. Instruction: the programming language should be specified in the instruction.
18 | - step 2: check the Solution:
19 | 1. Solution: Solution is related to instruction and information. Solution is the specific resolution to instruction and information.
20 | 2. Solution: In the solution, it should only contain the code and comments within the code. There should be no explanations provided outside the code.
21 | 3.Solution: The code that do contain algorithmic logic in solution should not often buried inside complex or poorly documented functions, making them difficult to follow or learn from.
22 | 4. Solution: The code should import the required necessary libraries.
23 | - Overall answer: yes
24 | - reasons: All the requirements are met as per the given rules. The programming language is specified, and the solution is related to the instruction and information. The code is simple, easy to follow, and imports the necessary library. The solution provided is a good example of a well-structured and concise code that meets the task's specifications.
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/main.py:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | # Copyright (c) Microsoft Corporation.
4 |
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 |
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 |
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE
22 |
23 | import openai
24 | import os
25 | import ast
26 | import random
27 | import json
28 | import re
29 | import time
30 | import ast
31 | import http.client
32 | import argparse
33 | from tqdm import tqdm
34 | from typing import List, Dict, Any
35 | from sampler import GoodCase, BadCase
36 | from utils import make_request
37 |
38 |
39 | def load_source_data(path: str) -> List[Dict]:
40 | """
41 | Load source code data from a JSON file.
42 |
43 | Parameters:
44 | - path (str): The file path to the source code data file.
45 |
46 | Returns:
47 | - List[Dict]: A list of dictionaries containing source code data.
48 | """
49 | with open(path, "r") as f_s:
50 | ds = json.load(f_s)
51 | return [{"id": id, "text": d["text"]["code"]} for id, d in enumerate(ds)]
52 |
53 |
54 | def load_prompt(path: str) -> str:
55 | """
56 | Load a prompt text from a file.
57 |
58 | Parameters:
59 | - path (str): The file path to the prompt text file.
60 |
61 | Returns:
62 | - str: The loaded prompt text.
63 | """
64 | with open(path, "r") as f1:
65 | prompt = f1.readlines()
66 | return "\n".join(prompt)
67 |
68 |
69 | def analysis_filter(text: str) -> (str, str):
70 | """
71 | Filter out the analysis part from a text.
72 |
73 | Parameters:
74 | - text (str): The input text that may contain an analysis section.
75 |
76 | Returns:
77 | - (str, str): A tuple containing the filtered text and the analysis content if found.
78 | """
79 | pattern = r"(Analysis:[\s\S]*?)(?=Analysis:|$)"
80 | match = re.search(pattern, text)
81 | if not match:
82 | return text, None
83 |
84 | start_pos = match.start()
85 | prev_info = text[:start_pos]
86 | content = text[start_pos:]
87 | return prev_info.strip(), content
88 |
89 |
90 | def extract_message(generated_text: str) -> Dict:
91 | """
92 | Extract message parts from the generated text.
93 |
94 | Parameters:
95 | - generated_text (str): The generated text from the AI model.
96 |
97 | Returns:
98 | - Dict: A dictionary containing the extracted message parts and their quality.
99 | """
100 |
101 | pattern = r"task_name:\s*(.*?)\s*instruction:\s*(.*?)\s*information:\s*(.*?)\s*solution:\s*(.*?)(?=\ntask_name:|\Z)"
102 | matches = re.findall(pattern, generated_text, re.DOTALL)
103 |
104 | # assert len(matches) <= 1
105 | if len(matches) > 1:
106 | print(f"INFO: matches={len(matches)}\ngenerated_text={generated_text}\n\n")
107 | return {"generated_text": generated_text, "quality": False, "result": None}
108 |
109 | if len(matches) == 0:
110 | return {"generated_text": generated_text, "quality": False, "result": None}
111 |
112 | try:
113 | task_name, instruction, information, solution = matches[0]
114 | # save the information in the dict
115 | result = {
116 | "task_name": task_name,
117 | "instruction": instruction,
118 | "information": information,
119 | "solution": solution,
120 | }
121 | except Exception as e:
122 | print(e)
123 | return None
124 |
125 | noinput_num = 0
126 | for key, value in result.items():
127 | if value.strip() in ["", "."]:
128 | noinput_num += 1
129 | if noinput_num > 1:
130 | print(f"noinput={noinput_num}")
131 | return {"generated_text": generated_text, "quality": False, "result": None}
132 |
133 | # check sentence numbers
134 | instruction_sentence = re.split(r"[.!?](?:\s|$)", instruction)
135 | instruction_sentence = [item for item in instruction_sentence if item != ""]
136 | print(f"instruction_sentence_num={len(instruction_sentence)}")
137 |
138 | if len(instruction_sentence) < 1 or len(instruction_sentence) > 2:
139 | return {"generated_text": generated_text, "quality": False, "result": result}
140 |
141 | # information
142 | if information.strip().lower() in [
143 | "none",
144 | "none.",
145 | "none,",
146 | "no additional information is required",
147 | "",
148 | ]:
149 | result["information"] = ""
150 | information = ""
151 |
152 | if information == "":
153 | information_sentence_num = 0
154 | else:
155 | information_sentence = re.split(r"[.!?](?:\s|$)", information)
156 | information_sentence = [item for item in information_sentence if item != ""]
157 | information_sentence_num = len(information_sentence)
158 | # count sentence number
159 | print(f"information_sentence_num={information_sentence_num}")
160 | if information_sentence_num > 2 or information_sentence_num < 0:
161 | return {"generated_text": generated_text, "quality": False, "result": result}
162 |
163 | gen_text = ""
164 | # print(f"result={result}")
165 | for key, value in result.items():
166 | gen_text += f"{key}: {value}\n"
167 |
168 | return {"generated_text": gen_text, "quality": True, "result": result}
169 |
170 |
171 | def extract_answer(feedback: str) -> (str, List[str], str):
172 | """
173 | Extract answers from the feedback text.
174 |
175 | Parameters:
176 | - feedback (str): The feedback text containing answers.
177 |
178 | Returns:
179 | - (str, List[str], str): A tuple containing the feedback, a list of part answers, and the overall answer.
180 | """
181 |
182 | # extract feedback
183 | pattern = r"(- step 1:.*?- reasons:.*?)\n"
184 | match = re.search(pattern, feedback + "\n", flags=re.DOTALL)
185 | # print(f"match={match.groups()}\n\n")
186 | if match:
187 | feedback = match.group(1)
188 | else:
189 | print("No match found")
190 | return None, None, None
191 |
192 | # extract answer
193 | answers = re.findall("", feedback)
194 | overall_answer = re.search("-\s*Overall answer:\s*(.*?)\n", feedback).group(1)
195 | overall_answer = overall_answer.lower()
196 |
197 | part_answer = []
198 | for answer in answers:
199 | part_answer.append(answer.lower())
200 |
201 | return feedback, part_answer, overall_answer.strip()
202 |
203 |
204 | def few_shot_task_gen(
205 | source_code: List[Dict],
206 | gen_prompt: str,
207 | dis_prompt: str,
208 | good_case_path: str,
209 | bad_case_path: str,
210 | sample_number: int,
211 | engine: str,
212 | api_key: str,
213 | base_url: str,
214 | gen_max_token: int = 800,
215 | data_stream_path: str = "data_stream.txt",
216 | ) -> List[Dict]:
217 | """
218 | Generate few-shot tasks and process the data.
219 |
220 | Parameters:
221 | - source_code (List[Dict]): List of source code data.
222 | - gen_prompt (str): The generator prompt text.
223 | - dis_prompt (str): The discriminator prompt text.
224 | - good_case_path (str): Path to save good cases.
225 | - bad_case_path (str): Path to save bad cases.
226 | - sample_number (int): Number of samples for few-shot learning.
227 | - engine (str): The OpenAI engine to use.
228 | - api_key (str): The API key for OpenAI.
229 | - base_url (str): The base URL for the OpenAI API.
230 | - gen_max_token (int): Maximum token length for the generator.
231 | - data_stream_path (str): Path to save the data stream.
232 |
233 | Returns:
234 | - List[Dict]: A list of dictionaries containing the generated data.
235 | """
236 |
237 | good_prompt = "Here are some good examples:\n"
238 | bad_prompt = "Here are some bad examples. In each example, I also provide an pointing out the reasons why the case is not good. Please do not generate data like this.\n"
239 | #
240 | GoodCaser = GoodCase(good_case_path, good_prompt, sample_number=sample_number)
241 | BadCaser = BadCase(bad_case_path, bad_prompt, sample_number=sample_number)
242 | print(f"Good Case: {len(GoodCaser.sample_list)}")
243 | print(f"Bad Case: {len(BadCaser.sample_list)}")
244 | g_prompt = gen_prompt
245 | d_prompt = dis_prompt
246 |
247 | data_list = []
248 | with open(data_stream_path, "w") as f:
249 | for data in tqdm(source_code):
250 | ids = data["id"]
251 | # print("The code for example " + str(ids+1) + " is generating.")
252 |
253 | example_code = data["text"]
254 |
255 | good_few_shot = GoodCaser.generate_fewshot_text()
256 | bad_few_shot = BadCaser.generate_fewshot_text()
257 | example = {
258 | "good_few_shot": good_few_shot,
259 | "bad_few_shot": bad_few_shot,
260 | "input": example_code,
261 | }
262 |
263 | message = g_prompt.format_map(example)
264 | print(message)
265 | text = make_request(
266 | message=message, model=engine, api_key=api_key, base_url=base_url
267 | )
268 | if not text:
269 | raise Exception("No text generated")
270 | # print(text)
271 | # analysis filter
272 | text, analysis_content = analysis_filter(text)
273 | filter_messages = extract_message(text)
274 | if filter_messages["quality"] == False:
275 | continue
276 | generated_text = filter_messages["generated_text"]
277 | example_case = f"Input: \n{example_code}\n\nOutput:\n{generated_text}"
278 |
279 | # example_case = f"Output:\n{text}"
280 | ans = discriminator(
281 | prompt=d_prompt,
282 | GoodCaser=GoodCaser,
283 | BadCaser=BadCaser,
284 | engine=engine,
285 | api_key=api_key,
286 | base_url=base_url,
287 | generated_text=example_case,
288 | )
289 | if "no" not in ans:
290 | result_data = filter_messages["result"]
291 | assert result_data != None and len(result_data) > 0
292 | data_list.append(
293 | {
294 | "id": ids,
295 | "task_type": "code generation",
296 | "source_code": example_code,
297 | "generation_data": result_data,
298 | }
299 | )
300 | f.write(
301 | json.dumps(
302 | {
303 | "id": ids,
304 | "task_type": "code generation",
305 | "source_code": example_code,
306 | "generation_data": result_data,
307 | }
308 | )
309 | )
310 | f.write(",\n")
311 |
312 | time.sleep(2)
313 |
314 | if not data_list:
315 | raise Exception("No data generated")
316 |
317 | else:
318 | print(f"data_list={len(data_list)}\nsome examples are:")
319 | for key, value in data_list[0].items():
320 | print(f"{key}:{value}")
321 |
322 | return data_list
323 |
324 |
325 | def discriminator(
326 | prompt: str,
327 | GoodCaser: GoodCase,
328 | BadCaser: BadCase,
329 | engine: str,
330 | api_key: str,
331 | base_url: str,
332 | generated_text: str,
333 | max_token: int = 500,
334 | ) -> str:
335 | """
336 | Use discriminator to classify the generated text as good or bad.
337 |
338 | Parameters:
339 | - prompt (str): The discriminator prompt text.
340 | - GoodCaser (GoodCase): An instance of GoodCase for good examples.
341 | - BadCaser (BadCase): An instance of BadCase for bad examples.
342 | - engine (str): The OpenAI engine to use.
343 | - api_key (str): The API key for OpenAI.
344 | - base_url (str): The base URL for the OpenAI API.
345 | - generated_text (str): The generated text to be classified.
346 | - max_token (int): Maximum token length for the discriminator.
347 |
348 | Returns:
349 | - str: The overall answer from the discriminator.
350 | """
351 |
352 | prompt_format = """
353 | {prompt}\n{bad_examples}\n\n{generated_text}\n\nAnalysis:\n
354 | """
355 |
356 | good_examples = GoodCaser.generate_fewshot_for_d()
357 | bad_examples = BadCaser.generate_fewshot_for_d()
358 | # generate instruction
359 | example = {
360 | "prompt": prompt,
361 | "bad_examples": bad_examples,
362 | "generated_text": generated_text,
363 | }
364 | message = prompt_format.format_map(example)
365 | # obtain answer
366 | ans = make_request(
367 | message=message, model=engine, api_key=api_key, base_url=base_url
368 | )
369 | feedback, part_answer, overall_answer = extract_answer(ans)
370 | print(f"part_answer={part_answer}\toverall_answer={overall_answer}")
371 |
372 | if feedback is None:
373 | print("no pattern information was extracted in the feedback")
374 | return None
375 |
376 | generated_text = f"{generated_text}\n\nAnalysis:\n{feedback}\n"
377 |
378 | if overall_answer == "yes":
379 | GoodCaser.add_case(generated_text)
380 | else:
381 | BadCaser.add_case(generated_text)
382 |
383 | return overall_answer
384 |
385 |
386 | def parse_args():
387 | """
388 | Parse command line arguments.
389 | """
390 | parser = argparse.ArgumentParser(
391 | description="Process few-shot tasks and generate data."
392 | )
393 | parser.add_argument(
394 | "--source_data_path",
395 | type=str,
396 | default="fewshot_case/source_code.json",
397 | help="Path to the source code data file.",
398 | )
399 | parser.add_argument(
400 | "--gen_prompt_path",
401 | type=str,
402 | default="prompt/generator.txt",
403 | help="Path to the generator prompt text file.",
404 | )
405 | parser.add_argument(
406 | "--dis_prompt_path",
407 | type=str,
408 | default="prompt/discriminator.txt",
409 | help="Path to the discriminator prompt text file.",
410 | )
411 | parser.add_argument(
412 | "--good_case_path",
413 | type=str,
414 | default="prompt/generator.txt",
415 | help="Path to save good case.",
416 | )
417 | parser.add_argument(
418 | "--data_stream_path",
419 | type=str,
420 | default="data_stream.txt",
421 | help="Path to save txt file.",
422 | )
423 | parser.add_argument(
424 | "--bad_case_path",
425 | type=str,
426 | default="prompt/generator.txt",
427 | help="Path to the save bad case.",
428 | )
429 | parser.add_argument(
430 | "--save_json",
431 | action="store_true",
432 | help="If true, save the result to json file.",
433 | )
434 | parser.add_argument(
435 | "--sample_size",
436 | type=int,
437 | default=1,
438 | help="Sample size for the few-shot prompt.",
439 | )
440 | parser.add_argument(
441 | "--gen_max_token",
442 | type=int,
443 | default=800,
444 | help="Maximum token length for the generator prompt.",
445 | )
446 | parser.add_argument(
447 | "--output_data_path",
448 | type=str,
449 | default="result.json",
450 | help="Path for the output data file.",
451 | )
452 | parser.add_argument(
453 | "--openai_key",
454 | type=str,
455 | default="default_key",
456 | help="OpenAI api key.",
457 | )
458 | parser.add_argument(
459 | "--openai_url",
460 | type=str,
461 | default="https://api.openai.com/v1/engines/davinci/completions",
462 | help="OpenAI url.",
463 | )
464 | parser.add_argument(
465 | "--openai_model",
466 | type=str,
467 | default="text-davinci-003",
468 | help="OpenAI url.",
469 | )
470 | return parser.parse_args()
471 |
472 |
473 | def main(args):
474 | # load source code
475 | ds = load_source_data(path=args.source_data_path)
476 |
477 | # load generator and discriminator prompts
478 | gen_prompt = load_prompt(args.gen_prompt_path)
479 | dis_prompt = load_prompt(args.dis_prompt_path)
480 |
481 | # generate few-shot task
482 | generations = few_shot_task_gen(
483 | source_code=ds,
484 | gen_prompt=gen_prompt,
485 | dis_prompt=dis_prompt,
486 | good_case_path=args.good_case_path,
487 | bad_case_path=args.bad_case_path,
488 | sample_number=args.sample_size,
489 | engine=args.openai_model,
490 | api_key=args.openai_key,
491 | base_url=args.openai_url,
492 | gen_max_token=args.gen_max_token,
493 | data_stream_path=args.data_stream_path,
494 | )
495 | if args.save_json:
496 | with open(args.output_data_path, "w") as f:
497 | json.dump(generations, f, indent=4)
498 |
499 |
500 | if __name__ == "__main__":
501 | args = parse_args()
502 | main(args)
503 |
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/prompt/discriminator.txt:
--------------------------------------------------------------------------------
1 | Please analyze and judge whether the data meets the specifications according to the following rules:
2 | Let's go through each requirement step by step. If the requirement is met, enter "yes" after the requirement; otherwise, enter "no".
3 | - step 1: check the Instruction:
4 | 1.Instruction: the programming language should be specified in the instruction.
5 | - step 2: check the Solution:
6 | 1. Solution: Solution is related to instruction and information. Solution is the specific resolution to instruction and information.
7 | 2. Solution: In the solution, it should only contain the code and comments within the code. There should be no explanations provided outside the code.
8 | 3. Solution: The code that do contain algorithmic logic in solution should not often buried inside complex or poorly documented functions, making them difficult to follow or learn from.
9 | 4. Solution: The code should import the required necessary libraries.
10 | - Overall answer:
11 | - reasons:
12 |
13 |
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/prompt/generator.txt:
--------------------------------------------------------------------------------
1 | Translate a given task summary into a functional code. Each generated case needs to be provided with the following keys:
2 | task_name, instruction, information, solution. Your response should be formatted using spaces.
3 | Here are some requirements you should allowed:
4 | 1. Each generated case needs to be provided with the following keys:task_name, instruction, information, solution.
5 | 2. The solution is a specific resolution addressing instructions and information; therefore, a solution must be relevant to both instructions and information.
6 | 3. The instruction should be one or two sentences.
7 | 4. The code in solution should import the neccessary library you need, like math, functools, etc.
8 | 5. The code that do contain algorithmic logic in solution should not often buried inside complex or poorly documented functions, making them difficult to follow or learn from.
9 | {good_few_shot}
10 | {bad_few_shot}
11 | Input:
12 | {input}
13 | Output:
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/sampler.py:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | # Copyright (c) Microsoft Corporation.
4 |
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 |
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 |
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE
22 |
23 | import json
24 | import random
25 | import copy
26 | import os
27 |
28 |
29 | class CaseSample:
30 | def __init__(
31 | self,
32 | seed_path: str,
33 | pre_prompt: str,
34 | sample_number: int = 1,
35 | seed: int = 1024,
36 | prefix_name: str = "",
37 | ):
38 | """
39 | Initialize the CaseSample instance.
40 |
41 | Parameters:
42 | - seed_path (str): The path to the seed data directory.
43 | - pre_prompt (str): The prompt to be used before the few-shot examples.
44 | - sample_number (int): The number of samples to be taken from the seed data.
45 | - seed (int): The seed for random number generation.
46 | - prefix_name (str): The prefix name for the seed data files.
47 | """
48 | self.prompt = pre_prompt
49 | self.sample_number = sample_number
50 | self.seed_path = seed_path
51 | self.prefix_name = prefix_name
52 | # init sample
53 | self.init_sample(seed_path)
54 | # random.seed(seed)
55 |
56 | def init_sample(self, path: str):
57 | """
58 | Initialize the sample list from the given path.
59 |
60 | Parameters:
61 | - path (str): The path to the directory containing seed data files.
62 | """
63 | if path is None:
64 | raise ValueError("Can't init path to obtain seed case")
65 | # with open(path, 'r') as f:
66 | # sample_list = json.load(f)
67 | # f.close()
68 | sample_list = []
69 | file_list = os.listdir(path)
70 |
71 | for file_path in file_list:
72 | tmp_path = f"{path}/{file_path}"
73 | with open(tmp_path, "r") as f:
74 | data = f.readlines()
75 | f.close()
76 | data = "".join(data)
77 | sample_list.append(data)
78 |
79 | self.sample_list = self.preprocess_seed_data(sample_list)
80 |
81 | def preprocess_seed_data(self, seed_data: list) -> list:
82 | """
83 | Preprocess the seed data.
84 |
85 | Parameters:
86 | - seed_data (list): The list of seed data to be preprocessed.
87 |
88 | Returns:
89 | - list: The preprocessed seed data.
90 | """
91 | return seed_data
92 |
93 | def fewshot_sample(self) -> list:
94 | """
95 | Sample a few examples from the seed data.
96 |
97 | Returns:
98 | - list: A list of sampled few-shot examples.
99 | """
100 | assert self.sample_number <= len(self.sample_list)
101 | fewshot_case = random.sample(self.sample_list, self.sample_number)
102 |
103 | return fewshot_case
104 |
105 | def preprocess_add_data(self, data: str) -> str:
106 | """
107 | Preprocess data before adding it to the sample list.
108 |
109 | Parameters:
110 | - data (str): The data to be preprocessed.
111 |
112 | Returns:
113 | - str: The preprocessed data.
114 | """
115 | return data
116 |
117 | def add_case(self, data: str):
118 | """
119 | Add a new case to the sample list.
120 |
121 | Parameters:
122 | - data (str): The data to be added as a new case.
123 | """
124 | with open(
125 | f"{self.seed_path}/{self.prefix_name}_{len(self.sample_list)}.txt",
126 | "w",
127 | ) as f:
128 | f.write(data)
129 | f.close()
130 | self.sample_list.append(self.preprocess_add_data(data))
131 |
132 | def generate_fewshot_text(self) -> str:
133 | """
134 | Generate the few-shot text using the sampled cases and the prompt.
135 |
136 | Returns:
137 | - str: The generated few-shot text.
138 | """
139 | fewshot_case = self.fewshot_sample()
140 | gen_texts = self.prompt
141 | for i in range(len(fewshot_case)):
142 | gen_texts += f"Case {i}:\n{fewshot_case[i]}\n"
143 |
144 | return gen_texts
145 |
146 |
147 | class GoodCase(CaseSample):
148 |
149 | def __init__(
150 | self, seed_path, pre_prompt, sample_number=1, seed=1024, prefix_name="good_case"
151 | ):
152 | super().__init__(
153 | seed_path,
154 | pre_prompt,
155 | sample_number=sample_number,
156 | seed=seed,
157 | prefix_name=prefix_name,
158 | )
159 |
160 | def generate_fewshot_text(self):
161 | fewshot_case = self.fewshot_sample()
162 | gen_texts = self.prompt
163 | for i in range(len(fewshot_case)):
164 | gen_texts += f"Good case {i}:\n{fewshot_case[i]}\n"
165 |
166 | return gen_texts
167 |
168 | def generate_fewshot_for_d(self):
169 | fewshot_case = self.fewshot_sample()
170 | gen_texts = ""
171 | for i in range(len(fewshot_case)):
172 | gen_texts += f"Good Case {i}:\n{fewshot_case[i]}\n"
173 |
174 | return gen_texts
175 |
176 |
177 | class BadCase(CaseSample):
178 | def __init__(
179 | self, seed_path, pre_prompt, sample_number=1, seed=1024, prefix_name="bad_case"
180 | ):
181 | super().__init__(
182 | seed_path,
183 | pre_prompt,
184 | sample_number=sample_number,
185 | seed=seed,
186 | prefix_name=prefix_name,
187 | )
188 |
189 | def preprocess_add_data(self, data):
190 | return super().preprocess_add_data(data)
191 |
192 | def generate_fewshot_text(self):
193 | fewshot_case = self.fewshot_sample()
194 | gen_texts = self.prompt
195 | for i in range(len(fewshot_case)):
196 | gen_texts += f"Bad case {i}:\n{fewshot_case[i]}\n"
197 |
198 | return gen_texts
199 |
200 | def fewshot_sample(self):
201 | assert self.sample_number <= len(self.sample_list)
202 | fewshot_case = random.sample(self.sample_list[-1], self.sample_number)
203 | return fewshot_case
204 |
205 | def generate_fewshot_for_d(self):
206 | fewshot_case = self.fewshot_sample()
207 | gen_texts = ""
208 | # analysis = "Analysis:\nanswer:yes\nreasons:all requirements are satisfied"
209 | for i in range(len(fewshot_case)):
210 | gen_texts += f"Bad Case {i}:\n{fewshot_case[i]}\n"
211 |
212 | return gen_texts
213 |
--------------------------------------------------------------------------------
/src/data/llm_gen_dis/utils.py:
--------------------------------------------------------------------------------
1 | import signal
2 | import time
3 | from openai import OpenAI
4 |
5 |
6 | # Define the OpenAI client call within the make_request function
7 | def make_request(
8 | message: str,
9 | model: str,
10 | api_key: str,
11 | base_url: str,
12 | max_tokens: int = 512,
13 | temperature: float = 1,
14 | n: int = 1,
15 | **kwargs,
16 | ) -> str:
17 | """
18 | Makes a request to the OpenAI API for chat completions.
19 |
20 | Parameters:
21 | - message (str): The input message or question to be processed.
22 | - model (str): The model to be used for the request.
23 | - max_tokens (int): The maximum number of tokens to generate in the response.
24 | - temperature (float): The temperature to use for the request, controlling randomness.
25 | - n (int): The number of responses to generate.
26 | - **kwargs: Additional keyword arguments to pass to the API call.
27 |
28 | Returns:
29 | - dict: The response from the OpenAI API.
30 | """
31 |
32 | def call_api(message, max_tokens, temperature, api_key, base_url):
33 | # Initialize the OpenAI client (Note: api_key should be kept secure and not hard-coded)
34 | client = OpenAI(
35 | api_key=api_key,
36 | base_url=base_url,
37 | )
38 |
39 | while True:
40 | try:
41 | response = client.chat.completions.create(
42 | model=model,
43 | messages=[
44 | {
45 | "role": "system",
46 | "content": "You are an AI assistant that helps people find information.",
47 | },
48 | {"role": "user", "content": message},
49 | ],
50 | max_tokens=max_tokens,
51 | temperature=temperature,
52 | )
53 | return response
54 | except Exception as e:
55 | print(e)
56 | time.sleep(2)
57 |
58 | response = call_api(
59 | message,
60 | max_tokens,
61 | temperature,
62 | api_key,
63 | base_url,
64 | )
65 | return response.choices[0].message.content
66 |
67 |
68 | def handler(signum, frame):
69 | """
70 | Signal handler to raise an exception when an alarm is received.
71 |
72 | Parameters:
73 | - signum (int): The signal number.
74 | - frame (frame): The current stack frame.
75 | """
76 | raise Exception("end of time")
77 |
78 |
79 | def make_auto_request(*args, **kwargs) -> dict:
80 | """
81 | Makes an auto request with a timeout to the OpenAI API.
82 |
83 | Parameters:
84 | - *args: Variable arguments to be passed to make_request.
85 | - **kwargs: Keyword arguments to be passed to make_request.
86 |
87 | Returns:
88 | - dict: The response from the OpenAI API.
89 | """
90 |
91 | ret = None
92 | while ret is None:
93 | try:
94 | signal.signal(signal.SIGALRM, handler)
95 | signal.alarm(100) # Set an alarm for 100 seconds
96 | ret = make_request(*args, **kwargs)
97 | signal.alarm(0) # Disable the alarm
98 | except signal.AlarmClockError: # Alarm signal received
99 | print("Request timed out")
100 | except Exception as e:
101 | print(f"An error occurred: {e}")
102 | time.sleep(1) # Wait before retrying
103 | return ret
104 |
--------------------------------------------------------------------------------
/src/data/raw_code_collection/main.py:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | # Copyright (c) Microsoft Corporation.
4 |
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 |
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 |
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE
22 |
23 | import argparse
24 | import json
25 | import numpy as np
26 | from tqdm import tqdm
27 | from typing import List, Dict, Any
28 | from sentence_transformers import SentenceTransformer
29 | from utils.kcenter_greedy import kCenterGreedy
30 |
31 |
32 | def get_code_embedding(
33 | data: List[str], model_name: str, batch_size: int
34 | ) -> List[Dict[str, Any]]:
35 | """
36 | Generate embeddings for a list of code snippets using a sentence transformer model.
37 |
38 | Parameters:
39 | data (List[str]): A list of code snippets to embed.
40 | model_name (str): The name of the sentence transformer model to use.
41 | batch_size (int): The size of the batch for embedding generation.
42 |
43 | Returns:
44 | List[Dict[str, Any]]: A list of dictionaries with 'text' and 'embedding' keys.
45 | """
46 | model = SentenceTransformer(model_name)
47 | embeddings = model.encode(data, batch_size=batch_size, show_progress_bar=True)
48 | res = [{"text": t, "embedding": e.tolist()} for t, e in zip(data, embeddings)]
49 |
50 | return res
51 |
52 |
53 | def coreset(embeddings: np.ndarray, num: int, seed: int) -> np.ndarray:
54 | """
55 | Select a coreset from a set of embeddings using the k-Center Greedy algorithm.
56 |
57 | Parameters:
58 | embeddings (np.ndarray): An array of embeddings.
59 | num (int): The number of elements to select for the coreset.
60 |
61 | Returns:
62 | np.ndarray: An array containing the coreset elements.
63 | """
64 | kcg = kCenterGreedy(X=embeddings, y=None, seed=seed)
65 | batch = kcg.select_batch_(model=None, already_selected=[], N=num)
66 | return embeddings[batch]
67 |
68 |
69 | # Set up the argument parser
70 |
71 |
72 | def parse_args() -> argparse.Namespace:
73 | """
74 | Parse command line arguments.
75 |
76 | Returns:
77 | argparse.Namespace: Parsed arguments as an object.
78 | """
79 | parser = argparse.ArgumentParser(
80 | description="Generate embeddings for code and create a coreset."
81 | )
82 | parser.add_argument(
83 | "--data_path",
84 | type=str,
85 | required=True,
86 | help="Path to the data file containing code snippets in JSON format.",
87 | )
88 | parser.add_argument(
89 | "--save_path",
90 | type=str,
91 | required=True,
92 | help="Path to save.",
93 | )
94 | parser.add_argument(
95 | "--batch_size",
96 | type=int,
97 | default=4096,
98 | help="Batch size for embedding generation.",
99 | )
100 | parser.add_argument(
101 | "--seed",
102 | type=int,
103 | default=42,
104 | help="Random seed.",
105 | )
106 | parser.add_argument(
107 | "--model_name",
108 | type=str,
109 | default="sentence-transformers/all-roberta-large-v1",
110 | help="Pretrained model name for sentence transformer.",
111 | )
112 | parser.add_argument(
113 | "--coreset_size",
114 | type=int,
115 | required=True,
116 | help="Number of elements to include in the coreset.",
117 | )
118 |
119 | args = parser.parse_args()
120 | return args
121 |
122 |
123 | def main():
124 | """
125 | Main function to load data, generate embeddings, create a coreset, and save the results.
126 | """
127 | args = parse_args()
128 |
129 | # Load the dataset
130 | with open(args.data_path, "r") as f:
131 | data = f.readlines()
132 |
133 | data = [json.loads(d) for d in data]
134 |
135 | # Ensure the coreset size does not exceed the data size
136 | if args.coreset_size > len(data):
137 | raise ValueError("coreset_size exceeds the number of data entries")
138 |
139 | # Get code embeddings
140 | embeddings = get_code_embedding(data, args.model_name, args.batch_size)
141 |
142 | # Create a coreset from the dataset
143 | coreset_data = coreset(
144 | np.array([example["embedding"] for example in embeddings]),
145 | args.coreset_size,
146 | args.seed,
147 | )
148 |
149 | # Optionally, save the coreset to a file
150 | with open(args.save_path, "w") as f:
151 | json.dump(
152 | [
153 | {"text": embeddings[idx]["text"]}
154 | for idx, embedding in enumerate(coreset_data)
155 | ],
156 | f,
157 | indent=4,
158 | )
159 |
160 |
161 | if __name__ == "__main__":
162 | main()
163 |
--------------------------------------------------------------------------------
/src/data/raw_code_collection/utils/kcenter_greedy.py:
--------------------------------------------------------------------------------
1 | """ Origin code from https://github.com/google/active-learning/blob/master/sampling_methods/kcenter_greedy.py """
2 |
3 | """Returns points that minimizes the maximum distance of any point to a center.
4 |
5 | Implements the k-Center-Greedy method in
6 | Ozan Sener and Silvio Savarese. A Geometric Approach to Active Learning for
7 | Convolutional Neural Networks. https://arxiv.org/abs/1708.00489 2017
8 |
9 | Distance metric defaults to l2 distance. Features used to calculate distance
10 | are either raw features or if a model has transform method then uses the output
11 | of model.transform(X).
12 |
13 | Can be extended to a robust k centers algorithm that ignores a certain number of
14 | outlier datapoints. Resulting centers are solution to multiple integer program.
15 | """
16 |
17 |
18 | import numpy as np
19 | from sklearn.metrics import pairwise_distances
20 | from utils.sampling_def import SamplingMethod
21 |
22 |
23 | class kCenterGreedy(SamplingMethod):
24 |
25 | def __init__(self, X, y, seed, metric="euclidean"):
26 | self.X = X
27 | self.y = y
28 | self.flat_X = self.flatten_X()
29 | self.name = "kcenter"
30 | self.features = self.flat_X
31 | self.metric = metric
32 | self.min_distances = None
33 | self.n_obs = self.X.shape[0]
34 | self.already_selected = []
35 |
36 | def update_distances(self, cluster_centers, only_new=True, reset_dist=False):
37 | """Update min distances given cluster centers.
38 |
39 | Args:
40 | cluster_centers: indices of cluster centers
41 | only_new: only calculate distance for newly selected points and update
42 | min_distances.
43 | rest_dist: whether to reset min_distances.
44 | """
45 |
46 | if reset_dist:
47 | self.min_distances = None
48 | if only_new:
49 | cluster_centers = [
50 | d for d in cluster_centers if d not in self.already_selected
51 | ]
52 | if cluster_centers:
53 | # Update min_distances for all examples given new cluster center.
54 | x = self.features[cluster_centers]
55 | dist = pairwise_distances(self.features, x, metric=self.metric)
56 |
57 | if self.min_distances is None:
58 | self.min_distances = np.min(dist, axis=1).reshape(-1, 1)
59 | else:
60 | self.min_distances = np.minimum(self.min_distances, dist)
61 |
62 | def select_batch_(self, model, already_selected, N, **kwargs):
63 | """
64 | Diversity promoting active learning method that greedily forms a batch
65 | to minimize the maximum distance to a cluster center among all unlabeled
66 | datapoints.
67 |
68 | Args:
69 | model: model with scikit-like API with decision_function implemented
70 | already_selected: index of datapoints already selected
71 | N: batch size
72 |
73 | Returns:
74 | indices of points selected to minimize distance to cluster centers
75 | """
76 |
77 | try:
78 | # Assumes that the transform function takes in original data and not
79 | # flattened data.
80 | print("Getting transformed features...")
81 | self.features = model.transform(self.X)
82 | print("Calculating distances...")
83 | self.update_distances(already_selected, only_new=False, reset_dist=True)
84 | except:
85 | print("Using flat_X as features.")
86 | self.update_distances(already_selected, only_new=True, reset_dist=False)
87 |
88 | new_batch = []
89 |
90 | for _ in range(N):
91 | if self.already_selected is None:
92 | # Initialize centers with a randomly selected datapoint
93 | ind = np.random.choice(np.arange(self.n_obs))
94 | else:
95 | ind = np.argmax(self.min_distances)
96 | # New examples should not be in already selected since those points
97 | # should have min_distance of zero to a cluster center.
98 | assert ind not in already_selected
99 |
100 | self.update_distances([ind], only_new=True, reset_dist=False)
101 | new_batch.append(ind)
102 | print(
103 | "Maximum distance from cluster centers is %0.2f" % max(self.min_distances)
104 | )
105 |
106 | self.already_selected = already_selected
107 |
108 | return new_batch
109 |
--------------------------------------------------------------------------------
/src/data/raw_code_collection/utils/sampling_def.py:
--------------------------------------------------------------------------------
1 | """ Origin code from https://github.com/google/active-learning/blob/master/sampling_methods/kcenter_greedy.py """
2 |
3 | """Abstract class for sampling methods.
4 |
5 | Provides interface to sampling methods that allow same signature
6 | for select_batch. Each subclass implements select_batch_ with the desired
7 | signature for readability.
8 | """
9 |
10 | # from __future__ import absolute_import
11 | # from __future__ import division
12 | # from __future__ import print_function
13 |
14 | import abc
15 | import numpy as np
16 |
17 |
18 | class SamplingMethod(object):
19 | __metaclass__ = abc.ABCMeta
20 |
21 | @abc.abstractmethod
22 | def __init__(self, X, y, seed, **kwargs):
23 | self.X = X
24 | self.y = y
25 | self.seed = seed
26 |
27 | def flatten_X(self):
28 | shape = self.X.shape
29 | flat_X = self.X
30 | if len(shape) > 2:
31 | flat_X = np.reshape(self.X, (shape[0], np.product(shape[1:])))
32 | return flat_X
33 |
34 | @abc.abstractmethod
35 | def select_batch_(self):
36 | return
37 |
38 | def select_batch(self, **kwargs):
39 | return self.select_batch_(**kwargs)
40 |
41 | def to_dict(self):
42 | return None
43 |
--------------------------------------------------------------------------------
/src/eval/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/src/eval/.DS_Store
--------------------------------------------------------------------------------
/src/eval/evalplus/humaneval.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/src/eval/evalplus/humaneval.zip
--------------------------------------------------------------------------------
/src/eval/evalplus/mbpp.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/src/eval/evalplus/mbpp.zip
--------------------------------------------------------------------------------
/src/eval/mbpp_500/evaluate.py:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | # Copyright (c) Microsoft Corporation.
4 |
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 |
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 |
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE
22 |
23 | import re
24 | import json
25 | import os
26 |
27 | import subprocess
28 | import argparse
29 |
30 | from typing import List
31 | from tqdm import tqdm
32 | from evaluate import load
33 |
34 |
35 | os.environ["HF_ALLOW_CODE_EVAL"] = "1"
36 |
37 |
38 | def read_data(path: str) -> List:
39 | """Read data from a JSON file.
40 |
41 | Args:
42 | path (str): The file path to the JSON file.
43 |
44 | Returns:
45 | List: The parsed data from the JSON file.
46 | """
47 | with open(path, "r") as f:
48 | return json.load(f)
49 |
50 |
51 | def generate_py_file(reference_path: str, gen_code_path: str, save_path: str):
52 | """Generate Python files from the generated code.
53 |
54 | Args:
55 | reference_path (str): The file path to the reference JSON file.
56 | gen_code_path (str): The file path to the generated code JSON file.
57 | save_path (str): The directory path to save the generated Python files.
58 | """
59 | references = read_data(reference_path)
60 | generated_code = read_data(gen_code_path)
61 |
62 | if not os.path.exists(save_path):
63 | os.makedirs(save_path)
64 | else:
65 | raise ValueError("The save path already exists. Please provide a new path.")
66 |
67 | if len(references) != len(generated_code):
68 | raise ValueError(
69 | "The length of references list must be equal to the length of generated code list"
70 | )
71 |
72 | for i, reference in enumerate(references):
73 | case_path = os.path.join(save_path, f"case_{i}")
74 | if not os.path.exists(case_path):
75 | os.makedirs(case_path)
76 |
77 | code_candidate = generated_code[i]
78 | for j, code in enumerate(code_candidate):
79 | file_path = os.path.join(case_path, f"gen_{j}.py")
80 | with open(file_path, "w") as file:
81 | file.write(code.replace("\t", " ")) # Replace tabs with spaces
82 | file.write("\n")
83 | file.write(reference)
84 |
85 |
86 | def run_generated_py_file(reference_path: str, gen_code_path: str, scripts_folder: str):
87 | """Run the generated Python files and log the analysis.
88 |
89 | Args:
90 | reference_path (str): The file path to the reference JSON file.
91 | gen_code_path (str): The file path to the generated code JSON file.
92 | scripts_folder (str): The directory path where Python scripts are saved.
93 | """
94 | generate_py_file(reference_path, gen_code_path, scripts_folder)
95 |
96 | file_dirs = os.listdir(scripts_folder)
97 | for id, file_dir in enumerate(
98 | tqdm(sorted(file_dirs, key=lambda x: int("".join(re.findall(r"\d+", x)))))
99 | ):
100 | python_files = [
101 | f for f in os.listdir(f"{scripts_folder}{file_dir}") if f.endswith(".py")
102 | ]
103 | with open(f"{scripts_folder}/solution_information.txt", "a") as solution:
104 | solution.write(f"*****************Problem {id}*****************\n")
105 | for sid, file in enumerate(python_files):
106 | file_path = f"{scripts_folder}{file_dir}/{file}"
107 | with open(file_path, "r") as code_file:
108 | code = code_file.read()
109 | try:
110 | subprocess.run(
111 | ["python", file_path],
112 | check=True,
113 | stderr=subprocess.PIPE,
114 | universal_newlines=True,
115 | timeout=10,
116 | )
117 | status = "passed"
118 | error_type = "None"
119 | except subprocess.CalledProcessError as e:
120 | status = "failed"
121 | error_type = e.stderr
122 | except subprocess.TimeoutExpired as e1:
123 | print(f"Timeout for problem {sid}")
124 | finally:
125 | solution_information = f"Solution {sid}:\n\nStatus:{status}\nError:{error_type}\n\nCode:\n{code}\n\n"
126 | solution.write(solution_information)
127 |
128 |
129 | def pass_k_evaluation(reference_path: str, gen_code_path: str) -> List:
130 | """Evaluate the accuracy of the generated code.
131 |
132 | Args:
133 | reference_path (str): The file path to the reference JSON file.
134 | gen_code_path (str): The file path to the generated code JSON file.
135 |
136 | Returns:
137 | List: The evaluation results.
138 | """
139 | code_metric = load("code_eval")
140 | references = read_data(reference_path)
141 | generated_code = read_data(gen_code_path)
142 | results, _ = code_metric.compute(references=references, predictions=generated_code)
143 | return results
144 |
145 |
146 | def main():
147 | parser = argparse.ArgumentParser(description="Analyze and run generated code.")
148 |
149 | parser.add_argument(
150 | "--reference_path", type=str, help="Path to the reference JSON file."
151 | )
152 | parser.add_argument(
153 | "--gen_code_path", type=str, help="Path to the generated code JSON file."
154 | )
155 | parser.add_argument(
156 | "--analyze_generation",
157 | action="store_true",
158 | help="If true, generate analysis for each problem.",
159 | )
160 | parser.add_argument(
161 | "--save_path",
162 | type=str,
163 | default="./code_run/",
164 | help="Folder to save and run Python scripts.",
165 | )
166 |
167 | args = parser.parse_args()
168 | if args.analyze_generation:
169 | run_generated_py_file(args.reference_path, args.gen_code_path, args.save_path)
170 |
171 | results = pass_k_evaluation(args.reference_path, args.gen_code_path)
172 | print(results)
173 |
174 |
175 | if __name__ == "__main__":
176 | main()
177 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/generate.py:
--------------------------------------------------------------------------------
1 | # MIT License
2 |
3 | # Copyright (c) Microsoft Corporation.
4 |
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 |
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 |
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE
22 |
23 | import json
24 | import os
25 |
26 | from argparse import ArgumentParser
27 | from typing import List
28 | from llmchain.utils.prompter import Prompter
29 | from datasets import load_dataset
30 | from tqdm import tqdm
31 | from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
32 |
33 | from llmchain.utils.prompter import Prompter
34 |
35 |
36 | def prepare_prompt_mbpp(prompter: Prompter) -> List[str]:
37 | """
38 | Prepares prompts for MBPP (https://huggingface.co/datasets/mbpp) using a given prompter.
39 |
40 | Args:
41 | prompter (Prompter): A Prompter to wrap instructions.
42 | Returns:
43 | List[str]: A list of prepared prompts.
44 | """
45 | ds = load_dataset("mbpp", split="test")
46 | instructions = ds["text"]
47 | test_cases = ds["test_list"]
48 |
49 | if len(instructions) != len(test_cases):
50 | raise ValueError("The length of instructions and test cases must be equal")
51 |
52 | prompts = []
53 |
54 | for instruction, test_case in zip(instructions, test_cases):
55 | prompt = prompter.generate_prompt(
56 | instruction
57 | + "\n"
58 | + "the function should pass the following test code:\n"
59 | + "\n".join(test_case)
60 | )
61 | prompts.append(prompt)
62 |
63 | return prompts
64 |
65 |
66 | def generate(
67 | model: AutoModelForCausalLM,
68 | tokenizer: AutoTokenizer,
69 | prompt: str,
70 | generation_config: GenerationConfig,
71 | prompter: Prompter,
72 | ) -> List[str]:
73 | """
74 | Generates text using a given model and tokenizer.
75 |
76 | Args:
77 | model (AutoModelForCausalLM): The model to generate text.
78 | tokenizer (AutoTokenizer): The tokenizer corresponding to the model.
79 | prompt (str): The prompt to generate text from.
80 | generation_config (GenerationConfig): The configuration for text generation.
81 | prompter:Prompter
82 |
83 | Returns:
84 | List[str]: A list of generated texts.
85 | """
86 | inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
87 | s = model.generate(**inputs, generation_config=generation_config)
88 | outputs = tokenizer.batch_decode(s, skip_special_tokens=True)
89 | return [
90 | prompter.get_response(output).replace(tokenizer.eos_token, "")
91 | for output in outputs
92 | ]
93 |
94 |
95 | def main():
96 | parser = ArgumentParser()
97 | parser.add_argument("--model_path", default="", help="please enter model path here")
98 | parser.add_argument(
99 | "--save_path", default="./result", help="please enter save path here"
100 | )
101 | parser.add_argument(
102 | "--temperature", default="", type=float, help="generation_config"
103 | )
104 | parser.add_argument("--n_samples", default="", type=int, help="generation_config")
105 | parser.add_argument(
106 | "--max_new_tokens", default="", type=int, help="generation_config"
107 | )
108 | parser.add_argument("--batch_size", default="", type=int, help="generation_config")
109 | parser.add_argument("--top_p", default="", type=float, help="generation_config")
110 |
111 | args = parser.parse_args()
112 | generations = []
113 | prompter = Prompter()
114 |
115 | # prepare prompt
116 | prompts = prepare_prompt_mbpp(prompter)
117 |
118 | if not args.model_path:
119 | raise ValueError("Please provide a model path")
120 |
121 | tokenizer = AutoTokenizer.from_pretrained(args.model_path)
122 | model = AutoModelForCausalLM.from_pretrained(args.model_path).to("cuda")
123 | generation_config = GenerationConfig(
124 | pad_token_id=tokenizer.pad_token_id,
125 | temperature=args.temperature,
126 | max_new_tokens=args.max_new_tokens,
127 | num_return_sequences=args.batch_size,
128 | eos_token_id=tokenizer.eos_token_id,
129 | top_p=args.top_p,
130 | )
131 |
132 | for prompt in tqdm(prompts):
133 | samples_n = []
134 | if args.n_samples % args.batch_size != 0:
135 | raise ValueError("n_samples must be divisible by batch_size")
136 |
137 | for _ in range(int(args.n_samples / args.batch_size)):
138 | samples_n += generate(model, tokenizer, prompt, generation_config, prompter)
139 |
140 | generations.append(samples_n)
141 |
142 | if args.save_path:
143 | with open(args.save_path, "w") as f1:
144 | json.dump(generations, f1, indent=4)
145 |
146 |
147 | if __name__ == "__main__":
148 | main()
149 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/src/eval/mbpp_500/llmchain/__init__.py
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/WaveCoder/64e5ef634eee60f9d79e291afca5d5d715c8b4ed/src/eval/mbpp_500/llmchain/utils/__init__.py
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/prompter.py:
--------------------------------------------------------------------------------
1 | """
2 | A dedicated helper to manage templates and prompt building.
3 | """
4 |
5 | import json
6 | from pathlib import Path
7 | from typing import Union
8 |
9 |
10 | class Prompter(object):
11 | __slots__ = ("template", "_verbose")
12 |
13 | def __init__(self, template_name: str = "", verbose: bool = False):
14 | self._verbose = verbose
15 | if not template_name:
16 | # Enforce the default here, so the constructor can be called with '' and will not break.
17 | template_name = "alpaca"
18 | file_name = Path(__file__).parent / "templates" / f"{template_name}.json"
19 | if not file_name.exists():
20 | raise ValueError(f"Can't read {file_name}")
21 | with file_name.open() as fp:
22 | self.template = json.load(fp)
23 | if self._verbose:
24 | print(
25 | f"Using prompt template {template_name}: {self.template['description']}"
26 | )
27 |
28 | def generate_prompt(
29 | self,
30 | instruction: str,
31 | input: Union[None, str] = None,
32 | label: Union[None, str] = None,
33 | ) -> str:
34 | # returns the full prompt from instruction and optional input
35 | # if a label (=response, =output) is provided, it's also appended.
36 | if input:
37 | res = self.template["prompt_input"].format(
38 | instruction=instruction, input=input
39 | )
40 | else:
41 | res = self.template["prompt_no_input"].format(
42 | instruction=instruction
43 | )
44 | if label:
45 | res = f"{res}{label}"
46 | if self._verbose:
47 | print(res)
48 | return res
49 |
50 | def get_response(self, output: str) -> str:
51 | return output.split(self.template["response_split"])[1].strip()
52 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/templates/README.md:
--------------------------------------------------------------------------------
1 | # Prompt templates
2 |
3 | This directory contains template styles for the prompts used to finetune LoRA models.
4 |
5 | ## Format
6 |
7 | A template is described via a JSON file with the following keys:
8 |
9 | - `prompt_input`: The template to use when input is not None. Uses `{instruction}` and `{input}` placeholders.
10 | - `prompt_no_input`: The template to use when input is None. Uses `{instruction}` placeholders.
11 | - `description`: A short description of the template, with possible use cases.
12 | - `response_split`: The text to use as separator when cutting real response from the model output.
13 |
14 | No `{response}` placeholder was used, since the response is always the last element of the template and is just to be concatenated to the rest.
15 |
16 | ## Example template
17 |
18 | The default template, used unless otherwise specified, is `alpaca.json`
19 |
20 | ```json
21 | {
22 | "description": "Template used by Alpaca-LoRA.",
23 | "prompt_input": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
24 | "prompt_no_input": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n",
25 | "response_split": "### Response:"
26 | }
27 |
28 | ```
29 |
30 | ## Current templates
31 |
32 | ### alpaca
33 |
34 | Default template used for generic LoRA fine tunes so far.
35 |
36 | ### alpaca_legacy
37 |
38 | Legacy template used by the original alpaca repo, with no `\n` after the response field. Kept for reference and experiments.
39 |
40 | ### alpaca_short
41 |
42 | A trimmed down alpaca template which seems to perform just as well and spare some tokens. Models created with the default template seem to be queryable by the short tempalte as well. More experiments are welcome.
43 |
44 | ### vigogne
45 |
46 | The default alpaca template, translated to french. This template was used to train the "Vigogne" LoRA and is to be used to query it, or for extra fine tuning.
47 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/templates/alpaca.json:
--------------------------------------------------------------------------------
1 | {
2 | "description": "Template used by Alpaca-LoRA.",
3 | "prompt_input": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
4 | "prompt_no_input": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n",
5 | "response_split": "### Response:"
6 | }
7 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/templates/alpaca_legacy.json:
--------------------------------------------------------------------------------
1 | {
2 | "description": "Legacy template, used by Original Alpaca repository.",
3 | "prompt_input": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:",
4 | "prompt_no_input": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:",
5 | "response_split": "### Response:"
6 | }
7 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/templates/alpaca_short.json:
--------------------------------------------------------------------------------
1 | {
2 | "description": "A shorter template to experiment with.",
3 | "prompt_input": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
4 | "prompt_no_input": "### Instruction:\n{instruction}\n\n### Response:\n",
5 | "response_split": "### Response:"
6 | }
7 |
--------------------------------------------------------------------------------
/src/eval/mbpp_500/llmchain/utils/templates/vigogne.json:
--------------------------------------------------------------------------------
1 | {
2 | "description": "French template, used by Vigogne for finetuning.",
3 | "prompt_input": "Ci-dessous se trouve une instruction qui décrit une tâche, associée à une entrée qui fournit un contexte supplémentaire. Écrivez une réponse qui complète correctement la demande.\n\n### Instruction:\n{instruction}\n\n### Entrée:\n{input}\n\n### Réponse:\n",
4 | "prompt_no_input": "Ci-dessous se trouve une instruction qui décrit une tâche. Écrivez une réponse qui complète correctement la demande.\n\n### Instruction:\n{instruction}\n\n### Réponse:\n",
5 | "response_split": "### Réponse:"
6 | }
7 |
--------------------------------------------------------------------------------
/src/eval/reference/references_mbpp.json:
--------------------------------------------------------------------------------
1 | ["assert remove_Occ(\"hello\",\"l\") == \"heo\"\nassert remove_Occ(\"abcda\",\"a\") == \"bcd\"\nassert remove_Occ(\"PHP\",\"P\") == \"H\"", "assert sort_matrix([[1, 2, 3], [2, 4, 5], [1, 1, 1]])==[[1, 1, 1], [1, 2, 3], [2, 4, 5]]\nassert sort_matrix([[1, 2, 3], [-2, 4, -5], [1, -1, 1]])==[[-2, 4, -5], [1, -1, 1], [1, 2, 3]]\nassert sort_matrix([[5,8,9],[6,4,3],[2,1,4]])==[[2, 1, 4], [6, 4, 3], [5, 8, 9]]", "assert count_common(['red','green','black','pink','black','white','black','eyes','white','black','orange','pink','pink','red','red','white','orange','white',\"black\",'pink','green','green','pink','green','pink','white','orange',\"orange\",'red']) == [('pink', 6), ('black', 5), ('white', 5), ('red', 4)]\nassert count_common(['one', 'two', 'three', 'four', 'five', 'one', 'two', 'one', 'three', 'one']) == [('one', 4), ('two', 2), ('three', 2), ('four', 1)]\nassert count_common(['Facebook', 'Apple', 'Amazon', 'Netflix', 'Google', 'Apple', 'Netflix', 'Amazon']) == [('Apple', 2), ('Amazon', 2), ('Netflix', 2), ('Facebook', 1)]", "assert find_Volume(10,8,6) == 240\nassert find_Volume(3,2,2) == 6\nassert find_Volume(1,2,1) == 1", "assert split_lowerstring(\"AbCd\")==['bC','d']\nassert split_lowerstring(\"Python\")==['y', 't', 'h', 'o', 'n']\nassert split_lowerstring(\"Programming\")==['r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g']", "assert text_lowercase_underscore(\"aab_cbbbc\")==('Found a match!')\nassert text_lowercase_underscore(\"aab_Abbbc\")==('Not matched!')\nassert text_lowercase_underscore(\"Aaab_abbbc\")==('Not matched!')", "assert square_perimeter(10)==40\nassert square_perimeter(5)==20\nassert square_perimeter(4)==16", "assert remove_dirty_chars(\"probasscurve\", \"pros\") == 'bacuve'\nassert remove_dirty_chars(\"digitalindia\", \"talent\") == 'digiidi'\nassert remove_dirty_chars(\"exoticmiles\", \"toxic\") == 'emles' ", "assert test_duplicate(([1,2,3,4,5]))==False\nassert test_duplicate(([1,2,3,4, 4]))==True\nassert test_duplicate([1,1,2,2,3,3,4,4,5])==True", "assert is_woodall(383) == True\nassert is_woodall(254) == False\nassert is_woodall(200) == False", "assert multiples_of_num(4,3)== [3,6,9,12]\nassert multiples_of_num(2,5)== [5,10]\nassert multiples_of_num(9,2)== [2,4,6,8,10,12,14,16,18]", "assert find_first_duplicate(([1, 2, 3, 4, 4, 5]))==4\nassert find_first_duplicate([1, 2, 3, 4])==-1\nassert find_first_duplicate([1, 1, 2, 3, 3, 2, 2])==1", "assert maximum_Sum([[1,2,3],[4,5,6],[10,11,12],[7,8,9]]) == 33\nassert maximum_Sum([[0,1,1],[1,1,2],[3,2,1]]) == 6\nassert maximum_Sum([[0,1,3],[1,2,1],[9,8,2],[0,1,0],[6,4,8]]) == 19", "assert binary_to_decimal(100) == 4\nassert binary_to_decimal(1011) == 11\nassert binary_to_decimal(1101101) == 109", "assert find_Product([1,1,2,3],4) == 6\nassert find_Product([1,2,3,1,1],5) == 6\nassert find_Product([1,1,4,5,6],5) == 120", "assert check_k_elements([(4, 4), (4, 4, 4), (4, 4), (4, 4, 4, 4), (4, )], 4) == True\nassert check_k_elements([(7, 7, 7), (7, 7)], 7) == True\nassert check_k_elements([(9, 9), (9, 9, 9, 9)], 7) == False", "assert remove(['4words', '3letters', '4digits']) == ['words', 'letters', 'digits']\nassert remove(['28Jan','12Jan','11Jan']) == ['Jan','Jan','Jan']\nassert remove(['wonder1','wonder2','wonder3']) == ['wonder','wonder','wonder']", "assert binomial_Coeff(5,2) == 10\nassert binomial_Coeff(4,3) == 4\nassert binomial_Coeff(3,2) == 3", "assert get_Odd_Occurrence([1,2,3,1,2,3,1],7) == 1\nassert get_Odd_Occurrence([1,2,3,2,3,1,3],7) == 3\nassert get_Odd_Occurrence([2,3,5,4,5,2,4,3,5,2,4,4,2],13) == 5", "assert count_Substring_With_Equal_Ends(\"abc\") == 3\nassert count_Substring_With_Equal_Ends(\"abcda\") == 6\nassert count_Substring_With_Equal_Ends(\"ab\") == 2", "assert func([[1, 2, 6], [1, 3, 4, 5, 7, 8], [1, 3, 5, 6, 8, 9], [2, 5, 7, 11], [1, 4, 7, 8, 12]],3)==[5, 7, 1]\nassert func([[1, 2, 6], [1, 3, 4, 5, 7, 8], [1, 3, 5, 6, 8, 9], [2, 5, 7, 11], [1, 4, 7, 8, 12]],1)==[1]\nassert func([[1, 2, 6], [1, 3, 4, 5, 7, 8], [1, 3, 5, 6, 8, 9], [2, 5, 7, 11], [1, 4, 7, 8, 12]],5)==[6, 5, 7, 8, 1]", "assert max_Prime_Factors(15) == 5\nassert max_Prime_Factors(6) == 3\nassert max_Prime_Factors(2) == 2", "assert decimal_To_Binary(10) == 1010\nassert decimal_To_Binary(1) == 1\nassert decimal_To_Binary(20) == 10100", "assert find_missing([1,2,3,5],4) == 4\nassert find_missing([1,3,4,5],4) == 2\nassert find_missing([1,2,3,5,6,7],5) == 4", "assert find_rect_num(4) == 20\nassert find_rect_num(5) == 30\nassert find_rect_num(6) == 42", "assert find_Nth_Digit(1,2,1) == 5\nassert find_Nth_Digit(3,5,1) == 6\nassert find_Nth_Digit(5,6,5) == 3", "assert sort_mixed_list([19,'red',12,'green','blue', 10,'white','green',1])==[1, 10, 12, 19, 'blue', 'green', 'green', 'red', 'white']\nassert sort_mixed_list([19,'red',12,'green','blue', 10,'white','green',1])==[1, 10, 12, 19, 'blue', 'green', 'green', 'red', 'white']\nassert sort_mixed_list([19,'red',12,'green','blue', 10,'white','green',1])==[1, 10, 12, 19, 'blue', 'green', 'green', 'red', 'white']", "assert div_even_odd([1,3,5,7,4,1,6,8])==4\nassert div_even_odd([1,2,3,4,5,6,7,8,9,10])==2\nassert div_even_odd([1,5,7,9,10])==10", "assert rearange_string(\"aab\")==('aba')\nassert rearange_string(\"aabb\")==('abab')\nassert rearange_string(\"abccdd\")==('cdabcd')", "assert freq_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]])==({2: 3, 1: 2, 5: 2, 3: 1, 4: 1, 6: 1, 7: 1, 9: 1})\nassert freq_element([[1,2,3,4],[5,6,7,8],[9,10,11,12]])==({1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1})\nassert freq_element([[15,20,30,40],[80,90,100,110],[30,30,80,90]])==({30: 3, 80: 2, 90: 2, 15: 1, 20: 1, 40: 1, 100: 1, 110: 1})", "assert filter_evennumbers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])==[2, 4, 6, 8, 10]\nassert filter_evennumbers([10,20,45,67,84,93])==[10,20,84]\nassert filter_evennumbers([5,7,9,8,6,4,3])==[8,6,4]", "assert find_Sum([1,2,3,1,1,4,5,6],8) == 3\nassert find_Sum([1,2,3,1,1],5) == 3\nassert find_Sum([1,1,2],3) == 2", "assert text_match(\"aab_cbbbc\") == 'Found a match!'\nassert text_match(\"aab_Abbbc\") == 'Not matched!'\nassert text_match(\"Aaab_abbbc\") == 'Not matched!'", "assert text_match_string(\" python\")==('Not matched!')\nassert text_match_string(\"python\")==('Found a match!')\nassert text_match_string(\" lang\")==('Not matched!')", "assert get_gcd([2, 4, 6, 8, 16]) == 2\nassert get_gcd([1, 2, 3]) == 1\nassert get_gcd([2, 4, 6, 8]) == 2 ", "assert test_distinct([1,5,7,9]) == True\nassert test_distinct([2,4,5,5,7,9]) == False\nassert test_distinct([1,2,3]) == True", "assert compute_Last_Digit(2,4) == 2\nassert compute_Last_Digit(6,8) == 6\nassert compute_Last_Digit(1,2) == 2", "assert odd_bit_set_number(10) == 15\nassert odd_bit_set_number(20) == 21\nassert odd_bit_set_number(30) == 31", "assert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],0)==[1, 4, 7]\nassert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],2)==[3, 6, 9]\nassert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],1)==[2,5,1]", "assert min_length_list([[0], [1, 3], [5, 7], [9, 11], [13, 15, 17]])==(1, [0])\nassert min_length_list([[1,2,3,4,5],[1,2,3,4],[1,2,3],[1,2],[1]])==(1,[1])\nassert min_length_list([[3,4,5],[6,7,8,9],[10,11,12],[1,2]])==(2,[1,2])", "assert check_equilateral(6,8,12)==False \nassert check_equilateral(6,6,12)==False\nassert check_equilateral(6,6,6)==True", "assert parallelogram_area(10,20)==200\nassert parallelogram_area(15,20)==300\nassert parallelogram_area(8,9)==72", "assert check_Equality(\"abcda\") == \"Equal\"\nassert check_Equality(\"ab\") == \"Not Equal\"\nassert check_Equality(\"mad\") == \"Not Equal\"", "assert counting_sort([1,23,4,5,6,7,8]) == [1, 4, 5, 6, 7, 8, 23]\nassert counting_sort([12, 9, 28, 33, 69, 45]) == [9, 12, 28, 33, 45, 69]\nassert counting_sort([8, 4, 14, 3, 2, 1]) == [1, 2, 3, 4, 8, 14]", "assert tn_gp(1,5,2)==16\nassert tn_gp(1,5,4)==256\nassert tn_gp(2,6,3)==486", "assert check(70) == False\nassert check(23) == False\nassert check(73) == True", "assert find_Max_Num([1,2,3],3) == 321\nassert find_Max_Num([4,5,6,1],4) == 6541\nassert find_Max_Num([1,2,3,9],4) == 9321", "assert opposite_Signs(1,-2) == True\nassert opposite_Signs(3,2) == False\nassert opposite_Signs(-10,-10) == False", "assert is_octagonal(5) == 65\nassert is_octagonal(10) == 280\nassert is_octagonal(15) == 645", "assert max_len_sub([2, 5, 6, 3, 7, 6, 5, 8], 8) == 5\nassert max_len_sub([-2, -1, 5, -1, 4, 0, 3], 7) == 4\nassert max_len_sub([9, 11, 13, 15, 18], 5) == 1", "assert count_Substrings('112112',6) == 6\nassert count_Substrings('111',3) == 6\nassert count_Substrings('1101112',7) == 12", "assert smallest_num([10, 20, 1, 45, 99]) == 1\nassert smallest_num([1, 2, 3]) == 1\nassert smallest_num([45, 46, 50, 60]) == 45", "assert max_difference([(3, 5), (1, 7), (10, 3), (1, 2)]) == 7\nassert max_difference([(4, 6), (2, 17), (9, 13), (11, 12)]) == 15\nassert max_difference([(12, 35), (21, 27), (13, 23), (41, 22)]) == 23", "assert subject_marks([('English', 88), ('Science', 90), ('Maths', 97), ('Social sciences', 82)])==[('Social sciences', 82), ('English', 88), ('Science', 90), ('Maths', 97)]\nassert subject_marks([('Telugu',49),('Hindhi',54),('Social',33)])==([('Social',33),('Telugu',49),('Hindhi',54)])\nassert subject_marks([('Physics',96),('Chemistry',97),('Biology',45)])==([('Biology',45),('Physics',96),('Chemistry',97)])", "assert recursive_list_sum(([1, 2, [3,4],[5,6]]))==21\nassert recursive_list_sum(([7, 10, [15,14],[19,41]]))==106\nassert recursive_list_sum(([10, 20, [30,40],[50,60]]))==210", "assert pos_count([1,-2,3,-4]) == 2\nassert pos_count([3,4,5,-1]) == 3\nassert pos_count([1,2,3,4]) == 4", "assert bell_number(2)==2\nassert bell_number(10)==115975\nassert bell_number(56)==6775685320645824322581483068371419745979053216268760300", "assert is_Monotonic([6, 5, 4, 4]) == True\nassert is_Monotonic([1, 2, 2, 3]) == True\nassert is_Monotonic([1, 3, 2]) == False", "assert is_sublist([2,4,3,5,7],[3,7])==False\nassert is_sublist([2,4,3,5,7],[4,3])==True\nassert is_sublist([2,4,3,5,7],[1,6])==False", "assert get_equal([(11, 22, 33), (44, 55, 66)], 3) == 'All tuples have same length'\nassert get_equal([(1, 2, 3), (4, 5, 6, 7)], 3) == 'All tuples do not have same length'\nassert get_equal([(1, 2), (3, 4)], 2) == 'All tuples have same length'", "assert comb_sort([5, 15, 37, 25, 79]) == [5, 15, 25, 37, 79]\nassert comb_sort([41, 32, 15, 19, 22]) == [15, 19, 22, 32, 41]\nassert comb_sort([99, 15, 13, 47]) == [13, 15, 47, 99]", "assert dif_Square(5) == True\nassert dif_Square(10) == False\nassert dif_Square(15) == True", "assert multiple_split('Forces of the \\ndarkness*are coming into the play.') == ['Forces of the ', 'darkness', 'are coming into the play.']\nassert multiple_split('Mi Box runs on the \\n Latest android*which has google assistance and chromecast.') == ['Mi Box runs on the ', ' Latest android', 'which has google assistance and chromecast.']\nassert multiple_split('Certain services\\nare subjected to change*over the seperate subscriptions.') == ['Certain services', 'are subjected to change', 'over the seperate subscriptions.']", "assert is_samepatterns([\"red\",\"green\",\"green\"], [\"a\", \"b\", \"b\"])==True \nassert is_samepatterns([\"red\",\"green\",\"greenn\"], [\"a\",\"b\",\"b\"])==False \nassert is_samepatterns([\"red\",\"green\",\"greenn\"], [\"a\",\"b\"])==False ", "assert find_tuples([(6, 24, 12), (7, 9, 6), (12, 18, 21)], 6) == '[(6, 24, 12)]'\nassert find_tuples([(5, 25, 30), (4, 2, 3), (7, 8, 9)], 5) == '[(5, 25, 30)]'\nassert find_tuples([(7, 9, 16), (8, 16, 4), (19, 17, 18)], 4) == '[(8, 16, 4)]'", "assert count_Squares(4,3) == 20\nassert count_Squares(2,2) == 5\nassert count_Squares(1,1) == 1", "assert is_Diff (12345) == False\nassert is_Diff(1212112) == True\nassert is_Diff(1212) == False", "assert count_With_Odd_SetBits(5) == 3\nassert count_With_Odd_SetBits(10) == 5\nassert count_With_Odd_SetBits(15) == 8", "assert word_len(\"Hadoop\") == False\nassert word_len(\"great\") == True\nassert word_len(\"structure\") == True", "assert tetrahedral_number(5) == 35.0\nassert tetrahedral_number(6) == 56.0\nassert tetrahedral_number(7) == 84.0", "assert zip_tuples((7, 8, 4, 5, 9, 10),(1, 5, 6) ) == [(7, 1), (8, 5), (4, 6), (5, 1), (9, 5), (10, 6)]\nassert zip_tuples((8, 9, 5, 6, 10, 11),(2, 6, 7) ) == [(8, 2), (9, 6), (5, 7), (6, 2), (10, 6), (11, 7)]\nassert zip_tuples((9, 10, 6, 7, 11, 12),(3, 7, 8) ) == [(9, 3), (10, 7), (6, 8), (7, 3), (11, 7), (12, 8)]", "assert volume_sphere(10)==4188.790204786391\nassert volume_sphere(25)==65449.84694978735\nassert volume_sphere(20)==33510.32163829113", "assert get_Char(\"abc\") == \"f\"\nassert get_Char(\"gfg\") == \"t\"\nassert get_Char(\"ab\") == \"c\"", "assert sequence(10) == 6\nassert sequence(2) == 1\nassert sequence(3) == 2", "assert surfacearea_sphere(10)==1256.6370614359173\nassert surfacearea_sphere(15)==2827.4333882308138\nassert surfacearea_sphere(20)==5026.548245743669", "assert centered_hexagonal_number(10) == 271\nassert centered_hexagonal_number(2) == 7\nassert centered_hexagonal_number(9) == 217", "assert merge_dictionaries_three({ \"R\": \"Red\", \"B\": \"Black\", \"P\": \"Pink\" }, { \"G\": \"Green\", \"W\": \"White\" },{ \"O\": \"Orange\", \"W\": \"White\", \"B\": \"Black\" })=={'B': 'Black', 'R': 'Red', 'P': 'Pink', 'G': 'Green', 'W': 'White', 'O': 'Orange'}\nassert merge_dictionaries_three({ \"R\": \"Red\", \"B\": \"Black\", \"P\": \"Pink\" }, { \"G\": \"Green\", \"W\": \"White\" },{\"L\":\"lavender\",\"B\":\"Blue\"})=={'W': 'White', 'P': 'Pink', 'B': 'Black', 'R': 'Red', 'G': 'Green', 'L': 'lavender'}\nassert merge_dictionaries_three({ \"R\": \"Red\", \"B\": \"Black\", \"P\": \"Pink\" },{\"L\":\"lavender\",\"B\":\"Blue\"},{ \"G\": \"Green\", \"W\": \"White\" })=={'B': 'Black', 'P': 'Pink', 'R': 'Red', 'G': 'Green', 'L': 'lavender', 'W': 'White'}", "assert freq_count([10,10,10,10,20,20,20,20,40,40,50,50,30])==({10: 4, 20: 4, 40: 2, 50: 2, 30: 1}) \nassert freq_count([1,2,3,4,3,2,4,1,3,1,4])==({1:3, 2:2,3:3,4:3}) \nassert freq_count([5,6,7,4,9,10,4,5,6,7,9,5])==({10:1,5:3,6:2,7:2,4:2,9:2}) ", "assert closest_num(11) == 10\nassert closest_num(7) == 6\nassert closest_num(12) == 11", "assert len_log([\"python\",\"PHP\",\"bigdata\"]) == 7\nassert len_log([\"a\",\"ab\",\"abc\"]) == 3\nassert len_log([\"small\",\"big\",\"tall\"]) == 5", "assert find_substring([\"red\", \"black\", \"white\", \"green\", \"orange\"],\"ack\")==True\nassert find_substring([\"red\", \"black\", \"white\", \"green\", \"orange\"],\"abc\")==False\nassert find_substring([\"red\", \"black\", \"white\", \"green\", \"orange\"],\"ange\")==True", "assert is_undulating(\"1212121\") == True\nassert is_undulating(\"1991\") == False\nassert is_undulating(\"121\") == True", "assert power(3,4) == 81\nassert power(2,3) == 8\nassert power(5,5) == 3125", "assert index_minimum([('Rash', 143), ('Manjeet', 200), ('Varsha', 100)]) == 'Varsha'\nassert index_minimum([('Yash', 185), ('Dawood', 125), ('Sanya', 175)]) == 'Dawood'\nassert index_minimum([('Sai', 345), ('Salman', 145), ('Ayesha', 96)]) == 'Ayesha'", "assert Find_Min_Length([[1],[1,2]]) == 1\nassert Find_Min_Length([[1,2],[1,2,3],[1,2,3,4]]) == 2\nassert Find_Min_Length([[3,3,3],[4,4,4,4]]) == 3", "assert divisor(15) == 4 \nassert divisor(12) == 6\nassert divisor(9) == 3", "assert frequency_lists([[1, 2, 3, 2], [4, 5, 6, 2], [7, 8, 9, 5]])=={1: 1, 2: 3, 3: 1, 4: 1, 5: 2, 6: 1, 7: 1, 8: 1, 9: 1}\nassert frequency_lists([[1,2,3,4],[5,6,7,8],[9,10,11,12]])=={1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1,10:1,11:1,12:1}\nassert frequency_lists([[20,30,40,17],[18,16,14,13],[10,20,30,40]])=={20:2,30:2,40:2,17: 1,18:1, 16: 1,14: 1,13: 1, 10: 1}", "assert multiply_num((8, 2, 3, -1, 7))==-67.2\nassert multiply_num((-10,-20,-30))==-2000.0\nassert multiply_num((19,15,18))==1710.0", "assert decimal_to_binary(8) == '1000'\nassert decimal_to_binary(18) == '10010'\nassert decimal_to_binary(7) == '111' ", "assert next_smallest_palindrome(99)==101\nassert next_smallest_palindrome(1221)==1331\nassert next_smallest_palindrome(120)==121", "assert kth_element([12,3,5,7,19], 5, 2) == 3\nassert kth_element([17,24,8,23], 4, 3) == 8\nassert kth_element([16,21,25,36,4], 5, 4) == 36", "assert snake_to_camel('python_program')=='PythonProgram'\nassert snake_to_camel('python_language')==('PythonLanguage')\nassert snake_to_camel('programming_language')==('ProgrammingLanguage')", "assert eulerian_num(3, 1) == 4\nassert eulerian_num(4, 1) == 11\nassert eulerian_num(5, 3) == 26", "assert sort_sublists(([\"green\", \"orange\"], [\"black\", \"white\"], [\"white\", \"black\", \"orange\"]))==[['green', 'orange'], ['black', 'white'], ['black', 'orange', 'white']]\nassert sort_sublists(([\" red \",\"green\" ],[\"blue \",\" black\"],[\" orange\",\"brown\"]))==[[' red ', 'green'], [' black', 'blue '], [' orange', 'brown']]\nassert sort_sublists(([\"zilver\",\"gold\"], [\"magnesium\",\"aluminium\"], [\"steel\", \"bronze\"]))==[['gold', 'zilver'],['aluminium', 'magnesium'], ['bronze', 'steel']]", "assert count([True,False,True]) == 2\nassert count([False,False]) == 0\nassert count([True,True,True]) == 3", "assert add_lists([5, 6, 7], (9, 10)) == (9, 10, 5, 6, 7)\nassert add_lists([6, 7, 8], (10, 11)) == (10, 11, 6, 7, 8)\nassert add_lists([7, 8, 9], (11, 12)) == (11, 12, 7, 8, 9)", "assert count_Hexadecimal(10,15) == 6\nassert count_Hexadecimal(2,4) == 0\nassert count_Hexadecimal(15,16) == 1", "assert merge_sorted_list([25, 24, 15, 4, 5, 29, 110],[19, 20, 11, 56, 25, 233, 154],[24, 26, 54, 48])==[4, 5, 11, 15, 19, 20, 24, 24, 25, 25, 26, 29, 48, 54, 56, 110, 154, 233]\nassert merge_sorted_list([1, 3, 5, 6, 8, 9], [2, 5, 7, 11], [1, 4, 7, 8, 12])==[1, 1, 2, 3, 4, 5, 5, 6, 7, 7, 8, 8, 9, 11, 12]\nassert merge_sorted_list([18, 14, 10, 9, 8, 7, 9, 3, 2, 4, 1],[25, 35, 22, 85, 14, 65, 75, 25, 58],[12, 74, 9, 50, 61, 41])==[1, 2, 3, 4, 7, 8, 9, 9, 9, 10, 12, 14, 14, 18, 22, 25, 25, 35, 41, 50, 58, 61, 65, 74, 75, 85]", "assert odd_Equivalent(\"011001\",6) == 3\nassert odd_Equivalent(\"11011\",5) == 4\nassert odd_Equivalent(\"1010\",4) == 2", "assert extract_missing([(6, 9), (15, 34), (48, 70)], 2, 100) == [(2, 6), (9, 100), (9, 15), (34, 100), (34, 48), (70, 100)]\nassert extract_missing([(7, 2), (15, 19), (38, 50)], 5, 60) == [(5, 7), (2, 60), (2, 15), (19, 60), (19, 38), (50, 60)]\nassert extract_missing([(7, 2), (15, 19), (38, 50)], 1, 52) == [(1, 7), (2, 52), (2, 15), (19, 52), (19, 38), (50, 52)]", "assert common_in_nested_lists([[12, 18, 23, 25, 45], [7, 12, 18, 24, 28], [1, 5, 8, 12, 15, 16, 18]])==[18, 12]\nassert common_in_nested_lists([[12, 5, 23, 25, 45], [7, 11, 5, 23, 28], [1, 5, 8, 18, 23, 16]])==[5,23]\nassert common_in_nested_lists([[2, 3,4, 1], [4, 5], [6,4, 8],[4, 5], [6, 8,4]])==[4]", "assert perimeter(2,4) == 12\nassert perimeter(1,2) == 6\nassert perimeter(3,1) == 8", "assert check_integer(\"python\")==False\nassert check_integer(\"1\")==True\nassert check_integer(\"12345\")==True", "assert assign_freq([(6, 5, 8), (2, 7), (6, 5, 8), (6, 5, 8), (9, ), (2, 7)] ) == '[(6, 5, 8, 3), (2, 7, 2), (9, 1)]'\nassert assign_freq([(4, 2, 4), (7, 1), (4, 8), (4, 2, 4), (9, 2), (7, 1)] ) == '[(4, 2, 4, 2), (7, 1, 2), (4, 8, 1), (9, 2, 1)]'\nassert assign_freq([(11, 13, 10), (17, 21), (4, 2, 3), (17, 21), (9, 2), (4, 2, 3)] ) == '[(11, 13, 10, 1), (17, 21, 2), (4, 2, 3, 2), (9, 2, 1)]'", "assert empty_dit([{},{},{}])==True\nassert empty_dit([{1,2},{},{}])==False\nassert empty_dit({})==True", "assert tuple_to_int((1,2,3))==123\nassert tuple_to_int((4,5,6))==456\nassert tuple_to_int((5,6,7))==567", "assert list_to_float( [(\"3\", \"4\"), (\"1\", \"26.45\"), (\"7.32\", \"8\"), (\"4\", \"8\")] ) == '[(3.0, 4.0), (1.0, 26.45), (7.32, 8.0), (4.0, 8.0)]'\nassert list_to_float( [(\"4\", \"4\"), (\"2\", \"27\"), (\"4.12\", \"9\"), (\"7\", \"11\")] ) == '[(4.0, 4.0), (2.0, 27.0), (4.12, 9.0), (7.0, 11.0)]'\nassert list_to_float( [(\"6\", \"78\"), (\"5\", \"26.45\"), (\"1.33\", \"4\"), (\"82\", \"13\")] ) == '[(6.0, 78.0), (5.0, 26.45), (1.33, 4.0), (82.0, 13.0)]'", "assert string_to_list(\"python programming\")==['python','programming']\nassert string_to_list(\"lists tuples strings\")==['lists','tuples','strings']\nassert string_to_list(\"write a program\")==['write','a','program']", "assert search([1,1,2,2,3],5) == 3\nassert search([1,1,3,3,4,4,5,5,7,7,8],11) == 8\nassert search([1,2,2,3,3,4,4],7) == 1", "assert max_product_tuple([(2, 7), (2, 6), (1, 8), (4, 9)] )==36\nassert max_product_tuple([(10,20), (15,2), (5,10)] )==200\nassert max_product_tuple([(11,44), (10,15), (20,5), (12, 9)] )==484", "assert check_triplet([2, 7, 4, 0, 9, 5, 1, 3], 8, 6, 0) == True\nassert check_triplet([1, 4, 5, 6, 7, 8, 5, 9], 8, 6, 0) == False\nassert check_triplet([10, 4, 2, 3, 5], 5, 15, 0) == True", "assert smartNumber(1) == 30\nassert smartNumber(50) == 273\nassert smartNumber(1000) == 2664", "assert amicable_numbers_sum(999)==504\nassert amicable_numbers_sum(9999)==31626\nassert amicable_numbers_sum(99)==0", "assert angle_complex(0,1j)==1.5707963267948966 \nassert angle_complex(2,1j)==0.4636476090008061\nassert angle_complex(0,2j)==1.5707963267948966", "assert find_length(\"11000010001\", 11) == 6\nassert find_length(\"10111\", 5) == 1\nassert find_length(\"11011101100101\", 14) == 2 ", "assert sum(10,15) == 6\nassert sum(100,150) == 93\nassert sum(4,6) == 3", "assert multiply_int(10,20)==200\nassert multiply_int(5,10)==50\nassert multiply_int(4,8)==32", "assert long_words(3,\"python is a programming language\")==['python','programming','language']\nassert long_words(2,\"writing a program\")==['writing','program']\nassert long_words(5,\"sorting list\")==['sorting']", "assert magic_square_test([[7, 12, 1, 14], [2, 13, 8, 11], [16, 3, 10, 5], [9, 6, 15, 4]])==True\nassert magic_square_test([[2, 7, 6], [9, 5, 1], [4, 3, 8]])==True\nassert magic_square_test([[2, 7, 6], [9, 5, 1], [4, 3, 7]])==False", "assert max_occurrences([2,3,8,4,7,9,8,2,6,5,1,6,1,2,3,2,4,6,9,1,2])==(2, 5)\nassert max_occurrences([2,3,8,4,7,9,8,7,9,15,14,10,12,13,16,16,18])==(8, 2)\nassert max_occurrences([10,20,20,30,40,90,80,50,30,20,50,10])==(20, 3)", "assert reverse_vowels(\"Python\") == \"Python\"\nassert reverse_vowels(\"USA\") == \"ASU\"\nassert reverse_vowels(\"ab\") == \"ab\"", "assert tup_string(('e', 'x', 'e', 'r', 'c', 'i', 's', 'e', 's'))==(\"exercises\")\nassert tup_string(('p','y','t','h','o','n'))==(\"python\")\nassert tup_string(('p','r','o','g','r','a','m'))==(\"program\")", "assert sum_negativenum([2, 4, -6, -9, 11, -12, 14, -5, 17])==-32\nassert sum_negativenum([10,15,-14,13,-18,12,-20])==-52\nassert sum_negativenum([19, -65, 57, 39, 152,-639, 121, 44, 90, -190])==-894", "assert check_last([5,7,10],3,1) == \"ODD\"\nassert check_last([2,3],2,3) == \"EVEN\"\nassert check_last([1,2,3],3,1) == \"ODD\"", "assert hexagonal_num(10) == 190\nassert hexagonal_num(5) == 45\nassert hexagonal_num(7) == 91", "assert cal_electbill(75)==246.25\nassert cal_electbill(265)==1442.75\nassert cal_electbill(100)==327.5", "assert zero_count([0, 1, 2, -1, -5, 6, 0, -3, -2, 3, 4, 6, 8])==0.15\nassert zero_count([2, 1, 2, -1, -5, 6, 4, -3, -2, 3, 4, 6, 8])==0.00\nassert zero_count([2, 4, -6, -9, 11, -12, 14, -5, 17])==0.00", "assert is_Sum_Of_Powers_Of_Two(10) == True\nassert is_Sum_Of_Powers_Of_Two(7) == False\nassert is_Sum_Of_Powers_Of_Two(14) == True", "assert circle_circumference(10)==62.830000000000005\nassert circle_circumference(5)==31.415000000000003\nassert circle_circumference(4)==25.132", "assert extract_singly([(3, 4, 5), (4, 5, 7), (1, 4)]) == [3, 4, 5, 7, 1]\nassert extract_singly([(1, 2, 3), (4, 2, 3), (7, 8)]) == [1, 2, 3, 4, 7, 8]\nassert extract_singly([(7, 8, 9), (10, 11, 12), (10, 11)]) == [7, 8, 9, 10, 11, 12]", "assert pancake_sort([15, 79, 25, 38, 69]) == [15, 25, 38, 69, 79]\nassert pancake_sort([98, 12, 54, 36, 85]) == [12, 36, 54, 85, 98]\nassert pancake_sort([41, 42, 32, 12, 23]) == [12, 23, 32, 41, 42]", "assert count_samepair([1,2,3,4,5,6,7,8],[2,2,3,1,2,6,7,9],[2,1,3,1,2,6,7,9])==3\nassert count_samepair([1,2,3,4,5,6,7,8],[2,2,3,1,2,6,7,8],[2,1,3,1,2,6,7,8])==4\nassert count_samepair([1,2,3,4,2,6,7,8],[2,2,3,1,2,6,7,8],[2,1,3,1,2,6,7,8])==5", "assert find_lists(([1, 2, 3, 4], [5, 6, 7, 8])) == 2\nassert find_lists(([1, 2], [3, 4], [5, 6])) == 3\nassert find_lists(([9, 8, 7, 6, 5, 4, 3, 2, 1])) == 1", "assert sum_Pairs([1,8,9,15,16],5) == 74\nassert sum_Pairs([1,2,3,4],4) == 10\nassert sum_Pairs([1,2,3,4,5,7,9,11,14],9) == 188", "assert max_Abs_Diff((2,1,5,3),4) == 4\nassert max_Abs_Diff((9,3,2,5,1),5) == 8\nassert max_Abs_Diff((3,2,1),3) == 2", "assert ascii_value_string(\"python\")==112\nassert ascii_value_string(\"Program\")==80\nassert ascii_value_string(\"Language\")==76", "assert max_path_sum([[1, 0, 0], [4, 8, 0], [1, 5, 3]], 2, 2) == 14\nassert max_path_sum([[13, 0, 0], [7, 4, 0], [2, 4, 6]], 2, 2) == 24 \nassert max_path_sum([[2, 0, 0], [11, 18, 0], [21, 25, 33]], 2, 2) == 53", "assert sum_digits_twoparts(35)==17\nassert sum_digits_twoparts(7)==7\nassert sum_digits_twoparts(100)==19", "assert longest_subseq_with_diff_one([1, 2, 3, 4, 5, 3, 2], 7) == 6\nassert longest_subseq_with_diff_one([10, 9, 4, 5, 4, 8, 6], 7) == 3\nassert longest_subseq_with_diff_one([1, 2, 3, 2, 3, 7, 2, 1], 8) == 7", "assert does_Contain_B(1,7,3) == True\nassert does_Contain_B(1,-3,5) == False\nassert does_Contain_B(3,2,5) == False", "assert is_coprime(17,13) == True\nassert is_coprime(15,21) == False\nassert is_coprime(25,45) == False", "assert merge_sort([3, 4, 2, 6, 5, 7, 1, 9]) == [1, 2, 3, 4, 5, 6, 7, 9]\nassert merge_sort([7, 25, 45, 78, 11, 33, 19]) == [7, 11, 19, 25, 33, 45, 78]\nassert merge_sort([3, 1, 4, 9, 8]) == [1, 3, 4, 8, 9]", "assert parabola_vertex(5,3,2)==(-0.3, 1.55)\nassert parabola_vertex(9,8,4)==(-0.4444444444444444, 2.2222222222222223)\nassert parabola_vertex(2,4,6)==(-1.0, 4.0)", "assert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],0)==[1, 4, 7]\nassert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],2)==[3, 6, 9]\nassert specified_element([[1, 2, 3, 2], [4, 5, 6, 2], [7, 1, 9, 5]],3)==[2,2,5]", "assert even_bit_toggle_number(10) == 0\nassert even_bit_toggle_number(20) == 30\nassert even_bit_toggle_number(30) == 20", "assert tuple_int_str((('333', '33'), ('1416', '55')))==((333, 33), (1416, 55))\nassert tuple_int_str((('999', '99'), ('1000', '500')))==((999, 99), (1000, 500))\nassert tuple_int_str((('666', '66'), ('1500', '555')))==((666, 66), (1500, 555))", "assert encode_list([1,1,2,3,4,4.3,5,1])==[[2, 1], [1, 2], [1, 3], [1, 4], [1, 4.3], [1, 5], [1, 1]]\nassert encode_list('automatically')==[[1, 'a'], [1, 'u'], [1, 't'], [1, 'o'], [1, 'm'], [1, 'a'], [1, 't'], [1, 'i'], [1, 'c'], [1, 'a'], [2, 'l'], [1, 'y']]\nassert encode_list('python')==[[1, 'p'], [1, 'y'], [1, 't'], [1, 'h'], [1, 'o'], [1, 'n']]", "assert min_Ops([2,2,2,2],4,3) == 0\nassert min_Ops([4,2,6,8],4,3) == -1\nassert min_Ops([21,33,9,45,63],5,6) == 24", "assert month_season('January',4)==('winter')\nassert month_season('October',28)==('autumn')\nassert month_season('June',6)==('spring')", "assert solution(2, 3, 7) == ('x = ', 2, ', y = ', 1)\nassert solution(4, 2, 7) == 'No solution'\nassert solution(1, 13, 17) == ('x = ', 4, ', y = ', 1)", "assert remove_elements([1,2,3,4,5,6,7,8,9,10],[2,4,6,8])==[1, 3, 5, 7, 9, 10]\nassert remove_elements([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[1, 3, 5, 7])==[2, 4, 6, 8, 9, 10]\nassert remove_elements([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[5,7])==[1, 2, 3, 4, 6, 8, 9, 10]", "assert sum_series(6)==12\nassert sum_series(10)==30\nassert sum_series(9)==25", "assert area_polygon(4,20)==400.00000000000006\nassert area_polygon(10,15)==1731.1969896610804\nassert area_polygon(9,7)==302.90938549487214", "assert areEquivalent(36,57) == False\nassert areEquivalent(2,4) == False\nassert areEquivalent(23,47) == True", "assert count_char_position(\"xbcefg\") == 2\nassert count_char_position(\"ABcED\") == 3\nassert count_char_position(\"AbgdeF\") == 5", "assert find_even_Pair([5,4,7,2,1],5) == 4\nassert find_even_Pair([7,2,8,1,0,5,11],7) == 9\nassert find_even_Pair([1,2,3],3) == 1", "assert next_Power_Of_2(0) == 1\nassert next_Power_Of_2(5) == 8\nassert next_Power_Of_2(17) == 32", "assert frequency([1,2,3],4) == 0\nassert frequency([1,2,2,3,3,3,4],3) == 3\nassert frequency([0,1,2,3,1,2],1) == 2", "assert get_pell(4) == 12\nassert get_pell(7) == 169\nassert get_pell(8) == 408", "assert sum_range_list( [2,1,5,6,8,3,4,9,10,11,8,12],8,10)==29\nassert sum_range_list( [2,1,5,6,8,3,4,9,10,11,8,12],5,7)==16\nassert sum_range_list( [2,1,5,6,8,3,4,9,10,11,8,12],7,10)==38", "assert perimeter_pentagon(5)==25\nassert perimeter_pentagon(10)==50\nassert perimeter_pentagon(15)==75", "assert count_occurance(\"letstdlenstdporstd\") == 3\nassert count_occurance(\"truststdsolensporsd\") == 1\nassert count_occurance(\"makestdsostdworthit\") == 2", "assert remove_splchar('python @#&^%$*program123')==('pythonprogram123')\nassert remove_splchar('python %^$@!^&*() programming24%$^^() language')==('pythonprogramming24language')\nassert remove_splchar('python ^%&^()(+_)(_^&67) program')==('python67program')", "assert group_keyvalue([('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)])=={'yellow': [1, 3], 'blue': [2, 4], 'red': [1]}\nassert group_keyvalue([('python', 1), ('python', 2), ('python', 3), ('python', 4), ('python', 5)])=={'python': [1,2,3,4,5]}\nassert group_keyvalue([('yellow',100), ('blue', 200), ('yellow', 300), ('blue', 400), ('red', 100)])=={'yellow': [100, 300], 'blue': [200, 400], 'red': [100]}", "assert is_valid_parenthese(\"(){}[]\")==True\nassert is_valid_parenthese(\"()[{)}\")==False\nassert is_valid_parenthese(\"()\")==True", "assert perimeter_triangle(10,20,30)==60\nassert perimeter_triangle(3,4,5)==12\nassert perimeter_triangle(25,35,45)==105", "assert answer(3,8) == (3,6)\nassert answer(2,6) == (2,4)\nassert answer(1,3) == (1,2)", "assert string_literals(['language'],'python language')==('Matched!')\nassert string_literals(['program'],'python language')==('Not Matched!')\nassert string_literals(['python'],'programming language')==('Not Matched!')", "assert is_num_keith(14) == True\nassert is_num_keith(12) == False\nassert is_num_keith(197) == True", "assert distance_lat_long(23.5,67.5,25.5,69.5)==12179.372041317429\nassert distance_lat_long(10.5,20.5,30.5,40.5)==6069.397933300514\nassert distance_lat_long(10,20,30,40)==6783.751974994595", "assert common_prefix([\"tablets\", \"tables\", \"taxi\", \"tamarind\"], 4) == 'ta'\nassert common_prefix([\"apples\", \"ape\", \"april\"], 3) == 'ap'\nassert common_prefix([\"teens\", \"teenager\", \"teenmar\"], 3) == 'teen'", "assert find_character(\"ThisIsGeeksforGeeks\") == (['T', 'I', 'G', 'G'], ['h', 'i', 's', 's', 'e', 'e', 'k', 's', 'f', 'o', 'r', 'e', 'e', 'k', 's'], [], [])\nassert find_character(\"Hithere2\") == (['H'], ['i', 't', 'h', 'e', 'r', 'e'], ['2'], [])\nassert find_character(\"HeyFolks32\") == (['H', 'F'], ['e', 'y', 'o', 'l', 'k', 's'], ['3', '2'], [])", "assert count_pairs([1, 5, 3, 4, 2], 5, 3) == 2\nassert count_pairs([8, 12, 16, 4, 0, 20], 6, 4) == 5\nassert count_pairs([2, 4, 1, 3, 4], 5, 2) == 3", "assert greater_specificnum([220, 330, 500],200)==True\nassert greater_specificnum([12, 17, 21],20)==False\nassert greater_specificnum([1,2,3,4],10)==False", "assert parabola_focus(5,3,2)==(-0.3, 1.6)\nassert parabola_focus(9,8,4)==(-0.4444444444444444, 2.25)\nassert parabola_focus(2,4,6)==(-1.0, 4.125)", "assert check_literals('The quick brown fox jumps over the lazy dog.',['fox']) == 'Matched!'\nassert check_literals('The quick brown fox jumps over the lazy dog.',['horse']) == 'Not Matched!'\nassert check_literals('The quick brown fox jumps over the lazy dog.',['lazy']) == 'Matched!'", "assert longest_common_subsequence(\"AGGTAB\" , \"GXTXAYB\", 6, 7) == 4\nassert longest_common_subsequence(\"ABCDGH\" , \"AEDFHR\", 6, 6) == 3\nassert longest_common_subsequence(\"AXYT\" , \"AYZX\", 4, 4) == 2", "assert prod_Square(25) == False\nassert prod_Square(30) == False\nassert prod_Square(16) == True", "assert first_Missing_Positive([1,2,3,-1,5],5) == 4\nassert first_Missing_Positive([0,-1,-2,1,5,8],6) == 2\nassert first_Missing_Positive([0,1,2,5,-8],5) == 3", "assert count_Intgral_Points(1,1,4,4) == 4\nassert count_Intgral_Points(1,2,1,2) == 1\nassert count_Intgral_Points(4,2,6,4) == 1", "assert check_monthnumber(\"February\")==False\nassert check_monthnumber(\"June\")==True\nassert check_monthnumber(\"April\")==True", "assert check_String('thishasboth29') == True\nassert check_String('python') == False\nassert check_String ('string') == False", "assert remove_tuple((1, 3, 5, 2, 3, 5, 1, 1, 3)) == (1, 2, 3, 5)\nassert remove_tuple((2, 3, 4, 4, 5, 6, 6, 7, 8, 8)) == (2, 3, 4, 5, 6, 7, 8)\nassert remove_tuple((11, 12, 13, 11, 11, 12, 14, 13)) == (11, 12, 13, 14)", "assert octal_To_Decimal(25) == 21\nassert octal_To_Decimal(30) == 24\nassert octal_To_Decimal(40) == 32", "assert first([1,2,3,4,5,6,6],6,6) == 5\nassert first([1,2,2,2,3,2,2,4,2],2,9) == 1\nassert first([1,2,3],1,3) == 0", "assert remove_tuples([(4, 5), (4, ), (8, 6, 7), (1, ), (3, 4, 6, 7)] , 1) == [(4, 5), (8, 6, 7), (3, 4, 6, 7)]\nassert remove_tuples([(4, 5), (4,5), (6, 7), (1, 2, 3), (3, 4, 6, 7)] ,2) == [(1, 2, 3), (3, 4, 6, 7)]\nassert remove_tuples([(1, 4, 4), (4, 3), (8, 6, 7), (1, ), (3, 6, 7)] , 3) == [(4, 3), (1,)]", "assert find_exponentio((10, 4, 5, 6), (5, 6, 7, 5)) == (100000, 4096, 78125, 7776)\nassert find_exponentio((11, 5, 6, 7), (6, 7, 8, 6)) == (1771561, 78125, 1679616, 117649)\nassert find_exponentio((12, 6, 7, 8), (7, 8, 9, 7)) == (35831808, 1679616, 40353607, 2097152)", "assert largest_triangle(4,2)==10.392304845413264\nassert largest_triangle(5,7)==4.639421805988064\nassert largest_triangle(9,1)==105.2220865598093", "assert highest_Power_of_2(10) == 8\nassert highest_Power_of_2(19) == 16\nassert highest_Power_of_2(32) == 32", "assert position_max([12,33,23,10,67,89,45,667,23,12,11,10,54])==[7]\nassert position_max([1,2,2,2,4,4,4,5,5,5,5])==[7,8,9,10]\nassert position_max([2,1,5,6,8,3,4,9,10,11,8,12])==[11]", "assert chkList(['one','one','one']) == True\nassert chkList(['one','Two','Three']) == False\nassert chkList(['bigdata','python','Django']) == False", "assert remove_even(\"python\")==(\"pto\")\nassert remove_even(\"program\")==(\"porm\")\nassert remove_even(\"language\")==(\"lnug\")", "assert hamming_Distance(4,8) == 2\nassert hamming_Distance(2,4) == 2\nassert hamming_Distance(1,2) == 2", "assert count(\"abcc\",\"c\") == 2\nassert count(\"ababca\",\"a\") == 3\nassert count(\"mnmm0pm\",\"m\") == 4", "assert inversion_elements((7, 8, 9, 1, 10, 7)) == (-8, -9, -10, -2, -11, -8)\nassert inversion_elements((2, 4, 5, 6, 1, 7)) == (-3, -5, -6, -7, -2, -8)\nassert inversion_elements((8, 9, 11, 14, 12, 13)) == (-9, -10, -12, -15, -13, -14)", "assert concatenate_elements((\"DSP \", \"IS \", \"BEST \", \"FOR \", \"ALL \", \"UTS\")) == ('DSP IS ', 'IS BEST ', 'BEST FOR ', 'FOR ALL ', 'ALL UTS')\nassert concatenate_elements((\"RES \", \"IS \", \"BEST \", \"FOR \", \"ALL \", \"QESR\")) == ('RES IS ', 'IS BEST ', 'BEST FOR ', 'FOR ALL ', 'ALL QESR')\nassert concatenate_elements((\"MSAM\", \"IS \", \"BEST \", \"FOR \", \"ALL \", \"SKD\")) == ('MSAMIS ', 'IS BEST ', 'BEST FOR ', 'FOR ALL ', 'ALL SKD')", "assert find_longest_repeating_subseq(\"AABEBCDD\") == 3\nassert find_longest_repeating_subseq(\"aabb\") == 2\nassert find_longest_repeating_subseq(\"aab\") == 1", "assert is_decimal('123.11') == True\nassert is_decimal('0.21') == True\nassert is_decimal('123.1214') == False", "assert heap_replace( [25, 44, 68, 21, 39, 23, 89],21)==[21, 25, 23, 44, 39, 68, 89]\nassert heap_replace([25, 44, 68, 21, 39, 23, 89],110)== [23, 25, 68, 44, 39, 110, 89]\nassert heap_replace([25, 44, 68, 21, 39, 23, 89],500)==[23, 25, 68, 44, 39, 500, 89]", "assert is_allowed_specific_char(\"ABCDEFabcdef123450\") == True\nassert is_allowed_specific_char(\"*&%@#!}{\") == False\nassert is_allowed_specific_char(\"HELLOhowareyou98765\") == True", "assert count_Num(2) == 1\nassert count_Num(3) == 2\nassert count_Num(1) == 1", "assert fourth_Power_Sum(2) == 17\nassert fourth_Power_Sum(4) == 354\nassert fourth_Power_Sum(6) == 2275", "assert concatenate_strings((\"Manjeet\", \"Nikhil\", \"Akshat\"), (\" Singh\", \" Meherwal\", \" Garg\")) == ('Manjeet Singh', 'Nikhil Meherwal', 'Akshat Garg')\nassert concatenate_strings((\"Shaik\", \"Ayesha\", \"Sanya\"), (\" Dawood\", \" Begum\", \" Singh\")) == ('Shaik Dawood', 'Ayesha Begum', 'Sanya Singh')\nassert concatenate_strings((\"Harpreet\", \"Priyanka\", \"Muskan\"), (\"Kour\", \" Agarwal\", \"Sethi\")) == ('HarpreetKour', 'Priyanka Agarwal', 'MuskanSethi')", "assert degree_radian(90)==5156.620156177409\nassert degree_radian(60)==3437.746770784939\nassert degree_radian(120)==6875.493541569878", "assert decode_list([[2, 1], 2, 3, [2, 4], 5,1])==[1,1,2,3,4,4,5,1]\nassert decode_list(['a', 'u', 't', 'o', 'm', 'a', 't', 'i', 'c', 'a', [2, 'l'], 'y'])==['a', 'u', 't', 'o', 'm', 'a', 't', 'i', 'c', 'a', 'l', 'l', 'y']\nassert decode_list(['p', 'y', 't', 'h', 'o', 'n'])==['p', 'y', 't', 'h', 'o', 'n']", "assert check_subset_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],[[12, 18, 23, 25, 45], [7, 11, 19, 24, 28], [1, 5, 8, 18, 15, 16]])==False\nassert check_subset_list([[2, 3, 1], [4, 5], [6, 8]],[[4, 5], [6, 8]])==True\nassert check_subset_list([['a', 'b'], ['e'], ['c', 'd']],[['g']])==False", "assert first_Repeated_Char(\"Google\") == \"o\"\nassert first_Repeated_Char(\"data\") == \"a\"\nassert first_Repeated_Char(\"python\") == '\\0'", "assert min_Operations(2,4) == 1\nassert min_Operations(4,10) == 4\nassert min_Operations(1,4) == 3", "assert extract_min_max((5, 20, 3, 7, 6, 8), 2) == (3, 5, 8, 20)\nassert extract_min_max((4, 5, 6, 1, 2, 7), 3) == (1, 2, 4, 5, 6, 7)\nassert extract_min_max((2, 3, 4, 8, 9, 11, 7), 4) == (2, 3, 4, 7, 8, 9, 11)", "assert replace_max_specialchar('Python language, Programming language.',2)==('Python:language: Programming language.')\nassert replace_max_specialchar('a b c,d e f',3)==('a:b:c:d e f')\nassert replace_max_specialchar('ram reshma,ram rahim',1)==('ram:reshma,ram rahim')", "assert first_even ([1, 3, 5, 7, 4, 1, 6, 8]) == 4\nassert first_even([2, 3, 4]) == 2\nassert first_even([5, 6, 7]) == 6", "assert check_type((5, 6, 7, 3, 5, 6) ) == True\nassert check_type((1, 2, \"4\") ) == False\nassert check_type((3, 2, 1, 4, 5) ) == True", "assert is_majority([1, 2, 3, 3, 3, 3, 10], 7, 3) == True\nassert is_majority([1, 1, 2, 4, 4, 4, 6, 6], 8, 4) == False\nassert is_majority([1, 1, 1, 2, 2], 5, 1) == True", "assert count_Set_Bits(2) == 1\nassert count_Set_Bits(4) == 1\nassert count_Set_Bits(6) == 2", "assert find_Min([1,2,3,4,5],0,4) == 1\nassert find_Min([4,6,8],0,2) == 4\nassert find_Min([2,3,5,7,9],0,4) == 2", "assert odd_values_string('abcdef') == 'ace'\nassert odd_values_string('python') == 'pto'\nassert odd_values_string('data') == 'dt'", "assert min_of_three(10,20,0)==0\nassert min_of_three(19,15,18)==15\nassert min_of_three(-10,-20,-30)==-30", "assert all_Bits_Set_In_The_Given_Range(4,1,2) == True\nassert all_Bits_Set_In_The_Given_Range(17,2,4) == True\nassert all_Bits_Set_In_The_Given_Range(39,4,6) == False", "assert re_arrange_array([-1, 2, -3, 4, 5, 6, -7, 8, 9], 9) == [-1, -3, -7, 4, 5, 6, 2, 8, 9]\nassert re_arrange_array([12, -14, -26, 13, 15], 5) == [-14, -26, 12, 13, 15]\nassert re_arrange_array([10, 24, 36, -42, -39, -78, 85], 7) == [-42, -39, -78, 10, 24, 36, 85]", "assert replace_blank(\"hello people\",'@')==(\"hello@people\")\nassert replace_blank(\"python program language\",'$')==(\"python$program$language\")\nassert replace_blank(\"blank space\",\"-\")==(\"blank-space\")", "assert max_sum([[1], [2,1], [3,3,2]], 3) == 6\nassert max_sum([[1], [1, 2], [4, 1, 12]], 3) == 15 \nassert max_sum([[2], [3,2], [13,23,12]], 3) == 28", "assert larg_nnum([10, 20, 50, 70, 90, 20, 50, 40, 60, 80, 100],2)==[100,90]\nassert larg_nnum([10, 20, 50, 70, 90, 20, 50, 40, 60, 80, 100],5)==[100,90,80,70,60]\nassert larg_nnum([10, 20, 50, 70, 90, 20, 50, 40, 60, 80, 100],3)==[100,90,80]", "assert lateralsuface_cylinder(10,5)==314.15000000000003\nassert lateralsuface_cylinder(4,5)==125.66000000000001\nassert lateralsuface_cylinder(4,10)==251.32000000000002", "assert volume_cube(3)==27\nassert volume_cube(2)==8\nassert volume_cube(5)==125", "assert even_bit_set_number(10) == 10\nassert even_bit_set_number(20) == 30\nassert even_bit_set_number(30) == 30", "assert No_of_Triangle(4,2) == 7\nassert No_of_Triangle(4,3) == 3\nassert No_of_Triangle(1,3) == -1", "assert check_occurences([(3, 1), (1, 3), (2, 5), (5, 2), (6, 3)] ) == {(1, 3): 2, (2, 5): 2, (3, 6): 1}\nassert check_occurences([(4, 2), (2, 4), (3, 6), (6, 3), (7, 4)] ) == {(2, 4): 2, (3, 6): 2, (4, 7): 1}\nassert check_occurences([(13, 2), (11, 23), (12, 25), (25, 12), (16, 23)] ) == {(2, 13): 1, (11, 23): 1, (12, 25): 2, (16, 23): 1}", "assert number_of_substrings(\"abc\") == 6\nassert number_of_substrings(\"abcd\") == 10\nassert number_of_substrings(\"abcde\") == 15", "assert get_total_number_of_sequences(10, 4) == 4\nassert get_total_number_of_sequences(5, 2) == 6\nassert get_total_number_of_sequences(16, 3) == 84", "assert replace_list([1, 3, 5, 7, 9, 10],[2, 4, 6, 8])==[1, 3, 5, 7, 9, 2, 4, 6, 8]\nassert replace_list([1,2,3,4,5],[5,6,7,8])==[1,2,3,4,5,6,7,8]\nassert replace_list([\"red\",\"blue\",\"green\"],[\"yellow\"])==[\"red\",\"blue\",\"yellow\"]", "assert array_3d(6,4,3)==[[['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*']], [['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*']], [['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*', '*']]]\nassert array_3d(5,3,4)==[[['*', '*', '*', '*', '*'], ['*', '*', '*', '*','*'], ['*', '*', '*', '*', '*']], [['*', '*', '*', '*', '*'],['*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*']], [['*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*']], [['*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*'], ['*', '*', '*', '*', '*']]]\nassert array_3d(1,2,3)==[[['*'],['*']],[['*'],['*']],[['*'],['*']]]", "assert count_charac(\"python programming\")==18\nassert count_charac(\"language\")==8\nassert count_charac(\"words\")==5", "assert sort_on_occurence([(1, 'Jake'), (2, 'Bob'), (1, 'Cara')]) == [(1, 'Jake', 'Cara', 2), (2, 'Bob', 1)]\nassert sort_on_occurence([('b', 'ball'), ('a', 'arm'), ('b', 'b'), ('a', 'ant')]) == [('b', 'ball', 'b', 2), ('a', 'arm', 'ant', 2)]\nassert sort_on_occurence([(2, 'Mark'), (3, 'Maze'), (2, 'Sara')]) == [(2, 'Mark', 'Sara', 2), (3, 'Maze', 1)]", "assert next_Perfect_Square(35) == 36\nassert next_Perfect_Square(6) == 9\nassert next_Perfect_Square(9) == 16", "assert max_sum([1, 15, 51, 45, 33, 100, 12, 18, 9], 9) == 194\nassert max_sum([80, 60, 30, 40, 20, 10], 6) == 210\nassert max_sum([2, 3 ,14, 16, 21, 23, 29, 30], 8) == 138", "assert babylonian_squareroot(10)==3.162277660168379\nassert babylonian_squareroot(2)==1.414213562373095\nassert babylonian_squareroot(9)==3.0", "assert lps(\"TENS FOR TENS\") == 5 \nassert lps(\"CARDIO FOR CARDS\") == 7\nassert lps(\"PART OF THE JOURNEY IS PART\") == 9 ", "assert harmonic_sum(7) == 2.5928571428571425\nassert harmonic_sum(4) == 2.083333333333333\nassert harmonic_sum(19) == 3.547739657143682", "assert intersection_array([1, 2, 3, 5, 7, 8, 9, 10],[1, 2, 4, 8, 9])==[1, 2, 8, 9]\nassert intersection_array([1, 2, 3, 5, 7, 8, 9, 10],[3,5,7,9])==[3,5,7,9]\nassert intersection_array([1, 2, 3, 5, 7, 8, 9, 10],[10,20,30,40])==[10]", "assert count_X((10, 8, 5, 2, 10, 15, 10, 8, 5, 8, 8, 2),4) == 0\nassert count_X((10, 8, 5, 2, 10, 15, 10, 8, 5, 8, 8, 2),10) == 3\nassert count_X((10, 8, 5, 2, 10, 15, 10, 8, 5, 8, 8, 2),8) == 4", "assert insert_element(['Red', 'Green', 'Black'] ,'c')==['c', 'Red', 'c', 'Green', 'c', 'Black'] \nassert insert_element(['python', 'java'] ,'program')==['program', 'python', 'program', 'java'] \nassert insert_element(['happy', 'sad'] ,'laugh')==['laugh', 'happy', 'laugh', 'sad'] ", "assert convert(1) == (1.0, 0.0)\nassert convert(4) == (4.0,0.0)\nassert convert(5) == (5.0,0.0)", "assert count_integer([1,2,'abc',1.2]) == 2\nassert count_integer([1,2,3]) == 3\nassert count_integer([1,1.2,4,5.1]) == 2", "assert words_ae(\"python programe\")==['ame']\nassert words_ae(\"python programe language\")==['ame','anguage']\nassert words_ae(\"assert statement\")==['assert', 'atement']", "assert combinations_colors( [\"Red\",\"Green\",\"Blue\"],1)==[('Red',), ('Green',), ('Blue',)]\nassert combinations_colors( [\"Red\",\"Green\",\"Blue\"],2)==[('Red', 'Red'), ('Red', 'Green'), ('Red', 'Blue'), ('Green', 'Green'), ('Green', 'Blue'), ('Blue', 'Blue')]\nassert combinations_colors( [\"Red\",\"Green\",\"Blue\"],3)==[('Red', 'Red', 'Red'), ('Red', 'Red', 'Green'), ('Red', 'Red', 'Blue'), ('Red', 'Green', 'Green'), ('Red', 'Green', 'Blue'), ('Red', 'Blue', 'Blue'), ('Green', 'Green', 'Green'), ('Green', 'Green', 'Blue'), ('Green', 'Blue', 'Blue'), ('Blue', 'Blue', 'Blue')]", "assert count_Primes_nums(5) == 2\nassert count_Primes_nums(10) == 4\nassert count_Primes_nums(100) == 25", "assert swap_numbers(10,20)==(20,10)\nassert swap_numbers(15,17)==(17,15)\nassert swap_numbers(100,200)==(200,100)", "assert count_odd([1, 2, 3, 5, 7, 8, 10])==4\nassert count_odd([10,15,14,13,-18,12,-20])==2\nassert count_odd([1, 2, 4, 8, 9])==2", "assert maximize_elements(((1, 3), (4, 5), (2, 9), (1, 10)), ((6, 7), (3, 9), (1, 1), (7, 3))) == ((6, 7), (4, 9), (2, 9), (7, 10))\nassert maximize_elements(((2, 4), (5, 6), (3, 10), (2, 11)), ((7, 8), (4, 10), (2, 2), (8, 4))) == ((7, 8), (5, 10), (3, 10), (8, 11))\nassert maximize_elements(((3, 5), (6, 7), (4, 11), (3, 12)), ((8, 9), (5, 11), (3, 3), (9, 5))) == ((8, 9), (6, 11), (4, 11), (9, 12))", "assert newman_prime(3) == 7 \nassert newman_prime(4) == 17\nassert newman_prime(5) == 41", "assert division_elements((10, 4, 6, 9),(5, 2, 3, 3)) == (2, 2, 2, 3)\nassert division_elements((12, 6, 8, 16),(6, 3, 4, 4)) == (2, 2, 2, 4)\nassert division_elements((20, 14, 36, 18),(5, 7, 6, 9)) == (4, 2, 6, 2)", "assert split_two_parts([1,1,2,3,4,4,5,1],3)==([1, 1, 2], [3, 4, 4, 5, 1])\nassert split_two_parts(['a', 'b', 'c', 'd'],2)==(['a', 'b'], ['c', 'd'])\nassert split_two_parts(['p', 'y', 't', 'h', 'o', 'n'],4)==(['p', 'y', 't', 'h'], ['o', 'n'])", "assert merge_dict({'a': 100, 'b': 200},{'x': 300, 'y': 200})=={'x': 300, 'y': 200, 'a': 100, 'b': 200}\nassert merge_dict({'a':900,'b':900,'d':900},{'a':900,'b':900,'d':900})=={'a':900,'b':900,'d':900,'a':900,'b':900,'d':900}\nassert merge_dict({'a':10,'b':20},{'x':30,'y':40})=={'x':30,'y':40,'a':10,'b':20}", "assert dog_age(12)==61\nassert dog_age(15)==73\nassert dog_age(24)==109", "assert list_split(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n'],3)==[['a', 'd', 'g', 'j', 'm'], ['b', 'e', 'h', 'k', 'n'], ['c', 'f', 'i', 'l']] \nassert list_split([1,2,3,4,5,6,7,8,9,10,11,12,13,14],3)==[[1,4,7,10,13], [2,5,8,11,14], [3,6,9,12]] \nassert list_split(['python','java','C','C++','DBMS','SQL'],2)==[['python', 'C', 'DBMS'], ['java', 'C++', 'SQL']] ", "assert lateralsurface_cube(5)==100\nassert lateralsurface_cube(9)==324\nassert lateralsurface_cube(10)==400", "assert square_Sum(2) == 10\nassert square_Sum(3) == 35\nassert square_Sum(4) == 84", "assert find_star_num(3) == 37\nassert find_star_num(4) == 73\nassert find_star_num(5) == 121", "assert ascii_value('A')==65\nassert ascii_value('R')==82\nassert ascii_value('S')==83", "assert sum_even_and_even_index([5, 6, 12, 1, 18, 8],6) == 30\nassert sum_even_and_even_index([3, 20, 17, 9, 2, 10, 18, 13, 6, 18],10) == 26\nassert sum_even_and_even_index([5, 6, 12, 1],4) == 12", "assert even_Power_Sum(2) == 1056\nassert even_Power_Sum(3) == 8832\nassert even_Power_Sum(1) == 32", "assert rear_extract([(1, 'Rash', 21), (2, 'Varsha', 20), (3, 'Kil', 19)]) == [21, 20, 19]\nassert rear_extract([(1, 'Sai', 36), (2, 'Ayesha', 25), (3, 'Salman', 45)]) == [36, 25, 45]\nassert rear_extract([(1, 'Sudeep', 14), (2, 'Vandana', 36), (3, 'Dawood', 56)]) == [14, 36, 56]", "assert substract_elements((10, 4, 5), (2, 5, 18)) == (8, -1, -13)\nassert substract_elements((11, 2, 3), (24, 45 ,16)) == (-13, -43, -13)\nassert substract_elements((7, 18, 9), (10, 11, 12)) == (-3, 7, -3)", "assert even_binomial_Coeff_Sum(4) == 8\nassert even_binomial_Coeff_Sum(6) == 32\nassert even_binomial_Coeff_Sum(2) == 2", "assert get_Position([2,5,4],3,2) == 2\nassert get_Position([4,3],2,2) == 2\nassert get_Position([1,2,3,4],4,1) == 4", "assert volume_cylinder(10,5)==1570.7500000000002\nassert volume_cylinder(4,5)==251.32000000000002\nassert volume_cylinder(4,10)==502.64000000000004", "assert dict_filter({'Cierra Vega': 175, 'Alden Cantrell': 180, 'Kierra Gentry': 165, 'Pierre Cox': 190},170)=={'Cierra Vega': 175, 'Alden Cantrell': 180, 'Pierre Cox': 190}\nassert dict_filter({'Cierra Vega': 175, 'Alden Cantrell': 180, 'Kierra Gentry': 165, 'Pierre Cox': 190},180)=={ 'Alden Cantrell': 180, 'Pierre Cox': 190}\nassert dict_filter({'Cierra Vega': 175, 'Alden Cantrell': 180, 'Kierra Gentry': 165, 'Pierre Cox': 190},190)=={ 'Pierre Cox': 190}", "assert count_first_elements((1, 5, 7, (4, 6), 10) ) == 3\nassert count_first_elements((2, 9, (5, 7), 11) ) == 2\nassert count_first_elements((11, 15, 5, 8, (2, 3), 8) ) == 4", "assert is_num_decagonal(3) == 27\nassert is_num_decagonal(7) == 175\nassert is_num_decagonal(10) == 370", "assert sequential_search([11,23,58,31,56,77,43,12,65,19],31) == (True, 3)\nassert sequential_search([12, 32, 45, 62, 35, 47, 44, 61],61) == (True, 7)\nassert sequential_search([9, 10, 17, 19, 22, 39, 48, 56],48) == (True, 6)", "assert all_unique([1,2,3]) == True\nassert all_unique([1,2,1,2]) == False\nassert all_unique([1,2,3,4,5]) == True", "assert sub_list([1, 2, 3],[4,5,6])==[-3,-3,-3]\nassert sub_list([1,2],[3,4])==[-2,-2]\nassert sub_list([90,120],[50,70])==[40,50]", "assert validate(1234) == True\nassert validate(51241) == False\nassert validate(321) == True", "assert check_element([\"green\", \"orange\", \"black\", \"white\"],'blue')==False\nassert check_element([1,2,3,4],7)==False\nassert check_element([\"green\", \"green\", \"green\", \"green\"],'green')==True", "assert text_match_two_three(\"ac\")==('Not matched!')\nassert text_match_two_three(\"dc\")==('Not matched!')\nassert text_match_two_three(\"abbbba\")==('Found a match!')", "assert max_sub_array_sum_repeated([10, 20, -30, -1], 4, 3) == 30\nassert max_sub_array_sum_repeated([-1, 10, 20], 3, 2) == 59\nassert max_sub_array_sum_repeated([-1, -2, -3], 3, 3) == -1", "assert square_Sum(2) == 20\nassert square_Sum(3) == 56\nassert square_Sum(4) == 120", "assert modular_inverse([ 1, 6, 4, 5 ], 4, 7) == 2\nassert modular_inverse([1, 3, 8, 12, 12], 5, 13) == 3\nassert modular_inverse([2, 3, 4, 5], 4, 6) == 1", "assert odd_Days(100) == 5\nassert odd_Days(50) ==6\nassert odd_Days(75) == 2", "assert max_length([[0], [1, 3], [5, 7], [9, 11], [13, 15, 17]])==(3, [13, 15, 17])\nassert max_length([[1], [5, 7], [10, 12, 14,15]])==(4, [10, 12, 14,15])\nassert max_length([[5], [15,20,25]])==(3, [15,20,25])", "assert count_no_of_ways(2, 4) == 16\nassert count_no_of_ways(3, 2) == 6\nassert count_no_of_ways(4, 4) == 228", "assert find(10,3) == 3\nassert find(4,2) == 2\nassert find(20,5) == 4", "assert otherside_rightangle(7,8)==10.63014581273465\nassert otherside_rightangle(3,4)==5\nassert otherside_rightangle(7,15)==16.55294535724685", "assert max_val(['Python', 3, 2, 4, 5, 'version'])==5\nassert max_val(['Python', 15, 20, 25])==25\nassert max_val(['Python', 30, 20, 40, 50, 'version'])==50", "assert sum_div(8)==7\nassert sum_div(12)==16\nassert sum_div(7)==1", "assert get_Inv_Count([1,20,6,4,5],5) == 5\nassert get_Inv_Count([1,2,1],3) == 1\nassert get_Inv_Count([1,2,5,6,1],5) == 3", "assert flatten_list([0, 10, [20, 30], 40, 50, [60, 70, 80], [90, 100, 110, 120]])==[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]\nassert flatten_list([[10, 20], [40], [30, 56, 25], [10, 20], [33], [40]])==[10, 20, 40, 30, 56, 25, 10, 20, 33, 40]\nassert flatten_list([[1,2,3], [4,5,6], [10,11,12], [7,8,9]])==[1, 2, 3, 4, 5, 6, 10, 11, 12, 7, 8, 9]", "assert intersection_nested_lists( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],[[12, 18, 23, 25, 45], [7, 11, 19, 24, 28], [1, 5, 8, 18, 15, 16]])==[[12], [7, 11], [1, 5, 8]]\nassert intersection_nested_lists([[2, 3, 1], [4, 5], [6, 8]], [[4, 5], [6, 8]])==[[], []]\nassert intersection_nested_lists(['john','amal','joel','george'],[['john'],['jack','john','mary'],['howard','john'],['jude']])==[['john'], ['john'], ['john'], []]", "assert max_aggregate([('Juan Whelan',90),('Sabah Colley',88),('Peter Nichols',7),('Juan Whelan',122),('Sabah Colley',84)])==('Juan Whelan', 212)\nassert max_aggregate([('Juan Whelan',50),('Sabah Colley',48),('Peter Nichols',37),('Juan Whelan',22),('Sabah Colley',14)])==('Juan Whelan', 72)\nassert max_aggregate([('Juan Whelan',10),('Sabah Colley',20),('Peter Nichols',30),('Juan Whelan',40),('Sabah Colley',50)])==('Sabah Colley', 70)", "assert count_binary_seq(1) == 2.0\nassert count_binary_seq(2) == 6.0\nassert count_binary_seq(3) == 20.0", "assert dict_depth({'a':1, 'b': {'c': {'d': {}}}})==4\nassert dict_depth({'a':1, 'b': {'c':'python'}})==2\nassert dict_depth({1: 'Sun', 2: {3: {4:'Mon'}}})==3", "assert set_Bit_Number(6) == 4\nassert set_Bit_Number(10) == 8\nassert set_Bit_Number(18) == 16", "assert solve([1,0,2],3) == True\nassert solve([1,2,0],3) == False\nassert solve([1,2,1],3) == True", "assert find_Element([1,2,3,4,5],[[0,2],[0,3]],2,1) == 3\nassert find_Element([1,2,3,4],[[0,1],[0,2]],1,2) == 3\nassert find_Element([1,2,3,4,5,6],[[0,1],[0,2]],1,1) == 1", "assert start_withp([\"Python PHP\", \"Java JavaScript\", \"c c++\"])==('Python', 'PHP')\nassert start_withp([\"Python Programming\",\"Java Programming\"])==('Python','Programming')\nassert start_withp([\"Pqrst Pqr\",\"qrstuv\"])==('Pqrst','Pqr')", "assert max_sum_increasing_subseq([1, 101, 2, 3, 100, 4, 5 ], 7, 4, 6) == 11\nassert max_sum_increasing_subseq([1, 101, 2, 3, 100, 4, 5 ], 7, 2, 5) == 7\nassert max_sum_increasing_subseq([11, 15, 19, 21, 26, 28, 31], 7, 2, 4) == 71", "assert colon_tuplex((\"HELLO\", 5, [], True) ,2,50)==(\"HELLO\", 5, [50], True) \nassert colon_tuplex((\"HELLO\", 5, [], True) ,2,100)==((\"HELLO\", 5, [100],True))\nassert colon_tuplex((\"HELLO\", 5, [], True) ,2,500)==(\"HELLO\", 5, [500], True)", "assert large_product([1, 2, 3, 4, 5, 6],[3, 6, 8, 9, 10, 6],3)==[60, 54, 50]\nassert large_product([1, 2, 3, 4, 5, 6],[3, 6, 8, 9, 10, 6],4)==[60, 54, 50, 48]\nassert large_product([1, 2, 3, 4, 5, 6],[3, 6, 8, 9, 10, 6],5)==[60, 54, 50, 48, 45]", "assert maximum(5,10) == 10\nassert maximum(-1,-2) == -1\nassert maximum(9,7) == 9", "assert string_to_tuple(\"python 3.0\")==('p', 'y', 't', 'h', 'o', 'n', '3', '.', '0')\nassert string_to_tuple(\"item1\")==('i', 't', 'e', 'm', '1')\nassert string_to_tuple(\"15.10\")==('1', '5', '.', '1', '0')", "assert set_left_most_unset_bit(10) == 14\nassert set_left_most_unset_bit(12) == 14\nassert set_left_most_unset_bit(15) == 15", "assert volume_cone(5,12)==314.15926535897927\nassert volume_cone(10,15)==1570.7963267948965\nassert volume_cone(19,17)==6426.651371693521", "assert pos_nos([-1,-2,1,2]) == 1,2\nassert pos_nos([3,4,-5]) == 3,4\nassert pos_nos([-2,-3,1]) == 1", "assert max_sum_rectangular_grid([ [1, 4, 5], [2, 0, 0 ] ], 3) == 7\nassert max_sum_rectangular_grid([ [ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10] ], 5) == 24\nassert max_sum_rectangular_grid([ [7, 9, 11, 15, 19], [21, 25, 28, 31, 32] ], 5) == 81", "assert find_Max_Len_Even(\"python language\") == \"language\"\nassert find_Max_Len_Even(\"maximum even length\") == \"length\"\nassert find_Max_Len_Even(\"eve\") == \"-1\"", "assert find_last_occurrence([2, 5, 5, 5, 6, 6, 8, 9, 9, 9], 5) == 3\nassert find_last_occurrence([2, 3, 5, 8, 6, 6, 8, 9, 9, 9], 9) == 9\nassert find_last_occurrence([2, 2, 1, 5, 6, 6, 6, 9, 9, 9], 6) == 6", "assert modified_encode([1,1,2,3,4,4,5,1])==[[2, 1], 2, 3, [2, 4], 5, 1]\nassert modified_encode('automatically')==['a', 'u', 't', 'o', 'm', 'a', 't', 'i', 'c', 'a', [2, 'l'], 'y']\nassert modified_encode('python')==['p', 'y', 't', 'h', 'o', 'n']", "assert max_volume(8) == 18\nassert max_volume(4) == 2\nassert max_volume(1) == 0", "assert find_long_word('Please move back to strem') == ['strem']\nassert find_long_word('4K Ultra HD streaming player') == ['Ultra']\nassert find_long_word('Streaming Media Player') == ['Media']", "assert sum_difference(12)==5434\nassert sum_difference(20)==41230\nassert sum_difference(54)==2151270", "assert find_demlo(\"111111\") == '12345654321'\nassert find_demlo(\"1111\") == '1234321'\nassert find_demlo(\"13333122222\") == '123456789101110987654321'", "assert position_min([12,33,23,10,67,89,45,667,23,12,11,10,54])==[3,11]\nassert position_min([1,2,2,2,4,4,4,5,5,5,5])==[0]\nassert position_min([2,1,5,6,8,3,4,9,10,11,8,12])==[1]", "assert re_arrange([-5, -2, 5, 2, 4,\t7, 1, 8, 0, -8], 10) == [-5, 5, -2, 2, -8, 4, 7, 1, 8, 0]\nassert re_arrange([1, 2, 3, -4, -1, 4], 6) == [-4, 1, -1, 2, 3, 4]\nassert re_arrange([4, 7, 9, 77, -4, 5, -3, -9], 8) == [-4, 4, -3, 7, -9, 9, 77, 5]", "assert sum_of_alternates((5, 6, 3, 6, 10, 34)) == (46, 18)\nassert sum_of_alternates((1, 2, 3, 4, 5)) == (6, 9)\nassert sum_of_alternates((6, 7, 8, 9, 4, 5)) == (21, 18)", "assert get_Min_Squares(6) == 3\nassert get_Min_Squares(2) == 2\nassert get_Min_Squares(4) == 1", "assert most_occurrences([\"UTS is best for RTF\", \"RTF love UTS\", \"UTS is best\"] ) == 'UTS'\nassert most_occurrences([\"Its been a great year\", \"this year is so worse\", \"this year is okay\"] ) == 'year'\nassert most_occurrences([\"Families can be reunited\", \"people can be reunited\", \"Tasks can be achieved \"] ) == 'can'", "assert check_isosceles(6,8,12)==False \nassert check_isosceles(6,6,12)==True\nassert check_isosceles(6,16,20)==False", "assert rotate_left([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],3,4)==[4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4]\nassert rotate_left([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],2,2)==[3, 4, 5, 6, 7, 8, 9, 10, 1, 2]\nassert rotate_left([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],5,2)==[6, 7, 8, 9, 10, 1, 2]", "assert neg_count([-1,-2,3,-4,-5]) == 4\nassert neg_count([1,2,3]) == 0\nassert neg_count([1,2,-3,-10,20]) == 2", "assert find_char('For the four consumer complaints contact manager AKR reddy') == ['For', 'the', 'four', 'AKR', 'reddy']\nassert find_char('Certain service are subject to change MSR') == ['are', 'MSR']\nassert find_char('Third party legal desclaimers') == ['Third', 'party', 'legal']", "assert count_unset_bits(2) == 1\nassert count_unset_bits(4) == 2\nassert count_unset_bits(6) == 1", "assert char_frequency('python')=={'p': 1, 'y': 1, 't': 1, 'h': 1, 'o': 1, 'n': 1}\nassert char_frequency('program')=={'p': 1, 'r': 2, 'o': 1, 'g': 1, 'a': 1, 'm': 1}\nassert char_frequency('language')=={'l': 1, 'a': 2, 'n': 1, 'g': 2, 'u': 1, 'e': 1}", "assert Sort([['a', 10], ['b', 5], ['c', 20], ['d', 15]]) == [['b', 5], ['a', 10], ['d', 15], ['c', 20]]\nassert Sort([['452', 10], ['256', 5], ['100', 20], ['135', 15]]) == [['256', 5], ['452', 10], ['135', 15], ['100', 20]]\nassert Sort([['rishi', 10], ['akhil', 5], ['ramya', 20], ['gaur', 15]]) == [['akhil', 5], ['rishi', 10], ['gaur', 15], ['ramya', 20]]", "assert check_Validity(1,2,3) == False\nassert check_Validity(2,3,5) == False\nassert check_Validity(7,10,5) == True", "assert ap_sum(1,5,2)==25\nassert ap_sum(2,6,4)==72\nassert ap_sum(1,4,5)==34", "assert check_monthnum(\"February\")==True\nassert check_monthnum(\"January\")==False\nassert check_monthnum(\"March\")==False", "assert text_match_word(\"python.\")==('Found a match!')\nassert text_match_word(\"python.\")==('Found a match!')\nassert text_match_word(\" lang .\")==('Not matched!')", "assert count_Substring_With_Equal_Ends('aba') == 4\nassert count_Substring_With_Equal_Ends('abcab') == 7\nassert count_Substring_With_Equal_Ends('abc') == 3", "assert find_Divisor(2,2) == 2\nassert find_Divisor(2,5) == 2\nassert find_Divisor(5,10) == 2", "assert sum_three_smallest_nums([10,20,30,40,50,60,7]) == 37\nassert sum_three_smallest_nums([1,2,3,4,5]) == 6\nassert sum_three_smallest_nums([0,1,2,3,4,5]) == 6", "assert set_to_tuple({1, 2, 3, 4, 5}) == (1, 2, 3, 4, 5)\nassert set_to_tuple({6, 7, 8, 9, 10, 11}) == (6, 7, 8, 9, 10, 11)\nassert set_to_tuple({12, 13, 14, 15, 16}) == (12, 13, 14, 15, 16)", "assert find_minimum_range([[3, 6, 8, 10, 15], [1, 5, 12], [4, 8, 15, 16], [2, 6]]) == (4, 6)\nassert find_minimum_range([[ 2, 3, 4, 8, 10, 15 ], [1, 5, 12], [7, 8, 15, 16], [3, 6]]) == (4, 7)\nassert find_minimum_range([[4, 7, 9, 11, 16], [2, 6, 13], [5, 9, 16, 17], [3, 7]]) == (5, 7)", "assert dig_let(\"python\")==(6,0)\nassert dig_let(\"program\")==(7,0)\nassert dig_let(\"python3.0\")==(6,2)", "assert count_Odd_Squares(5,100) == 8\nassert count_Odd_Squares(8,65) == 6\nassert count_Odd_Squares(2,5) == 1", "assert diff_consecutivenums([1, 1, 3, 4, 4, 5, 6, 7])==[0, 2, 1, 0, 1, 1, 1]\nassert diff_consecutivenums([4, 5, 8, 9, 6, 10])==[1, 3, 1, -3, 4]\nassert diff_consecutivenums([0, 1, 2, 3, 4, 4, 4, 4, 5, 7])==[1, 1, 1, 1, 0, 0, 0, 1, 2]", "assert zigzag(4, 3) == 5\nassert zigzag(4, 2) == 4\nassert zigzag(3, 1) == 1", "assert count_Squares(4,3) == 20\nassert count_Squares(1,2) == 2\nassert count_Squares(2,2) == 5", "assert find_ways(4) == 2\nassert find_ways(6) == 5\nassert find_ways(8) == 14", "assert check(\"01010101010\") == \"Yes\"\nassert check(\"name0\") == \"No\"\nassert check(\"101\") == \"Yes\"", "assert minimum_Length(\"mnm\") == 1\nassert minimum_Length(\"abcda\") == 3\nassert minimum_Length(\"abcb\") == 2", "assert first_Element([0,1,2,3,4,5],6,1) == 0\nassert first_Element([1,2,1,3,4],5,2) == 1\nassert first_Element([2,3,4,3,5,7,1,2,3,5],10,2) == 2", "assert unique_Characters('aba') == False\nassert unique_Characters('abc') == True\nassert unique_Characters('abab') == False", "assert remove_column([[1, 2, 3], [2, 4, 5], [1, 1, 1]],0)==[[2, 3], [4, 5], [1, 1]]\nassert remove_column([[1, 2, 3], [-2, 4, -5], [1, -1, 1]],2)==[[1, 2], [-2, 4], [1, -1]]\nassert remove_column([[1, 3], [5, 7], [1, 3], [13, 15, 17], [5, 7], [9, 11]],0)==[[3], [7], [3], [15, 17], [7], [11]]", "assert tn_ap(1,5,2)==9\nassert tn_ap(2,6,4)==22\nassert tn_ap(1,4,5)==16", "assert count_Rectangles(2) == 8\nassert count_Rectangles(1) == 1\nassert count_Rectangles(0) == 0", "assert find_angle(47,89)==44\nassert find_angle(45,95)==40\nassert find_angle(50,40)==90", "assert find_max([(2, 4), (6, 7), (5, 1), (6, 10), (8, 7)]) == 10\nassert find_max([(3, 5), (7, 8), (6, 2), (7, 11), (9, 8)]) == 11\nassert find_max([(4, 6), (8, 9), (7, 3), (8, 12), (10, 9)]) == 12", "assert moddiv_list([4,5,6],[1, 2, 3])==[0, 1, 0]\nassert moddiv_list([3,2],[1,4])==[0, 2]\nassert moddiv_list([90,120],[50,70])==[40, 50]", "assert Check_Solution(1,3,2) == \"Yes\"\nassert Check_Solution(1,2,3) == \"No\"\nassert Check_Solution(1,-5,6) == \"No\"", "assert get_carol(2) == 7\nassert get_carol(4) == 223\nassert get_carol(5) == 959", "assert remove_empty([[], [], [], 'Red', 'Green', [1,2], 'Blue', [], []])==['Red', 'Green', [1, 2], 'Blue']\nassert remove_empty([[], [], [],[],[], 'Green', [1,2], 'Blue', [], []])==[ 'Green', [1, 2], 'Blue']\nassert remove_empty([[], [], [], 'Python',[],[], 'programming', 'language',[],[],[], [], []])==['Python', 'programming', 'language']", "assert max_occurrences([1,2,3,1,2,3,12,4,2]) == 2\nassert max_occurrences([1,2,6,7,0,1,0,1,0]) == 1,0\nassert max_occurrences([1,2,3,1,2,4,1]) == 1", "assert add_K_element([(1, 3, 4), (2, 4, 6), (3, 8, 1)], 4) == [(5, 7, 8), (6, 8, 10), (7, 12, 5)]\nassert add_K_element([(1, 2, 3), (4, 5, 6), (7, 8, 9)], 8) == [(9, 10, 11), (12, 13, 14), (15, 16, 17)]\nassert add_K_element([(11, 12, 13), (14, 15, 16), (17, 18, 19)], 9) == [(20, 21, 22), (23, 24, 25), (26, 27, 28)]", "assert min_flip_to_make_string_alternate(\"0001010111\") == 2\nassert min_flip_to_make_string_alternate(\"001\") == 1\nassert min_flip_to_make_string_alternate(\"010111011\") == 2 ", "assert count_Digit(12345) == 5\nassert count_Digit(11223305) == 8\nassert count_Digit(4123459) == 7", "assert adjacent_num_product([1,2,3,4,5,6]) == 30\nassert adjacent_num_product([1,2,3,4,5]) == 20\nassert adjacent_num_product([2,3]) == 6", "assert is_tree_balanced(root) == False\nassert is_tree_balanced(root1) == True\nassert is_tree_balanced(root2) == False ", "assert repeat_tuples((1, 3), 4) == ((1, 3), (1, 3), (1, 3), (1, 3))\nassert repeat_tuples((1, 2), 3) == ((1, 2), (1, 2), (1, 2))\nassert repeat_tuples((3, 4), 5) == ((3, 4), (3, 4), (3, 4), (3, 4), (3, 4))", "assert lateralsurface_cuboid(8,5,6)==156\nassert lateralsurface_cuboid(7,9,10)==320\nassert lateralsurface_cuboid(10,20,30)==1800", "assert float_sort([('item1', '12.20'), ('item2', '15.10'), ('item3', '24.5')])==[('item3', '24.5'), ('item2', '15.10'), ('item1', '12.20')] \nassert float_sort([('item1', '15'), ('item2', '10'), ('item3', '20')])==[('item3', '20'), ('item1', '15'), ('item2', '10')] \nassert float_sort([('item1', '5'), ('item2', '10'), ('item3', '14')])==[('item3', '14'), ('item2', '10'), ('item1', '5')] ", "assert smallest_missing([0, 1, 2, 3, 4, 5, 6], 0, 6) == 7\nassert smallest_missing([0, 1, 2, 6, 9, 11, 15], 0, 6) == 3\nassert smallest_missing([1, 2, 3, 4, 6, 9, 11, 15], 0, 7) == 0", "assert heap_assending([18, 14, 10, 9, 8, 7, 9, 3, 2, 4, 1])==[1, 2, 3, 4, 7, 8, 9, 9, 10, 14, 18]\nassert heap_assending([25, 35, 22, 85, 14, 65, 75, 25, 58])==[14, 22, 25, 25, 35, 58, 65, 75, 85]\nassert heap_assending([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])==[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]", "assert volume_cuboid(1,2,3)==6\nassert volume_cuboid(5,7,9)==315\nassert volume_cuboid(10,15,21)==3150", "assert permute_string('ab')==['ab', 'ba']\nassert permute_string('abc')==['abc', 'bac', 'bca', 'acb', 'cab', 'cba']\nassert permute_string('abcd')==['abcd', 'bacd', 'bcad', 'bcda', 'acbd', 'cabd', 'cbad', 'cbda', 'acdb', 'cadb', 'cdab', 'cdba', 'abdc', 'badc', 'bdac', 'bdca', 'adbc', 'dabc', 'dbac', 'dbca', 'adcb', 'dacb', 'dcab', 'dcba']", "assert round_num(4722,10)==4720\nassert round_num(1111,5)==1110\nassert round_num(219,2)==218", "assert remove_replica((1, 1, 4, 4, 4, 5, 5, 6, 7, 7)) == (1, 'MSP', 4, 'MSP', 'MSP', 5, 'MSP', 6, 7, 'MSP')\nassert remove_replica((2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 9)) == (2, 3, 4, 'MSP', 5, 6, 'MSP', 7, 8, 9, 'MSP')\nassert remove_replica((2, 2, 5, 4, 5, 7, 5, 6, 7, 7)) == (2, 'MSP', 5, 4, 'MSP', 7, 'MSP', 6, 'MSP', 'MSP')", "assert remove_Char(\"aba\",'a') == \"b\"\nassert remove_Char(\"toggle\",'g') == \"tole\"\nassert remove_Char(\"aabbc\",'b') == \"aac\"", "assert move_first([1,2,3,4]) == [4,1,2,3]\nassert move_first([0,1,2,3]) == [3,0,1,2]\nassert move_first([9,8,7,1]) == [1,9,8,7]", "assert surfacearea_cuboid(1,2,3)==22\nassert surfacearea_cuboid(5,7,9)==286\nassert surfacearea_cuboid(10,15,21)==1350", "assert multi_list(3,4)==[[0, 0, 0, 0], [0, 1, 2, 3], [0, 2, 4, 6]] \nassert multi_list(5,7)==[[0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6], [0, 2, 4, 6, 8, 10, 12], [0, 3, 6, 9, 12, 15, 18], [0, 4, 8, 12, 16, 20, 24]]\nassert multi_list(10,15)==[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28], [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42], [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56], [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70], [0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84], [0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98], [0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112], [0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90, 99, 108, 117, 126]]", "assert index_on_inner_list([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)] ,0)==[('Beau Turnbull', 94, 98), ('Brady Kent', 97, 96), ('Greyson Fulton', 98, 99), ('Wyatt Knott', 91, 94)]\nassert index_on_inner_list([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)] ,1)==[('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98), ('Brady Kent', 97, 96), ('Greyson Fulton', 98, 99)]\nassert index_on_inner_list([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)] ,2)==[('Wyatt Knott', 91, 94), ('Brady Kent', 97, 96), ('Beau Turnbull', 94, 98), ('Greyson Fulton', 98, 99)]", "assert find_rotation_count([8, 9, 10, 1, 2, 3, 4, 5, 6, 7]) == 3\nassert find_rotation_count([8, 9, 10,2, 5, 6]) == 3\nassert find_rotation_count([2, 5, 6, 8, 9, 10]) == 0", "assert even_bit_toggle_number(10) == 15\nassert even_bit_toggle_number(20) == 1\nassert even_bit_toggle_number(30) == 11", "assert frequency_Of_Smallest(5,[1,2,3,4,3]) == 1\nassert frequency_Of_Smallest(7,[3,1,2,5,6,2,3]) == 1\nassert frequency_Of_Smallest(7,[3,3,6,3,7,4,9]) == 3", "assert get_perrin(9) == 12\nassert get_perrin(4) == 2\nassert get_perrin(6) == 5", "assert swap_count(\"[]][][\") == 2\nassert swap_count(\"[[][]]\") == 0\nassert swap_count(\"[[][]]][\") == 1", "assert even_or_odd(\"AB3454D\") ==\"Odd\"\nassert even_or_odd(\"ABC\") == \"Even\"\nassert even_or_odd(\"AAD\") == \"Odd\"", "assert highest_Power_of_2(10) == 8\nassert highest_Power_of_2(19) == 16\nassert highest_Power_of_2(32) == 32", "assert find_lucas(9) == 76\nassert find_lucas(4) == 7\nassert find_lucas(3) == 4", "assert add_string([1,2,3,4],'temp{0}')==['temp1', 'temp2', 'temp3', 'temp4']\nassert add_string(['a','b','c','d'], 'python{0}')==[ 'pythona', 'pythonb', 'pythonc', 'pythond']\nassert add_string([5,6,7,8],'string{0}')==['string5', 'string6', 'string7', 'string8']", "assert convert_list_dictionary([\"S001\", \"S002\", \"S003\", \"S004\"],[\"Adina Park\", \"Leyton Marsh\", \"Duncan Boyle\", \"Saim Richards\"] ,[85, 98, 89, 92])==[{'S001': {'Adina Park': 85}}, {'S002': {'Leyton Marsh': 98}}, {'S003': {'Duncan Boyle': 89}}, {'S004': {'Saim Richards': 92}}]\nassert convert_list_dictionary([\"abc\",\"def\",\"ghi\",\"jkl\"],[\"python\",\"program\",\"language\",\"programs\"],[100,200,300,400])==[{'abc':{'python':100}},{'def':{'program':200}},{'ghi':{'language':300}},{'jkl':{'programs':400}}]\nassert convert_list_dictionary([\"A1\",\"A2\",\"A3\",\"A4\"],[\"java\",\"C\",\"C++\",\"DBMS\"],[10,20,30,40])==[{'A1':{'java':10}},{'A2':{'C':20}},{'A3':{'C++':30}},{'A4':{'DBMS':40}}]", "assert get_max_sum(60) == 106\nassert get_max_sum(10) == 12\nassert get_max_sum(2) == 2", "assert max_length_list([[0], [1, 3], [5, 7], [9, 11], [13, 15, 17]])==(3, [13, 15, 17])\nassert max_length_list([[1,2,3,4,5],[1,2,3,4],[1,2,3],[1,2],[1]])==(5,[1,2,3,4,5])\nassert max_length_list([[3,4,5],[6,7,8,9],[10,11,12]])==(4,[6,7,8,9])", "assert check_distinct((1, 4, 5, 6, 1, 4)) == False\nassert check_distinct((1, 4, 5, 6)) == True\nassert check_distinct((2, 3, 4, 5, 6)) == True", "assert first_non_repeating_character(\"abcabc\") == None\nassert first_non_repeating_character(\"abc\") == \"a\"\nassert first_non_repeating_character(\"ababc\") == \"c\"", "assert check_char(\"abba\") == \"Valid\"\nassert check_char(\"a\") == \"Valid\"\nassert check_char(\"abcd\") == \"Invalid\"", "assert median_numbers(25,55,65)==55.0\nassert median_numbers(20,10,30)==20.0\nassert median_numbers(15,45,75)==45.0", "assert sum_of_digits([10,2,56])==14\nassert sum_of_digits([[10,20,4,5,'b',70,'a']])==19\nassert sum_of_digits([10,20,-4,5,-70])==19", "assert bitwise_xor((10, 4, 6, 9), (5, 2, 3, 3)) == (15, 6, 5, 10)\nassert bitwise_xor((11, 5, 7, 10), (6, 3, 4, 4)) == (13, 6, 3, 14)\nassert bitwise_xor((12, 6, 8, 11), (7, 4, 5, 6)) == (11, 2, 13, 13)", "assert extract_freq([(3, 4), (1, 2), (4, 3), (5, 6)] ) == 3\nassert extract_freq([(4, 15), (2, 3), (5, 4), (6, 7)] ) == 4\nassert extract_freq([(5, 16), (2, 3), (6, 5), (6, 9)] ) == 4", "assert add_nested_tuples(((1, 3), (4, 5), (2, 9), (1, 10)), ((6, 7), (3, 9), (1, 1), (7, 3))) == ((7, 10), (7, 14), (3, 10), (8, 13))\nassert add_nested_tuples(((2, 4), (5, 6), (3, 10), (2, 11)), ((7, 8), (4, 10), (2, 2), (8, 4))) == ((9, 12), (9, 16), (5, 12), (10, 15))\nassert add_nested_tuples(((3, 5), (6, 7), (4, 11), (3, 12)), ((8, 9), (5, 11), (3, 3), (9, 5))) == ((11, 14), (11, 18), (7, 14), (12, 17))", "assert ncr_modp(10,2,13)==6\nassert ncr_modp(15,12,43)==25\nassert ncr_modp(17,9,18)==10", "assert is_valid_URL(\"https://www.google.com\") == True\nassert is_valid_URL(\"https:/www.gmail.com\") == False\nassert is_valid_URL(\"https:// www.redit.com\") == False", "assert minimum(1,2) == 1\nassert minimum(-5,-4) == -5\nassert minimum(0,0) == 0", "assert check_tuplex((\"w\", 3, \"r\", \"e\", \"s\", \"o\", \"u\", \"r\", \"c\", \"e\"),'r')==True\nassert check_tuplex((\"w\", 3, \"r\", \"e\", \"s\", \"o\", \"u\", \"r\", \"c\", \"e\"),'5')==False\nassert check_tuplex((\"w\", 3, \"r\", \"e\", \"s\", \"o\", \"u\", \"r\", \"c\",\"e\"),3)==True", "assert find_Parity(12) == \"Even Parity\"\nassert find_Parity(7) == \"Odd Parity\"\nassert find_Parity(10) == \"Even Parity\"", "assert rearrange_bigger(12)==21\nassert rearrange_bigger(10)==False\nassert rearrange_bigger(102)==120", "assert k_smallest_pairs([1,3,7],[2,4,6],2)==[[1, 2], [1, 4]]\nassert k_smallest_pairs([1,3,7],[2,4,6],1)==[[1, 2]]\nassert k_smallest_pairs([1,3,7],[2,4,6],7)==[[1, 2], [1, 4], [3, 2], [1, 6], [3, 4], [3, 6], [7, 2]]", "assert min_product_tuple([(2, 7), (2, 6), (1, 8), (4, 9)] )==8\nassert min_product_tuple([(10,20), (15,2), (5,10)] )==30\nassert min_product_tuple([(11,44), (10,15), (20,5), (12, 9)] )==100", "assert min_val(['Python', 3, 2, 4, 5, 'version'])==2\nassert min_val(['Python', 15, 20, 25])==15\nassert min_val(['Python', 30, 20, 40, 50, 'version'])==20", "assert snake_to_camel('android_tv') == 'AndroidTv'\nassert snake_to_camel('google_pixel') == 'GooglePixel'\nassert snake_to_camel('apple_watch') == 'AppleWatch'", "assert remove_odd([1,2,3]) == [2]\nassert remove_odd([2,4,6]) == [2,4,6]\nassert remove_odd([10,20,3]) == [10,20]", "assert extract_nth_element([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)] ,0)==['Greyson Fulton', 'Brady Kent', 'Wyatt Knott', 'Beau Turnbull']\nassert extract_nth_element([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)] ,2)==[99, 96, 94, 98]\nassert extract_nth_element([('Greyson Fulton', 98, 99), ('Brady Kent', 97, 96), ('Wyatt Knott', 91, 94), ('Beau Turnbull', 94, 98)],1)==[98, 97, 91, 94]", "assert overlapping([1,2,3,4,5],[6,7,8,9]) == False\nassert overlapping([1,2,3],[4,5,6]) == False\nassert overlapping([1,4,5],[1,4,5]) == True", "assert max_Product([1,2,3,4,7,0,8,4]) == (7,8)\nassert max_Product([0,-1,-2,-4,5,0,-6]) == (-4,-6)\nassert max_Product([1,2,3]) == (2,3)", "assert breakSum(12) == 13\nassert breakSum(24) == 27\nassert breakSum(23) == 23", "assert group_tuples([('x', 'y'), ('x', 'z'), ('w', 't')]) == [('x', 'y', 'z'), ('w', 't')]\nassert group_tuples([('a', 'b'), ('a', 'c'), ('d', 'e')]) == [('a', 'b', 'c'), ('d', 'e')]\nassert group_tuples([('f', 'g'), ('f', 'g'), ('h', 'i')]) == [('f', 'g', 'g'), ('h', 'i')]", "assert Find_Max([['A'],['A','B'],['A','B','C']]) == ['A','B','C']\nassert Find_Max([[1],[1,2],[1,2,3]]) == [1,2,3]\nassert Find_Max([[1,1],[1,2,3],[1,5,6,1]]) == [1,5,6,1]", "assert round_and_sum([22.4, 4.0, -16.22, -9.10, 11.00, -12.22, 14.20, -5.20, 17.50])==243\nassert round_and_sum([5,2,9,24.3,29])==345\nassert round_and_sum([25.0,56.7,89.2])==513", "assert cube_Sum(2) == 72\nassert cube_Sum(3) == 288\nassert cube_Sum(4) == 800", "assert concatenate_tuple((\"ID\", \"is\", 4, \"UTS\") ) == 'ID-is-4-UTS'\nassert concatenate_tuple((\"QWE\", \"is\", 4, \"RTY\") ) == 'QWE-is-4-RTY'\nassert concatenate_tuple((\"ZEN\", \"is\", 4, \"OP\") ) == 'ZEN-is-4-OP'", "assert find_Average_Of_Cube(2) == 4.5\nassert find_Average_Of_Cube(3) == 12\nassert find_Average_Of_Cube(1) == 1", "assert get_maxgold([[1, 3, 1, 5],[2, 2, 4, 1],[5, 0, 2, 3],[0, 6, 1, 2]],4,4)==16\nassert get_maxgold([[10,20],[30,40]],2,2)==70\nassert get_maxgold([[4,9],[3,7]],2,2)==13", "assert extract_rear(('Mers', 'for', 'Vers') ) == ['s', 'r', 's']\nassert extract_rear(('Avenge', 'for', 'People') ) == ['e', 'r', 'e']\nassert extract_rear(('Gotta', 'get', 'go') ) == ['a', 't', 'o']", "assert count_element_in_list([[1, 3], [5, 7], [1, 11], [1, 15, 7]],1)==3\nassert count_element_in_list([['A', 'B'], ['A', 'C'], ['A', 'D', 'E'], ['B', 'C', 'D']],'A')==3\nassert count_element_in_list([['A', 'B'], ['A', 'C'], ['A', 'D', 'E'], ['B', 'C', 'D']],'E')==1", "assert filter_oddnumbers([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])==[1,3,5,7,9]\nassert filter_oddnumbers([10,20,45,67,84,93])==[45,67,93]\nassert filter_oddnumbers([5,7,9,8,6,4,3])==[5,7,9,3]", "assert change_date_format(\"2026-01-02\") == '02-01-2026'\nassert change_date_format(\"2020-11-13\") == '13-11-2020'\nassert change_date_format(\"2021-04-26\") == '26-04-2021'", "assert shell_sort([12, 23, 4, 5, 3, 2, 12, 81, 56, 95]) == [2, 3, 4, 5, 12, 12, 23, 56, 81, 95]\nassert shell_sort([24, 22, 39, 34, 87, 73, 68]) == [22, 24, 34, 39, 68, 73, 87]\nassert shell_sort([32, 30, 16, 96, 82, 83, 74]) == [16, 30, 32, 74, 82, 83, 96]", "assert and_tuples((10, 4, 6, 9), (5, 2, 3, 3)) == (0, 0, 2, 1)\nassert and_tuples((1, 2, 3, 4), (5, 6, 7, 8)) == (1, 2, 3, 0)\nassert and_tuples((8, 9, 11, 12), (7, 13, 14, 17)) == (0, 9, 10, 0)", "assert parabola_directrix(5,3,2)==-198\nassert parabola_directrix(9,8,4)==-2336\nassert parabola_directrix(2,4,6)==-130", "assert common_element([1,2,3,4,5], [5,6,7,8,9])==True\nassert common_element([1,2,3,4,5], [6,7,8,9])==None\nassert common_element(['a','b','c'], ['d','b','e'])==True", "assert median_trapezium(15,25,35)==20\nassert median_trapezium(10,20,30)==15\nassert median_trapezium(6,9,4)==7.5", "assert check_greater([1, 2, 3, 4, 5], 4) == 'No, entered number is less than those in the array'\nassert check_greater([2, 3, 4, 5, 6], 8) == 'Yes, the entered number is greater than those in the array'\nassert check_greater([9, 7, 4, 8, 6, 1], 11) == 'Yes, the entered number is greater than those in the array'", "assert text_match_one(\"ac\")==('Not matched!')\nassert text_match_one(\"dc\")==('Not matched!')\nassert text_match_one(\"abba\")==('Found a match!')", "assert last_Digit(123) == 3\nassert last_Digit(25) == 5\nassert last_Digit(30) == 0", "assert neg_nos([-1,4,5,-6]) == -1,-6\nassert neg_nos([-1,-2,3,4]) == -1,-2\nassert neg_nos([-7,-6,8,9]) == -7,-6", "assert remove_odd(\"python\")==(\"yhn\")\nassert remove_odd(\"program\")==(\"rga\")\nassert remove_odd(\"language\")==(\"agae\")", "assert count_bidirectional([(5, 6), (1, 2), (6, 5), (9, 1), (6, 5), (2, 1)] ) == '3'\nassert count_bidirectional([(5, 6), (1, 3), (6, 5), (9, 1), (6, 5), (2, 1)] ) == '2'\nassert count_bidirectional([(5, 6), (1, 2), (6, 5), (9, 2), (6, 5), (2, 1)] ) == '4'", "assert multiple_to_single([11, 33, 50])==113350\nassert multiple_to_single([-1,2,3,4,5,6])==-123456\nassert multiple_to_single([10,15,20,25])==10152025", "assert find_adverb_position(\"clearly!! we can see the sky\")==(0, 7, 'clearly')\nassert find_adverb_position(\"seriously!! there are many roses\")==(0, 9, 'seriously')\nassert find_adverb_position(\"unfortunately!! sita is going to home\")==(0, 13, 'unfortunately')", "assert surfacearea_cube(5)==150\nassert surfacearea_cube(3)==54\nassert surfacearea_cube(10)==600", "assert positive_count([0, 1, 2, -1, -5, 6, 0, -3, -2, 3, 4, 6, 8])==0.54\nassert positive_count([2, 1, 2, -1, -5, 6, 4, -3, -2, 3, 4, 6, 8])==0.69\nassert positive_count([2, 4, -6, -9, 11, -12, 14, -5, 17])==0.56", "assert largest_neg([1,2,3,-4,-6]) == -6\nassert largest_neg([1,2,3,-8,-9]) == -9\nassert largest_neg([1,2,3,4,-1]) == -1", "assert trim_tuple([(5, 3, 2, 1, 4), (3, 4, 9, 2, 1),(9, 1, 2, 3, 5), (4, 8, 2, 1, 7)], 2) == '[(2,), (9,), (2,), (2,)]'\nassert trim_tuple([(5, 3, 2, 1, 4), (3, 4, 9, 2, 1), (9, 1, 2, 3, 5), (4, 8, 2, 1, 7)], 1) == '[(3, 2, 1), (4, 9, 2), (1, 2, 3), (8, 2, 1)]'\nassert trim_tuple([(7, 8, 4, 9), (11, 8, 12, 4),(4, 1, 7, 8), (3, 6, 9, 7)], 1) == '[(8, 4), (8, 12), (1, 7), (6, 9)]'", "assert index_multiplication(((1, 3), (4, 5), (2, 9), (1, 10)),((6, 7), (3, 9), (1, 1), (7, 3)) ) == ((6, 21), (12, 45), (2, 9), (7, 30))\nassert index_multiplication(((2, 4), (5, 6), (3, 10), (2, 11)),((7, 8), (4, 10), (2, 2), (8, 4)) ) == ((14, 32), (20, 60), (6, 20), (16, 44))\nassert index_multiplication(((3, 5), (6, 7), (4, 11), (3, 12)),((8, 9), (5, 11), (3, 3), (9, 5)) ) == ((24, 45), (30, 77), (12, 33), (27, 60))", "assert count_Occurrence(('a', 'a', 'c', 'b', 'd'),['a', 'b'] ) == 3\nassert count_Occurrence((1, 2, 3, 1, 4, 6, 7, 1, 4),[1, 4, 7]) == 6\nassert count_Occurrence((1,2,3,4,5,6),[1,2]) == 2", "assert cube_nums([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])==[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]\nassert cube_nums([10,20,30])==([1000, 8000, 27000])\nassert cube_nums([12,15])==([1728, 3375])", "assert cal_sum(9) == 49\nassert cal_sum(10) == 66\nassert cal_sum(11) == 88", "assert check_Triangle(1,5,2,5,4,6) == 'Yes'\nassert check_Triangle(1,1,1,4,1,5) == 'No'\nassert check_Triangle(1,1,1,1,1,1) == 'No'", "assert extract_string(['Python', 'list', 'exercises', 'practice', 'solution'] ,8)==['practice', 'solution']\nassert extract_string(['Python', 'list', 'exercises', 'practice', 'solution'] ,6)==['Python']\nassert extract_string(['Python', 'list', 'exercises', 'practice', 'solution'] ,9)==['exercises']", "assert remove_whitespaces(' Google Flutter ') == 'GoogleFlutter'\nassert remove_whitespaces(' Google Dart ') == 'GoogleDart'\nassert remove_whitespaces(' iOS Swift ') == 'iOSSwift'", "assert loss_amount(1500,1200)==None\nassert loss_amount(100,200)==100\nassert loss_amount(2000,5000)==3000", "assert sumofFactors(18) == 26\nassert sumofFactors(30) == 48\nassert sumofFactors(6) == 8", "assert text_match_wordz(\"pythonz.\")==('Found a match!')\nassert text_match_wordz(\"xyz.\")==('Found a match!')\nassert text_match_wordz(\" lang .\")==('Not matched!')", "assert check_monthnumb_number(5)==True\nassert check_monthnumb_number(2)==False\nassert check_monthnumb_number(6)==False", "assert reverse_string_list(['Red', 'Green', 'Blue', 'White', 'Black'])==['deR', 'neerG', 'eulB', 'etihW', 'kcalB']\nassert reverse_string_list(['john','amal','joel','george'])==['nhoj','lama','leoj','egroeg']\nassert reverse_string_list(['jack','john','mary'])==['kcaj','nhoj','yram']", "assert Find_Min([[1],[1,2],[1,2,3]]) == [1]\nassert Find_Min([[1,1],[1,1,1],[1,2,7,8]]) == [1,1]\nassert Find_Min([['x'],['x','y'],['x','y','z']]) == ['x']", "assert rectangle_area(10,20)==200\nassert rectangle_area(10,5)==50\nassert rectangle_area(4,2)==8", "assert remove_uppercase('cAstyoUrFavoRitETVshoWs') == 'cstyoravoitshos'\nassert remove_uppercase('wAtchTheinTernEtrAdIo') == 'wtchheinerntrdo'\nassert remove_uppercase('VoicESeaRchAndreComMendaTionS') == 'oiceachndreomendaion'", "assert Extract([[1, 2], [3, 4, 5], [6, 7, 8, 9]]) == [1, 3, 6]\nassert Extract([[1,2,3],[4, 5]]) == [1,4]\nassert Extract([[9,8,1],[1,2]]) == [9,1]", "assert upper_ctr('PYthon') == 1\nassert upper_ctr('BigData') == 1\nassert upper_ctr('program') == 0", "assert combinations_list(['orange', 'red', 'green', 'blue'])==[[], ['orange'], ['red'], ['red', 'orange'], ['green'], ['green', 'orange'], ['green', 'red'], ['green', 'red', 'orange'], ['blue'], ['blue', 'orange'], ['blue', 'red'], ['blue', 'red', 'orange'], ['blue', 'green'], ['blue', 'green', 'orange'], ['blue', 'green', 'red'], ['blue', 'green', 'red', 'orange']]\nassert combinations_list(['red', 'green', 'blue', 'white', 'black', 'orange'])==[[], ['red'], ['green'], ['green', 'red'], ['blue'], ['blue', 'red'], ['blue', 'green'], ['blue', 'green', 'red'], ['white'], ['white', 'red'], ['white', 'green'], ['white', 'green', 'red'], ['white', 'blue'], ['white', 'blue', 'red'], ['white', 'blue', 'green'], ['white', 'blue', 'green', 'red'], ['black'], ['black', 'red'], ['black', 'green'], ['black', 'green', 'red'], ['black', 'blue'], ['black', 'blue', 'red'], ['black', 'blue', 'green'], ['black', 'blue', 'green', 'red'], ['black', 'white'], ['black', 'white', 'red'], ['black', 'white', 'green'], ['black', 'white', 'green', 'red'], ['black', 'white', 'blue'], ['black', 'white', 'blue', 'red'], ['black', 'white', 'blue', 'green'], ['black', 'white', 'blue', 'green', 'red'], ['orange'], ['orange', 'red'], ['orange', 'green'], ['orange', 'green', 'red'], ['orange', 'blue'], ['orange', 'blue', 'red'], ['orange', 'blue', 'green'], ['orange', 'blue', 'green', 'red'], ['orange', 'white'], ['orange', 'white', 'red'], ['orange', 'white', 'green'], ['orange', 'white', 'green', 'red'], ['orange', 'white', 'blue'], ['orange', 'white', 'blue', 'red'], ['orange', 'white', 'blue', 'green'], ['orange', 'white', 'blue', 'green', 'red'], ['orange', 'black'], ['orange', 'black', 'red'], ['orange', 'black', 'green'], ['orange', 'black', 'green', 'red'], ['orange', 'black', 'blue'], ['orange', 'black', 'blue', 'red'], ['orange', 'black', 'blue', 'green'], ['orange', 'black', 'blue', 'green', 'red'], ['orange', 'black', 'white'], ['orange', 'black', 'white', 'red'], ['orange', 'black', 'white', 'green'], ['orange', 'black', 'white', 'green', 'red'], ['orange', 'black', 'white', 'blue'], ['orange', 'black', 'white', 'blue', 'red'], ['orange', 'black', 'white', 'blue', 'green'], ['orange', 'black', 'white', 'blue', 'green', 'red']]\nassert combinations_list(['red', 'green', 'black', 'orange'])==[[], ['red'], ['green'], ['green', 'red'], ['black'], ['black', 'red'], ['black', 'green'], ['black', 'green', 'red'], ['orange'], ['orange', 'red'], ['orange', 'green'], ['orange', 'green', 'red'], ['orange', 'black'], ['orange', 'black', 'red'], ['orange', 'black', 'green'], ['orange', 'black', 'green', 'red']]", "assert max_subarray_product([1, -2, -3, 0, 7, -8, -2]) == 112\nassert max_subarray_product([6, -3, -10, 0, 2]) == 180 \nassert max_subarray_product([-2, -40, 0, -2, -3]) == 80", "assert check_value({'Cierra Vega': 12, 'Alden Cantrell': 12, 'Kierra Gentry': 12, 'Pierre Cox': 12},10)==False\nassert check_value({'Cierra Vega': 12, 'Alden Cantrell': 12, 'Kierra Gentry': 12, 'Pierre Cox': 12},12)==True\nassert check_value({'Cierra Vega': 12, 'Alden Cantrell': 12, 'Kierra Gentry': 12, 'Pierre Cox': 12},5)==False", "assert drop_empty({'c1': 'Red', 'c2': 'Green', 'c3':None})=={'c1': 'Red', 'c2': 'Green'}\nassert drop_empty({'c1': 'Red', 'c2': None, 'c3':None})=={'c1': 'Red'}\nassert drop_empty({'c1': None, 'c2': 'Green', 'c3':None})=={ 'c2': 'Green'}", "assert find_peak([1, 3, 20, 4, 1, 0], 6) == 2\nassert find_peak([2, 3, 4, 5, 6], 5) == 4\nassert find_peak([8, 9, 11, 12, 14, 15], 6) == 5 ", "assert decimal_to_Octal(10) == 12\nassert decimal_to_Octal(2) == 2\nassert decimal_to_Octal(33) == 41", "assert max_product([3, 100, 4, 5, 150, 6], 6) == 45000 \nassert max_product([4, 42, 55, 68, 80], 5) == 50265600\nassert max_product([10, 22, 9, 33, 21, 50, 41, 60], 8) == 21780000 ", "assert max_profit([1, 5, 2, 3, 7, 6, 4, 5], 3) == 10\nassert max_profit([2, 4, 7, 5, 4, 3, 5], 2) == 7\nassert max_profit([10, 6, 8, 4, 2], 2) == 2", "assert add_pairwise((1, 5, 7, 8, 10)) == (6, 12, 15, 18)\nassert add_pairwise((2, 6, 8, 9, 11)) == (8, 14, 17, 20)\nassert add_pairwise((3, 7, 9, 10, 12)) == (10, 16, 19, 22)", "assert find_remainder([ 100, 10, 5, 25, 35, 14 ],6,11) ==9\nassert find_remainder([1,1,1],3,1) == 0\nassert find_remainder([1,2,1],3,2) == 0", "assert check_Consecutive([1,2,3,4,5]) == True\nassert check_Consecutive([1,2,3,5,6]) == False\nassert check_Consecutive([1,2,1]) == False", "assert tuple_intersection([(3, 4), (5, 6), (9, 10), (4, 5)] , [(5, 4), (3, 4), (6, 5), (9, 11)]) == {(4, 5), (3, 4), (5, 6)}\nassert tuple_intersection([(4, 1), (7, 4), (11, 13), (17, 14)] , [(1, 4), (7, 4), (16, 12), (10, 13)]) == {(4, 7), (1, 4)}\nassert tuple_intersection([(2, 1), (3, 2), (1, 3), (1, 4)] , [(11, 2), (2, 3), (6, 2), (1, 3)]) == {(1, 3), (2, 3)}", "assert replace_char(\"polygon\",'y','l')==(\"pollgon\")\nassert replace_char(\"character\",'c','a')==(\"aharaater\")\nassert replace_char(\"python\",'l','a')==(\"python\")", "assert sort_counter({'Math':81, 'Physics':83, 'Chemistry':87})==[('Chemistry', 87), ('Physics', 83), ('Math', 81)]\nassert sort_counter({'Math':400, 'Physics':300, 'Chemistry':250})==[('Math', 400), ('Physics', 300), ('Chemistry', 250)]\nassert sort_counter({'Math':900, 'Physics':1000, 'Chemistry':1250})==[('Chemistry', 1250), ('Physics', 1000), ('Math', 900)]", "assert big_sum([1,2,3]) == 4\nassert big_sum([-1,2,3,4]) == 3\nassert big_sum([2,3,6]) == 8", "assert is_lower(\"InValid\") == \"invalid\"\nassert is_lower(\"TruE\") == \"true\"\nassert is_lower(\"SenTenCE\") == \"sentence\"", "assert remove_lowercase(\"PYTHon\")==('PYTH')\nassert remove_lowercase(\"FInD\")==('FID')\nassert remove_lowercase(\"STRinG\")==('STRG')", "assert first_Digit(123) == 1\nassert first_Digit(456) == 4\nassert first_Digit(12) == 1", "assert get_max_occuring_char(\"data\") == \"a\"\nassert get_max_occuring_char(\"create\") == \"e\"\nassert get_max_occuring_char(\"brilliant girl\") == \"i\"", "assert is_subset_sum([3, 34, 4, 12, 5, 2], 6, 9) == True\nassert is_subset_sum([3, 34, 4, 12, 5, 2], 6, 30) == False\nassert is_subset_sum([3, 34, 4, 12, 5, 2], 6, 15) == True", "assert match(\"Geeks\") == 'Yes'\nassert match(\"geeksforGeeks\") == 'Yes'\nassert match(\"geeks\") == 'No'", "assert first_Factorial_Divisible_Number(10) == 5\nassert first_Factorial_Divisible_Number(15) == 5\nassert first_Factorial_Divisible_Number(5) == 4", "assert remove_matching_tuple([('Hello', 'dude'), ('How', 'are'), ('you', '?')], [('Hello', 'dude'), ('How', 'are')]) == [('you', '?')]\nassert remove_matching_tuple([('Part', 'of'), ('the', 'journey'), ('is ', 'end')], [('Journey', 'the'), ('is', 'end')]) == [('Part', 'of'), ('the', 'journey'), ('is ', 'end')]\nassert remove_matching_tuple([('Its', 'been'), ('a', 'long'), ('day', 'without')], [('a', 'long'), ('my', 'friend')]) == [('Its', 'been'), ('day', 'without')]", "assert largest_palindrome([1, 232, 54545, 999991], 4) == 54545\nassert largest_palindrome([1, 2, 3, 4, 5, 50], 6) == 5\nassert largest_palindrome([1, 3, 7, 9, 45], 5) == 9", "assert binomial_probability(10, 5, 1.0/3) == 0.13656454808718185\nassert binomial_probability(11, 6, 2.0/4) == 0.2255859375\nassert binomial_probability(12, 7, 3.0/5) == 0.227030335488", "assert sort_tuple([(1, 3), (3, 2), (2, 1)] ) == [(2, 1), (3, 2), (1, 3)]\nassert sort_tuple([(2, 4), (3, 3), (1, 1)] ) == [(1, 1), (3, 3), (2, 4)]\nassert sort_tuple([(3, 9), (6, 7), (4, 3)] ) == [(4, 3), (6, 7), (3, 9)]", "assert area_pentagon(5)==43.01193501472417\nassert area_pentagon(10)==172.0477400588967\nassert area_pentagon(15)==387.10741513251753", "assert frequency_Of_Largest(5,[1,2,3,4,4]) == 2\nassert frequency_Of_Largest(3,[5,6,5]) == 1\nassert frequency_Of_Largest(4,[2,7,7,7]) == 3", "assert extract_symmetric([(6, 7), (2, 3), (7, 6), (9, 8), (10, 2), (8, 9)] ) == {(8, 9), (6, 7)}\nassert extract_symmetric([(7, 8), (3, 4), (8, 7), (10, 9), (11, 3), (9, 10)] ) == {(9, 10), (7, 8)}\nassert extract_symmetric([(8, 9), (4, 5), (9, 8), (11, 10), (12, 4), (10, 11)] ) == {(8, 9), (10, 11)}", "assert sum_gp(1,5,2)==31\nassert sum_gp(1,5,4)==341\nassert sum_gp(2,6,3)==728", "assert binary_search([1,2,3,5,8], 6) == False\nassert binary_search([7, 8, 9, 10, 13], 10) == True\nassert binary_search([11, 13, 14, 19, 22, 36], 23) == False", "assert calculate_polygons(1,1, 4, 4, 3)==[[(-5.0, -4.196152422706632), (-5.0, -0.7320508075688767), (-2.0, 1.0), (1.0, -0.7320508075688767), (1.0, -4.196152422706632), (-2.0, -5.928203230275509), (-5.0, -4.196152422706632)], [(1.0, -4.196152422706632), (1.0, -0.7320508075688767), (4.0, 1.0), (7.0, -0.7320508075688767), (7.0, -4.196152422706632), (4.0, -5.928203230275509), (1.0, -4.196152422706632)], [(7.0, -4.196152422706632), (7.0, -0.7320508075688767), (10.0, 1.0), (13.0, -0.7320508075688767), (13.0, -4.196152422706632), (10.0, -5.928203230275509), (7.0, -4.196152422706632)], [(-2.0, 1.0000000000000004), (-2.0, 4.464101615137755), (1.0, 6.196152422706632), (4.0, 4.464101615137755), (4.0, 1.0000000000000004), (1.0, -0.7320508075688767), (-2.0, 1.0000000000000004)], [(4.0, 1.0000000000000004), (4.0, 4.464101615137755), (7.0, 6.196152422706632), (10.0, 4.464101615137755), (10.0, 1.0000000000000004), (7.0, -0.7320508075688767), (4.0, 1.0000000000000004)], [(-5.0, 6.196152422706632), (-5.0, 9.660254037844387), (-2.0, 11.392304845413264), (1.0, 9.660254037844387), (1.0, 6.196152422706632), (-2.0, 4.464101615137755), (-5.0, 6.196152422706632)], [(1.0, 6.196152422706632), (1.0, 9.660254037844387), (4.0, 11.392304845413264), (7.0, 9.660254037844387), (7.0, 6.196152422706632), (4.0, 4.464101615137755), (1.0, 6.196152422706632)], [(7.0, 6.196152422706632), (7.0, 9.660254037844387), (10.0, 11.392304845413264), (13.0, 9.660254037844387), (13.0, 6.196152422706632), (10.0, 4.464101615137755), (7.0, 6.196152422706632)], [(-2.0, 11.392304845413264), (-2.0, 14.85640646055102), (1.0, 16.588457268119896), (4.0, 14.85640646055102), (4.0, 11.392304845413264), (1.0, 9.660254037844387), (-2.0, 11.392304845413264)], [(4.0, 11.392304845413264), (4.0, 14.85640646055102), (7.0, 16.588457268119896), (10.0, 14.85640646055102), (10.0, 11.392304845413264), (7.0, 9.660254037844387), (4.0, 11.392304845413264)]]\nassert calculate_polygons(5,4,7,9,8)==[[(-11.0, -9.856406460551018), (-11.0, -0.6188021535170058), (-3.0, 4.0), (5.0, -0.6188021535170058), (5.0, -9.856406460551018), (-3.0, -14.475208614068023), (-11.0, -9.856406460551018)], [(5.0, -9.856406460551018), (5.0, -0.6188021535170058), (13.0, 4.0), (21.0, -0.6188021535170058), (21.0, -9.856406460551018), (13.0, -14.475208614068023), (5.0, -9.856406460551018)], [(21.0, -9.856406460551018), (21.0, -0.6188021535170058), (29.0, 4.0), (37.0, -0.6188021535170058), (37.0, -9.856406460551018), (29.0, -14.475208614068023), (21.0, -9.856406460551018)], [(-3.0, 4.0), (-3.0, 13.237604307034012), (5.0, 17.856406460551018), (13.0, 13.237604307034012), (13.0, 4.0), (5.0, -0.6188021535170058), (-3.0, 4.0)], [(13.0, 4.0), (13.0, 13.237604307034012), (21.0, 17.856406460551018), (29.0, 13.237604307034012), (29.0, 4.0), (21.0, -0.6188021535170058), (13.0, 4.0)], [(-11.0, 17.856406460551018), (-11.0, 27.09401076758503), (-3.0, 31.712812921102035), (5.0, 27.09401076758503), (5.0, 17.856406460551018), (-3.0, 13.237604307034012), (-11.0, 17.856406460551018)], [(5.0, 17.856406460551018), (5.0, 27.09401076758503), (13.0, 31.712812921102035), (21.0, 27.09401076758503), (21.0, 17.856406460551018), (13.0, 13.237604307034012), (5.0, 17.856406460551018)], [(21.0, 17.856406460551018), (21.0, 27.09401076758503), (29.0, 31.712812921102035), (37.0, 27.09401076758503), (37.0, 17.856406460551018), (29.0, 13.237604307034012), (21.0, 17.856406460551018)], [(-3.0, 31.712812921102035), (-3.0, 40.95041722813605), (5.0, 45.569219381653056), (13.0, 40.95041722813605), (13.0, 31.712812921102035), (5.0, 27.09401076758503), (-3.0, 31.712812921102035)], [(13.0, 31.712812921102035), (13.0, 40.95041722813605), (21.0, 45.569219381653056), (29.0, 40.95041722813605), (29.0, 31.712812921102035), (21.0, 27.09401076758503), (13.0, 31.712812921102035)]]\nassert calculate_polygons(9,6,4,3,2)==[[(5.0, 2.5358983848622456), (5.0, 4.8452994616207485), (7.0, 6.0), (9.0, 4.8452994616207485), (9.0, 2.5358983848622456), (7.0, 1.3811978464829942), (5.0, 2.5358983848622456)], [(7.0, 6.0), (7.0, 8.309401076758503), (9.0, 9.464101615137753), (11.0, 8.309401076758503), (11.0, 6.0), (9.0, 4.8452994616207485), (7.0, 6.0)]]", "assert binary_to_integer((1, 1, 0, 1, 0, 0, 1)) == '105'\nassert binary_to_integer((0, 1, 1, 0, 0, 1, 0, 1)) == '101'\nassert binary_to_integer((1, 1, 0, 1, 0, 1)) == '53'", "assert remove_lowercase('KDeoALOklOOHserfLoAJSIskdsf') == 'KDALOOOHLAJSI'\nassert remove_lowercase('ProducTnamEstreAmIngMediAplAYer') == 'PTEAIMAAY'\nassert remove_lowercase('maNufacTuredbYSheZenTechNolOGIes') == 'NTYSZTNOGI'", "assert heap_queue_smallest( [25, 35, 22, 85, 14, 65, 75, 25, 58],3)==[14, 22, 25] \nassert heap_queue_smallest( [25, 35, 22, 85, 14, 65, 75, 25, 58],2)==[14, 22]\nassert heap_queue_smallest( [25, 35, 22, 85, 14, 65, 75, 22, 58],5)==[14, 22, 22, 25, 35]", "assert surfacearea_cone(5,12)==282.7433388230814\nassert surfacearea_cone(10,15)==880.5179353159282\nassert surfacearea_cone(19,17)==2655.923961165254", "assert gcd(12, 17) == 1\nassert gcd(4,6) == 2\nassert gcd(2,9) == 1", "assert diameter_circle(10)==20\nassert diameter_circle(40)==80\nassert diameter_circle(15)==30", "assert concatenate_elements(['hello','there','have','a','rocky','day'] ) == ' hello there have a rocky day'\nassert concatenate_elements([ 'Hi', 'there', 'How','are', 'you'] ) == ' Hi there How are you'\nassert concatenate_elements([ 'Part', 'of', 'the','journey', 'is', 'end'] ) == ' Part of the journey is end'", "assert num_comm_div(2,4) == 2\nassert num_comm_div(2,8) == 2\nassert num_comm_div(12,24) == 6", "assert find(3,3) == 0\nassert find(10,3) == 1\nassert find(16,5) == 1", "assert add_consecutive_nums([1, 1, 3, 4, 4, 5, 6, 7])==[2, 4, 7, 8, 9, 11, 13]\nassert add_consecutive_nums([4, 5, 8, 9, 6, 10])==[9, 13, 17, 15, 16]\nassert add_consecutive_nums([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])==[3, 5, 7, 9, 11, 13, 15, 17, 19]", "assert sum_Of_Series(5) == 225\nassert sum_Of_Series(2) == 9\nassert sum_Of_Series(3) == 36", "assert re_order([6, 0, 8, 2, 3, 0, 4, 0, 1]) == [6, 8, 2, 3, 4, 1, 0, 0, 0]\nassert re_order([4, 0, 2, 7, 0, 9, 0, 12, 0]) == [4, 2, 7, 9, 12, 0, 0, 0, 0]\nassert re_order([3, 11, 0, 74, 14, 0, 1, 0, 2]) == [3, 11, 74, 14, 1, 2, 0, 0, 0]", "assert permutation_coefficient(10, 2) == 90\nassert permutation_coefficient(10, 3) == 720\nassert permutation_coefficient(10, 1) == 10", "assert remove_words(['red', 'green', 'blue', 'white', 'black', 'orange'],['white', 'orange'])==['red', 'green', 'blue', 'black']\nassert remove_words(['red', 'green', 'blue', 'white', 'black', 'orange'],['black', 'orange'])==['red', 'green', 'blue', 'white']\nassert remove_words(['red', 'green', 'blue', 'white', 'black', 'orange'],['blue', 'white'])==['red', 'green', 'black', 'orange']", "assert same_order([\"red\",\"green\",\"black\",\"orange\"],[\"red\",\"pink\",\"green\",\"white\",\"black\"])==True\nassert same_order([\"red\",\"pink\",\"green\",\"white\",\"black\"],[\"white\",\"orange\",\"pink\",\"black\"])==False\nassert same_order([\"red\",\"green\",\"black\",\"orange\"],[\"red\",\"pink\",\"green\",\"white\",\"black\"])==True", "assert average_Odd(9) == 5\nassert average_Odd(5) == 3\nassert average_Odd(11) == 6", "assert no_of_subsequences([1,2,3,4], 10) == 11\nassert no_of_subsequences([4,8,7,2], 50) == 9\nassert no_of_subsequences([5,6,7,8], 15) == 4"]
--------------------------------------------------------------------------------
/src/script/coreset.sh:
--------------------------------------------------------------------------------
1 | python data/raw_code_collection/main.py \
2 | --data_path input_example.jsonl \
3 | --save_path data/seed.jsonl \
4 | --batch_size 4096 \
5 | --seed 42 \
6 | --model_name sentence-transformers/all-roberta-large-v1 \
7 | --coreset_size 2
--------------------------------------------------------------------------------
/src/script/data_generate.sh:
--------------------------------------------------------------------------------
1 | set -ex
2 |
3 | python data/llm_gen_dis/main.py \
4 | --source_data_path data/seed.jsonl \
5 | --gen_prompt_path data/llm_gen_dis/prompt/generator.txt \
6 | --dis_prompt_path data/llm_gen_dis/prompt/discriminator.txt \
7 | --good_case_path data/llm_gen_dis/fewshot_case/good_case \
8 | --bad_case_path data/llm_gen_dis/fewshot_case/bad_case \
9 | --data_stream_path data/result.txt \
10 | --save_json \
11 | --gen_max_token 800 \
12 | --sample_size 1 \
13 | --output_data_path data/result.json \
14 | --openai_key your_key \
15 | --openai_url url \
16 | --openai_model your_model_name \
--------------------------------------------------------------------------------
/src/script/evaluate.sh:
--------------------------------------------------------------------------------
1 | python -m eval.mbpp.evaluate \
2 | --reference_path eval/reference/references_mbpp.json \
3 | --gen_code_path eval/mbpp_generation.json \
4 | --analyze_generation \
5 | --save_path eval/analysis/ \
6 |
--------------------------------------------------------------------------------
/src/script/generate.sh:
--------------------------------------------------------------------------------
1 | MODEL=microsoft/wavecoder-ds-6.7b
2 | python eval/mbpp/generate.py \
3 | --model_path $MODEL \
4 | --save_path eval/mbpp_generation.json \
5 | --max_new_tokens 512 \
6 | --temperature 0.0 \
7 | --n_samples 1 \
8 | --batch_size 1 \
9 | --top_p 1.0 \
10 |
--------------------------------------------------------------------------------
/src/script/train.sh:
--------------------------------------------------------------------------------
1 | LOG_PATH=train/log.txt
2 | torchrun --nproc_per_node=8 --master_port=20001 train/train_mem.py \
3 | --model_name_or_path deepseek-ai/deepseek-coder-6.7b-base \
4 | --data_path data_path \
5 | --bf16 True \
6 | --tf32 True \
7 | --output_dir your_path \
8 | --num_train_epochs 3 \
9 | --per_device_train_batch_size 4 \
10 | --per_device_eval_batch_size 2 \
11 | --gradient_accumulation_steps 16 \
12 | --evaluation_strategy "no" \
13 | --save_strategy "steps" \
14 | --save_steps 30 \
15 | --optim adafactor \
16 | --is_save_loss_spike False \
17 | --save_total_limit 17 \
18 | --learning_rate 5e-5 \
19 | --weight_decay 0. \
20 | --warmup_steps 15 \
21 | --lr_scheduler_type "linear" \
22 | --logging_steps 1 \
23 | --fsdp "full_shard auto_wrap" \
24 | --adam_epsilon 1e-6 \
25 | --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
26 | --model_max_length 2048 \
27 | --gradient_checkpointing True \
28 | --lazy_preprocess False > $LOG_PATH 2>&1
--------------------------------------------------------------------------------
/src/train/llama2_flash_attn_monkey_patch.py:
--------------------------------------------------------------------------------
1 | import warnings
2 | from typing import Optional, Tuple
3 |
4 | import torch
5 | from flash_attn import __version__ as flash_attn_version
6 | from flash_attn.bert_padding import pad_input, unpad_input
7 | from flash_attn.flash_attn_interface import (
8 | flash_attn_func,
9 | flash_attn_varlen_kvpacked_func,
10 | )
11 | from transformers.models.llama.modeling_llama import (
12 | LlamaAttention,
13 | LlamaModel,
14 | rotate_half,
15 | )
16 |
17 |
18 | def apply_rotary_pos_emb(q, k, cos_sin, position_ids):
19 | gather_indices = position_ids[:, :, None, None] # [bsz, seq_len, 1, 1]
20 | gather_indices = gather_indices.repeat(
21 | 1, 1, cos_sin[0].shape[1], cos_sin[0].shape[3]
22 | )
23 | bsz = gather_indices.shape[0]
24 | cos, sin = (
25 | torch.gather(x.transpose(1, 2).repeat(bsz, 1, 1, 1), 1, gather_indices)
26 | for x in cos_sin
27 | )
28 | q, k = ((x * cos) + (rotate_half(x) * sin) for x in (q, k))
29 | return q, k
30 |
31 |
32 | def forward(
33 | self,
34 | hidden_states: torch.Tensor,
35 | attention_mask: Optional[torch.Tensor] = None,
36 | position_ids: Optional[torch.Tensor] = None,
37 | past_key_value: Optional[Tuple[torch.Tensor]] = None,
38 | output_attentions: bool = False,
39 | use_cache: bool = False,
40 | padding_mask: Optional[torch.Tensor] = None,
41 | ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
42 | if output_attentions:
43 | warnings.warn(
44 | "Output attentions is not supported for patched `LlamaAttention`, returning `None` instead."
45 | )
46 |
47 | bsz, q_len, _ = hidden_states.size()
48 | kv_heads = getattr(self, "num_key_value_heads", self.num_heads)
49 |
50 | q, k, v = (
51 | op(hidden_states).view(bsz, q_len, nh, self.head_dim)
52 | for op, nh in (
53 | (self.q_proj, self.num_heads),
54 | (self.k_proj, kv_heads),
55 | (self.v_proj, kv_heads),
56 | )
57 | )
58 | # shape: (b, s, num_heads, head_dim)
59 |
60 | kv_seq_len = k.shape[1]
61 | past_kv_len = 0
62 | if past_key_value is not None:
63 | past_kv_len = past_key_value[0].shape[2]
64 | kv_seq_len += past_kv_len
65 |
66 | cos_sin = self.rotary_emb(v, seq_len=kv_seq_len)
67 | q, k = apply_rotary_pos_emb(q, k, cos_sin, position_ids)
68 |
69 | if past_key_value is not None:
70 | assert (
71 | flash_attn_version >= "2.1.0"
72 | ), "past_key_value support requires flash-attn >= 2.1.0"
73 | # reuse k, v
74 | k = torch.cat([past_key_value[0].transpose(1, 2), k], dim=1)
75 | v = torch.cat([past_key_value[1].transpose(1, 2), v], dim=1)
76 |
77 | past_key_value = (k.transpose(1, 2), v.transpose(1, 2)) if use_cache else None
78 |
79 | if attention_mask is None:
80 | output = flash_attn_func(q, k, v, 0.0, softmax_scale=None, causal=True).view(
81 | bsz, q_len, -1
82 | )
83 | else:
84 | q, indices, cu_q_lens, max_s = unpad_input(q, attention_mask[:, -q_len:])
85 | # We can skip concat and call unpad twice but seems better to call unpad only once.
86 | kv, _, cu_k_lens, max_k = unpad_input(
87 | torch.stack((k, v), dim=2), attention_mask
88 | )
89 | output_unpad = flash_attn_varlen_kvpacked_func(
90 | q,
91 | kv,
92 | cu_q_lens,
93 | cu_k_lens,
94 | max_s,
95 | max_k,
96 | 0.0,
97 | softmax_scale=None,
98 | causal=True,
99 | )
100 | output_unpad = output_unpad.reshape(-1, self.num_heads * self.head_dim)
101 | output = pad_input(output_unpad, indices, bsz, q_len)
102 |
103 | return self.o_proj(output), None, past_key_value
104 |
105 |
106 | # Disable the transformation of the attention mask in LlamaModel as flash attention
107 | # takes a boolean key_padding_mask. Fills in the past kv length for use in forward.
108 | def _prepare_decoder_attention_mask(
109 | self, attention_mask, input_shape, inputs_embeds, past_key_values_length
110 | ):
111 | # [bsz, seq_len]
112 | if past_key_values_length > 0 and attention_mask is not None:
113 | attention_mask = torch.cat(
114 | (
115 | torch.full(
116 | (input_shape[0], past_key_values_length),
117 | True,
118 | dtype=attention_mask.dtype,
119 | device=attention_mask.device,
120 | ),
121 | attention_mask,
122 | ),
123 | dim=-1,
124 | )
125 |
126 | if attention_mask is not None and torch.all(attention_mask):
127 | return None # This uses the faster call when training with full samples
128 |
129 | return attention_mask
130 |
131 |
132 | def replace_llama_attn_with_flash_attn():
133 | cuda_major, cuda_minor = torch.cuda.get_device_capability()
134 | if cuda_major < 8:
135 | warnings.warn(
136 | "Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward."
137 | "ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593"
138 | )
139 |
140 | LlamaModel._prepare_decoder_attention_mask = _prepare_decoder_attention_mask
141 | LlamaAttention.forward = forward
142 |
143 |
144 | def test():
145 | from fastchat.train.llama_flash_attn_monkey_patch import forward as fastchat_forward
146 | from transformers.models.llama.configuration_llama import LlamaConfig
147 |
148 | config = LlamaConfig(
149 | hidden_size=1024,
150 | intermediate_size=128,
151 | num_hidden_layers=1,
152 | num_attention_heads=8,
153 | max_position_embeddings=16,
154 | )
155 | device = torch.device("cuda")
156 | model = LlamaModel(config)
157 | attn = LlamaAttention(config).to(device).half()
158 | bsz, hs, seqlen = 2, config.hidden_size, config.max_position_embeddings
159 | position_ids = torch.arange(seqlen, dtype=torch.long, device=device).view(
160 | -1, seqlen
161 | )
162 |
163 | mask = torch.full((bsz, seqlen), True, dtype=torch.bool, device=device)
164 | for i in range(4):
165 | hidden = torch.rand((bsz, seqlen, hs), dtype=torch.float16, device=device)
166 | if i:
167 | mask[0, -i:] = False
168 | mask[1, :i] = False
169 |
170 | lmask = model._prepare_decoder_attention_mask(mask, hidden.shape[:2], hidden, 0)
171 | ref, _, _ = attn.forward(
172 | hidden, attention_mask=lmask, position_ids=position_ids
173 | )
174 |
175 | fast, _, _ = fastchat_forward(
176 | attn, hidden, attention_mask=mask, position_ids=position_ids
177 | )
178 |
179 | lmask = _prepare_decoder_attention_mask(
180 | model, mask, hidden.shape[:2], hidden, 0
181 | )
182 | test, _, _ = forward(
183 | attn, hidden, attention_mask=lmask, position_ids=position_ids
184 | )
185 |
186 | print(f"Mean(abs(ref)) = {torch.mean(torch.abs(ref))}")
187 | print(f"Mean(abs(ref - fast)) = {torch.mean(torch.abs(ref - fast))}")
188 | print(f"Mean(abs(ref - test)) = {torch.mean(torch.abs(ref - test))}")
189 | print(f"Mean(abs(fast - test)) = {torch.mean(torch.abs(fast - test))}")
190 | print(f"allclose(fast, test) = {torch.allclose(fast, test)}")
191 |
192 | with torch.no_grad():
193 | # Also check that past_kv is handled properly
194 | hidden = torch.rand((bsz, seqlen, hs), dtype=torch.float16, device=device)
195 | part_len = seqlen // 4
196 | assert part_len * 4 == seqlen
197 | mask = torch.full((bsz, seqlen), True, dtype=torch.bool, device=device)
198 | mask[0, -2:] = False
199 | lmask = _prepare_decoder_attention_mask(
200 | model, mask, hidden.shape[:2], hidden, 0
201 | )
202 | oneshot, _, _ = forward(
203 | attn, hidden, attention_mask=lmask, position_ids=position_ids
204 | )
205 | parts = []
206 | past_kv, past_kv_len = None, 0
207 | for i in range(4):
208 | start = part_len * i
209 | end = start + part_len
210 | hidden_part = hidden[:, start:end, ...]
211 | lmask = _prepare_decoder_attention_mask(
212 | model,
213 | mask[:, start:end],
214 | hidden_part.shape[:2],
215 | hidden_part,
216 | past_kv_len,
217 | )
218 | part, _, past_kv = forward(
219 | attn,
220 | hidden_part.clone(),
221 | attention_mask=lmask,
222 | position_ids=position_ids[:, start:end],
223 | past_key_value=past_kv,
224 | use_cache=True,
225 | )
226 | parts.append(part)
227 | past_kv_len = past_kv[0].shape[2]
228 |
229 | print(
230 | f"allclose(oneshot[:, 0], parts[0]) = {torch.allclose(oneshot[:, :part_len], parts[0])}"
231 | )
232 | print(
233 | f"allclose(oneshot, parts) = {torch.allclose(oneshot, torch.cat(parts, dim=1))}"
234 | )
235 |
236 |
237 | if __name__ == "__main__":
238 | test()
239 |
--------------------------------------------------------------------------------
/src/train/train.py:
--------------------------------------------------------------------------------
1 | # This code is based on tatsu-lab/stanford_alpaca (https://github.com/tatsu-lab/stanford_alpaca).
2 |
3 | from dataclasses import dataclass, field
4 | import json
5 | import logging
6 | import math
7 | import pathlib
8 | import copy
9 | from typing import Dict, Optional, Sequence
10 |
11 | import numpy as np
12 | import torch
13 | from torch.utils.data import Dataset
14 | import transformers
15 | from transformers import Trainer
16 | from transformers.trainer_pt_utils import LabelSmoother
17 | from transformers import set_seed
18 |
19 | import utils
20 | from fastchat.conversation import SeparatorStyle
21 | from fastchat.model.model_adapter import get_conversation_template
22 |
23 | IGNORE_TOKEN_ID = LabelSmoother.ignore_index
24 |
25 |
26 | IGNORE_INDEX = -100
27 | DEFAULT_PAD_TOKEN = "[PAD]"
28 | DEFAULT_EOS_TOKEN = ""
29 | DEFAULT_BOS_TOKEN = ""
30 | DEFAULT_UNK_TOKEN = ""
31 | PROMPT_DICT = {
32 | "prompt_input": (
33 | "Below is an instruction that describes a task, paired with an input that provides further context. "
34 | "Write a response that appropriately completes the request.\n\n"
35 | "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
36 | ),
37 | "prompt_no_input": (
38 | "Below is an instruction that describes a task. "
39 | "Write a response that appropriately completes the request.\n\n"
40 | "### Instruction:\n{instruction}\n\n### Response:"
41 | ),
42 | }
43 |
44 |
45 | set_seed(42)
46 |
47 |
48 | @dataclass
49 | class ModelArguments:
50 | model_name_or_path: Optional[str] = field(default="facebook/opt-125m")
51 |
52 |
53 | @dataclass
54 | class DataArguments:
55 | data_path: str = field(
56 | default=None, metadata={"help": "Path to the training data."}
57 | )
58 | eval_data_path: str = field(
59 | default=None, metadata={"help": "Path to the evaluation data."}
60 | )
61 | lazy_preprocess: bool = False
62 |
63 |
64 | @dataclass
65 | class TrainingArguments(transformers.TrainingArguments):
66 | cache_dir: Optional[str] = field(default=None)
67 | optim: str = field(default="adamw_torch")
68 | model_max_length: int = field(
69 | default=512,
70 | metadata={
71 | "help": "Maximum sequence length. Sequences will be right padded (and possibly truncated)."
72 | },
73 | )
74 | is_save_loss_spike: bool = field(default=False)
75 |
76 |
77 | local_rank = None
78 |
79 |
80 | def rank0_print(*args):
81 | if local_rank == 0:
82 | print(*args)
83 |
84 |
85 | def trainer_save_model_safe(trainer: transformers.Trainer):
86 | from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
87 | from torch.distributed.fsdp import StateDictType, FullStateDictConfig
88 |
89 | save_policy = FullStateDictConfig(offload_to_cpu=True, rank0_only=True)
90 | with FSDP.state_dict_type(
91 | trainer.model, StateDictType.FULL_STATE_DICT, save_policy
92 | ):
93 | trainer.save_model()
94 |
95 |
96 | def smart_tokenizer_and_embedding_resize(
97 | special_tokens_dict: Dict,
98 | tokenizer: transformers.PreTrainedTokenizer,
99 | model: transformers.PreTrainedModel,
100 | ):
101 | """Resize tokenizer and embedding.
102 |
103 | Note: This is the unoptimized version that may make your embedding size not be divisible by 64.
104 | """
105 | num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)
106 | model.resize_token_embeddings(len(tokenizer))
107 |
108 | if num_new_tokens > 0:
109 | input_embeddings = model.get_input_embeddings().weight.data
110 | output_embeddings = model.get_output_embeddings().weight.data
111 |
112 | input_embeddings_avg = input_embeddings[:-num_new_tokens].mean(
113 | dim=0, keepdim=True
114 | )
115 | output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(
116 | dim=0, keepdim=True
117 | )
118 |
119 | input_embeddings[-num_new_tokens:] = input_embeddings_avg
120 | output_embeddings[-num_new_tokens:] = output_embeddings_avg
121 |
122 |
123 | def _tokenize_fn(
124 | strings: Sequence[str], tokenizer: transformers.PreTrainedTokenizer
125 | ) -> Dict:
126 | """Tokenize a list of strings."""
127 | tokenized_list = [
128 | tokenizer(
129 | text,
130 | return_tensors="pt",
131 | padding="longest",
132 | max_length=tokenizer.model_max_length,
133 | truncation=True,
134 | )
135 | for text in strings
136 | ]
137 | input_ids = labels = [tokenized.input_ids[0] for tokenized in tokenized_list]
138 | input_ids_lens = labels_lens = [
139 | tokenized.input_ids.ne(tokenizer.pad_token_id).sum().item()
140 | for tokenized in tokenized_list
141 | ]
142 | return dict(
143 | input_ids=input_ids,
144 | labels=labels,
145 | input_ids_lens=input_ids_lens,
146 | labels_lens=labels_lens,
147 | )
148 |
149 |
150 | def preprocess(
151 | sources: Sequence[str],
152 | targets: Sequence[str],
153 | tokenizer: transformers.PreTrainedTokenizer,
154 | ) -> Dict:
155 | """Preprocess the data by tokenizing."""
156 | examples = [s + t for s, t in zip(sources, targets)]
157 | examples_tokenized, sources_tokenized = [
158 | _tokenize_fn(strings, tokenizer) for strings in (examples, sources)
159 | ]
160 | input_ids = examples_tokenized["input_ids"]
161 | labels = copy.deepcopy(input_ids)
162 | for label, source_len in zip(labels, sources_tokenized["input_ids_lens"]):
163 | label[:source_len] = IGNORE_INDEX
164 | return dict(input_ids=input_ids, labels=labels)
165 |
166 |
167 | class SupervisedDataset(Dataset):
168 | """Dataset for supervised fine-tuning."""
169 |
170 | def __init__(self, data_path: str, tokenizer: transformers.PreTrainedTokenizer):
171 | super(SupervisedDataset, self).__init__()
172 | logging.warning("Loading data...")
173 | list_data_dict = utils.jload(data_path)
174 |
175 | logging.warning("Formatting inputs...")
176 | prompt_input, prompt_no_input = (
177 | PROMPT_DICT["prompt_input"],
178 | PROMPT_DICT["prompt_no_input"],
179 | )
180 | sources = [
181 | (
182 | prompt_input.format_map(example)
183 | if example.get("input", "") != ""
184 | else prompt_no_input.format_map(example)
185 | )
186 | for example in list_data_dict
187 | ]
188 | targets = [
189 | f"{example['output']}{tokenizer.eos_token}" for example in list_data_dict
190 | ]
191 |
192 | logging.warning("Tokenizing inputs... This may take some time...")
193 | data_dict = preprocess(sources, targets, tokenizer)
194 |
195 | self.input_ids = data_dict["input_ids"]
196 | self.labels = data_dict["labels"]
197 |
198 | def __len__(self):
199 | return len(self.input_ids)
200 |
201 | def __getitem__(self, i) -> Dict[str, torch.Tensor]:
202 | return dict(input_ids=self.input_ids[i], labels=self.labels[i])
203 |
204 |
205 | @dataclass
206 | class DataCollatorForSupervisedDataset(object):
207 | """Collate examples for supervised fine-tuning."""
208 |
209 | tokenizer: transformers.PreTrainedTokenizer
210 |
211 | def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
212 | input_ids, labels = tuple(
213 | [instance[key] for instance in instances] for key in ("input_ids", "labels")
214 | )
215 | input_ids = torch.nn.utils.rnn.pad_sequence(
216 | input_ids, batch_first=True, padding_value=self.tokenizer.pad_token_id
217 | )
218 | labels = torch.nn.utils.rnn.pad_sequence(
219 | labels, batch_first=True, padding_value=IGNORE_INDEX
220 | )
221 | return dict(
222 | input_ids=input_ids,
223 | labels=labels,
224 | attention_mask=input_ids.ne(self.tokenizer.pad_token_id),
225 | )
226 |
227 |
228 | def make_supervised_data_module(
229 | tokenizer: transformers.PreTrainedTokenizer, data_args
230 | ) -> Dict:
231 | """Make dataset and collator for supervised fine-tuning."""
232 | train_dataset = SupervisedDataset(
233 | tokenizer=tokenizer, data_path=data_args.data_path
234 | )
235 | data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
236 | print(f"len={len(train_dataset)}")
237 | return dict(
238 | train_dataset=train_dataset, eval_dataset=None, data_collator=data_collator
239 | )
240 |
241 |
242 | def train():
243 | global local_rank
244 |
245 | parser = transformers.HfArgumentParser(
246 | (ModelArguments, DataArguments, TrainingArguments)
247 | )
248 | model_args, data_args, training_args = parser.parse_args_into_dataclasses()
249 | local_rank = training_args.local_rank
250 |
251 | # Set RoPE scaling factor
252 | config = transformers.AutoConfig.from_pretrained(
253 | model_args.model_name_or_path,
254 | cache_dir=training_args.cache_dir,
255 | )
256 | orig_ctx_len = getattr(config, "max_position_embeddings", None)
257 | if orig_ctx_len and training_args.model_max_length > orig_ctx_len:
258 | scaling_factor = float(math.ceil(training_args.model_max_length / orig_ctx_len))
259 | config.rope_scaling = {"type": "linear", "factor": scaling_factor}
260 | config.use_cache = False
261 |
262 | # Load model and tokenizer
263 | model = transformers.AutoModelForCausalLM.from_pretrained(
264 | model_args.model_name_or_path,
265 | config=config,
266 | cache_dir=training_args.cache_dir,
267 | )
268 |
269 | tokenizer = transformers.AutoTokenizer.from_pretrained(
270 | model_args.model_name_or_path,
271 | cache_dir=training_args.cache_dir,
272 | model_max_length=training_args.model_max_length,
273 | padding_side="right",
274 | use_fast=False,
275 | )
276 | # tokenizer.pad_token = tokenizer.unk_token
277 | special_tokens_dict = dict()
278 | if tokenizer.pad_token is None:
279 | special_tokens_dict["pad_token"] = DEFAULT_PAD_TOKEN
280 | if tokenizer.eos_token is None:
281 | special_tokens_dict["eos_token"] = DEFAULT_EOS_TOKEN
282 | if tokenizer.bos_token is None:
283 | special_tokens_dict["bos_token"] = DEFAULT_BOS_TOKEN
284 | if tokenizer.unk_token is None:
285 | special_tokens_dict["unk_token"] = DEFAULT_UNK_TOKEN
286 | print(f"special_tokens_dict={special_tokens_dict}\tuse_fast=False")
287 | smart_tokenizer_and_embedding_resize(
288 | special_tokens_dict=special_tokens_dict,
289 | tokenizer=tokenizer,
290 | model=model,
291 | )
292 |
293 | # Load data
294 | data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
295 |
296 | # Start trainner
297 | trainer = Trainer(
298 | model=model, tokenizer=tokenizer, args=training_args, **data_module
299 | )
300 | if list(pathlib.Path(training_args.output_dir).glob("checkpoint-*")):
301 | trainer.train(resume_from_checkpoint=True)
302 | else:
303 | trainer.train()
304 |
305 | # Save model
306 | model.config.use_cache = True
307 | trainer.save_state()
308 | trainer_save_model_safe(trainer)
309 |
310 |
311 | if __name__ == "__main__":
312 | train()
313 |
--------------------------------------------------------------------------------
/src/train/train_mem.py:
--------------------------------------------------------------------------------
1 |
2 | from llama2_flash_attn_monkey_patch import (
3 | replace_llama_attn_with_flash_attn,
4 | )
5 |
6 | replace_llama_attn_with_flash_attn()
7 |
8 | from train import train
9 |
10 |
11 | if __name__ == "__main__":
12 | train()
--------------------------------------------------------------------------------
/src/train/utils.py:
--------------------------------------------------------------------------------
1 | import dataclasses
2 | import logging
3 | import math
4 | import os
5 | import io
6 | import sys
7 | import time
8 | import json
9 | from typing import Optional, Sequence, Union
10 |
11 | import openai
12 | import tqdm
13 | from openai import openai_object
14 | import copy
15 |
16 | StrOrOpenAIObject = Union[str, openai_object.OpenAIObject]
17 |
18 | openai_org = os.getenv("OPENAI_ORG")
19 | if openai_org is not None:
20 | openai.organization = openai_org
21 | logging.warning(f"Switching to organization: {openai_org} for OAI API key.")
22 |
23 |
24 | @dataclasses.dataclass
25 | class OpenAIDecodingArguments(object):
26 | max_tokens: int = 1800
27 | temperature: float = 0.2
28 | top_p: float = 1.0
29 | n: int = 1
30 | stream: bool = False
31 | stop: Optional[Sequence[str]] = None
32 | presence_penalty: float = 0.0
33 | frequency_penalty: float = 0.0
34 | suffix: Optional[str] = None
35 | logprobs: Optional[int] = None
36 | echo: bool = False
37 |
38 |
39 | def openai_completion(
40 | prompts: Union[str, Sequence[str], Sequence[dict[str, str]], dict[str, str]],
41 | decoding_args: OpenAIDecodingArguments,
42 | model_name="text-davinci-003",
43 | sleep_time=2,
44 | batch_size=1,
45 | max_instances=sys.maxsize,
46 | max_batches=sys.maxsize,
47 | return_text=False,
48 | **decoding_kwargs,
49 | ) -> Union[Union[StrOrOpenAIObject], Sequence[StrOrOpenAIObject], Sequence[Sequence[StrOrOpenAIObject]],]:
50 | """Decode with OpenAI API.
51 |
52 | Args:
53 | prompts: A string or a list of strings to complete. If it is a chat model the strings should be formatted
54 | as explained here: https://github.com/openai/openai-python/blob/main/chatml.md. If it is a chat model
55 | it can also be a dictionary (or list thereof) as explained here:
56 | https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb
57 | decoding_args: Decoding arguments.
58 | model_name: Model name. Can be either in the format of "org/model" or just "model".
59 | sleep_time: Time to sleep once the rate-limit is hit.
60 | batch_size: Number of prompts to send in a single request. Only for non chat model.
61 | max_instances: Maximum number of prompts to decode.
62 | max_batches: Maximum number of batches to decode. This argument will be deprecated in the future.
63 | return_text: If True, return text instead of full completion object (which contains things like logprob).
64 | decoding_kwargs: Additional decoding arguments. Pass in `best_of` and `logit_bias` if you need them.
65 |
66 | Returns:
67 | A completion or a list of completions.
68 | Depending on return_text, return_openai_object, and decoding_args.n, the completion type can be one of
69 | - a string (if return_text is True)
70 | - an openai_object.OpenAIObject object (if return_text is False)
71 | - a list of objects of the above types (if decoding_args.n > 1)
72 | """
73 | is_single_prompt = isinstance(prompts, (str, dict))
74 | if is_single_prompt:
75 | prompts = [prompts]
76 |
77 | if max_batches < sys.maxsize:
78 | logging.warning(
79 | "`max_batches` will be deprecated in the future, please use `max_instances` instead."
80 | "Setting `max_instances` to `max_batches * batch_size` for now."
81 | )
82 | max_instances = max_batches * batch_size
83 |
84 | prompts = prompts[:max_instances]
85 | num_prompts = len(prompts)
86 | prompt_batches = [
87 | prompts[batch_id * batch_size : (batch_id + 1) * batch_size]
88 | for batch_id in range(int(math.ceil(num_prompts / batch_size)))
89 | ]
90 |
91 | completions = []
92 | for batch_id, prompt_batch in tqdm.tqdm(
93 | enumerate(prompt_batches),
94 | desc="prompt_batches",
95 | total=len(prompt_batches),
96 | ):
97 | batch_decoding_args = copy.deepcopy(decoding_args) # cloning the decoding_args
98 |
99 | while True:
100 | try:
101 | shared_kwargs = dict(
102 | model=model_name,
103 | **batch_decoding_args.__dict__,
104 | **decoding_kwargs,
105 | )
106 | completion_batch = openai.Completion.create(prompt=prompt_batch, **shared_kwargs)
107 | choices = completion_batch.choices
108 |
109 | for choice in choices:
110 | choice["total_tokens"] = completion_batch.usage.total_tokens
111 | completions.extend(choices)
112 | break
113 | except openai.error.OpenAIError as e:
114 | logging.warning(f"OpenAIError: {e}.")
115 | if "Please reduce your prompt" in str(e):
116 | batch_decoding_args.max_tokens = int(batch_decoding_args.max_tokens * 0.8)
117 | logging.warning(f"Reducing target length to {batch_decoding_args.max_tokens}, Retrying...")
118 | else:
119 | logging.warning("Hit request rate limit; retrying...")
120 | time.sleep(sleep_time) # Annoying rate limit on requests.
121 |
122 | if return_text:
123 | completions = [completion.text for completion in completions]
124 | if decoding_args.n > 1:
125 | # make completions a nested list, where each entry is a consecutive decoding_args.n of original entries.
126 | completions = [completions[i : i + decoding_args.n] for i in range(0, len(completions), decoding_args.n)]
127 | if is_single_prompt:
128 | # Return non-tuple if only 1 input and 1 generation.
129 | (completions,) = completions
130 | return completions
131 |
132 |
133 | def _make_w_io_base(f, mode: str):
134 | if not isinstance(f, io.IOBase):
135 | f_dirname = os.path.dirname(f)
136 | if f_dirname != "":
137 | os.makedirs(f_dirname, exist_ok=True)
138 | f = open(f, mode=mode)
139 | return f
140 |
141 |
142 | def _make_r_io_base(f, mode: str):
143 | if not isinstance(f, io.IOBase):
144 | f = open(f, mode=mode)
145 | return f
146 |
147 |
148 | def jdump(obj, f, mode="w", indent=4, default=str):
149 | """Dump a str or dictionary to a file in json format.
150 |
151 | Args:
152 | obj: An object to be written.
153 | f: A string path to the location on disk.
154 | mode: Mode for opening the file.
155 | indent: Indent for storing json dictionaries.
156 | default: A function to handle non-serializable entries; defaults to `str`.
157 | """
158 | f = _make_w_io_base(f, mode)
159 | if isinstance(obj, (dict, list)):
160 | json.dump(obj, f, indent=indent, default=default)
161 | elif isinstance(obj, str):
162 | f.write(obj)
163 | else:
164 | raise ValueError(f"Unexpected type: {type(obj)}")
165 | f.close()
166 |
167 |
168 | def jload(f, mode="r"):
169 | """Load a .json file into a dictionary."""
170 | f = _make_r_io_base(f, mode)
171 | jdict = json.load(f)
172 | f.close()
173 | return jdict
174 |
--------------------------------------------------------------------------------