├── .gitignore
├── README.md
├── alt_app.py
├── app.py
├── dataset_uploader.py
├── requirements.txt
└── templates
├── dpo_form.html
├── index.html
├── main.js
└── sft_form.html
/.gitignore:
--------------------------------------------------------------------------------
1 | /README_files/
2 | /venv/
3 | README.html
4 | dpo_data.json
5 | sft_data.json
6 | __pycache__/
7 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ShareGPT Builder
2 |
3 | ShareGPT Builder is a versatile Gradio application that provides two key functionalities for training Language Learning Models (LLMs).
4 |
5 | The application is designed to run locally, and submitted examples will be stored locally in the applications directory, but can also be served as a web application to anyone.
6 |
7 | ### Supervised Fine-Tuning (SFT) Conversation Sample Builder:
8 |
9 | Firstly, it allows you to manually construct and store ShareGPT Formatted (SFT) conversations involving a system, a human, and GPT role or the Standard Formatted conversations involving a system, a user and an assistant. These conversations are automatically uploaded to huggingface.
10 |
11 | For datasets using this format, refer to the [Hermes 2.5 Dataset here](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
12 |
13 |
14 |
15 |
16 | ### Direct Preference Optimization (DPO) RLHF Sample Builder:
17 | Secondly, the application also includes a DPO Sample Builder. This feature enables the creation of sample comparison conversation responses, for Reinforcement Learning from Human Feedback (RLHF). This data gets automatically uploaded to the hub, and is in the Intel NeuralChat DPO format.
18 |
19 |
20 |
21 | ### Datasets inspector
22 | In this tab you can check all of your uploaded datasets, since the datasets are not uploaded in real time and there's an interval between the commits you will have to wait a little bit until the upload finishes as well as huggingface dataset viewer finished processing the newly commited data.
23 |
24 |
25 |
26 | ## Installation
27 |
28 | 1. Clone the repository:
29 | ```bash
30 | git clone https://github.com/teknium1/sharegpt-builder.git
31 | ```
32 |
33 | 2. Navigate to the project directory:
34 | ```bash
35 | cd sharegpt-builder
36 | ```
37 |
38 | 3. Install the required Python packages:
39 | ```bash
40 | pip install -r requirements.txt
41 | ```
42 |
43 | 4. login with your HuggingFace token with write access if you aren't already:
44 | ```bash
45 | huggingface-cli login
46 | ```
47 |
48 | ## Usage
49 |
50 | 1. Run the Gradio application:
51 | ```bash
52 | python app.py
53 | ```
54 |
55 | 2. Open your web browser and navigate to `http://127.0.0.1:7860/`.
56 |
57 | 3. You will find 2 tabs, one for SFT and one for DPO, navigate to the one you want to contribute to and click there.
58 |
59 | 4. To add more turns to the conversation, fill the text field and press **↳ enter**
60 |
61 | 5. After adding all the turns, click `save chat` to upload the conversation.
62 |
63 | 6. The uploaded conversations can be viewed directly on the hub.
64 |
65 | ## Contributing
66 | Contributions are welcome and greatly appreciated! Every little bit helps, and credit will always be given.
67 |
68 | * `12/17/2024` : Thanks to [not-lain](https://github.com/not-lain) for fixing sharegpt template as well as adding the dataset viewer tab
69 | * `12/13/2024` : Thanks to [aldryss](https://github.com/aldryss) for updating the UI 🔥
70 | * `12/12/2024` : Thanks to [not-lain](https://github.com/not-lain) for the help switching from flask to gradio and supporting automatic dataset upload 🔥
71 |
72 |
73 | Here are ways to contribute:
74 |
75 | 1. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
76 | 2. Fork the repository on GitHub and start making your changes to a new branch.
77 | 3. Write a test which shows that the bug was fixed or that the feature works as expected.
78 | 4. Send a pull request and bug the maintainer until it gets merged and published.
79 |
80 | Alternatively, you can contribute via submission of bugs or feature requests to the issues tab.
81 |
82 | ## Note
83 |
84 | The application is set to run in debug mode. For production use, make sure to turn off debug mode in `app.py`.
85 |
86 | ## License
87 |
88 | This project is licensed under the terms of the MIT license.
89 |
--------------------------------------------------------------------------------
/alt_app.py:
--------------------------------------------------------------------------------
1 | from flask import Flask, render_template, request, redirect, url_for
2 | import json, os
3 |
4 | app = Flask(__name__)
5 |
6 | def clean_entry(entry):
7 | entry = entry.strip().replace("\r", "").replace(" \n", "\n")
8 | return entry
9 |
10 | # Route for index/main page
11 | @app.route('/', defaults={'active_tab': 'sft'})
12 | @app.route('/')
13 | def index(active_tab):
14 | return render_template('index.html', active_tab=active_tab)
15 |
16 | # Route for the SFT Dataset Builder.
17 | @app.route('/sft', methods=['GET', 'POST'])
18 | def form():
19 | if request.method == 'POST':
20 | # Extract form data
21 | system_prompt = request.form.get('system')
22 | user_prompts = request.form.getlist('user[]')
23 | gpt_responses = request.form.getlist('gpt[]')
24 |
25 | # Clean the system prompt, user prompts, and gpt responses
26 | system_prompt = clean_entry(system_prompt)
27 | user_prompts = [clean_entry(prompt) for prompt in user_prompts]
28 | gpt_responses = [clean_entry(response) for response in gpt_responses]
29 |
30 | # Data to be appended
31 | data_to_append = {
32 | 'conversations': [
33 | {
34 | 'from': 'system',
35 | 'value': system_prompt
36 | }
37 | ],
38 | 'source': 'manual'
39 | }
40 |
41 | # Add turns to the conversation
42 | for user_prompt, gpt_response in zip(user_prompts, gpt_responses):
43 | data_to_append['conversations'].append({
44 | 'from': 'human',
45 | 'value': user_prompt
46 | })
47 | data_to_append['conversations'].append({
48 | 'from': 'gpt',
49 | 'value': gpt_response
50 | })
51 |
52 | # File path
53 | file_path = './sft_data.json'
54 |
55 | # Check if file exists and append data
56 | if os.path.exists(file_path):
57 | with open(file_path, 'r+', encoding='utf-8') as file:
58 | data = json.load(file)
59 | data.append(data_to_append)
60 | file.seek(0)
61 | json.dump(data, file, indent=4)
62 | else:
63 | with open(file_path, 'w', encoding='utf-8') as file:
64 | json.dump([data_to_append], file, indent=4)
65 |
66 | return redirect(url_for('index'))
67 | return redirect(url_for('index'))
68 |
69 | # Route for the DPO dataset builder
70 | @app.route('/dpo', methods=['GET', 'POST'])
71 | def dpo_form():
72 | if request.method == 'POST':
73 | # Extract form data
74 | system_prompt = request.form.get('system')
75 | prompt = request.form.get('prompt')
76 | chosen = request.form.get('chosen')
77 | rejected = request.form.get('rejected')
78 |
79 | # Data to be appended
80 | data_to_append = {
81 | 'system': clean_entry(system_prompt),
82 | 'question': clean_entry(prompt),
83 | 'chosen': clean_entry(chosen),
84 | 'rejected': clean_entry(rejected),
85 | 'source': 'manual'
86 | }
87 |
88 | # File path
89 | file_path = './dpo_data.json'
90 |
91 | # Check if file exists and append data
92 | if os.path.exists(file_path):
93 | with open(file_path, 'r+', encoding='utf-8') as file:
94 | data = json.load(file)
95 | data.append(data_to_append)
96 | file.seek(0)
97 | json.dump(data, file, indent=4)
98 | else:
99 | with open(file_path, 'w', encoding='utf-8') as file:
100 | json.dump([data_to_append], file, indent=4)
101 |
102 | return "Success", 200
103 | return render_template('index.html', active_tab='dpo')
104 |
105 | if __name__ == '__main__':
106 | app.run(debug=True, port=7272)
107 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | import gradio as gr
2 | from huggingface_hub import whoami
3 | import datetime
4 | from dataset_uploader import ParquetScheduler
5 |
6 | ##########
7 | # Setup #
8 | ##########
9 |
10 | contributor_username = whoami()["name"]
11 |
12 | # only show an info the first time uploading to the hub
13 | show_info = True
14 |
15 | every = 1 # we push once every 1 minute (use 5 if there are lots of people using the same HF token)
16 |
17 | choices = ["sharegpt","standard"]
18 |
19 | # schedulers
20 | schedulers = {
21 | "sft-sharegpt": ParquetScheduler(repo_id=f"{contributor_username}/sft-sharegpt", every=every),
22 | "sft-standard": ParquetScheduler(repo_id=f"{contributor_username}/sft-standard", every=every),
23 | "dpo-sharegpt": ParquetScheduler(repo_id=f"{contributor_username}/dpo-sharegpt", every=every),
24 | "dpo-standard": ParquetScheduler(repo_id=f"{contributor_username}/dpo-standard", every=every),
25 | }
26 |
27 |
28 | ##########
29 | # Utils #
30 | ##########
31 |
32 |
33 | def chat_message(role, content, prompt_type=None):
34 | """
35 | A function that transforms the chat content into a chat message
36 | Args:
37 | role: A string, either "user" or "assistant"
38 | content: A string, the content of the message
39 | prompt_type: A string, either "standard" or "sharegpt"
40 | Returns:
41 | A dictionary, the message to be sent to the chatbot.
42 | """
43 | if prompt_type == "sharegpt":
44 | if role == "user":
45 | role = "human"
46 | elif role == "assistant":
47 | role = "gpt"
48 | # sharegpt chat format
49 | return {"from": role, "value": content}
50 | else:
51 | return {"role": role, "content": content}
52 |
53 |
54 | def chat(prompt: str, history=[]):
55 | """
56 | A function that generates a response to a given prompt.
57 | Args:
58 | prompt: A string, the prompt to be sent to the chatbot.
59 | history: A list of dictionaries, each dictionary being a message from the user or the assistant.
60 | Returns:
61 | A generator in the form of a single updated list of dictionaries, being a list of messages from the user and assistant
62 | """
63 | if history == [] or (len(history) > 1 and history[-1]["role"] == "assistant"):
64 | history.append(chat_message("user", prompt))
65 | else:
66 | history.append(chat_message("assistant", prompt))
67 | return history
68 |
69 |
70 | def clear_textbox_field():
71 | """
72 | A function that clears the textbox field.
73 | """
74 | return None
75 |
76 |
77 | def clear_both_fields():
78 | """
79 | A function that clears both the textbox and the chatbot.
80 | """
81 | return None, None
82 |
83 |
84 | def clear_3_fields():
85 | """
86 | A function that clears both the textbox and the chatbot.
87 | """
88 | return None, None, None
89 |
90 |
91 | def setup_submission(system_prompt="", history=[], chat_format="sharegpt"):
92 | # removes the extra metadata field from the chat history and format sharegpt accordingly
93 | for i in range(len(history)):
94 | sample = history[i]
95 | history[i] = chat_message(
96 | sample["role"], sample["content"], prompt_type=chat_format
97 | )
98 |
99 | # add system prompt if provided
100 | system_prompt = system_prompt.strip()
101 | if system_prompt != "":
102 | sys = chat_message("system", system_prompt, prompt_type=chat_format)
103 | history.insert(0, sys)
104 |
105 | return history
106 |
107 |
108 | def save_sft_data(system_prompt="", history=[], sft_chat_format="sharegpt"):
109 | """
110 | A function that pushes the data to the hub.
111 | """
112 |
113 | # setup the info message to only show once
114 | global show_info
115 | scheduler = schedulers[f"sft-{sft_chat_format}"]
116 |
117 | # case user clicked submit and did not have any chat history
118 | if history == []:
119 | raise gr.Error("you need to setup a chat first")
120 |
121 | # case history ends with user prompt
122 | if history[-1]["role"] == "user":
123 | raise gr.Error("history needs to end with assistant prompt")
124 |
125 | history = setup_submission(system_prompt, history, sft_chat_format)
126 | # preparing the submission
127 | data = {"contributor": contributor_username}
128 | data["timestamp"] = str(datetime.datetime.now(datetime.UTC))
129 | data["chat_format"] = sft_chat_format
130 | data["conversations"] = history
131 |
132 | # submitting the data
133 | scheduler.append(data)
134 |
135 | # show the info message only once
136 | if show_info:
137 | gr.Info("Data has been saved successfully (this message is only shown once)")
138 | gr.Info(
139 | "The scheduler may take up to 1 minute to push the data, please wait 🤗"
140 | )
141 | show_info = False
142 |
143 |
144 | def save_dpo_data(
145 | system_prompt="", history=[], chosen="", rejected="", dpo_chat_format="sharegpt"
146 | ):
147 | """
148 | A function that pushes the data to the hub.
149 | """
150 |
151 | # setup the info message to only show once
152 | global show_info
153 | scheduler = schedulers[f"dpo-{dpo_chat_format}"]
154 |
155 | # case user clicked submit and did not have any chat history
156 | if history == []:
157 | raise gr.Error("you need to setup a chat first")
158 |
159 | # case history ends with user prompt
160 | if history[-1]["role"] == "assistant":
161 | raise gr.Error("history needs to end with user prompt")
162 |
163 | # case chosen and rejected are not full
164 | chosen, rejected = chosen.strip(), rejected.strip()
165 | if chosen == "" or rejected == "":
166 | raise gr.Error(
167 | "both chosen and rejected need to have a text when you click the submit button"
168 | )
169 |
170 | history = setup_submission(system_prompt, history, dpo_chat_format)
171 | chosen_chat, rejected_chat = history.copy(), history.copy()
172 | chosen_chat.append(chat_message("user", chosen, dpo_chat_format))
173 | rejected_chat.append(chat_message("user", rejected, dpo_chat_format))
174 |
175 | # preparing the submission
176 | data = {"contributor": contributor_username}
177 |
178 | data["timestamp"] = str(datetime.datetime.now(datetime.UTC))
179 | data["chat_format"] = dpo_chat_format
180 | data["prompt"] = history
181 | data["chosen"] = chosen_chat
182 | data["rejected"] = rejected_chat
183 |
184 | # submitting the data
185 | scheduler.append(data)
186 |
187 | # show the info message only once
188 | if show_info:
189 | gr.Info("Data has been saved successfully (this message is only shown once)")
190 | gr.Info(
191 | "The scheduler may take up to 1 minute to push the data, please wait 🤗"
192 | )
193 | show_info = False
194 |
195 |
196 | def undo_chat(history):
197 | return history[:-2]
198 |
199 |
200 | ##############
201 | # Interface #
202 | ##############
203 |
204 | with gr.Blocks() as demo:
205 | gr.Markdown("
ShareGPT-Builder
")
206 |
207 | #### SFT ####
208 | with gr.Tab("SFT"):
209 | with gr.Accordion("system prompt", open=False):
210 | system_prompt = gr.TextArea(show_label=False, container=False)
211 | sft_chat_format = gr.Radio(choices=choices, value="sharegpt")
212 |
213 | chatbot = gr.Chatbot(
214 | type="messages", show_copy_button=True, show_copy_all_button=True
215 | )
216 | textbox = gr.Textbox(show_label=False, submit_btn=True)
217 | textbox.submit(
218 | fn=chat, inputs=[textbox, chatbot], outputs=[chatbot]
219 | ).then( # empty field for convinience
220 | clear_textbox_field, outputs=[textbox]
221 | )
222 | chatbot.undo(undo_chat, inputs=chatbot, outputs=chatbot)
223 | with gr.Row():
224 | clear_button = gr.Button("Clear")
225 | clear_button.click(clear_both_fields, outputs=[textbox, chatbot])
226 | submit = gr.Button("save chat", variant="primary")
227 | submit.click(
228 | save_sft_data, inputs=[system_prompt, chatbot, sft_chat_format]
229 | ).then(clear_both_fields, outputs=[textbox, chatbot])
230 |
231 | #### DPO ####
232 | with gr.Tab("DPO"):
233 | with gr.Accordion("system prompt", open=False):
234 | dpo_system_prompt = gr.TextArea(show_label=False, container=False)
235 | dpo_chat_format = gr.Radio(choices=choices, value="sharegpt")
236 | dpo_chatbot = gr.Chatbot(
237 | type="messages", show_copy_button=True, show_copy_all_button=True
238 | )
239 | gr.Markdown(
240 | "type in either of these fields and press enter, when you are ready for the final submission fill both fields, don't press enter and click on the save chat button"
241 | )
242 | with gr.Row():
243 | dpo_rejected_textbox = gr.Textbox(label="rejected (or add chat)", render=True)
244 | dpo_chosen_textbox = gr.Textbox(label="chosen (or add chat)")
245 | # submit using either of these fields
246 | dpo_chosen_textbox.submit(
247 | fn=chat, inputs=[dpo_chosen_textbox, dpo_chatbot], outputs=[dpo_chatbot]
248 | ).then( # empty field for convinience
249 | clear_textbox_field, outputs=[dpo_chosen_textbox]
250 | )
251 | dpo_rejected_textbox.submit(
252 | fn=chat,
253 | inputs=[dpo_rejected_textbox, dpo_chatbot],
254 | outputs=[dpo_chatbot],
255 | ).then( # empty field for convinience
256 | clear_textbox_field, outputs=[dpo_rejected_textbox]
257 | )
258 | dpo_chatbot.undo(undo_chat, inputs=dpo_chatbot, outputs=dpo_chatbot)
259 | with gr.Row():
260 | dpo_clear_button = gr.Button("Clear")
261 | dpo_clear_button.click(
262 | clear_3_fields,
263 | outputs=[dpo_chosen_textbox, dpo_rejected_textbox, dpo_chatbot],
264 | )
265 | dpo_submit = gr.Button("save chat", variant="primary")
266 | dpo_submit.click(
267 | save_dpo_data,
268 | inputs=[
269 | dpo_system_prompt,
270 | dpo_chatbot,
271 | dpo_chosen_textbox,
272 | dpo_rejected_textbox,
273 | dpo_chat_format,
274 | ],
275 | ).then(
276 | clear_3_fields,
277 | outputs=[dpo_chosen_textbox, dpo_rejected_textbox, dpo_chatbot],
278 | )
279 | with gr.Tab("Inspect datasets"):
280 | dataset = gr.Dropdown(choices=list(schedulers.keys()))
281 | @gr.render(inputs=dataset)
282 | def show_dataset(dataset) :
283 | gr.HTML(f""" """)
289 |
290 | if __name__ == "__main__":
291 | demo.launch(debug=True, show_error=True)
292 |
--------------------------------------------------------------------------------
/dataset_uploader.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 | import tempfile
4 | import uuid
5 | from pathlib import Path
6 | from typing import Any, Dict, List, Optional, Union
7 |
8 | import pyarrow as pa
9 | import pyarrow.parquet as pq
10 | from huggingface_hub import CommitScheduler
11 | from huggingface_hub.hf_api import HfApi
12 |
13 | ###################################
14 | # Parquet scheduler #
15 | # Uploads data in parquet format #
16 | ###################################
17 |
18 |
19 | class ParquetScheduler(CommitScheduler):
20 | """
21 | Usage: configure the scheduler with a repo id. Once started, you can add data to be uploaded to the Hub. 1 `.append`
22 | call will result in 1 row in your final dataset.
23 |
24 | ```py
25 | # Start scheduler
26 | >>> scheduler = ParquetScheduler(repo_id="my-parquet-dataset")
27 |
28 | # Append some data to be uploaded
29 | >>> scheduler.append({...})
30 | >>> scheduler.append({...})
31 | >>> scheduler.append({...})
32 | ```
33 |
34 | The scheduler will automatically infer the schema from the data it pushes.
35 | Optionally, you can manually set the schema yourself:
36 |
37 | ```py
38 | >>> scheduler = ParquetScheduler(
39 | ... repo_id="my-parquet-dataset",
40 | ... schema={
41 | ... "prompt": {"_type": "Value", "dtype": "string"},
42 | ... "negative_prompt": {"_type": "Value", "dtype": "string"},
43 | ... "guidance_scale": {"_type": "Value", "dtype": "int64"},
44 | ... "image": {"_type": "Image"},
45 | ... },
46 | ... )
47 |
48 | See https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Value for the list of
49 | possible values.
50 | """
51 |
52 | def __init__(
53 | self,
54 | *,
55 | repo_id: str,
56 | schema: Optional[Dict[str, Dict[str, str]]] = None,
57 | every: Union[int, float] = 5,
58 | path_in_repo: Optional[str] = "data",
59 | repo_type: Optional[str] = "dataset",
60 | revision: Optional[str] = None,
61 | private: bool = False,
62 | token: Optional[str] = None,
63 | allow_patterns: Union[List[str], str, None] = None,
64 | ignore_patterns: Union[List[str], str, None] = None,
65 | hf_api: Optional[HfApi] = None,
66 | ) -> None:
67 | super().__init__(
68 | repo_id=repo_id,
69 | folder_path="dummy", # not used by the scheduler
70 | every=every,
71 | path_in_repo=path_in_repo,
72 | repo_type=repo_type,
73 | revision=revision,
74 | private=private,
75 | token=token,
76 | allow_patterns=allow_patterns,
77 | ignore_patterns=ignore_patterns,
78 | hf_api=hf_api,
79 | )
80 |
81 | self._rows: List[Dict[str, Any]] = []
82 | self._schema = schema
83 |
84 | def append(self, row: Dict[str, Any]) -> None:
85 | """Add a new item to be uploaded."""
86 | with self.lock:
87 | self._rows.append(row)
88 |
89 | def push_to_hub(self):
90 | # Check for new rows to push
91 | with self.lock:
92 | rows = self._rows
93 | self._rows = []
94 | if not rows:
95 | return
96 | print(f"Got {len(rows)} item(s) to commit.")
97 |
98 | # Load images + create 'features' config for datasets library
99 | schema: Dict[str, Dict] = self._schema or {}
100 | path_to_cleanup: List[Path] = []
101 | for row in rows:
102 | for key, value in row.items():
103 | # Infer schema (for `datasets` library)
104 | if key not in schema:
105 | schema[key] = _infer_schema(key, value)
106 |
107 | # Load binary files if necessary
108 | if schema[key]["_type"] in ("Image", "Audio"):
109 | # It's an image or audio: we load the bytes and remember to cleanup the file
110 | file_path = Path(value)
111 | if file_path.is_file():
112 | row[key] = {
113 | "path": file_path.name,
114 | "bytes": file_path.read_bytes(),
115 | }
116 | path_to_cleanup.append(file_path)
117 |
118 | # Complete rows if needed
119 | for row in rows:
120 | for feature in schema:
121 | if feature not in row:
122 | row[feature] = None
123 |
124 | # Export items to Arrow format
125 | table = pa.Table.from_pylist(rows)
126 |
127 | # Add metadata (used by datasets library)
128 | table = table.replace_schema_metadata(
129 | {"huggingface": json.dumps({"info": {"features": schema}})}
130 | )
131 |
132 | # Write to parquet file
133 | archive_file = tempfile.NamedTemporaryFile(delete=False)
134 | pq.write_table(table, archive_file.name)
135 | archive_file.close()
136 |
137 | # Upload
138 | self.api.upload_file(
139 | repo_id=self.repo_id,
140 | repo_type=self.repo_type,
141 | revision=self.revision,
142 | path_in_repo=f"{uuid.uuid4()}.parquet",
143 | path_or_fileobj=archive_file.name,
144 | )
145 | print("Commit completed.")
146 |
147 | # Cleanup
148 | os.unlink(archive_file.name)
149 | for path in path_to_cleanup:
150 | path.unlink(missing_ok=True)
151 |
152 |
153 | def _infer_schema(key: str, value: Any) -> Dict[str, str]:
154 | """
155 | Infer schema for the `datasets` library.
156 |
157 | See https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Value.
158 | """
159 | # In short any column_name in the dataset has any of these keywords
160 | # the column will be inferred into the correct column type accordingly
161 | if "image" in key:
162 | return {"_type": "Image"}
163 | if "audio" in key:
164 | return {"_type": "Audio"}
165 | if isinstance(value, int):
166 | return {"_type": "Value", "dtype": "int64"}
167 | if isinstance(value, float):
168 | return {"_type": "Value", "dtype": "float64"}
169 | if isinstance(value, bool):
170 | return {"_type": "Value", "dtype": "bool"}
171 | if isinstance(value, bytes):
172 | return {"_type": "Value", "dtype": "binary"}
173 | # Otherwise in last resort => convert it to a string
174 | return {"_type": "Value", "dtype": "string"}
175 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.26.4
2 | huggingface_hub==0.26.2
3 | pandas==2.2.3
4 | pyarrow==18.1.0
5 | gradio==5.3.0
6 |
--------------------------------------------------------------------------------
/templates/dpo_form.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | DPO Builder
6 |
7 |
8 |
9 |