├── .gitignore
├── README.md
├── dataset_sheet.pdf
├── plot2code
    ├── __init__.py
    ├── eval
    │   ├── __int__.py
    │   ├── combine_evaluation_results.py
    │   ├── gpt4v_evaluate_pairs.py
    │   ├── gpt4v_evaluations_score.py
    │   └── text_match_score.py
    ├── execute_generated_code.py
    ├── gpt4v_generate_code.py
    ├── llm_generate_code.py
    └── utils.py
├── requirements.txt
└── scripts
    ├── evaluate-instruct.sh
    ├── evaluate.sh
    └── generate_code.sh


/.gitignore:
--------------------------------------------------------------------------------
1 | data/
2 | generated_results/
3 | evaluation_results


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Plot2Code Benchmark
  2 | 
  3 | Plot2Code benchmark is now open-sourced at [huggingface (ARC Lab)](https://huggingface.co/TencentARC) and [GitHub](https://github.com/TencentARC/Plot2Code). More information can be found in our [paper](https://arxiv.org/abs/2405.07990). 
  4 | 
  5 | 
  6 | This repository contains the code for an evaluation pipeline that generates Python code from reference plots, executes the generated code to draw plots, and then calculates various evaluation metrics to assess the quality of the generated code.
  7 | 
  8 | ## Why we need [Plot2Code](https://huggingface.co/datasets/TencentARC/Plot2Code)?
  9 | * 🧐 While MLLMs have demonstrated potential in visual contexts, their capabilities in visual coding tasks have not been thoroughly evaluated. Plot2Code offers a platform for comprehensive assessment of these models.
 10 | 
 11 | * 🤗 To enable individuals to ascertain the proficiency of AI assistants in generating code that renders into plots given reference plots, we initiated the Plot2Code project. This ensures evaluations are pertinent to real-world applications.
 12 | 
 13 | * 💻 Plot2Code accommodates all modalities (text and images) for both input and output, facilitating an exploration of the influence of each modality.
 14 | 
 15 | ## Supported Tasks
 16 | 
 17 | Plot2Code is primarily designed as a benchmark for code generation from scientific plots. Specifically, it supports the following settings:
 18 | 
 19 | * Text2Image: We provide instructions to the assistant, requesting it to generate pyplot code and subsequently render the plots.
 20 | * Image2Image: Referred to as the Direct Asking setting in our paper, we input the reference plot directly and ask the assistant to generate pyplot code to render similar plots.
 21 | * I+T 2 Image: Combining both instructions and reference plots as input, this is called the Conditional Asking setting in our paper.
 22 | 
 23 | By employing these settings, we can investigate the impact of each input modality on the quality of the final rendered plots.
 24 | 
 25 | ## Requirements
 26 | 
 27 | - NumPy
 28 | - Matplotlib==3.8.4
 29 | - Pillow
 30 | - Levenshtein
 31 | - openai>1.12.0
 32 | 
 33 | You can install the required packages using the following command:
 34 | 
 35 | ```bash
 36 | pip install -r requirements.txt
 37 | ```
 38 | 
 39 | ## How to Download
 40 | You can use following codes to download the dataset：
 41 | ```shell
 42 | git lfs install
 43 | mkdir data
 44 | cd data
 45 | git clone https://huggingface.co/datasets/TencentARC/Plot2Code
 46 | ```
 47 | 
 48 | ## Usage
 49 | 
 50 | 1. Generate code from reference plots. Add --instruction for the conditional setting.
 51 | ``` bash
 52 | export OPENAI_API_KEY=[API_KEY]
 53 | export OPENAI_API_BASE=[API_BASE]
 54 | 
 55 | # GPT-4V generate code (direct asking)
 56 | python -m plot2code.gpt4v_generate_code --prompt_strategy default
 57 | 
 58 | # GPT-4V generate code (conditional asking)
 59 | python -m plot2code.gpt4v_generate_code --prompt_strategy default --instruct
 60 | 
 61 | # GPT-4V generate code (conditional asking with CoT)
 62 | python -m plot2code.gpt4v_generate_code --prompt_strategy CoT --instruct
 63 | ```
 64 | 2. Execute the generated code to render the plots.
 65 | ``` bash
 66 | python -m plot2code.execute_generated_code --model_name "$model_name" --prompt_strategy $prompt_strategy
 67 | 
 68 | ```
 69 | 3. Evaluate the similarity between the generated plots and the grond truth plots.
 70 | 
 71 | ``` bash
 72 | echo "Calculating text match score..."
 73 | python -m plot2code.eval.text_match_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy
 74 | 
 75 | echo "Calculating gpt-4v evaluation score..."
 76 | python -m plot2code.eval.gpt4v_evaluations_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy
 77 | 
 78 | echo "Combining evaluation results..."
 79 | python -m plot2code.eval.combine_evaluation_results  --model_name "$model_name"  --prompt_strategy $prompt_strategy
 80 | ```
 81 | 
 82 | See [scripts](scripts) for more details.
 83 | 
 84 | # News
 85 | * 🔥[2024/08] We futther update the Python and R's plotly plot-code pairs with instruction for evaluation!🔥
 86 | * 🔥[2024/05] We open source the [Plot2Code benchmark](https://huggingface.co/datasets/TencentARC/Plot2Code).
 87 | Stay tuned for this project! 😆
 88 | 
 89 | # License
 90 | 
 91 | In this study, we crawled every website link listed in the Matplotlib gallery and Plotly documentation to collect data for our analysis. Both Matplotlib and Plotly libraries are distributed under permissive open-source licenses. We have taken the following steps to ensure compliance with the respective license terms:
 92 | 
 93 | 1. Acknowledgment of Licenses: We acknowledge that the Matplotlib library and its gallery are distributed under the BSD 3-Clause License, and the Plotly library and its documentation are distributed under the MIT License.
 94 | 2. Retention of Copyright Notices: We have retained all copyright notices and license information from the original Matplotlib gallery content and Plotly documentation, as required by their respective licenses.
 95 | 3. Usage and Distribution: Our use of the Matplotlib gallery and Plotly documentation content is solely for academic and research purposes. We have not modified the original content from the Matplotlib gallery or Plotly documentation, and any distribution of our work will include proper attribution to the Matplotlib and Plotly projects.
 96 | 
 97 | By adhering to these guidelines, we ensure that our use of the Matplotlib and Plotly content is fully compliant with their respective licenses.
 98 | 
 99 | This project is open-sourced under the [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). These evaluation code and datasets are fully open for academic research and can be used for commercial purposes with official written permission. Check our [dataset sheet](dataset_sheet.pdf) for more information.
100 | 
101 | # Citation
102 | The code and model in this repository is mostly developed for or derived from the paper below. Please cite it if you find the repository helpful.
103 | ```
104 | @misc{wu2024plot2code,
105 |       title={Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots}, 
106 |       author={Chengyue Wu and Yixiao Ge and Qiushan Guo and Jiahao Wang and Zhixuan Liang and Zeyu Lu and Ying Shan and Ping Luo},
107 |       year={2024},
108 |       eprint={2405.07990},
109 |       archivePrefix={arXiv},
110 |       primaryClass={cs.CL}
111 | }
112 | ```
113 | 


--------------------------------------------------------------------------------
/dataset_sheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentARC/Plot2Code/12bd95a7e04dfcc2eb664cbeb739d924c090c503/dataset_sheet.pdf


--------------------------------------------------------------------------------
/plot2code/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentARC/Plot2Code/12bd95a7e04dfcc2eb664cbeb739d924c090c503/plot2code/__init__.py


--------------------------------------------------------------------------------
/plot2code/eval/__int__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TencentARC/Plot2Code/12bd95a7e04dfcc2eb664cbeb739d924c090c503/plot2code/eval/__int__.py


--------------------------------------------------------------------------------
/plot2code/eval/combine_evaluation_results.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import numpy as np
 4 | from ..utils import get_parser, get_eval_path, get_save_path
 5 | 
 6 | parser = get_parser()
 7 | args = parser.parse_args()
 8 | # Function to read the JSONL file and load the content into a list
 9 | def read_jsonl_file(jsonl_file):
10 |     content_list = []
11 |     with open(jsonl_file, "r") as file:
12 |         for line in file:
13 |             content_list.append(json.loads(line))
14 |     return content_list
15 | 
16 | eval_dir = get_eval_path(args)
17 | generated_code_file = get_save_path(args)
18 | 
19 | text_match_score_results_file = os.path.join(eval_dir, args.text_match_score_results)
20 | gpt4v_evaluation_results_file = os.path.join(eval_dir, args.gpt4_vision_evaluation_results)
21 | final_score_file = os.path.join(eval_dir, args.final_score_results)
22 | 
23 | # Read the JSONL files
24 | generated_code_results = read_jsonl_file(generated_code_file)
25 | text_match_score_results = read_jsonl_file(text_match_score_results_file)
26 | evaluations_results = read_jsonl_file(gpt4v_evaluation_results_file)
27 | # Initialize the list to store the final results
28 | final_results = []
29 | 
30 | # Iterate over the items in the lists
31 | for text_match, evaluation in zip(text_match_score_results, evaluations_results):
32 |     # Create a new dictionary to store the final result for the current item
33 |     final_result = {}
34 | 
35 |     # Add the evaluation results to the final result
36 |     final_result.update(text_match)
37 |     final_result.update(evaluation)
38 | 
39 |     # Append the final result to the list of final results
40 |     final_results.append(final_result)
41 | 
42 | # Write the final results to a new JSONL file
43 | with open(final_score_file, "w") as file:
44 |     for final_result in final_results:
45 |         file.write(json.dumps(final_result) + "\n")
46 | 
47 |     # Calculate the average of each evaluation metric
48 |     average_text_match_score = np.mean([result['text_match_score'] for result in final_results])
49 |     average_evaluation_score = np.mean([result['rating'] for result in final_results if result['rating'] is not None])
50 | 
51 |     file.write(f'Code pass rate: {(len(final_results)) / len(generated_code_results)}\n')
52 |     file.write(f"Average text match score: {average_text_match_score}\n")
53 |     file.write(f"Average gpt-4v evaluation score: {average_evaluation_score}\n")
54 | 
55 | print(f'Code pass rate: {(len(final_results)) / len(generated_code_results)}\n')
56 | print(f"Average text match score: {average_text_match_score}\n")
57 | print(f"Average gpt-4v evaluation score: {average_evaluation_score}\n")


--------------------------------------------------------------------------------
/plot2code/eval/gpt4v_evaluate_pairs.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import json
  3 | from openai import OpenAI
  4 | from tqdm import tqdm
  5 | from PIL import Image
  6 | import numpy as np
  7 | from ..utils import get_parser, get_save_path, get_eval_path, get_api_response, encode_image, read_jsonl_file
  8 | import time
  9 | 
 10 | parser = get_parser()
 11 | parser.add_argument("--test_model_name", type=str, default="test_model")
 12 | parser.add_argument("--test_prompt_strategy", type=str, default=None)
 13 | 
 14 | args = parser.parse_args()
 15 | 
 16 | client = OpenAI(
 17 |     api_key=os.getenv("OPENAI_API_KEY"), 
 18 |     base_url=os.getenv("OPENAI_API_BASE"), 
 19 | )
 20 | 
 21 | 
 22 | # Read the JSONL file
 23 | generated_code_file =  get_save_path(args)
 24 | content_list = read_jsonl_file(generated_code_file)
 25 | 
 26 | 
 27 | if args.test_prompt_strategy is not None:
 28 |     test_file = generated_code_file.replace(args.prompt_strategy, args.test_prompt_strategy)
 29 | else:
 30 |     test_file = generated_code_file.replace(args.model_name, args.test_model_name)
 31 | 
 32 | compared_content_list = read_jsonl_file(test_file)
 33 | 
 34 | def extract_non_empty_img(content_list):
 35 |     non_empty_img = {}
 36 |     for item in content_list:
 37 |         test_image_path = item['generated_image_path']
 38 |         img = Image.open(test_image_path)
 39 |         img_np = np.array(img)
 40 | 
 41 |         if np.all(img_np == 255):
 42 |             continue  # Skip this iteration
 43 |         
 44 |         idx = int(item['generated_image_path'].rstrip('.png').split('test_image_')[-1])
 45 |         non_empty_img[idx] = test_image_path
 46 | 
 47 |     return non_empty_img
 48 | 
 49 | def extract_ground_truth_img(content_list):
 50 |     ground_truth_img = {}
 51 |     for item in content_list:
 52 |         ground_truth_path = item['ground_truth_path']
 53 |         
 54 |         idx = int(item['ground_truth_path'].rstrip('.png').split('ground_truth_image_')[-1])
 55 |         ground_truth_img[idx] = ground_truth_path
 56 | 
 57 |     return ground_truth_img
 58 | 
 59 | 
 60 | compare_prompt = "Please act as an impartial judge and evaluate the quality of the generated images provided by two AI assistants given the ground truth image displayed below. " + \
 61 | "You should choose the assistant that generate the more similar image. Your evaluation should consider factors such as the overall appearance, colors, shapes, positions, and other visual elements of the images." 
 62 | 
 63 | output_prompt = "Begin your evaluation by comparing the two responses and provide a short explanation. Avoid any biases and ensure that the order in which the responses were presented does not influence your decision. " + \
 64 |     "Be as objective as possible. After providing your explanation, output your final verdict by strictly following this format: \"[[A]]\" if assistant A is better, \"[[B]]\" if assistant B is better, and \"[[C]]\" for a tie."
 65 | # Function to evaluate the similarity between two images
 66 | def compare_image(gt_path, image_path1, image_path2):
 67 |     gt_image = encode_image(gt_path)
 68 |     base64_image1 = encode_image(image_path1)
 69 |     base64_image2 = encode_image(image_path2)
 70 | 
 71 |     messages=[
 72 |             {
 73 |                 "role": "system",
 74 |                 "content": [
 75 |                     {
 76 |                         "type": "text",
 77 |                         "text": "You are a helpful assistant."
 78 |                     }, 
 79 |                 ]
 80 |             }, 
 81 |             {
 82 |                 "role": "user",
 83 |                 "content": [
 84 |                     {
 85 |                         "type": "text",
 86 |                         "text": compare_prompt
 87 |                     },
 88 |                     {
 89 |                         "type": "text",
 90 |                         "text": "Here is the ground truth image."
 91 |                     },
 92 |                     {
 93 |                         "type": "image_url",
 94 |                         "image_url": {
 95 |                             "url": f"data:image/png;base64,{gt_image}"
 96 |                         },
 97 |                     },
 98 |                     {
 99 |                         "type": "text",
100 |                         "text": "Here is the image generated by the assistant A."
101 |                     },
102 |                     {
103 |                         "type": "image_url",
104 |                         "image_url": {
105 |                             "url": f"data:image/png;base64,{base64_image1}"
106 |                         },
107 |                     },
108 |                     {
109 |                         "type": "text",
110 |                         "text": "Here is the image generated by the assistant B."
111 |                     },
112 |                     {
113 |                         "type": "image_url",
114 |                         "image_url": {
115 |                             "url": f"data:image/png;base64,{base64_image2}"
116 |                         },
117 |                     },
118 |                     {
119 |                         "type": "text",
120 |                         "text": output_prompt
121 |                     },
122 |                 ],
123 |             }
124 |         ]
125 |     response = get_api_response(client, messages, args, model_name='gpt-4-vision-preview')
126 |     judgement = response.choices[0].message.content.strip()
127 |     if "[[A]]" in judgement:
128 |         winner = "A"
129 |     elif "[[B]]" in judgement:
130 |         winner = "B"
131 |     elif "[[C]]" in judgement:
132 |         winner = "tie"
133 |     else:
134 |         winner = "error"
135 |     return winner, judgement
136 | 
137 | 
138 | 
139 | img_dict_1 = extract_non_empty_img(content_list)
140 | img_dict_2 = extract_non_empty_img(compared_content_list)
141 | gt_dict = extract_ground_truth_img(content_list)
142 | 
143 | img_idx_set1 = set(img_dict_1.keys())
144 | img_idx_set2 = set(img_dict_2.keys())
145 | 
146 | common_img_idx = img_idx_set1.intersection(img_idx_set2)
147 | 
148 | eval_dir = get_eval_path(args)
149 | 
150 | if args.test_prompt_strategy is not None:
151 |     args.test_model_name = args.test_model_name + "_" + args.test_prompt_strategy
152 |     
153 | pair_compared_result_file = os.path.join(eval_dir, args.test_model_name + '_compared_results.jsonl')
154 | 
155 | previous_results = None
156 | evaluated_idx = None
157 | 
158 | if os.path.exists(pair_compared_result_file) and os.path.getsize(pair_compared_result_file) > 0:
159 |     previous_results = read_jsonl_file(pair_compared_result_file)
160 |     evaluated_idx = [item['question_id'] for item in previous_results]
161 |     
162 | with open(pair_compared_result_file, "w") as results_file:
163 | 
164 |     if previous_results is not None:
165 |         for result in previous_results:
166 |             results_file.write(json.dumps(result) + "\n")
167 |             results_file.flush()
168 | 
169 |     for idx in tqdm(common_img_idx):
170 | 
171 |         if evaluated_idx is not None and idx in evaluated_idx:
172 |             continue
173 | 
174 |         image_path1 = img_dict_1[idx]
175 |         image_path2 = img_dict_2[idx]
176 |         gt_path = gt_dict[idx]
177 | 
178 |         round1_winner, round1_judgement = compare_image(gt_path, image_path1, image_path2)
179 |         round2_winner, round2_judgement = compare_image(gt_path, image_path2, image_path1)
180 | 
181 |         round1_map = {"A": args.model_name, "B": args.test_model_name}
182 |         round2_map = {"A": args.test_model_name, "B": args.model_name}
183 |         round1_winner = round1_map.get(round1_winner, round1_winner)
184 |         round2_winner = round2_map.get(round2_winner, round2_winner)
185 | 
186 |         result = {
187 |             "question_id": idx,
188 |             "model_1": args.model_name,
189 |             "model_2": args.test_model_name,
190 |             "round1_winner": round1_winner,
191 |             "round2_winner": round2_winner,
192 |             "round1_judgement": round1_judgement,
193 |             "round2_judgement": round2_judgement,
194 |             "tstamp": time.time(),
195 |         }
196 | 
197 |         results_file.write(json.dumps(result) + "\n")
198 |         results_file.flush()
199 | 
200 | results = read_jsonl_file(pair_compared_result_file)
201 | print(f"Total number of compared pairs: {len(results)}")
202 | 
203 | win_cnt = 0
204 | loss_cnt = 0
205 | tie_cnt = 0
206 | 
207 | for item in results:
208 |     if item["round1_winner"] == "tie" or item["round2_winner"] == "tie" or item["round1_winner"] != item["round2_winner"]:
209 |         tie_cnt += 1
210 |     elif item["round1_winner"] == args.model_name:
211 |         win_cnt += 1
212 |     else:
213 |         loss_cnt += 1
214 |     
215 | print(f"Win: {win_cnt}, Loss: {loss_cnt}, Tie: {tie_cnt}")
216 | print(f"Win Ratio: {win_cnt / len(results)}")
217 | print(f"Tie Ratio: {tie_cnt / len(results)}")
218 | print(f"Loss Ratio: {loss_cnt / len(results)}")
219 | 


--------------------------------------------------------------------------------
/plot2code/eval/gpt4v_evaluations_score.py:
--------------------------------------------------------------------------------
  1 | import base64
  2 | import os
  3 | import openai
  4 | import json
  5 | import re
  6 | from openai import OpenAI
  7 | from tqdm import tqdm
  8 | from PIL import Image
  9 | import numpy as np
 10 | from ..utils import get_parser, get_save_path, get_eval_path, get_api_response, read_jsonl_file, encode_image
 11 | 
 12 | parser = get_parser()
 13 | args = parser.parse_args()
 14 | 
 15 | client = OpenAI(
 16 |     api_key=os.getenv("OPENAI_API_KEY"), 
 17 |     base_url=os.getenv("OPENAI_API_BASE"), 
 18 | )
 19 | 
 20 | 
 21 | # Function to evaluate the similarity between two images
 22 | def evaluate_image_similarity(image_path1, image_path2):
 23 |     base64_image1 = encode_image(image_path1)
 24 |     base64_image2 = encode_image(image_path2)
 25 | 
 26 |     messages=[
 27 |             {
 28 |                 "role": "system",
 29 |                 "content": [
 30 |                     {
 31 |                         "type": "text",
 32 |                         "text": "You are a helpful assistant."
 33 |                     }, 
 34 |                 ]
 35 |             }, 
 36 |             {
 37 |             "role": "user",
 38 |             "content": [
 39 |                 {
 40 |                 "type": "text",
 41 |                 "text": "Please evaluate the similarity between a reference image created using matplotlib and an image generated by code provided by an AI assistant. Consider factors such as the overall appearance, colors, shapes, positions, and other visual elements of the images. Begin your evaluation by providing a short explanation. Be as objective as possible. After providing your explanation After providing your explanation, you must rate the response on a scale of 1 to 10 by strictly following this format: \"[[rating]]\", for example: \"Rating: [[5]]\".",
 42 |                 },
 43 |                 {
 44 |                 "type": "image_url",
 45 |                 "image_url": {
 46 |                     "url": f"data:image/png;base64,{base64_image1}"
 47 |                 },
 48 |                 },
 49 |                 {
 50 |                 "type": "image_url",
 51 |                 "image_url": {
 52 |                     "url": f"data:image/png;base64,{base64_image2}"
 53 |                 },
 54 |                 },
 55 |             ],
 56 |             }
 57 |         ]
 58 |     response = get_api_response(client, messages, args, model_name='gpt-4-vision-preview')
 59 | 
 60 |     return response.choices[0].message.content.strip()
 61 | 
 62 | # Read the JSONL file
 63 | generated_code_file =  get_save_path(args)
 64 | 
 65 | evaluation_file = os.path.join(get_eval_path(args), args.gpt4_vision_evaluation_results)
 66 | 
 67 | items = read_jsonl_file(generated_code_file)
 68 | 
 69 | # Evaluate the similarity between ground truth images and test images
 70 | results = []
 71 | total_rating = 0
 72 | 
 73 | 
 74 | # check if the evaluation file is empty. If it is not empty, read the file and skip the evaluation for the images that have already been evaluated
 75 | if os.path.exists(evaluation_file) and os.path.getsize(evaluation_file) > 0:
 76 |     previous_results = read_jsonl_file(evaluation_file)
 77 |     evaluated_ground_truth_paths = [item['ground_truth_path'] for item in previous_results]
 78 |     total_rating = sum([item['rating'] for item in previous_results if item['rating'] is not None])
 79 |     
 80 | # Save the evaluation results to a new JSONL file
 81 | with open(evaluation_file, "w") as jsonl_file:
 82 |     
 83 |     if total_rating > 0:
 84 |         # Write the previous results to the evaluation file
 85 |         for result in previous_results:
 86 |             jsonl_file.write(json.dumps(result) + "\n")
 87 |             jsonl_file.flush()
 88 |     else:
 89 |         previous_results = []
 90 |         evaluated_ground_truth_paths = []
 91 |         
 92 |     # Wrap the loop with tqdm to show the progress bar
 93 |     for item in tqdm(items, desc="Evaluating image similarity"):
 94 |         ground_truth_path = item['ground_truth_path']
 95 |         
 96 |         if ground_truth_path in evaluated_ground_truth_paths:
 97 |             continue # Skip this iteration
 98 |         
 99 |         test_image_path = item['generated_image_path']
100 |         img = Image.open(test_image_path)
101 |         img_np = np.array(img)
102 |        
103 |         # Check if test_image is all white
104 |         if np.all(img_np == 255):
105 |             print(f"Skipping all white image: {test_image_path}")
106 |             continue  # Skip this iteration
107 |         
108 |         evaluation = evaluate_image_similarity(ground_truth_path, test_image_path)
109 | 
110 |         # Extract the rating and add it to the total_rating
111 |         rating_match = re.search(r'Rating: \[\[(\d+)\]\]', evaluation)
112 |         if rating_match:
113 |             rating = int(rating_match.group(1))
114 |             total_rating += rating
115 |         else:
116 |             rating = None
117 | 
118 |         # Update the results dictionary with the new key for the rating
119 |         result = {'ground_truth_path': ground_truth_path, 'test_image_path': test_image_path, 'evaluation': evaluation, 'rating': rating}
120 |         previous_results.append(result)
121 | 
122 |         jsonl_file.write(json.dumps(result) + "\n")
123 |         jsonl_file.flush()
124 | 
125 | # Calculate the average rating
126 | average_evaluation_score = np.mean([result['rating'] for result in previous_results if result['rating'] is not None])
127 | print(f"Average Rating: {average_evaluation_score:.2f}")


--------------------------------------------------------------------------------
/plot2code/eval/text_match_score.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import matplotlib.pyplot as plt
  3 | import matplotlib
  4 | import json
  5 | from tqdm import tqdm
  6 | from PIL import Image
  7 | import numpy as np
  8 | import Levenshtein
  9 | from ..utils import get_parser, get_save_path, get_eval_path, read_jsonl_file
 10 | from matplotlib.pyplot import *
 11 | 
 12 | parser = get_parser()
 13 | args = parser.parse_args()
 14 | 
 15 | # Read the JSONL file
 16 | generated_code_file =  get_save_path(args)
 17 | content_list = read_jsonl_file(generated_code_file)
 18 | 
 19 | 
 20 | def position_similarity(pos1, pos2, size_ratio):
 21 |     pos2_adjusted = pos2 * size_ratio
 22 | 
 23 |     position_difference = pos1 - pos2_adjusted
 24 | 
 25 |     # calculate the absolute distance
 26 |     absolute_distance = np.sqrt(np.sum(position_difference ** 2))
 27 | 
 28 |     # convert the absolute distance to similarity
 29 |     distance_similarity = np.exp(-absolute_distance / 100)
 30 | 
 31 |     return distance_similarity
 32 | 
 33 | 
 34 | def extract_texts(component):
 35 |     texts = []
 36 |     positions = []
 37 |     if isinstance(component, matplotlib.text.Text) and component.get_visible():
 38 |         text = component.get_text().strip().lower()
 39 |         position = component.get_position()
 40 |         if text:  # only extract non-empty text
 41 |             texts.append(text)
 42 |             positions.append(np.array(position))
 43 | 
 44 |     for child in component.get_children():
 45 |         child_texts, child_positions = extract_texts(child)
 46 |         texts.extend(child_texts)
 47 |         positions.extend(child_positions)
 48 | 
 49 |     return texts, positions
 50 | 
 51 | def match_texts(texts1, texts2, positions1, positions2, size_ratio):
 52 |     matched = 0
 53 |     unmatched1 = len(texts1)
 54 |     unmatched2 = len(texts2)
 55 | 
 56 |     for text1, pos1 in zip(texts1, positions1):
 57 |         min_distance = float('inf')
 58 |         best_match_index = None
 59 |         for i, (text2, pos2) in enumerate(zip(texts2, positions2)):
 60 |             try:
 61 |                 distance = Levenshtein.distance(text1, text2)
 62 |                 position_sim = position_similarity(pos1, pos2, size_ratio)
 63 |                 
 64 |                 if distance < min_distance and position_sim > 0.8:
 65 |                     min_distance = distance
 66 |                     best_match_index = i
 67 |             except:
 68 |                 pass
 69 | 
 70 |         if min_distance <= 0:  # the maximux edit distance allowed
 71 |             matched += 1
 72 |             texts2.pop(best_match_index)
 73 |             positions2.pop(best_match_index)
 74 |             unmatched2 -= 1
 75 |             unmatched1 -= 1
 76 | 
 77 |     total_pairs = matched + unmatched1 + unmatched2
 78 |     if total_pairs == 0:
 79 |         return 1
 80 |     match_score = matched / total_pairs
 81 |     return match_score
 82 | 
 83 | eval_dir = get_eval_path(args)
 84 | text_match_score_file = os.path.join(eval_dir, args.text_match_score_results)
 85 | 
 86 | # Open the JSONL file in write mode
 87 | with open(text_match_score_file, "w") as jsonl_file:
 88 |     for item in tqdm(content_list, desc="Evaluating text image similarity"):
 89 |         ground_truth_path = item['ground_truth_path']
 90 |         test_image_path = item['generated_image_path']
 91 |         code = item['code']
 92 |         ground_truth_code = item['ground_truth_code']
 93 |         # Execute the code and save the images
 94 |         
 95 |         img = Image.open(test_image_path)
 96 |         img_np = np.array(img)
 97 |         
 98 |         # Check if test_image is all white
 99 |         if np.all(img_np == 255):
100 |             print(f"Skipping all white image: {test_image_path}")
101 |             continue  # Skip this iteration
102 |         
103 |         exec(ground_truth_code)
104 |         fig1 = plt.gcf()
105 |         fig1.savefig('gt_img.png')
106 |         plt.close()
107 |         matplotlib.rcdefaults()
108 |         plt.cla()
109 |         plt.clf()
110 |         plt.close("all")
111 | 
112 |         exec(code)
113 |         fig2 = plt.gcf()
114 |         fig2.savefig('test_img.png')
115 |         plt.close()
116 |         matplotlib.rcdefaults()
117 |         plt.cla()
118 |         plt.clf()
119 |         plt.close("all")
120 | 
121 |         # Extract texts and positions from the figures
122 |         texts1, positions1 = extract_texts(fig1)
123 |         texts2, positions2 = extract_texts(fig2)
124 |         # Calculate the size ratio
125 |         fig1_size = np.array(fig1.get_size_inches())
126 |         fig2_size = np.array(fig2.get_size_inches())
127 |         size_ratio = fig1_size / fig2_size
128 | 
129 |         # Calculate the match score
130 |         match_score = match_texts(texts1, texts2, positions1, positions2, size_ratio)
131 | 
132 |         # Append the generated image path, ground truth code and match score to the JSONL file
133 |         jsonl_file.write(json.dumps({'ground_truth_path': ground_truth_path, 'test_image_path': test_image_path, 'text_match_score': match_score}) + "\n")
134 |         jsonl_file.flush()
135 | 


--------------------------------------------------------------------------------
/plot2code/execute_generated_code.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import matplotlib.pyplot as plt
 4 | from PIL import Image
 5 | import numpy as np
 6 | from matplotlib.pyplot import *
 7 | from .utils import get_parser, read_jsonl_file, get_save_path, get_img_path
 8 | from tqdm import tqdm
 9 | import matplotlib
10 | parser = get_parser()
11 | 
12 | args = parser.parse_args()
13 | 
14 | import multiprocessing
15 | 
16 | def execute_code(code, success_flag, image_path):
17 |     try:
18 |         exec(code)
19 |         fig = plt.gcf()
20 |         # Save the generated image with the same size as the ground truth image
21 |         fig.savefig(image_path)
22 |         success_flag.value = True
23 | 
24 |     except Exception as e:
25 |         # print(e)
26 |         pass
27 | 
28 | def execute_code_and_save_image(code, ground_truth_path, image_path):
29 |     # Get the ground truth image size
30 |     ground_truth_image = Image.open(ground_truth_path)
31 |     ground_truth_size = ground_truth_image.size
32 | 
33 |     success_flag = multiprocessing.Value("b", False)
34 | 
35 |     # create a process to execute the code
36 |     code_process = multiprocessing.Process(target=execute_code, args=(code, success_flag, image_path))
37 |     code_process.start()
38 | 
39 |     # set a timer of 30 seconds
40 |     code_process.join(30)
41 | 
42 |     # if the process is still running, terminate it
43 |     if code_process.is_alive():
44 |         code_process.terminate()
45 |         code_process.join()
46 | 
47 |     if success_flag.value:
48 |         generated_image = Image.open(image_path)
49 |     else:
50 |         # Create a white image and save it
51 |         generated_image = Image.fromarray(np.full((ground_truth_size), 255, dtype=np.uint8))
52 | 
53 |     # resized_image = generated_image.resize(ground_truth_size)s
54 |     generated_image.save(image_path)
55 | 
56 | def main():
57 |     # Read the ground truth code JSONL file
58 |     ground_truth_code_list = read_jsonl_file(args.ground_truth_code_file)
59 |     
60 |     # Read the generated code JSONL file
61 |     generated_code_file =  get_save_path(args)
62 |     code_list = read_jsonl_file(generated_code_file)
63 | 
64 |     # Create the test_images folder if it doesn't exist
65 |     test_images_folder = get_img_path(args)
66 |     data = []
67 |     for item in tqdm(code_list):
68 |         # Execute the code and save the images to the test_images folder
69 |         code = item['code']
70 |         ground_truth_path = item['ground_truth_path']
71 |         idx = int(ground_truth_path.rstrip('.png').split('ground_truth_image_')[-1])
72 |         image_path = os.path.join(test_images_folder, f"test_image_{idx}.png")
73 |         execute_code_and_save_image(code, ground_truth_path, image_path)
74 |         plt.close()
75 |         matplotlib.rcdefaults()
76 |         plt.cla()
77 |         plt.clf()
78 |         plt.close("all")
79 |         # Get the ground truth code for the current item
80 |         ground_truth_code = ground_truth_code_list[idx]['code']
81 | 
82 |         data.append({'ground_truth_code': ground_truth_code, 'code': code, 'ground_truth_path': ground_truth_path, 'generated_image_path': image_path})
83 | 
84 | 
85 |     # Open the JSONL file in write mode
86 |     with open(generated_code_file, "w") as jsonl_file:
87 |         # Append the generated image path to the JSONL file
88 |         for item in data:
89 |             jsonl_file.write(json.dumps(item) + "\n")
90 |             jsonl_file.flush()
91 |         
92 | if __name__ == '__main__':
93 |     main()


--------------------------------------------------------------------------------
/plot2code/gpt4v_generate_code.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from openai import OpenAI
  3 | import os
  4 | import json
  5 | from tqdm import tqdm
  6 | from .utils import get_parser, get_save_path, direct_prompt, read_jsonl_file, get_api_response, extract_code, CoT_prompt, encode_image
  7 | # Function to encode the image
  8 | 
  9 | parser = get_parser()
 10 | 
 11 | args = parser.parse_args()
 12 | # Directory containing your images
 13 | image_directory = args.image_directory
 14 | 
 15 | client = OpenAI(
 16 |     api_key=os.getenv("OPENAI_API_KEY"), 
 17 |     base_url=os.getenv("OPENAI_API_BASE"), 
 18 | )
 19 | 
 20 | # Get current execute path
 21 | current_path = sys.path[0]
 22 | 
 23 | save_path = get_save_path(args)
 24 | 
 25 | previous_results = None
 26 | previous_filename = []
 27 | 
 28 | # check if the save_path is empty. if it is not empty, load the already generated results
 29 | if os.path.exists(save_path) and os.path.getsize(save_path) > 0:
 30 |     previous_results = read_jsonl_file(save_path)
 31 |     previous_filename = [result['question_id'] for result in previous_results]
 32 | 
 33 | if args.instruct:
 34 |     instructions = read_jsonl_file('data/ground_truth_code_with_instruction.jsonl')
 35 |     
 36 | # Iterate through all the PNG files in the ground_truth folder
 37 | # Open the JSONL file
 38 | if __name__ == '__main__':
 39 |     
 40 |     with open(save_path, "w") as jsonl_file:
 41 | 
 42 |         # write the previous results to the file
 43 |         if previous_results is not None:
 44 |             for result in previous_results:
 45 |                 jsonl_file.write(json.dumps(result) + '\n')
 46 |                 jsonl_file.flush()
 47 |             
 48 |         for filename in tqdm(os.listdir(image_directory)):
 49 |             
 50 |             if filename in previous_filename:
 51 |                 continue 
 52 |             
 53 |             if filename.endswith(".png"):
 54 |                 image_path = os.path.join(current_path, image_directory, filename)
 55 | 
 56 |             # Getting the base64 string
 57 |             base64_image = encode_image(image_path)
 58 | 
 59 |             if args.instruct:
 60 |                 prompt = instructions[int(filename.rstrip('.png').lstrip('ground_truth_image_'))]['instruction'] + '\n' + direct_prompt
 61 |             else:
 62 |                 prompt = direct_prompt
 63 | 
 64 |             messages =  [
 65 |                             {
 66 |                                 "role": "system",
 67 |                                 "content": "You are a helpful assistant."
 68 |                             }, 
 69 |                             {
 70 |                                 "role": "user",
 71 |                                 "content": [
 72 |                                     {
 73 |                                         "type": "text",
 74 |                                         "text": prompt
 75 |                                     },
 76 |                                     {
 77 |                                         "type": "image_url",
 78 |                                         "image_url": {
 79 |                                             "url": f"data:image/jpeg;base64,{base64_image}" if 'gpt' in args.model_name else image_path
 80 |                                         },
 81 |                                     },
 82 |                                 ],
 83 |                             }
 84 |                         ]
 85 |             
 86 |             if args.prompt_strategy == 'CoT':
 87 |                 messages.append(
 88 |                     {
 89 |                         "role": "assistant",
 90 |                         "content": CoT_prompt
 91 |                     }
 92 |                 ) 
 93 |                 response = get_api_response(client, messages, args)
 94 |             elif args.prompt_strategy == 'Plan-and-Solve':   
 95 |                 step1_prompt = 'Let us first describe the plot and make a detailed plan step by step.'
 96 |                 step2_prompt = ' Based on the above description, now we are prepared to generate the code. The generated code is surrounded by ```python and ``` to make it easier to be extracted by regular expressions. Therefore, the code is:'
 97 |                   
 98 |                 messages.append(
 99 |                     {
100 |                         "role": "assistant",
101 |                         "content": step1_prompt
102 |                     }
103 |                 )
104 |                 response = get_api_response(client, messages, args)
105 |                 tmp_result = extract_code(response.choices[0].message.content.strip())
106 |                 if tmp_result != response.choices[0].message.content.strip():
107 |                     print('step1')
108 |                     jsonl_file.write(json.dumps({'code': tmp_result, 'question_id': filename, 'ground_truth_path': image_path}) + "\n")
109 |                     jsonl_file.flush()
110 |                     continue
111 |                 print('steps2')
112 |                 messages.append(
113 |                     {
114 |                         "role": "assistant", 
115 |                         "content": response.choices[0].message.content.strip() + step2_prompt
116 |                     }
117 |                 )
118 |                 response = get_api_response(client, messages, args) 
119 |             else:
120 |                 response = get_api_response(client, messages, args)
121 |                 
122 |             generated_code = extract_code(response.choices[0].message.content.strip())
123 |                 
124 |             jsonl_file.write(json.dumps({'code': generated_code, 'question_id': filename, 'ground_truth_path': image_path}) + "\n")
125 |             jsonl_file.flush()


--------------------------------------------------------------------------------
/plot2code/llm_generate_code.py:
--------------------------------------------------------------------------------
 1 | import base64
 2 | import sys
 3 | from openai import OpenAI
 4 | import os
 5 | import json
 6 | from tqdm import tqdm
 7 | import time
 8 | from .utils import get_parser, get_save_path, direct_prompt, read_jsonl_file, get_api_response, extract_code, encode_image
 9 | # Function to encode the  
10 | 
11 | parser = get_parser()
12 | 
13 | args = parser.parse_args()
14 | # Directory containing your images
15 | image_directory = args.image_directory
16 | 
17 | client = OpenAI(
18 |     api_key=os.getenv("OPENAI_API_KEY"), 
19 |     base_url=os.getenv("OPENAI_API_BASE"), 
20 | )
21 | 
22 | # Get current execute path
23 | current_path = sys.path[0]
24 | 
25 | save_path = get_save_path(args)
26 | 
27 | previous_results = None
28 | previous_filename = []
29 | 
30 | # check if the save_path is empty. if it is not empty, load the already generated results
31 | if os.path.exists(save_path) and os.path.getsize(save_path) > 0:
32 |     previous_results = read_jsonl_file(save_path)
33 |     previous_filename = [result['question_id'] for result in previous_results]
34 | 
35 | if args.instruct:
36 |     instructions = read_jsonl_file('data/ground_truth_code_with_instruction.jsonl')
37 |     
38 | # Iterate through all the PNG files in the ground_truth folder
39 | # Open the JSONL file
40 | if __name__ == '__main__':
41 |     
42 |     with open(save_path, "w") as jsonl_file:
43 | 
44 |         # write the previous results to the file
45 |         if previous_results is not None:
46 |             for result in previous_results:
47 |                 jsonl_file.write(json.dumps(result) + '\n')
48 |                 jsonl_file.flush()
49 |             
50 |         for filename in tqdm(os.listdir(image_directory)):
51 |             
52 |             if filename in previous_filename:
53 |                 continue 
54 |             
55 |             if filename.endswith(".png"):
56 |                 image_path = os.path.join(current_path, image_directory, filename)
57 | 
58 |             # Getting the base64 string
59 |             base64_image = encode_image(image_path)
60 | 
61 |             if args.instruct:
62 |                 prompt = instructions[int(filename.rstrip('.png').lstrip('ground_truth_image_'))]['instruction'] + '\n' + direct_prompt
63 |             else:
64 |                 prompt = direct_prompt
65 | 
66 |             messages =  [
67 |                             {
68 |                                 "role": "system",
69 |                                 "content": "You are a helpful assistant."
70 |                             }, 
71 |                             {
72 |                                 "role": "user",
73 |                                 "content": [
74 |                                     {
75 |                                         "type": "text",
76 |                                         "text": prompt
77 |                                     },
78 |                                 ],
79 |                             }
80 |                         ]
81 |             
82 |             
83 |             response = get_api_response(client, messages, args)
84 |                 
85 |             generated_code = extract_code(response.choices[0].message.content.strip())
86 |                 
87 |             jsonl_file.write(json.dumps({'code': generated_code, 'question_id': filename, 'ground_truth_path': image_path}) + "\n")
88 |             jsonl_file.flush()


--------------------------------------------------------------------------------
/plot2code/utils.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import base64
 3 | import os
 4 | import json
 5 | import re
 6 | import time
 7 | 
 8 | # Generate the argparse
 9 | def get_parser():
10 |     parser = argparse.ArgumentParser(description='Generate code from images using GPT-4 Vision')
11 |     parser.add_argument('--image_directory', type=str, default='data/Plot2Code/test', help='Directory containing the images')
12 |     parser.add_argument('--output_file', type=str, default='generated_code.jsonl', help='Output file to store the generated code')
13 |     parser.add_argument('--model_name', type=str, default='gpt-4-vision-preview', help='Model name to use for generating the code')
14 |     parser.add_argument('--model_path', type=str, default='/group/40034/chengyuewu/deepseek-vl-7b-chat', help='Model path to use for generating the code')
15 |     parser.add_argument('--max_tokens', type=int, default=1024, help='Maximum tokens to use for generating the code')
16 |     parser.add_argument('--temperature', type=int, default=0, help='Temperature to use for generating the code')
17 |     parser.add_argument('--save_dir', type=str, default='generated_results', help='Directory to save the generated code')
18 |     parser.add_argument('--prompt_strategy', type=str, default='default', help='Prompt strategy to use for generating the code')
19 |     parser.add_argument('--ground_truth_code_file', type=str, default='data/Plot2Code/test/metadata.jsonl', help='ground truth code file')
20 |     parser.add_argument('--max_retries', type=int, default=5, help='the maximum number of retries')
21 |     parser.add_argument('--eval_dir', default='evaluation_results', type=str, help='Directory to save the evaluation results')
22 |     parser.add_argument("--text_match_score_results", type=str, default="text_match_score.jsonl", help="Path to the JSONL file containing the text match scores")
23 |     parser.add_argument("--gpt4-vision-evaluation-results", type=str, default="gpt_4v_evaluation_results.jsonl", help="Path to the JSONL file containing the GPT-4 Vision evaluation results")
24 |     parser.add_argument('--final_score_results', type=str, default='final_score_results.jsonl', help='Output file to store the final score results')
25 |     parser.add_argument('--instruct', action='store_true', help='Whether to use instruction or not')
26 |     return parser
27 | 
28 | 
29 | def encode_image(image_path):
30 |   with open(image_path, "rb") as image_file:
31 |     return base64.b64encode(image_file.read()).decode('utf-8')
32 | 
33 | 
34 | def get_save_path(args):
35 |     if args.instruct:
36 |         save_path = os.path.join(args.save_dir, args.model_name, 'instruct', args.prompt_strategy, args.output_file)
37 |     else:
38 |         save_path = os.path.join(args.save_dir, args.model_name, 'direct', args.prompt_strategy, args.output_file)
39 |     os.makedirs(os.path.dirname(save_path), exist_ok=True)
40 |     return save_path
41 | 
42 | def get_img_path(args):
43 |     if args.instruct:
44 |         save_path = os.path.join(args.save_dir, args.model_name, 'instruct', args.prompt_strategy, "generated_images")
45 |     else:
46 |         save_path = os.path.join(args.save_dir, args.model_name, 'direct', args.prompt_strategy, "generated_images")
47 |     os.makedirs(save_path, exist_ok=True)
48 |     return save_path
49 | 
50 | def get_eval_path(args):
51 |     if args.instruct:
52 |         save_path = os.path.join(args.eval_dir, args.model_name, 'instruct', args.prompt_strategy)
53 |     else:
54 |         save_path = os.path.join(args.eval_dir, args.model_name, 'direct', args.prompt_strategy)
55 |     os.makedirs(save_path, exist_ok=True)
56 |     return save_path
57 | 
58 | def read_jsonl_file(file_path):
59 |     with open(file_path, 'r') as json_file:
60 |         return [json.loads(line) for line in json_file]
61 | 
62 | def extract_code(response_str):
63 |     matches = re.findall(r'```python(.*?)```', response_str, re.DOTALL)
64 |     if matches:
65 |         return "\n".join(match.strip() for match in matches)
66 |     else:
67 |         return response_str
68 |         
69 | def get_api_response(client, messages, args, model_name=None):
70 |     retry_cnt = 0
71 |             
72 |     while retry_cnt < args.max_retries:
73 |         try:
74 |             response = client.chat.completions.create(
75 |                 model=args.model_name if model_name is None else model_name,
76 |                 messages=messages,
77 |                 max_tokens=args.max_tokens,
78 |                 n=1, 
79 |                 temperature=args.temperature,
80 |             )
81 |             break
82 |         except Exception as e:
83 |             backoff = 2 ** retry_cnt
84 |             time.sleep(backoff)
85 |             retry_cnt += 1
86 |             print(f"Retry count: {retry_cnt}")
87 |     return response
88 |                     
89 | direct_prompt = "You are a helpful assistant that can generate Python code using matplotlib." + \
90 |                 "Generate the matplotlib code to create a plot that looks like the given image, as similar as possible." + \
91 |                 "The generated code should be surrounded by ```python and ```\n"
92 | 
93 | CoT_prompt = "Let us think step by step."
94 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | matplotlib==3.8.4
2 | Levenshtein
3 | NumPy
4 | Pillow
5 | openai


--------------------------------------------------------------------------------
/scripts/evaluate-instruct.sh:
--------------------------------------------------------------------------------
 1 | # #!/bin/bash
 2 | 
 3 | model_name=$1
 4 | prompt_strategy=$2
 5 |  
 6 | echo "Executing generated code to draw images..."
 7 | python -m plot2code.execute_generated_code --model_name "$model_name" --prompt_strategy $prompt_strategy --instruct
 8 | 
 9 | echo "Calculating text match score..."
10 | python -m plot2code.eval.text_match_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy --instruct
11 | 
12 | echo "Calculating gpt-4v evaluation score..."
13 | python -m plot2code.eval.gpt4v_evaluations_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy --instruct
14 | 
15 | echo "Combining evaluation results..."
16 | python -m plot2code.eval.combine_evaluation_results  --model_name "$model_name"  --prompt_strategy $prompt_strategy --instruct
17 | 
18 | echo "Done!"


--------------------------------------------------------------------------------
/scripts/evaluate.sh:
--------------------------------------------------------------------------------
 1 | # #!/bin/bash
 2 | 
 3 | model_name=$1
 4 | prompt_strategy=$2
 5 |  
 6 | echo "Executing generated code to draw images..."
 7 | python -m plot2code.execute_generated_code --model_name "$model_name" --prompt_strategy $prompt_strategy
 8 | 
 9 | echo "Calculating text match score..."
10 | python -m plot2code.eval.text_match_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy
11 | 
12 | echo "Calculating gpt-4v evaluation score..."
13 | python -m plot2code.eval.gpt4v_evaluations_score  --model_name "$model_name"  --prompt_strategy $prompt_strategy
14 | 
15 | echo "Combining evaluation results..."
16 | python -m plot2code.eval.combine_evaluation_results  --model_name "$model_name"  --prompt_strategy $prompt_strategy
17 | 
18 | echo "Done!"


--------------------------------------------------------------------------------
/scripts/generate_code.sh:
--------------------------------------------------------------------------------
1 | # GPT-4V generate code (direct asking)
2 | python -m plot2code.gpt4v_generate_code --prompt_strategy default
3 | 
4 | # GPT-4V generate code (conditional asking)
5 | python -m plot2code.gpt4v_generate_code --prompt_strategy default --instruct
6 | 
7 | # GPT-4V generate code (conditional asking with CoT)
8 | python -m plot2code.gpt4v_generate_code --prompt_strategy CoT --instruct


--------------------------------------------------------------------------------