├── LICENSE.md
├── README.md
├── ReviewFormat.txt
├── apikey.ini
├── chat_response.py
├── chat_reviewer.py
├── docker
├── Dockerfile
├── entrypoint.sh
└── requirements.txt
├── get_paper_from_pdf.py
├── images
├── chatreviewer.jpg
└── chatreviewer.png
├── input_file
├── demo1.pdf
└── demo2.pdf
├── output_file
├── 2023-03-18-19-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt
├── 2023-03-18-20-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt
└── 2023-03-18-21-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt
├── readme_en.md
├── requirements.txt
└── review_comments.txt
/LICENSE.md:
--------------------------------------------------------------------------------
1 |
本作品采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可。
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
6 |
7 | ChatReviewer
8 |
13 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
14 |
15 |
16 | **ChatReviewer收到了大多数人的肯定,同时也收到少数质疑和担忧。**
17 |
18 | **因此,为了防止极少数人直接使用生成的内容进行论文审稿,在每次生成的内容最后加上了的伦理声明,**
19 |
20 | **输出中间插入了警告:Generated by ChatGPT, no copying allowed! (为了伦理安全,只能牺牲内容的可读性~)**
21 |
22 | **从生成的内容里扣字,不如直接使用ChatGPT更有效率,相信可以防止有人使用该工具对分配的论文进行审稿。**
23 |
24 | **该工具帮助了不少人从审稿人角度对论文进行快速的总结和评估,提高了科研和学习的效率,目前看来是利大于弊。**
25 |
26 | **如果谁有更好的方法来限制少数人的不规范使用,欢迎留言,为科研界做一份贡献。**
27 |
28 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
29 |
30 | **其他相关工具--学术GPT https://huggingface.co/spaces/ShiwenNi/gpt-academic**
31 | 
32 | 💥💥💥**ChatReviewer的第一版网页出来了!网页版不需要翻墙,直接点击:https://huggingface.co/spaces/ShiwenNi/ChatReviewer**
33 | 
34 | 💥💥💥**ChatResponse的第一版网页也出来了!网页版不需要翻墙,直接点击:https://huggingface.co/spaces/ShiwenNi/ChatResponse**
35 | 
36 | **ChatPaper https://chatwithpaper.org/**
37 | 
38 |
39 | ChatReviewer是一款基于ChatGPT-3.5的API开发的智能论文分析与建议助手。其用途如下:
40 | ⭐️对论文的优缺点进行快速总结和分析,提高科研人员的文献阅读和理解的效率,紧跟研究前沿。
41 | ⭐️对自己的论文进行分析,根据ChatReviewer生成的改进建议进行查漏补缺,进一步提高自己的论文质量。
42 |
43 | **ChatResponse是一款根据审稿人的评论自动生成作者回复的AI助手。用途如下:**
44 | ⭐️根据收到的审稿意见,ChatResponse自动提取其中各个审稿人的问题和担忧,并生成点对点的回复。
45 |
46 | 基于之前ChatPaper的启发,本人在周末开发了这款ChatReviewer,并且开源给大家。欢迎大家使用、提问和转发!
47 |
48 | ♥本项目是本人利用休息时间进行更新,如果对您有帮助,欢迎Star和Fork,也欢迎您进行赞助!♥
49 |
50 | **⭐️⭐️⭐️ 声明:请对审稿的论文负责,不要直接复制粘贴ChatReviewer生成的任何审稿意见!!!**
51 |
52 | ## 主要更新:
53 | - 💥**网页版改为使用GPT4o-mini模型**
54 | - 💥**增加了ChatResponse的网页版!** 2023/3/27
55 | - 💥💥💥**为了方便没有太多计算机背景的人的使用,经过本人昨晚上的辛苦加班TAT,ChatReviewer的第一版网页出来了!!!** 2023/3/22
56 | - 重写了section split的逻辑, fix了可能抓不到固定标题的问题;修改prompt机制:先询问chatgpt 它感兴趣的章节, 随后再发送相应的章节。2023/3/21
57 | - **更新了ChatResponse,这个是根据审稿人的评论自动生成作者回复的AI助手。**(ChatResponse和ChatReviewer有点左右互博的意思...) 2023/3/19
58 | - 增加了Docker部署的方式。将服务部署在自己的服务器上,速度更快,更安全。一行命令可以部署两个服务。2023/5/28
59 |
60 |
61 | ## 使用步骤:
62 | Windows, Mac和Linux系统都可,python版本最好是3.8或3.9,因为低于3.8就不支持tiktoken这个包。
63 | 1. 在apikey.ini中填入你的openai的api key(sk开头的那串)
64 | 2. 使用过程要使用VPN而且保证全局代理(因为ChatGPT把中国ban了)。
65 | 3. 在ReviewFormat.txt中输入你想要的特殊审稿格式(不然就是默认格式)。
66 | 
67 | 4. 安装依赖:使用VPN。
68 | ``` bash
69 | pip install -r requirements.txt
70 | ```
71 | 或者使用国内镜像:
72 | ```bash
73 | pip install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
74 | ```
75 | 5. 对本地的论文进行分析: 运行chat_reviewer.py, 比如:
76 | ```python
77 | python chat_reviewer.py --paper_path "input_file/demo1.pdf"
78 | ```
79 | 对本地的论文进行批量分析: 运行chat_reviewer.py, 比如:
80 | ```python
81 | python chat_reviewer.py --paper_path "input_file"
82 | ```
83 | Docker部署:
84 |
85 | ```
86 | docker run -d -p 7000:7000 -p 8000:8000 hanhongyong/chatreviewer:latest
87 | ```
88 |
89 | 其中,7000端口为ChatReviewer服务,8000端口为ChatResponse服务。注意:本服务一定要部署在国外服务器上!
90 |
91 | ## 例子:
92 | 
93 |
94 | ## 使用ChatResponse
95 | 对本地的审稿评论review_comments.txt进行回复: 运行chat_response.py, 比如:
96 | ```python
97 | python chat_response.py --comment_path "review_comments.txt"
98 | ```
99 | 例子:
100 | 
101 |
102 | ## 致谢:
103 | - 感谢OpenAI提供的强大ChatGPT-API;
104 | - 感谢[kaixindelele](https://github.com/kaixindelele)同学的[ChatPaper](https://github.com/kaixindelele/ChatPaper)和开源精神 ,ChatReviewer的代码是基于ChatPaper修改而来。
105 |
106 |
107 |
108 |
--------------------------------------------------------------------------------
/ReviewFormat.txt:
--------------------------------------------------------------------------------
1 | * Overall Review
2 | Please briefly summarize the main points and contributions of this paper.
3 | xxx
4 |
5 | * Paper Strength
6 | Please provide a list of the strengths of this paper, including but not limited to: innovative and practical methodology, insightful empirical findings or in-depth theoretical analysis, well-structured review of relevant literature, and any other factors that may make the paper valuable to readers. (Maximum length: 2,000 characters)
7 | (1) xxx
8 | (2) xxx
9 | (3) xxx
10 | ...
11 |
12 | * Paper Weakness
13 | Please provide a numbered list of your main concerns regarding this paper (so authors could respond to the concerns individually). These may include, but are not limited to: inadequate implementation details for reproducing the study, limited evaluation and ablation studies for the proposed method, correctness of the theoretical analysis or experimental results, lack of comparisons or discussions with widely-known baselines in the field, lack of clarity in exposition, or any other factors that may impede the reader's understanding or benefit from the paper. Please kindly refrain from providing a general assessment of the paper's novelty without providing detailed explanations. (Maximum length: 2,000 characters)
14 | (1) xxx
15 | (2) xxx
16 | (3) xxx
17 | ...
18 |
19 | * Questions To Authors And Suggestions For Rebuttal
20 | Please provide a numbered list of specific and clear questions that pertain to the details of the proposed method, evaluation setting, or additional results that would aid in supporting the authors' claims. The questions should be formulated in a manner that, after the authors have answered them during the rebuttal, it would enable a more thorough assessment of the paper's quality. (Maximum length: 2,000 characters)
21 | xxx
22 |
23 | *Overall score (1-10)
24 | The paper is scored on a scale of 1-10, with 10 being the full mark, and 6 stands for borderline accept. Then give the reason for your rating.
25 | xxx
--------------------------------------------------------------------------------
/apikey.ini:
--------------------------------------------------------------------------------
1 | [OpenAI]
2 | OPENAI_API_KEYS = [sk-KXXXXXXXXXXXXXXXXXXXXXXXX, ]
3 |
4 |
--------------------------------------------------------------------------------
/chat_response.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import os
3 | import re
4 | import datetime
5 | import time
6 | import openai, tenacity
7 | import argparse
8 | import configparser
9 | import json
10 | import tiktoken
11 | from get_paper_from_pdf import Paper
12 | # ChatResponse
13 | # 定义Response类
14 | class Response:
15 | # 初始化方法,设置属性
16 | def __init__(self, args=None):
17 | if args.language == 'en':
18 | self.language = 'English'
19 | elif args.language == 'zh':
20 | self.language = 'Chinese'
21 | else:
22 | self.language = 'Chinese'
23 | # 创建一个ConfigParser对象
24 | self.config = configparser.ConfigParser()
25 | # 读取配置文件
26 | self.config.read('apikey.ini')
27 | # 获取某个键对应的值
28 | self.chat_api_list = self.config.get('OpenAI', 'OPENAI_API_KEYS')[1:-1].replace('\'', '').split(',')
29 | self.chat_api_list = [api.strip() for api in self.chat_api_list if len(api) > 5]
30 | self.cur_api = 0
31 | self.file_format = args.file_format
32 | self.max_token_num = 4096
33 | self.encoding = tiktoken.get_encoding("gpt2")
34 |
35 |
36 | def response_by_chatgpt(self, comment_path):
37 | htmls = []
38 | # 读取回复的内容
39 | with open(comment_path, 'r') as file:
40 | comments = file.read()
41 |
42 | chat_response_text = self.chat_response(text=comments)
43 | htmls.append(chat_response_text)
44 |
45 | # 将审稿意见保存起来
46 | date_str = str(datetime.datetime.now())[:13].replace(' ', '-')
47 | try:
48 | export_path = os.path.join('./', 'response_file')
49 | os.makedirs(export_path)
50 | except:
51 | pass
52 | file_name = os.path.join(export_path, date_str+'-Response.'+self.file_format)
53 | self.export_to_markdown("\n".join(htmls), file_name=file_name)
54 | htmls = []
55 |
56 |
57 | @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),
58 | stop=tenacity.stop_after_attempt(5),
59 | reraise=True)
60 | def chat_response(self, text):
61 | openai.api_key = self.chat_api_list[self.cur_api]
62 | self.cur_api += 1
63 | self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api
64 | response_prompt_token = 1000
65 | text_token = len(self.encoding.encode(text))
66 | input_text_index = int(len(text)*(self.max_token_num-response_prompt_token)/text_token)
67 | input_text = "This is the review comments:" + text[:input_text_index]
68 | messages=[
69 | {"role": "system", "content": """You are the author, you submitted a paper, and the reviewers gave the review comments.
70 | Please reply with what we have done, not what we will do.
71 | You need to extract questions from the review comments one by one, and then respond point-to-point to the reviewers’ concerns.
72 | Please answer in {}. Follow the format of the output later:
73 | - Response to reviewers
74 | #1 reviewer
75 | Concern #1: xxxx
76 | Author response: xxxxx
77 |
78 | Concern #2: xxxx
79 | Author response: xxxxx
80 | ...
81 |
82 | #2 reviewer
83 | Concern #1: xxxx
84 | Author response: xxxxx
85 |
86 | Concern #2: xxxx
87 | Author response: xxxxx
88 | ...
89 |
90 | #3 reviewer
91 | Concern #1: xxxx
92 | Author response: xxxxx
93 |
94 | Concern #2: xxxx
95 | Author response: xxxxx
96 | ...
97 |
98 | """.format(self.language)
99 |
100 | },
101 | {"role": "user", "content": input_text},
102 | ]
103 |
104 | response = openai.ChatCompletion.create(
105 | model="gpt-3.5-turbo",
106 | messages=messages,
107 | )
108 | result = ''
109 | for choice in response.choices:
110 | result += choice.message.content
111 | print("********"*10)
112 | print(result)
113 | print("********"*10)
114 | print("prompt_token_used:", response.usage.prompt_tokens)
115 | print("completion_token_used:", response.usage.completion_tokens)
116 | print("total_token_used:", response.usage.total_tokens)
117 | print("response_time:", response.response_ms/1000.0, 's')
118 | return result
119 |
120 | def export_to_markdown(self, text, file_name, mode='w'):
121 | # 使用markdown模块的convert方法,将文本转换为html格式
122 | # html = markdown.markdown(text)
123 | # 打开一个文件,以写入模式
124 | with open(file_name, mode, encoding="utf-8") as f:
125 | # 将html格式的内容写入文件
126 | f.write(text)
127 |
128 | def main(args):
129 | Response1 = Response(args=args)
130 | Response1.response_by_chatgpt(comment_path=args.comment_path)
131 |
132 |
133 | if __name__ == '__main__':
134 | parser = argparse.ArgumentParser()
135 | parser.add_argument("--comment_path", type=str, default='review_comments.txt', help="path of comment")
136 | parser.add_argument("--file_format", type=str, default='txt', help="output file format")
137 | parser.add_argument("--language", type=str, default='en', help="output lauguage, en or zh")
138 |
139 | args = parser.parse_args()
140 | start_time = time.time()
141 | main(args=args)
142 | print("response time:", time.time() - start_time)
143 |
144 |
--------------------------------------------------------------------------------
/chat_reviewer.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import os
3 | import re
4 | import datetime
5 | import time
6 | import openai, tenacity
7 | import argparse
8 | import configparser
9 | import json
10 | import tiktoken
11 | from get_paper_from_pdf import Paper
12 | import jieba
13 |
14 | def contains_chinese(text):
15 | for ch in text:
16 | if u'\u4e00' <= ch <= u'\u9fff':
17 | return True
18 | return False
19 |
20 | def insert_sentence(text, sentence, interval):
21 | lines = text.split('\n')
22 | new_lines = []
23 |
24 | for line in lines:
25 | if contains_chinese(line):
26 | words = list(jieba.cut(line))
27 | separator = ''
28 | else:
29 | words = line.split()
30 | separator = ' '
31 |
32 | new_words = []
33 | count = 0
34 |
35 | for word in words:
36 | new_words.append(word)
37 | count += 1
38 |
39 | if count % interval == 0:
40 | new_words.append(sentence)
41 |
42 | new_lines.append(separator.join(new_words))
43 |
44 | return '\n'.join(new_lines)
45 |
46 | # 定义Reviewer类
47 | class Reviewer:
48 | # 初始化方法,设置属性
49 | def __init__(self, args=None):
50 | if args.language == 'en':
51 | self.language = 'English'
52 | elif args.language == 'zh':
53 | self.language = 'Chinese'
54 | else:
55 | self.language = 'Chinese'
56 | # 创建一个ConfigParser对象
57 | self.config = configparser.ConfigParser()
58 | # 读取配置文件
59 | self.config.read('apikey.ini')
60 | # 获取某个键对应的值
61 | self.chat_api_list = self.config.get('OpenAI', 'OPENAI_API_KEYS')[1:-1].replace('\'', '').split(',')
62 | self.chat_api_list = [api.strip() for api in self.chat_api_list if len(api) > 5]
63 | self.cur_api = 0
64 | self.file_format = args.file_format
65 | self.max_token_num = 4096
66 | self.encoding = tiktoken.get_encoding("gpt2")
67 |
68 | def validateTitle(self, title):
69 | # 修正论文的路径格式
70 | rstr = r"[\/\\\:\*\?\"\<\>\|]" # '/ \ : * ? " < > |'
71 | new_title = re.sub(rstr, "_", title) # 替换为下划线
72 | return new_title
73 |
74 |
75 | def review_by_chatgpt(self, paper_list):
76 | htmls = []
77 | for paper_index, paper in enumerate(paper_list):
78 | sections_of_interest = self.stage_1(paper)
79 | # extract the essential parts of the paper
80 | text = ''
81 | text += 'Title:' + paper.title + '. '
82 | text += 'Abstract: ' + paper.section_texts['Abstract']
83 | intro_title = next((item for item in paper.section_names if 'ntroduction' in item.lower()), None)
84 | if intro_title is not None:
85 | text += 'Introduction: ' + paper.section_texts[intro_title]
86 | # Similar for conclusion section
87 | conclusion_title = next((item for item in paper.section_names if 'onclusion' in item), None)
88 | if conclusion_title is not None:
89 | text += 'Conclusion: ' + paper.section_texts[conclusion_title]
90 | for heading in sections_of_interest:
91 | if heading in paper.section_names:
92 | text += heading + ': ' + paper.section_texts[heading]
93 | chat_review_text = self.chat_review(text=text)
94 | htmls.append('## Paper:' + str(paper_index+1))
95 | htmls.append('\n\n\n')
96 | htmls.append(chat_review_text)
97 |
98 | # 将审稿意见保存起来
99 | date_str = str(datetime.datetime.now())[:13].replace(' ', '-')
100 | try:
101 | export_path = os.path.join('./', 'output_file')
102 | os.makedirs(export_path)
103 | except:
104 | pass
105 | mode = 'w' if paper_index == 0 else 'a'
106 | file_name = os.path.join(export_path, date_str+'-'+self.validateTitle(paper.title)+"."+self.file_format)
107 | self.export_to_markdown("\n".join(htmls), file_name=file_name, mode=mode)
108 | htmls = []
109 |
110 |
111 | def stage_1(self, paper):
112 | htmls = []
113 | text = ''
114 | text += 'Title: ' + paper.title + '. '
115 | text += 'Abstract: ' + paper.section_texts['Abstract']
116 | text_token = len(self.encoding.encode(text))
117 | if text_token > self.max_token_num/2 - 800:
118 | input_text_index = int(len(text)*((self.max_token_num/2)-800)/text_token)
119 | text = text[:input_text_index]
120 | openai.api_key = self.chat_api_list[self.cur_api]
121 | self.cur_api += 1
122 | self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api
123 | messages = [
124 | {"role": "system",
125 | "content": f"You are a professional reviewer in the field of {args.research_fields}. "
126 | f"I will give you a paper. You need to review this paper and discuss the novelty and originality of ideas, correctness, clarity, the significance of results, potential impact and quality of the presentation. "
127 | f"Due to the length limitations, I am only allowed to provide you the abstract, introduction, conclusion and at most two sections of this paper."
128 | f"Now I will give you the title and abstract and the headings of potential sections. "
129 | f"You need to reply at most two headings. Then I will further provide you the full information, includes aforementioned sections and at most two sections you called for.\n\n"
130 | f"Title: {paper.title}\n\n"
131 | f"Abstract: {paper.section_texts['Abstract']}\n\n"
132 | f"Potential Sections: {paper.section_names[2:-1]}\n\n"
133 | f"Follow the following format to output your choice of sections:"
134 | f"{{chosen section 1}}, {{chosen section 2}}\n\n"},
135 | {"role": "user", "content": text},
136 | ]
137 | response = openai.ChatCompletion.create(
138 | model="gpt-3.5-turbo",
139 | messages=messages,
140 | )
141 | result = ''
142 | for choice in response.choices:
143 | result += choice.message.content
144 | print(result)
145 | return result.split(',')
146 |
147 | @tenacity.retry(wait=tenacity.wait_exponential(multiplier=1, min=4, max=10),
148 | stop=tenacity.stop_after_attempt(5),
149 | reraise=True)
150 | def chat_review(self, text):
151 | openai.api_key = self.chat_api_list[self.cur_api]
152 | self.cur_api += 1
153 | self.cur_api = 0 if self.cur_api >= len(self.chat_api_list)-1 else self.cur_api
154 | review_prompt_token = 1000
155 | text_token = len(self.encoding.encode(text))
156 | # input_text_index = int(len(text)*(self.max_token_num-review_prompt_token)/text_token)
157 | if text_token > self.max_token_num/2 - 800:
158 | input_text_index = int(len(text)*((self.max_token_num/2)-800)/text_token)
159 | text = text[:input_text_index]
160 | input_text = "This is the paper for your review:" + text
161 | with open('ReviewFormat.txt', 'r') as file: # 读取特定的审稿格式
162 | review_format = file.read()
163 | messages=[
164 | {"role": "system", "content": "You are a professional reviewer in the field of "+args.research_fields+". Now I will give you a paper. You need to give a complete review opinion according to the following requirements and format:"+ review_format +" Please answer in {}.".format(self.language)},
165 | {"role": "user", "content": input_text},
166 | ]
167 |
168 | response = openai.ChatCompletion.create(
169 | model="gpt-3.5-turbo",
170 | messages=messages,
171 | )
172 | result = ''
173 | for choice in response.choices:
174 | result += choice.message.content
175 | result = insert_sentence(result, '**Generated by ChatGPT, no copying allowed!**', 15)
176 | result += "\n\n⚠伦理声明/Ethics statement:\n--禁止直接复制生成的评论用于任何论文审稿工作!\n--Direct copying of generated comments for any paper review work is prohibited!"
177 | print("********"*10)
178 | print(result)
179 | print("********"*10)
180 | print("prompt_token_used:", response.usage.prompt_tokens)
181 | print("completion_token_used:", response.usage.completion_tokens)
182 | print("total_token_used:", response.usage.total_tokens)
183 | print("response_time:", response.response_ms/1000.0, 's')
184 | return result
185 |
186 | def export_to_markdown(self, text, file_name, mode='w'):
187 | # 使用markdown模块的convert方法,将文本转换为html格式
188 | # html = markdown.markdown(text)
189 | # 打开一个文件,以写入模式
190 | with open(file_name, mode, encoding="utf-8") as f:
191 | # 将html格式的内容写入文件
192 | f.write(text)
193 |
194 | def main(args):
195 |
196 | reviewer1 = Reviewer(args=args)
197 | # 开始判断是路径还是文件:
198 | paper_list = []
199 | if args.paper_path.endswith(".pdf"):
200 | paper_list.append(Paper(path=args.paper_path))
201 | else:
202 | for root, dirs, files in os.walk(args.paper_path):
203 | print("root:", root, "dirs:", dirs, 'files:', files) #当前目录路径
204 | for filename in files:
205 | # 如果找到PDF文件,则将其复制到目标文件夹中
206 | if filename.endswith(".pdf"):
207 | paper_list.append(Paper(path=os.path.join(root, filename)))
208 | print("------------------paper_num: {}------------------".format(len(paper_list)))
209 | [print(paper_index, paper_name.path.split('\\')[-1]) for paper_index, paper_name in enumerate(paper_list)]
210 | reviewer1.review_by_chatgpt(paper_list=paper_list)
211 |
212 |
213 |
214 | if __name__ == '__main__':
215 | parser = argparse.ArgumentParser()
216 | parser.add_argument("--paper_path", type=str, default='', help="path of papers")
217 | parser.add_argument("--file_format", type=str, default='txt', help="output file format")
218 | parser.add_argument("--research_fields", type=str, default='computer science, artificial intelligence and reinforcement learning', help="the research fields of paper")
219 | parser.add_argument("--language", type=str, default='en', help="output lauguage, en or zh")
220 |
221 | args = parser.parse_args()
222 | start_time = time.time()
223 | main(args=args)
224 | print("review time:", time.time() - start_time)
225 |
226 |
--------------------------------------------------------------------------------
/docker/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.10-slim
2 | LABEL maintainer="hanhongyong"
3 | RUN apt-get update \
4 | && apt-get install -y libpq-dev build-essential \
5 | && apt-get clean
6 | RUN mkdir /app
7 | RUN mkdir -p /app/ChatResponse && mkdir -p /app/ChatReviewer
8 | COPY requirements.txt /app/
9 | COPY ChatResponse/* /app/ChatResponse
10 | COPY ChatReviewer/* /app/ChatReviewer
11 | COPY entrypoint.sh /app/
12 | RUN pip install --user --no-cache-dir -r /app/requirements.txt
13 | WORKDIR /app
14 | EXPOSE 7000
15 | EXPOSE 8000
16 | RUN chmod +x entrypoint.sh
17 | ENTRYPOINT ["sh","./entrypoint.sh"]
18 |
19 |
20 |
21 |
--------------------------------------------------------------------------------
/docker/entrypoint.sh:
--------------------------------------------------------------------------------
1 | python ChatResponse/app.py & python ChatReviewer/app.py
--------------------------------------------------------------------------------
/docker/requirements.txt:
--------------------------------------------------------------------------------
1 | PyMuPDF==1.21.1
2 | jieba
3 | tiktoken==0.2.0
4 | tenacity==8.2.2
5 | pybase64==1.2.3
6 | Pillow==9.4.0
7 | openai==0.27.0
8 | markdown
9 | gradio==3.20.1
10 | PyPDF2
--------------------------------------------------------------------------------
/get_paper_from_pdf.py:
--------------------------------------------------------------------------------
1 | import fitz, io, os
2 | from PIL import Image
3 | from collections import Counter
4 | import json
5 | import re
6 |
7 |
8 | class Paper:
9 | def __init__(self, path, title='', url='', abs='', authors=[]):
10 | # 初始化函数,根据pdf路径初始化Paper对象
11 | self.url = url # 文章链接
12 | self.path = path # pdf路径
13 | self.section_names = [] # 段落标题
14 | self.section_texts = {} # 段落内容
15 | self.abs = abs
16 | self.title_page = 0
17 | if title == '':
18 | self.pdf = fitz.open(self.path) # pdf文档
19 | self.title = self.get_title()
20 | self.parse_pdf()
21 | else:
22 | self.title = title
23 | self.authors = authors
24 | self.roman_num = ["I", "II", 'III', "IV", "V", "VI", "VII", "VIII", "IIX", "IX", "X"]
25 | self.digit_num = [str(d + 1) for d in range(10)]
26 | self.first_image = ''
27 |
28 | def parse_pdf(self):
29 | self.pdf = fitz.open(self.path) # pdf文档
30 | self.text_list = [page.get_text() for page in self.pdf]
31 | self.all_text = ' '.join(self.text_list)
32 | self.extract_section_infomation()
33 | self.section_texts.update({"title": self.title})
34 | self.pdf.close()
35 |
36 | # 定义一个函数,根据字体的大小,识别每个章节名称,并返回一个列表
37 | def get_chapter_names(self, ):
38 | # # 打开一个pdf文件
39 | doc = fitz.open(self.path) # pdf文档
40 | text_list = [page.get_text() for page in doc]
41 | all_text = ''
42 | for text in text_list:
43 | all_text += text
44 | # # 创建一个空列表,用于存储章节名称
45 | chapter_names = []
46 | for line in all_text.split('\n'):
47 | line_list = line.split(' ')
48 | if '.' in line:
49 | point_split_list = line.split('.')
50 | space_split_list = line.split(' ')
51 | if 1 < len(space_split_list) < 5:
52 | if 1 < len(point_split_list) < 5 and (
53 | point_split_list[0] in self.roman_num or point_split_list[0] in self.digit_num):
54 | # print("line:", line)
55 | chapter_names.append(line)
56 |
57 | return chapter_names
58 |
59 | def get_title(self):
60 | doc = self.pdf # 打开pdf文件
61 | max_font_size = 0 # 初始化最大字体大小为0
62 | max_string = "" # 初始化最大字体大小对应的字符串为空
63 | max_font_sizes = [0]
64 | for page_index, page in enumerate(doc): # 遍历每一页
65 | text = page.get_text("dict") # 获取页面上的文本信息
66 | blocks = text["blocks"] # 获取文本块列表
67 | for block in blocks: # 遍历每个文本块
68 | if block["type"] == 0 and len(block['lines']): # 如果是文字类型
69 | if len(block["lines"][0]["spans"]):
70 | font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
71 | max_font_sizes.append(font_size)
72 | if font_size > max_font_size: # 如果字体大小大于当前最大值
73 | max_font_size = font_size # 更新最大值
74 | max_string = block["lines"][0]["spans"][0]["text"] # 更新最大值对应的字符串
75 | max_font_sizes.sort()
76 | # print("max_font_sizes", max_font_sizes[-10:])
77 | cur_title = ''
78 | for page_index, page in enumerate(doc): # 遍历每一页
79 | text = page.get_text("dict") # 获取页面上的文本信息
80 | blocks = text["blocks"] # 获取文本块列表
81 | for block in blocks: # 遍历每个文本块
82 | if block["type"] == 0 and len(block['lines']): # 如果是文字类型
83 | if len(block["lines"][0]["spans"]):
84 | cur_string = block["lines"][0]["spans"][0]["text"] # 更新最大值对应的字符串
85 | font_flags = block["lines"][0]["spans"][0]["flags"] # 获取第一行第一段文字的字体特征
86 | font_size = block["lines"][0]["spans"][0]["size"] # 获取第一行第一段文字的字体大小
87 | # print(font_size)
88 | if abs(font_size - max_font_sizes[-1]) < 0.3 or abs(font_size - max_font_sizes[-2]) < 0.3:
89 | # print("The string is bold.", max_string, "font_size:", font_size, "font_flags:", font_flags)
90 | if len(cur_string) > 4 and "arXiv" not in cur_string:
91 | # print("The string is bold.", max_string, "font_size:", font_size, "font_flags:", font_flags)
92 | if cur_title == '':
93 | cur_title += cur_string
94 | else:
95 | cur_title += ' ' + cur_string
96 | self.title_page = page_index
97 | # break
98 | title = cur_title.replace('\n', ' ')
99 | return title
100 |
101 | def extract_section_infomation(self):
102 | doc = fitz.open(self.path)
103 |
104 | # 获取文档中所有字体大小
105 | font_sizes = []
106 | for page in doc:
107 | blocks = page.get_text("dict")["blocks"]
108 | for block in blocks:
109 | if 'lines' not in block:
110 | continue
111 | lines = block["lines"]
112 | for line in lines:
113 | for span in line["spans"]:
114 | font_sizes.append(span["size"])
115 | most_common_size, _ = Counter(font_sizes).most_common(1)[0]
116 |
117 | # 按照最频繁的字体大小确定标题字体大小的阈值
118 | threshold = most_common_size * 1
119 |
120 | section_dict = {}
121 | section_dict["Abstract"] = ""
122 | last_heading = None
123 | subheadings = []
124 | heading_font = -1
125 | # 遍历每一页并查找子标题
126 | found_abstract = False
127 | upper_heading = False
128 | font_heading = False
129 | for page in doc:
130 | blocks = page.get_text("dict")["blocks"]
131 | for block in blocks:
132 | if not found_abstract:
133 | try:
134 | text = json.dumps(block)
135 | except:
136 | continue
137 | if re.search(r"\bAbstract\b", text, re.IGNORECASE):
138 | found_abstract = True
139 | last_heading = "Abstract"
140 | if found_abstract:
141 | if 'lines' not in block:
142 | continue
143 | lines = block["lines"]
144 | for line in lines:
145 | for span in line["spans"]:
146 | # 如果当前文本是子标题
147 | if not font_heading and span["text"].isupper() and sum(1 for c in span["text"] if c.isupper() and ('A' <= c <='Z')) > 4: # 针对一些标题大小一样,但是全大写的论文
148 | upper_heading = True
149 | heading = span["text"].strip()
150 | if "References" in heading: # reference 以后的内容不考虑
151 | self.section_names = subheadings
152 | self.section_texts = section_dict
153 | return
154 | subheadings.append(heading)
155 | if last_heading is not None:
156 | section_dict[last_heading] = section_dict[last_heading].strip()
157 | section_dict[heading] = ""
158 | last_heading = heading
159 | if not upper_heading and span["size"] > threshold and re.match( # 正常情况下,通过字体大小判断
160 | r"[0-9]*\.* *[A-Z][a-z]+(?:\s[A-Z][a-z]+)*",
161 | span["text"].strip()):
162 | font_heading = True
163 | if heading_font == -1:
164 | heading_font = span["size"]
165 | elif heading_font != span["size"]:
166 | continue
167 | heading = span["text"].strip()
168 | if "References" in heading: # reference 以后的内容不考虑
169 | self.section_names = subheadings
170 | self.section_texts = section_dict
171 | return
172 | subheadings.append(heading)
173 | if last_heading is not None:
174 | section_dict[last_heading] = section_dict[last_heading].strip()
175 | section_dict[heading] = ""
176 | last_heading = heading
177 | # 否则将当前文本添加到上一个子标题的文本中
178 | elif last_heading is not None:
179 | section_dict[last_heading] += " " + span["text"].strip()
180 | self.section_names = subheadings
181 | self.section_texts = section_dict
182 |
183 |
184 | def main():
185 | path = r'demo.pdf'
186 | paper = Paper(path=path)
187 | paper.parse_pdf()
188 | # for key, value in paper.section_text_dict.items():
189 | # print(key, value)
190 | # print("*"*40)
191 |
192 |
193 | if __name__ == '__main__':
194 | main()
195 |
--------------------------------------------------------------------------------
/images/chatreviewer.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nishiwen1214/ChatReviewer/da15dc4179dcf6c18c2ff27bafb759adae6c6d0f/images/chatreviewer.jpg
--------------------------------------------------------------------------------
/images/chatreviewer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nishiwen1214/ChatReviewer/da15dc4179dcf6c18c2ff27bafb759adae6c6d0f/images/chatreviewer.png
--------------------------------------------------------------------------------
/input_file/demo1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nishiwen1214/ChatReviewer/da15dc4179dcf6c18c2ff27bafb759adae6c6d0f/input_file/demo1.pdf
--------------------------------------------------------------------------------
/input_file/demo2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nishiwen1214/ChatReviewer/da15dc4179dcf6c18c2ff27bafb759adae6c6d0f/input_file/demo2.pdf
--------------------------------------------------------------------------------
/output_file/2023-03-18-19-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt:
--------------------------------------------------------------------------------
1 | ## Paper:1
2 |
3 |
4 |
5 |
6 | Overall Review:
7 | The paper proposes a novel Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection. The proposed model selects the most explainable sentences for verdicts based on raw reports, thereby reducing the dependency on fact-checked reports. The paper presents two explainable fake news datasets and experimental results demonstrating that the proposed model outperforms state-of-the-art detection baselines and generates high-quality explanations.
8 |
9 | Paper Strength:
10 | (1) The paper addresses an important and timely problem of fake news detection and provide insights into the limitations of existing methods.
11 | (2) The proposed CofCED model is innovative and utilizes a hierarchical encoder and cascaded selectors for selecting explainable sentences.
12 | (3) The paper contributes to the research community by presenting two publicly available datasets for explainable fake news detection.
13 |
14 | Paper Weakness:
15 | (1) The paper could benefit from more detailed clarification of the proposed model's architecture and implementation details.
16 | (2) The paper lacks comparison with more relevant and widely-used baseline methods in the field.
17 | (3) Although the paper constructs two explainable fake news datasets, the paper does not describe the process and criteria for creating them.
18 |
19 | Questions To Authors And Suggestions For Rebuttal:
20 | (1) Can the authors provide additional information on the proposed model's architecture and implementation details?
21 | (2) Can the authors compare their proposed method with additional relevant and widely-used baseline methods in the field?
22 | (3) Can the authors provide more details on the process and criteria for creating the two constructed explainable fake news datasets?
23 |
24 | Overall score (1-5): 4
25 | The paper provides an innovative approach to fake news detection using a cascade of selectors and presents two publicly available datasets for the research community. However, the paper could benefit from additional details on architectural and implementation details and comparisons with more relevant baselines.
--------------------------------------------------------------------------------
/output_file/2023-03-18-20-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt:
--------------------------------------------------------------------------------
1 | ## Paper:1
2 |
3 |
4 |
5 |
6 |
7 |
8 | Overall Review:
9 | This paper proposes a novel Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection based on raw reports from different media outlets. The proposed model consists of a hierarchical encoder for web text representation, and two cascaded selectors to select the most explainable sentences for verdicts. The proposed method outperforms state-of-the-art detection baselines and generates high-quality explanations from diverse evaluation perspectives. The paper also presents two explainable fake news datasets, which are publicly available.
10 |
11 | Paper Strength:
12 | 1. The proposed CofCED model is innovative and practical, and solves the critical issue of fact-checking delays in detecting fake news.
13 | 2. The paper presents two explainable fake news datasets, which provide contributions to the research community.
14 | 3. The experiments demonstrate that the proposed method outperforms the state-of-the-art detection baselines and generates high-quality explanations.
15 |
16 | Paper Weakness:
17 | 1. The paper lacks explicit description of the implementation details for reproducing the study.
18 | 2. The evaluation and ablation studies for the proposed method are limited, lacking deeper analysis of the performance.
19 | 3. There is a lack of comparisons or discussions with widely-known baselines in the field.
20 |
21 | Questions to Authors and Suggestions for Rebuttal:
22 | 1. Could the authors provide more detailed information on the implementation process of the proposed method?
23 | 2. Could the authors conduct more extensive evaluation and ablation studies to support the proposed method's performance?
24 | 3. Could the authors compare the proposed method with more widely-known baselines in the field?
25 |
26 | Overall score:
27 | 4. The paper proposes a innovative and practical method for explainable fake news detection, and presents two valuable datasets to the research community. However, some weak points in the paper, such as limited evaluation and ablation studies and unclear comparisons with widely-known baselines, still need to be improved.
--------------------------------------------------------------------------------
/output_file/2023-03-18-21-A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Zhiwei Yang Abstract Acknowledgments References Appendices.txt:
--------------------------------------------------------------------------------
1 | ## Paper:1
2 |
3 |
4 |
5 |
6 | 总体评论:
7 |
8 | 本文提出了一种新颖的“Coarse-to-fine Cascaded Evidence-Distillation”(CofCED)的神经网络模型,用于可解释的假新闻检测,其基于海量的原始报告,通过两个级联选择器向上逐步精细选择语句,从而实现对新闻真实性的分类。通过对两个公开可访问的假新闻数据集的实验结果,证明了该方法明显优于现有技术,并且从多个评估视角生成了高质量的解释。
9 |
10 | 论文优点:
11 |
12 | (1) 作者提出了一种新颖的、基于原始报告进行假新闻检测的方法,实验结果表明其有效性。
13 |
14 | (2) 作者构建了两个公开的可解释假新闻数据集,有利于学术界的后续研究。
15 |
16 | (3) 本文提出的模型具有可解释性,基于语义特征进行分析,生成高质量的解释,具有良好的可读性和可理解性。
17 |
18 | 论文缺点:
19 |
20 | (1) 文中缺少详细的实现细节,难以复现相关实验。
21 |
22 | (2) 缺乏对该方法与其他优秀技术进行深入比较分析,难以了解其优势。
23 |
24 | (3) 论文在解释数据集的生成方式、数据量等问题上存在不足,需要进一步完善。
25 |
26 | 对作者提出的问题及建议:
27 |
28 | 暂无。
29 |
30 | 论文得分:
31 |
32 | 评分为3分。本文提出了一种新颖的、可解释的假新闻检测方法,构建了开放数据集,并从多个角度进行评估。但是,文章中存在缺少实验细节,与其他优秀技术的比较不足以及数据集生成等问题,需要尽快完善和解决。
--------------------------------------------------------------------------------
/readme_en.md:
--------------------------------------------------------------------------------
1 | # ChatReviewer & ChatResponse
2 |
3 |
7 |
8 |
9 | Based on the inspiration of ChatPaper, I developed this ChatReviewer over the weekend and open source it for everyone.
10 |
11 | **ChatReviewer is an automatic paper review AI assistant based on ChatGPT-3.5's API interface. **
12 |
13 | If it is helpful to you, a Star and Fork is a confirmation and encouragement to me.
14 |
15 | Feel free to repost it, as well as any questions and improvement ideas!
16 |
17 | ⭐️⭐️⭐️**Warning: ChatReviewer was developed to help people improve review efficiency and review quality, not to completely replace people with independent review, please be responsible for the reviewed papers and do not directly copy and paste any generated review comments!!! **
18 |
19 | ## Major updates.
20 | - ** Updated ChatResponse, an AI assistant that automatically generates author responses based on reviewers' comments. (ChatResponse and ChatReviewer are a bit left and right...) **
21 |
22 | ## Steps to use.
23 | Windows, Mac and Linux systems should be available, python version 3.8 or 3.9 is preferred, because below 3.8 the package tiktoken is not supported.
24 | 1. Fill in your openai key (the string starting with sk) in apikey.ini.
25 | 2. Enter the review format you want in ReviewFormat.txt (otherwise it is the default format).
26 | 3. Installation requirements.
27 | ``` bash
28 | pip install -r requirements.txt
29 | ```
30 | 4. To review a paper locally: run chat_reviewer.py, e.g.
31 | ```python
32 | python chat_reviewer.py --paper_path "input_file/demo1.pdf"
33 | ```
34 | To do a batch review of a local paper: run chat_reviewer.py, e.g.
35 | ```python
36 | python chat_reviewer.py --paper_path "input_file_path"
37 | ```
38 | ## Example:
39 | 
40 | ## Use ChatResponse
41 | To reply to a local review comment review_comments.txt: run chat_response.py, e.g.
42 | ```python
43 | python chat_response.py --comment_path "review_comments.txt"
44 | ```
45 | Example:
46 | 
47 |
48 | ## Acknowledgements.
49 | - Thanks to OpenAI for the powerful ChatGPT-API.
50 | - Thanks to [kaixindelele](https://github.com/kaixindelele) for [ChatPaper](https://github.com/kaixindelele/ChatPaper) and the spirit of open source , the code of ChatReviewer is based on ChatPaper modified.
51 |
52 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | PyMuPDF==1.21.1
2 | tiktoken==0.2.0
3 | tenacity==8.2.2
4 | pybase64==1.2.3
5 | Pillow==9.4.0
6 | openai==0.27.0
7 | markdown
8 | gradio==3.20.1
9 |
--------------------------------------------------------------------------------
/review_comments.txt:
--------------------------------------------------------------------------------
1 | #1 Reviewer
2 |
3 | Overall Review:
4 | The paper proposes a novel Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection. The proposed model selects the most explainable sentences for verdicts based on raw reports, thereby reducing the dependency on fact-checked reports. The paper presents two explainable fake news datasets and experimental results demonstrating that the proposed model outperforms state-of-the-art detection baselines and generates high-quality explanations.
5 |
6 | Paper Strength:
7 | (1) The paper addresses an important and timely problem of fake news detection and provide insights into the limitations of existing methods.
8 | (2) The proposed CofCED model is innovative and utilizes a hierarchical encoder and cascaded selectors for selecting explainable sentences.
9 | (3) The paper contributes to the research community by presenting two publicly available datasets for explainable fake news detection.
10 |
11 | Paper Weakness:
12 | (1) The paper could benefit from more detailed clarification of the proposed model's architecture and implementation details.
13 | (2) The paper lacks comparison with more relevant and widely-used baseline methods in the field.
14 | (3) Although the paper constructs two explainable fake news datasets, the paper does not describe the process and criteria for creating them.
15 |
16 | Questions To Authors And Suggestions For Rebuttal:
17 | (1) Can the authors provide additional information on the proposed model's architecture and implementation details?
18 | (2) Can the authors compare their proposed method with additional relevant and widely-used baseline methods in the field?
19 | (3) Can the authors provide more details on the process and criteria for creating the two constructed explainable fake news datasets?
20 |
21 | Overall score (1-5): 4
22 | The paper provides an innovative approach to fake news detection using a cascade of selectors and presents two publicly available datasets for the research community. However, the paper could benefit from additional details on architectural and implementation details and comparisons with more relevant baselines.
23 |
24 | #2 Reviewer
25 |
26 | Overall Review:
27 | The paper proposes a novel Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection. The proposed model selects the most explainable sentences for verdicts based on raw reports, thereby reducing the dependency on fact-checked reports. The paper presents two explainable fake news datasets and experimental results demonstrating that the proposed model outperforms state-of-the-art detection baselines and generates high-quality explanations.
28 |
29 | Paper Strength:
30 | (1) The paper addresses an important and timely problem of fake news detection and provide insights into the limitations of existing methods.
31 | (2) The proposed CofCED model is innovative and utilizes a hierarchical encoder and cascaded selectors for selecting explainable sentences.
32 | (3) The paper contributes to the research community by presenting two publicly available datasets for explainable fake news detection.
33 |
34 | Paper Weakness:
35 | (1) The paper could benefit from more detailed clarification of the proposed model's architecture and implementation details.
36 | (2) The paper lacks comparison with more relevant and widely-used baseline methods in the field.
37 | (3) Although the paper constructs two explainable fake news datasets, the paper does not describe the process and criteria for creating them.
38 |
39 | Questions To Authors And Suggestions For Rebuttal:
40 | (1) Can the authors provide additional information on the proposed model's architecture and implementation details?
41 | (2) Can the authors compare their proposed method with additional relevant and widely-used baseline methods in the field?
42 | (3) Can the authors provide more details on the process and criteria for creating the two constructed explainable fake news datasets?
43 |
44 | Overall score (1-5): 4
45 | The paper provides an innovative approach to fake news detection using a cascade of selectors and presents two publicly available datasets for the research community. However, the paper could benefit from additional details on architectural and implementation details and comparisons with more relevant baselines.
46 |
47 | #3 Reviewer
48 |
49 | Overall Review:
50 | The paper proposes a novel Coarse-to-fine Cascaded Evidence-Distillation (CofCED) neural network for explainable fake news detection. The proposed model selects the most explainable sentences for verdicts based on raw reports, thereby reducing the dependency on fact-checked reports. The paper presents two explainable fake news datasets and experimental results demonstrating that the proposed model outperforms state-of-the-art detection baselines and generates high-quality explanations.
51 |
52 | Paper Strength:
53 | (1) The paper addresses an important and timely problem of fake news detection and provide insights into the limitations of existing methods.
54 | (2) The proposed CofCED model is innovative and utilizes a hierarchical encoder and cascaded selectors for selecting explainable sentences.
55 | (3) The paper contributes to the research community by presenting two publicly available datasets for explainable fake news detection.
56 |
57 | Paper Weakness:
58 | (1) The paper could benefit from more detailed clarification of the proposed model's architecture and implementation details.
59 | (2) The paper lacks comparison with more relevant and widely-used baseline methods in the field.
60 | (3) Although the paper constructs two explainable fake news datasets, the paper does not describe the process and criteria for creating them.
61 |
62 | Questions To Authors And Suggestions For Rebuttal:
63 | (1) Can the authors provide additional information on the proposed model's architecture and implementation details?
64 | (2) Can the authors compare their proposed method with additional relevant and widely-used baseline methods in the field?
65 | (3) Can the authors provide more details on the process and criteria for creating the two constructed explainable fake news datasets?
66 |
67 | Overall score (1-5): 4
68 | The paper provides an innovative approach to fake news detection using a cascade of selectors and presents two publicly available datasets for the research community. However, the paper could benefit from additional details on architectural and implementation details and comparisons with more relevant baselines.
--------------------------------------------------------------------------------