├── .gitignore
├── README.md
├── crawl_paperlist.py
├── crawl_reviews.py
├── keywords_ranking.md
├── paperlist_2022.tsv
├── paperlist_2023.tsv
├── sources
├── 50_most_keywords_2022.png
├── 50_most_keywords_2023.png
├── 50_most_title_2022.png
├── 50_most_title_2023.png
├── ICLR-2022.csv
├── ICLR-2022.md
├── ICLR-2023.csv
├── ICLR-2023.md
├── logo-mask.png
├── logo.png
├── logo_wordcloud_keywords_2022.png
├── logo_wordcloud_keywords_2023.png
├── logo_wordcloud_title_2022.png
└── logo_wordcloud_title_2023.png
├── title_ranking.md
└── visualization.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints/
2 | *.exe
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Visualize ICLR 2023 OpenReview Data
2 |
3 | ICLR 2023 Paper submission analysis from https://openreview.net/group?id=ICLR.cc/2023/Conference
4 |
5 |
6 |
7 |
8 |
9 | ## Requirements
10 | + Install requirements
11 | ```bash
12 | pip install wordcloud nltk pandas imageio selenium tqdm
13 | ```
14 | + Download several `nltk`-related packages for language processing
15 | ```python
16 | import nltk
17 | nltk.download('punkt')
18 | nltk.download('averaged_perceptron_tagger')
19 | nltk.download('wordnet')
20 | nltk.download('stopwords')
21 |
22 | ```
23 | + If you got anything wrong when calling `webdriver.Edge('msedgedriver.exe')`, you can
24 |
25 | - Delete `msedgedriver.exe` since it may only work on my computer (Windows 10)
26 |
27 | - [*Install Microsoft Edge (Chromium)*](https://docs.microsoft.com/en-us/microsoft-edge/webdriver-chromium?tabs=python#install-microsoft-edge-chromium): *Ensure you have installed [Microsoft Edge (Chromium)](https://www.microsoft.com/en-us/edge). To confirm that you have Microsoft Edge (Chromium) installed, go to `edge://settings/help` in the browser, and verify the version number is Version 75 or later*.
28 | - *Download Microsoft Edge Driver*: *Go to `edge://settings/help` to get the version of Edge.*
29 | - *Navigate to the [Microsoft Edge Driver downloads](https://developer.microsoft.com/microsoft-edge/tools/webdriver/#downloads) page and download the driver that matches the Edge version number.*
30 |
31 | > From https://stackoverflow.com/questions/63529124/how-to-open-up-microsoft-edge-using-selenium-and-python
32 |
33 | ## Crawl Data
34 | 1. Run `crawl_paperlist.py` to crawl the list of papers (~0.5h).
35 | ```bash
36 | python crawl_paperlist.py --year 2023
37 | ```
38 |
39 | ## Paper List
40 | `crawl_paperlist.py` will miss several papers for some errors. The *full* paper list are as follows:
41 |
42 | + Year 2022 (3,407 submissions in total):
43 |
44 | + [sources/ICLR-2022.csv](./sources/ICLR-2022.csv)
45 |
46 | + [sources/ICLR-2022.md](./sources/ICLR-2022.md)
47 |
48 | + Year 2023 (4,966 submissions in total):
49 |
50 | + [sources/ICLR-2023.csv](./sources/ICLR-2023.csv)
51 |
52 | + [sources/ICLR-2023.md](./sources/ICLR-2023.md)
53 |
54 |
55 | ## Visualization
56 | Keywords and Title
57 |
58 | + **Keywords Frequency**
59 | The top 50 common keywords (uncased) and their frequency:
60 |
61 |
62 | 
63 |
64 |
65 | + **Top-10 Ranking between 2022 and 2023** (full list please refer to [keywords.md](./keywords.md))
66 |
67 | | Keyword | 2022 | 2023 |
68 | |:-------------------------|-------:|-------:|
69 | | reinforcement learning | 1 | 1 |
70 | | deep learning | 2 | 2 |
71 | | representation learning | 4 | 3 |
72 | | graph neural network | 3 | 4 |
73 | | transformer | 5 | 5 |
74 | | federate learning | 7 | 6 |
75 | | self-supervised learning | 6 | 7 |
76 | | contrastive learning | 10 | 8 |
77 | | robustness | 9 | 9 |
78 | | generative model | 8 | 10 |
79 |
80 | + **Ranking Changes of Top-50 keywords**
81 |
82 | | Keywords | 2022 | 2023 | rank $\uparrow$ |
83 | | :----------------------------- | ---: | ---: | -------------: |
84 | | large language model | 208 | 32 | 176 |
85 | | diffusion model | 173 | 14 | 159 |
86 | | offline reinforcement learning | 59 | 20 | 39 |
87 | | sparsity | 85 | 49 | 36 |
88 | | adversarial training | 19 | 47 | -28 |
89 | | differential privacy | 45 | 23 | 22 |
90 | | fairness | 43 | 22 | 21 |
91 | | model compression | 61 | 41 | 20 |
92 | | domain generalization | 55 | 36 | 19 |
93 | | time series | 58 | 40 | 18 |
94 |
95 |
96 |
97 | + **Keywords Cloud**
98 | The word clouds formed by keywords of submissions show the hot topics including *deep learning*, *reinforcement learning*, *representation learning*, *graph neural network*, etc.
99 |
100 |
101 | 
102 |
103 |
104 |
105 | + **Title Keywords Frequency**
106 | The top 50 common title keywords (uncased) and their frequency:
107 |
108 |
109 | 
110 |
111 |
112 | + **Top-10 Ranking between 2022 and 2023** (full list please refer to [title.md](./title.md))
113 |
114 | | Title | 2022 | 2023 |
115 | |:---------------|-------:|-------:|
116 | | representation | 1 | 1 |
117 | | graph | 3 | 2 |
118 | | data | 6 | 3 |
119 | | reinforcement | 2 | 4 |
120 | | transformer | 7 | 5 |
121 | | training | 5 | 6 |
122 | | image | 10 | 7 |
123 | | efficient | 9 | 8 |
124 | | language | 15 | 9 |
125 | | federate | 14 | 10 |
126 |
127 | + **Ranking Changes of Top-50 Title keywords**
128 |
129 | | Title | 2022 | 2023 | rank $\uparrow$ |
130 | | :--------- | ---: | ---: | -------------: |
131 | | mask | 325 | 45 | 280 |
132 | | diffusion | 132 | 25 | 107 |
133 | | base | 76 | 36 | 40 |
134 | | visual | 61 | 38 | 23 |
135 | | offline | 55 | 34 | 21 |
136 | | attack | 25 | 46 | -21 |
137 | | vision | 64 | 44 | 20 |
138 | | generation | 36 | 17 | 19 |
139 | | adaptive | 45 | 32 | 13 |
140 | | knowledge | 38 | 26 | 12 |
141 |
142 | + **Title Keywords Cloud**
143 | The word clouds formed by keywords of submission titles:
144 |
145 |
146 | 
147 |
148 |
149 |
150 | ## Related projects
151 | + https://github.com/evanzd/ICLR2021-OpenReviewData
152 | + https://github.com/fedebotu/ICLR2022-OpenReviewData
153 | + https://github.com/EdisonLeeeee/ICLR2022-OpenReviewData
154 |
155 |
--------------------------------------------------------------------------------
/crawl_paperlist.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import argparse
4 | from tqdm import tqdm
5 | from selenium import webdriver
6 | from selenium.webdriver.common.by import By
7 | from selenium.webdriver.support import expected_conditions as EC
8 | from selenium.webdriver.support.wait import WebDriverWait
9 |
10 | parser = argparse.ArgumentParser()
11 | parser.add_argument("--year", type=int, default="2023", help="Year of OpenReview papers. (default: 2023)")
12 | parser.add_argument('--pages', type=int, default=100, help='Number of pages on the website. (default: 100)')
13 | args = parser.parse_args()
14 |
15 | driver = webdriver.Edge('msedgedriver.exe')
16 | driver.get(f'https://openreview.net/group?id=ICLR.cc/{args.year}/Conference')
17 |
18 | cond = EC.presence_of_element_located((By.XPATH, '//*[@id="all-submissions"]/nav/ul/li[13]/a'))
19 | WebDriverWait(driver, 60).until(cond)
20 |
21 | with open('paperlist.tsv', 'w', encoding='utf8') as f:
22 | f.write('\t'.join(['paper_id', 'title', 'link', 'keywords', 'abstract']) + '\n')
23 |
24 | for page in tqdm(range(1, args.pages+1)):
25 | text = ''
26 | elems = driver.find_elements_by_xpath('//*[@id="all-submissions"]/ul/li')
27 | for i, elem in enumerate(elems):
28 | try:
29 | # parse title
30 | title = elem.find_element_by_xpath('./h4/a[1]')
31 | link = title.get_attribute('href')
32 | paper_id = link.split('=')[-1]
33 | title = title.text.strip().replace('\t', ' ').replace('\n', ' ')
34 | # show details
35 | elem.find_element_by_xpath('./a').click()
36 | time.sleep(0.2)
37 | # parse keywords & abstract
38 | items = elem.find_elements_by_xpath('.//li')
39 | keyword = ''.join([x.text for x in items if 'Keywords' in x.text])
40 | abstract = ''.join([x.text for x in items if 'Abstract' in x.text])
41 | keyword = keyword.strip().replace('\t', ' ').replace('\n', ' ').replace('Keywords: ', '')
42 | abstract = abstract.strip().replace('\t', ' ').replace('\n', ' ').replace('Abstract: ', '')
43 | text += paper_id + '\t' + title + '\t' + link + '\t' + keyword + '\t' + abstract + '\n'
44 | except Exception as e:
45 | print(f'page {page}, # {i}:', e)
46 | continue
47 |
48 | with open(f'paperlist_{args.year}.tsv', 'a', encoding='utf8') as f:
49 | f.write(text)
50 |
51 | # next page
52 | try:
53 | driver.find_element_by_xpath('//*[@id="all-submissions"]/nav/ul/li[13]/a').click()
54 | time.sleep(3) # NOTE: increase sleep time if needed
55 | except:
56 | print('no next page, exit.')
57 | break
58 |
--------------------------------------------------------------------------------
/crawl_reviews.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import argparse
4 | import pandas as pd
5 | from tqdm import tqdm
6 | from selenium import webdriver
7 | from selenium.webdriver.common.by import By
8 | from selenium.webdriver.support import expected_conditions as EC
9 | from selenium.webdriver.support.wait import WebDriverWait
10 |
11 |
12 | parser = argparse.ArgumentParser()
13 | parser.add_argument("--year", nargs="?", default="2023", help="Year of OpenReview papers. (default: 2023)")
14 | args = parser.parse_args()
15 |
16 | driver = webdriver.Edge('msedgedriver.exe')
17 |
18 | df = pd.read_csv(f'paperlist_{args.year}.tsv', sep='\t', index_col=0)
19 |
20 | ratings = dict()
21 | decisions = dict()
22 | for paper_id, link in tqdm(list(df.link.items())):
23 | try:
24 | driver.get(link)
25 | xpath = '//div[@id="note_children"]//span[@class="note_content_value"]/..'
26 | cond = EC.presence_of_element_located((By.XPATH, xpath))
27 | WebDriverWait(driver, 60).until(cond)
28 |
29 | elems = driver.find_elements_by_xpath(xpath)
30 | assert len(elems), 'empty ratings'
31 | ratings[paper_id] = pd.Series([
32 | int(x.text.split(': ')[1]) for x in elems if x.text.startswith('Rating:')
33 | ], dtype=int)
34 | decision = [x.text.split(': ')[1] for x in elems if x.text.startswith('Decision:')]
35 | decisions[paper_id] = decision[0] if decision else 'Unknown'
36 | except KeyboardInterrupt:
37 | break
38 | except Exception as e:
39 | print(paper_id, e)
40 | ratings[paper_id] = pd.Series(dtype=int)
41 | decisions[paper_id] = 'Unknown'
42 |
43 | df = pd.DataFrame(ratings).T
44 | df['decision'] = pd.Series(decisions)
45 | df.index.name = 'paper_id'
46 | df.to_csv('ratings.tsv', sep='\t')
47 |
--------------------------------------------------------------------------------
/keywords_ranking.md:
--------------------------------------------------------------------------------
1 | | Keyword | 2022 | 2023 |
2 | |:-----------------------------------|-------:|-------:|
3 | | reinforcement learning | 1 | 1 |
4 | | deep learning | 2 | 2 |
5 | | representation learning | 4 | 3 |
6 | | graph neural network | 3 | 4 |
7 | | transformer | 5 | 5 |
8 | | federate learning | 7 | 6 |
9 | | self-supervised learning | 6 | 7 |
10 | | contrastive learning | 10 | 8 |
11 | | robustness | 9 | 9 |
12 | | generative model | 8 | 10 |
13 | | continual learning | 14 | 11 |
14 | | neural network | 12 | 12 |
15 | | transfer learning | 15 | 13 |
16 | | diffusion model | 173 | 14 |
17 | | generalization | 11 | 15 |
18 | | language model | 21 | 16 |
19 | | computer vision | 13 | 17 |
20 | | knowledge distillation | 23 | 18 |
21 | | vision transformer | 20 | 19 |
22 | | offline reinforcement learning | 59 | 20 |
23 | | optimization | 24 | 21 |
24 | | fairness | 43 | 22 |
25 | | differential privacy | 45 | 23 |
26 | | semi-supervised learning | 31 | 24 |
27 | | unsupervised learning | 32 | 25 |
28 | | deep reinforcement learning | 26 | 26 |
29 | | machine learning | 17 | 27 |
30 | | interpretability | 16 | 28 |
31 | | meta-learning | 22 | 29 |
32 | | adversarial robustness | 18 | 30 |
33 | | multi-agent reinforcement learning | 34 | 31 |
34 | | large language model | 208 | 32 |
35 | | optimal transport | 33 | 33 |
36 | | data augmentation | 30 | 34 |
37 | | few-shot learning | 27 | 35 |
38 | | domain generalization | 55 | 36 |
39 | | nlp | 40 | 37 |
40 | | adversarial attack | 25 | 38 |
41 | | domain adaptation | 28 | 39 |
42 | | time series | 58 | 40 |
43 | | model compression | 61 | 41 |
44 | | natural language processing | 50 | 42 |
45 | | distribution shift | 46 | 43 |
46 | | neural architecture search | 52 | 44 |
47 | | attention | 37 | 45 |
48 | | image classification | 29 | 46 |
49 | | adversarial training | 19 | 47 |
50 | | active learning | 41 | 48 |
51 | | sparsity | 85 | 49 |
52 | | deep neural network | 48 | 50 |
53 |
--------------------------------------------------------------------------------
/sources/50_most_keywords_2022.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_keywords_2022.png
--------------------------------------------------------------------------------
/sources/50_most_keywords_2023.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_keywords_2023.png
--------------------------------------------------------------------------------
/sources/50_most_title_2022.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_title_2022.png
--------------------------------------------------------------------------------
/sources/50_most_title_2023.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_title_2023.png
--------------------------------------------------------------------------------
/sources/logo-mask.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo-mask.png
--------------------------------------------------------------------------------
/sources/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo.png
--------------------------------------------------------------------------------
/sources/logo_wordcloud_keywords_2022.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_keywords_2022.png
--------------------------------------------------------------------------------
/sources/logo_wordcloud_keywords_2023.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_keywords_2023.png
--------------------------------------------------------------------------------
/sources/logo_wordcloud_title_2022.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_title_2022.png
--------------------------------------------------------------------------------
/sources/logo_wordcloud_title_2023.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_title_2023.png
--------------------------------------------------------------------------------
/title_ranking.md:
--------------------------------------------------------------------------------
1 | | Title | 2022 | 2023 |
2 | |:---------------|-------:|-------:|
3 | | representation | 1 | 1 |
4 | | graph | 3 | 2 |
5 | | data | 6 | 3 |
6 | | reinforcement | 2 | 4 |
7 | | transformer | 7 | 5 |
8 | | training | 5 | 6 |
9 | | image | 10 | 7 |
10 | | efficient | 9 | 8 |
11 | | language | 15 | 9 |
12 | | federate | 14 | 10 |
13 | | optimization | 8 | 11 |
14 | | adversarial | 4 | 12 |
15 | | robust | 12 | 13 |
16 | | generalization | 13 | 14 |
17 | | detection | 20 | 15 |
18 | | contrastive | 21 | 16 |
19 | | generation | 36 | 17 |
20 | | domain | 23 | 18 |
21 | | dynamic | 30 | 19 |
22 | | gradient | 16 | 20 |
--------------------------------------------------------------------------------