├── .gitignore ├── README.md ├── crawl_paperlist.py ├── crawl_reviews.py ├── keywords_ranking.md ├── paperlist_2022.tsv ├── paperlist_2023.tsv ├── sources ├── 50_most_keywords_2022.png ├── 50_most_keywords_2023.png ├── 50_most_title_2022.png ├── 50_most_title_2023.png ├── ICLR-2022.csv ├── ICLR-2022.md ├── ICLR-2023.csv ├── ICLR-2023.md ├── logo-mask.png ├── logo.png ├── logo_wordcloud_keywords_2022.png ├── logo_wordcloud_keywords_2023.png ├── logo_wordcloud_title_2022.png └── logo_wordcloud_title_2023.png ├── title_ranking.md └── visualization.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | *.exe -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Visualize ICLR 2023 OpenReview Data 2 | 3 | ICLR 2023 Paper submission analysis from https://openreview.net/group?id=ICLR.cc/2023/Conference 4 | 5 |

6 | 7 |

8 | 9 | ## Requirements 10 | + Install requirements 11 | ```bash 12 | pip install wordcloud nltk pandas imageio selenium tqdm 13 | ``` 14 | + Download several `nltk`-related packages for language processing 15 | ```python 16 | import nltk 17 | nltk.download('punkt') 18 | nltk.download('averaged_perceptron_tagger') 19 | nltk.download('wordnet') 20 | nltk.download('stopwords') 21 | 22 | ``` 23 | + If you got anything wrong when calling `webdriver.Edge('msedgedriver.exe')`, you can 24 | 25 | - Delete `msedgedriver.exe` since it may only work on my computer (Windows 10) 26 | 27 | - [*Install Microsoft Edge (Chromium)*](https://docs.microsoft.com/en-us/microsoft-edge/webdriver-chromium?tabs=python#install-microsoft-edge-chromium): *Ensure you have installed [Microsoft Edge (Chromium)](https://www.microsoft.com/en-us/edge). To confirm that you have Microsoft Edge (Chromium) installed, go to `edge://settings/help` in the browser, and verify the version number is Version 75 or later*. 28 | - *Download Microsoft Edge Driver*: *Go to `edge://settings/help` to get the version of Edge.* 29 | - *Navigate to the [Microsoft Edge Driver downloads](https://developer.microsoft.com/microsoft-edge/tools/webdriver/#downloads) page and download the driver that matches the Edge version number.* 30 | 31 | > From https://stackoverflow.com/questions/63529124/how-to-open-up-microsoft-edge-using-selenium-and-python 32 | 33 | ## Crawl Data 34 | 1. Run `crawl_paperlist.py` to crawl the list of papers (~0.5h). 35 | ```bash 36 | python crawl_paperlist.py --year 2023 37 | ``` 38 | 39 | ## Paper List 40 | `crawl_paperlist.py` will miss several papers for some errors. The *full* paper list are as follows: 41 | 42 | + Year 2022 (3,407 submissions in total): 43 | 44 | + [sources/ICLR-2022.csv](./sources/ICLR-2022.csv) 45 | 46 | + [sources/ICLR-2022.md](./sources/ICLR-2022.md) 47 | 48 | + Year 2023 (4,966 submissions in total): 49 | 50 | + [sources/ICLR-2023.csv](./sources/ICLR-2023.csv) 51 | 52 | + [sources/ICLR-2023.md](./sources/ICLR-2023.md) 53 | 54 | 55 | ## Visualization 56 | Keywords and Title 57 | 58 | + **Keywords Frequency** 59 | The top 50 common keywords (uncased) and their frequency: 60 | 61 |

62 | 63 |

64 | 65 | + **Top-10 Ranking between 2022 and 2023** (full list please refer to [keywords.md](./keywords.md)) 66 | 67 | | Keyword | 2022 | 2023 | 68 | |:-------------------------|-------:|-------:| 69 | | reinforcement learning | 1 | 1 | 70 | | deep learning | 2 | 2 | 71 | | representation learning | 4 | 3 | 72 | | graph neural network | 3 | 4 | 73 | | transformer | 5 | 5 | 74 | | federate learning | 7 | 6 | 75 | | self-supervised learning | 6 | 7 | 76 | | contrastive learning | 10 | 8 | 77 | | robustness | 9 | 9 | 78 | | generative model | 8 | 10 | 79 | 80 | + **Ranking Changes of Top-50 keywords** 81 | 82 | | Keywords | 2022 | 2023 | rank $\uparrow$ | 83 | | :----------------------------- | ---: | ---: | -------------: | 84 | | large language model | 208 | 32 | 176 | 85 | | diffusion model | 173 | 14 | 159 | 86 | | offline reinforcement learning | 59 | 20 | 39 | 87 | | sparsity | 85 | 49 | 36 | 88 | | adversarial training | 19 | 47 | -28 | 89 | | differential privacy | 45 | 23 | 22 | 90 | | fairness | 43 | 22 | 21 | 91 | | model compression | 61 | 41 | 20 | 92 | | domain generalization | 55 | 36 | 19 | 93 | | time series | 58 | 40 | 18 | 94 | 95 | 96 | 97 | + **Keywords Cloud** 98 | The word clouds formed by keywords of submissions show the hot topics including *deep learning*, *reinforcement learning*, *representation learning*, *graph neural network*, etc. 99 | 100 |

101 | 102 |

103 | 104 | 105 | + **Title Keywords Frequency** 106 | The top 50 common title keywords (uncased) and their frequency: 107 | 108 |

109 | 110 |

111 | 112 | + **Top-10 Ranking between 2022 and 2023** (full list please refer to [title.md](./title.md)) 113 | 114 | | Title | 2022 | 2023 | 115 | |:---------------|-------:|-------:| 116 | | representation | 1 | 1 | 117 | | graph | 3 | 2 | 118 | | data | 6 | 3 | 119 | | reinforcement | 2 | 4 | 120 | | transformer | 7 | 5 | 121 | | training | 5 | 6 | 122 | | image | 10 | 7 | 123 | | efficient | 9 | 8 | 124 | | language | 15 | 9 | 125 | | federate | 14 | 10 | 126 | 127 | + **Ranking Changes of Top-50 Title keywords** 128 | 129 | | Title | 2022 | 2023 | rank $\uparrow$ | 130 | | :--------- | ---: | ---: | -------------: | 131 | | mask | 325 | 45 | 280 | 132 | | diffusion | 132 | 25 | 107 | 133 | | base | 76 | 36 | 40 | 134 | | visual | 61 | 38 | 23 | 135 | | offline | 55 | 34 | 21 | 136 | | attack | 25 | 46 | -21 | 137 | | vision | 64 | 44 | 20 | 138 | | generation | 36 | 17 | 19 | 139 | | adaptive | 45 | 32 | 13 | 140 | | knowledge | 38 | 26 | 12 | 141 | 142 | + **Title Keywords Cloud** 143 | The word clouds formed by keywords of submission titles: 144 | 145 |

146 | 147 |

148 | 149 | 150 | ## Related projects 151 | + https://github.com/evanzd/ICLR2021-OpenReviewData 152 | + https://github.com/fedebotu/ICLR2022-OpenReviewData 153 | + https://github.com/EdisonLeeeee/ICLR2022-OpenReviewData 154 | 155 | -------------------------------------------------------------------------------- /crawl_paperlist.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import argparse 4 | from tqdm import tqdm 5 | from selenium import webdriver 6 | from selenium.webdriver.common.by import By 7 | from selenium.webdriver.support import expected_conditions as EC 8 | from selenium.webdriver.support.wait import WebDriverWait 9 | 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument("--year", type=int, default="2023", help="Year of OpenReview papers. (default: 2023)") 12 | parser.add_argument('--pages', type=int, default=100, help='Number of pages on the website. (default: 100)') 13 | args = parser.parse_args() 14 | 15 | driver = webdriver.Edge('msedgedriver.exe') 16 | driver.get(f'https://openreview.net/group?id=ICLR.cc/{args.year}/Conference') 17 | 18 | cond = EC.presence_of_element_located((By.XPATH, '//*[@id="all-submissions"]/nav/ul/li[13]/a')) 19 | WebDriverWait(driver, 60).until(cond) 20 | 21 | with open('paperlist.tsv', 'w', encoding='utf8') as f: 22 | f.write('\t'.join(['paper_id', 'title', 'link', 'keywords', 'abstract']) + '\n') 23 | 24 | for page in tqdm(range(1, args.pages+1)): 25 | text = '' 26 | elems = driver.find_elements_by_xpath('//*[@id="all-submissions"]/ul/li') 27 | for i, elem in enumerate(elems): 28 | try: 29 | # parse title 30 | title = elem.find_element_by_xpath('./h4/a[1]') 31 | link = title.get_attribute('href') 32 | paper_id = link.split('=')[-1] 33 | title = title.text.strip().replace('\t', ' ').replace('\n', ' ') 34 | # show details 35 | elem.find_element_by_xpath('./a').click() 36 | time.sleep(0.2) 37 | # parse keywords & abstract 38 | items = elem.find_elements_by_xpath('.//li') 39 | keyword = ''.join([x.text for x in items if 'Keywords' in x.text]) 40 | abstract = ''.join([x.text for x in items if 'Abstract' in x.text]) 41 | keyword = keyword.strip().replace('\t', ' ').replace('\n', ' ').replace('Keywords: ', '') 42 | abstract = abstract.strip().replace('\t', ' ').replace('\n', ' ').replace('Abstract: ', '') 43 | text += paper_id + '\t' + title + '\t' + link + '\t' + keyword + '\t' + abstract + '\n' 44 | except Exception as e: 45 | print(f'page {page}, # {i}:', e) 46 | continue 47 | 48 | with open(f'paperlist_{args.year}.tsv', 'a', encoding='utf8') as f: 49 | f.write(text) 50 | 51 | # next page 52 | try: 53 | driver.find_element_by_xpath('//*[@id="all-submissions"]/nav/ul/li[13]/a').click() 54 | time.sleep(3) # NOTE: increase sleep time if needed 55 | except: 56 | print('no next page, exit.') 57 | break 58 | -------------------------------------------------------------------------------- /crawl_reviews.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import argparse 4 | import pandas as pd 5 | from tqdm import tqdm 6 | from selenium import webdriver 7 | from selenium.webdriver.common.by import By 8 | from selenium.webdriver.support import expected_conditions as EC 9 | from selenium.webdriver.support.wait import WebDriverWait 10 | 11 | 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument("--year", nargs="?", default="2023", help="Year of OpenReview papers. (default: 2023)") 14 | args = parser.parse_args() 15 | 16 | driver = webdriver.Edge('msedgedriver.exe') 17 | 18 | df = pd.read_csv(f'paperlist_{args.year}.tsv', sep='\t', index_col=0) 19 | 20 | ratings = dict() 21 | decisions = dict() 22 | for paper_id, link in tqdm(list(df.link.items())): 23 | try: 24 | driver.get(link) 25 | xpath = '//div[@id="note_children"]//span[@class="note_content_value"]/..' 26 | cond = EC.presence_of_element_located((By.XPATH, xpath)) 27 | WebDriverWait(driver, 60).until(cond) 28 | 29 | elems = driver.find_elements_by_xpath(xpath) 30 | assert len(elems), 'empty ratings' 31 | ratings[paper_id] = pd.Series([ 32 | int(x.text.split(': ')[1]) for x in elems if x.text.startswith('Rating:') 33 | ], dtype=int) 34 | decision = [x.text.split(': ')[1] for x in elems if x.text.startswith('Decision:')] 35 | decisions[paper_id] = decision[0] if decision else 'Unknown' 36 | except KeyboardInterrupt: 37 | break 38 | except Exception as e: 39 | print(paper_id, e) 40 | ratings[paper_id] = pd.Series(dtype=int) 41 | decisions[paper_id] = 'Unknown' 42 | 43 | df = pd.DataFrame(ratings).T 44 | df['decision'] = pd.Series(decisions) 45 | df.index.name = 'paper_id' 46 | df.to_csv('ratings.tsv', sep='\t') 47 | -------------------------------------------------------------------------------- /keywords_ranking.md: -------------------------------------------------------------------------------- 1 | | Keyword | 2022 | 2023 | 2 | |:-----------------------------------|-------:|-------:| 3 | | reinforcement learning | 1 | 1 | 4 | | deep learning | 2 | 2 | 5 | | representation learning | 4 | 3 | 6 | | graph neural network | 3 | 4 | 7 | | transformer | 5 | 5 | 8 | | federate learning | 7 | 6 | 9 | | self-supervised learning | 6 | 7 | 10 | | contrastive learning | 10 | 8 | 11 | | robustness | 9 | 9 | 12 | | generative model | 8 | 10 | 13 | | continual learning | 14 | 11 | 14 | | neural network | 12 | 12 | 15 | | transfer learning | 15 | 13 | 16 | | diffusion model | 173 | 14 | 17 | | generalization | 11 | 15 | 18 | | language model | 21 | 16 | 19 | | computer vision | 13 | 17 | 20 | | knowledge distillation | 23 | 18 | 21 | | vision transformer | 20 | 19 | 22 | | offline reinforcement learning | 59 | 20 | 23 | | optimization | 24 | 21 | 24 | | fairness | 43 | 22 | 25 | | differential privacy | 45 | 23 | 26 | | semi-supervised learning | 31 | 24 | 27 | | unsupervised learning | 32 | 25 | 28 | | deep reinforcement learning | 26 | 26 | 29 | | machine learning | 17 | 27 | 30 | | interpretability | 16 | 28 | 31 | | meta-learning | 22 | 29 | 32 | | adversarial robustness | 18 | 30 | 33 | | multi-agent reinforcement learning | 34 | 31 | 34 | | large language model | 208 | 32 | 35 | | optimal transport | 33 | 33 | 36 | | data augmentation | 30 | 34 | 37 | | few-shot learning | 27 | 35 | 38 | | domain generalization | 55 | 36 | 39 | | nlp | 40 | 37 | 40 | | adversarial attack | 25 | 38 | 41 | | domain adaptation | 28 | 39 | 42 | | time series | 58 | 40 | 43 | | model compression | 61 | 41 | 44 | | natural language processing | 50 | 42 | 45 | | distribution shift | 46 | 43 | 46 | | neural architecture search | 52 | 44 | 47 | | attention | 37 | 45 | 48 | | image classification | 29 | 46 | 49 | | adversarial training | 19 | 47 | 50 | | active learning | 41 | 48 | 51 | | sparsity | 85 | 49 | 52 | | deep neural network | 48 | 50 | 53 | -------------------------------------------------------------------------------- /sources/50_most_keywords_2022.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_keywords_2022.png -------------------------------------------------------------------------------- /sources/50_most_keywords_2023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_keywords_2023.png -------------------------------------------------------------------------------- /sources/50_most_title_2022.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_title_2022.png -------------------------------------------------------------------------------- /sources/50_most_title_2023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/50_most_title_2023.png -------------------------------------------------------------------------------- /sources/logo-mask.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo-mask.png -------------------------------------------------------------------------------- /sources/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo.png -------------------------------------------------------------------------------- /sources/logo_wordcloud_keywords_2022.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_keywords_2022.png -------------------------------------------------------------------------------- /sources/logo_wordcloud_keywords_2023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_keywords_2023.png -------------------------------------------------------------------------------- /sources/logo_wordcloud_title_2022.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_title_2022.png -------------------------------------------------------------------------------- /sources/logo_wordcloud_title_2023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EdisonLeeeee/ICLR2023-OpenReviewData/2eab03841f5adf6d23dca9f7e3d00072ba8a45d3/sources/logo_wordcloud_title_2023.png -------------------------------------------------------------------------------- /title_ranking.md: -------------------------------------------------------------------------------- 1 | | Title | 2022 | 2023 | 2 | |:---------------|-------:|-------:| 3 | | representation | 1 | 1 | 4 | | graph | 3 | 2 | 5 | | data | 6 | 3 | 6 | | reinforcement | 2 | 4 | 7 | | transformer | 7 | 5 | 8 | | training | 5 | 6 | 9 | | image | 10 | 7 | 10 | | efficient | 9 | 8 | 11 | | language | 15 | 9 | 12 | | federate | 14 | 10 | 13 | | optimization | 8 | 11 | 14 | | adversarial | 4 | 12 | 15 | | robust | 12 | 13 | 16 | | generalization | 13 | 14 | 17 | | detection | 20 | 15 | 18 | | contrastive | 21 | 16 | 19 | | generation | 36 | 17 | 20 | | domain | 23 | 18 | 21 | | dynamic | 30 | 19 | 22 | | gradient | 16 | 20 | --------------------------------------------------------------------------------