├── README.md
├── contributing.md
├── download_latest.py
├── output-1696514831.9555178-bacon.md
├── output-1696514935.4285033-quantum computing.md
├── output.md
└── weekly
    ├── 2023-10-05 arxiv dump.txt
    └── 2023-10-05 benchmarking.md


/README.md:
--------------------------------------------------------------------------------
 1 | # weekly_arxiv
 2 | 
 3 | - Main Export Link: https://export.arxiv.org/api/query?search_query=all:llms&sortBy=lastUpdatedDate&sortOrder=descending&max_results=200 
 4 | 
 5 | ## Usage
 6 | 
 7 | - run `python download_latest.py`
 8 | - it will update `output.md`
 9 | 
10 | You can modify the link in the script to change the search term, sorting, and limits in the URL
11 | 
12 | 
13 | ## Claude Prompting
14 | 
15 | Claude has 100k token window, which makes it useful for surveying the resulting documents
16 | 
17 | ### Identifying trends
18 | 
19 | Copy and paste the Markdown into Claude and then use this to get a list of trends
20 | 
21 | ```text
22 | Can you please characterize the major trends in the latest LLM research. ONLY use the material I have given you here. The goal is to summarize the last week of LLM research.
23 | ```
24 | 
25 | ### Listing out papers for specific trends
26 | 
27 | Once you identify trends, you can use this example. In this case I was asking about benchmarks.
28 | 
29 | ```text
30 | Please write a summary of the benchmarking trends as elucidated by the text I gave you. The summary should be in the format of a complete paragraph followed by a list of key innovations. Each item in the list should include the paper link and paper title, along with a brief description of the innovation. Make sure you only pull from the information I gave you in this conversation.
31 | ```
32 | 


--------------------------------------------------------------------------------
/contributing.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 | 
3 | NO >:-(
4 | 


--------------------------------------------------------------------------------
/download_latest.py:
--------------------------------------------------------------------------------
 1 | import requests
 2 | from time import time
 3 | from xml.etree import ElementTree as ET
 4 | 
 5 | 
 6 | # get search term
 7 | a = input("Whatcha wanna lookup? ")
 8 | 
 9 | 
10 | # URL of the XML object
11 | url = "https://export.arxiv.org/api/query?search_query=all:%s&sortBy=lastUpdatedDate&sortOrder=descending&max_results=200" % a.lower().replace(' ','%20')
12 | 
13 | # Send a GET request to the URL
14 | response = requests.get(url)
15 | 
16 | # Parse the XML response
17 | root = ET.fromstring(response.content)
18 | 
19 | # Namespace dictionary to find elements
20 | namespaces = {'atom': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
21 | 
22 | # Open the output file with UTF-8 encoding
23 | with open("output-%s-%s.md" % (time(), a), "w", encoding='utf-8') as file:
24 |     # Iterate over each entry in the XML data
25 |     for entry in root.findall('atom:entry', namespaces):
26 |         # Extract and write the title
27 |         title = entry.find('atom:title', namespaces).text
28 |         title = ' '.join(title.split())  # Replace newlines and superfluous whitespace with a single space
29 |         file.write(f"# {title}\n\n")
30 | 
31 |         # Extract and write the link to the paper
32 |         id = entry.find('atom:id', namespaces).text
33 |         file.write(f"[Link to the paper]({id})\n\n")
34 | 
35 |         # Extract and write the authors
36 |         authors = entry.findall('atom:author', namespaces)
37 |         file.write("## Authors\n")
38 |         for author in authors:
39 |             name = author.find('atom:name', namespaces).text
40 |             file.write(f"- {name}\n")
41 |         file.write("\n")
42 | 
43 |         # Extract and write the summary
44 |         summary = entry.find('atom:summary', namespaces).text
45 |         file.write(f"## Summary\n{summary}\n\n")
46 | 


--------------------------------------------------------------------------------
/weekly/2023-10-05 benchmarking.md:
--------------------------------------------------------------------------------
 1 | Recent research has led to the introduction of numerous benchmarks aimed at systematically evaluating the capabilities of large language models (LLMs) across diverse domains and tasks. These benchmarks enable the community to gain a more comprehensive understanding of the strengths and weaknesses of LLMs. 
 2 | 
 3 | Key benchmarking innovations include:
 4 | 
 5 | - [L-Eval: Instituting Standardized Evaluation for Long Context Language Models](http://arxiv.org/abs/2307.11088v3) - Proposes benchmark suite and metrics to evaluate LLM performance on long input contexts.
 6 | 
 7 | - [TRAM: Benchmarking Temporal Reasoning for Large Language Models](http://arxiv.org/abs/2310.00835v2) - Introduces benchmark composed of datasets covering various aspects of temporal reasoning. 
 8 | 
 9 | - [MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts](http://arxiv.org/abs/2310.02255v1) - Constructs benchmark amalgamating math and visual QA datasets requiring compositional reasoning.
10 | 
11 | - [ARN: A Comprehensive Framework and Dataset for Analogical Reasoning on Narratives](http://arxiv.org/abs/2310.01074v1) - Develops benchmark and framework for evaluating analogical reasoning in narratives using LLMs.
12 | 
13 | - [LLM-grounded Video Diffusion Models](http://arxiv.org/abs/2309.17444v2) - Benchmarks video generation from text descriptions using LLMs to guide video diffusion models. 
14 | 
15 | - [FELM: Benchmarking Factuality Evaluation of Large Language Models](http://arxiv.org/abs/2310.00741v1) - Introduces benchmark annotating LLM responses for evaluating factuality models.
16 | 
17 | - [L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models](http://arxiv.org/abs/2309.17446v2) - Presents systematic evaluation of LLMs on 7 language-to-code generation tasks.
18 | 
19 | The introduction of these rigorous benchmarks has been pivotal in furthering LLM research and enabling the community to track progress. However, there remains ample room for constructing additional benchmarks evaluating LLMs across even more diverse settings and applications.


--------------------------------------------------------------------------------