└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Tutorial on Large Language Models for Tabular Data: Progresses and Future Directions
  2 | 
  3 | 🌟 [A tutorial on “Large Language Models for Tabular Data”](https://t.co/50xoDQuXwz) at [the SIGIR’24 conference in D.C](https://sigir-2024.github.io/attend_Tutorials.html). 
  4 | 
  5 | ## Slides
  6 | - [SIGIR 2024 Tutorial on Large Language Models for Tabular Data](https://drive.google.com/file/d/1oso3tz-uoOEWqqG163GluHIDIyCRQ58I/view)
  7 |   - [Introduction]
  8 |   - [Encoding Tabular Data for LLMs]
  9 |   - [Modeling and Training LLMs for Tabular Data]
 10 |   - [Tasks and Benchmarks]
 11 |   - [LLM-driven Table Agents]
 12 | 
 13 |     
 14 | ## Paper
 15 | - [Paper](https://dl.acm.org/doi/pdf/10.1145/3626772.3661384)
 16 | 
 17 | 
 18 | 
 19 | ## Paper List
 20 | 
 21 | ### Introduction
 22 | * Binder: Binding Language Models in Symbolic Languages [[Paper]](https://openreview.net/pdf?id=lH1PV42cbF)
 23 | * TabLLM: Few-shot Classification of Tabular Data with Large Language Models [[Paper]](https://proceedings.mlr.press/v206/hegselmann23a/hegselmann23a.pdf)
 24 | * Datar: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning [[Paper]](https://arxiv.org/pdf/2301.13808)
 25 | * Din-sql: Decomposed in-context learning of text-to-sql with self-correction [[Paper]](https://proceedings.neurips.cc/paper_files/paper/2023/file/72223cc66f63ca1aa59edaec1b3670e6-Paper-Conference.pdf)
 26 | * Table Meets LLM: Can Large Language Models Understand Structured Table Data?  [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3616855.3635752)
 27 | * SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [[Paper]](https://arxiv.org/abs/2305.19308)
 28 | * Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [[Paper]](https://arxiv.org/pdf/2306.07209)
 29 | * DAIL-SQL: Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper]](https://www.vldb.org/pvldb/vol17/p1132-gao.pdf)
 30 | * Table-GPT: Table-tuned GPT for Diverse Table Tasks [[Paper]](https://arxiv.org/pdf/2310.09263)
 31 | * API-Assisted Code Generation for Question Answering on Varied Table Structures [[Paper]](https://arxiv.org/pdf/2310.14687)
 32 | * InsightPilot: An LLM-Empowered Automated Data Exploration System [[Paper]](https://aclanthology.org/2023.emnlp-demo.31.pdf)
 33 | * TableLlama: Towards Open Large Generalist Models for Tables [[Paper]](https://arxiv.org/pdf/2311.09206)
 34 | * DBCopilot: Scaling Natural Language Querying to Massive Databases [[Paper]](https://arxiv.org/pdf/2312.03463)
 35 | * TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [[Paper]](https://arxiv.org/pdf/2312.09039)
 36 | * DB-GPT: Empowering Database Interactions with Private Large Language Models [[Paper]](https://arxiv.org/pdf/2312.17449)
 37 | * Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding [[Paper]](https://arxiv.org/pdf/2401.04398)
 38 | * Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [[Paper]](https://arxiv.org/pdf/2401.12869)
 39 | * MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [[Paper]](https://arxiv.org/pdf/2402.11453)
 40 | * StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [[Paper]](https://arxiv.org/pdf/2402.16671)
 41 | * TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [[Paper]](https://arxiv.org/pdf/2403.19318)
 42 | * Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [[Paper]](https://arxiv.org/pdf/2405.16234)
 43 | * Table-LLaVA: Multimodal Table Understanding [[Paper]](https://arxiv.org/pdf/2406.08100)
 44 | * SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [[Paper]](https://arxiv.org/pdf/2406.14991)
 45 | * SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [[Paper]](https://arxiv.org/abs/2407.09025)
 46 | 
 47 | ### Encoding Tabular Data for LLMs
 48 | * Table Meets LLM: Can Large Language Models Understand Structured Table Data?  [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3616855.3635752)
 49 | * Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs [[Paper]](https://arxiv.org/pdf/2310.10358)
 50 | * SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [[Paper]](https://arxiv.org/pdf/2403.03636)
 51 | * Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper]](https://arxiv.org/pdf/2308.15363)
 52 | * Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies [[Paper]](https://aclanthology.org/2023.findings-emnlp.996.pdf)
 53 | * Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study [[Paper]](https://arxiv.org/pdf/2404.17136)
 54 | * TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [[Paper]](https://arxiv.org/pdf/2312.09039)
 55 | * DBCopilot: Scaling Natural Language Querying to Massive Databases [[Paper]](https://arxiv.org/pdf/2312.03463v1)
 56 | * TabLLM: Few-shot Classification of Tabular Data with Large Language Models [[Paper]](https://proceedings.mlr.press/v206/hegselmann23a/hegselmann23a.pdf)
 57 | * Towards foundation models for learning on tabular data [[Paper]](https://arxiv.org/pdf/2310.07338)
 58 | * Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data [[Paper]](https://arxiv.org/pdf/2402.12424v3)
 59 | * Multimodal Table Understanding [[Paper]](https://arxiv.org/pdf/2406.08100)
 60 | * Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [[Paper]](https://arxiv.org/pdf/2405.16234)
 61 | * SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [[Paper]](https://arxiv.org/abs/2407.09025)
 62 |   
 63 | 
 64 | ### Modeling and Training LLMs for Tabular Data
 65 | * TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [[Paper]](https://arxiv.org/pdf/2005.08314)
 66 | * TaPas: Weakly Supervised Table Parsing via Pre-training [[Paper]](https://arxiv.org/pdf/2004.02349)
 67 | * TURL: table understanding through representation learning [[Paper]](https://arxiv.org/pdf/2006.14806)
 68 | * TUTA: Tree-based transformers for generally structured table pre-training [[Paper]](https://arxiv.org/pdf/2010.12537)
 69 | * TAPEX: Table Pre-Training via Learning a Neural SQL Executor [[Paper]](https://arxiv.org/pdf/2107.07653)
 70 | * Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models [[Paper]](https://arxiv.org/pdf/2201.05966)
 71 | * Table-GPT: Table-tuned GPT for Diverse Table Tasks [[Paper]](https://arxiv.org/pdf/2310.09263)
 72 | * SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [[Paper]](https://arxiv.org/pdf/2406.14991)
 73 | * TableLlama: Towards Open Large Generalist Models for Tables [[Paper]](https://arxiv.org/pdf/2311.09206)
 74 | * Hellama: Llamabased table to text generation by highlighting the important evidence [[Paper]](https://arxiv.org/pdf/2311.08896)
 75 | * StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [[Paper]](https://arxiv.org/pdf/2402.16671)
 76 | * TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [[Paper]](https://arxiv.org/pdf/2403.19318)
 77 | * TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [[Paper]](https://arxiv.org/pdf/2401.13223)
 78 | * Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [[Paper]](https://arxiv.org/pdf/2308.15363)
 79 | * DB-GPT: Empowering Database Interactions with Private Large Language Models [[Paper]](https://arxiv.org/pdf/2312.17449)
 80 | * Towards foundation models for learning on tabular data [[Paper]](https://arxiv.org/pdf/2310.07338)
 81 | * LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks [[Paper]](https://arxiv.org/pdf/2206.06565)
 82 | * Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science [[Paper]](https://arxiv.org/pdf/2403.20208)
 83 | * Multimodal Table Understanding [[Paper]](https://arxiv.org/pdf/2406.08100)
 84 | * Effective distillation of table-based reasoning ability from llms [[Paper]](https://arxiv.org/pdf/2309.13182)
 85 | * OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding [[Paper]](https://arxiv.org/pdf/2407.04923v1)
 86 | * Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [[Paper]](https://arxiv.org/pdf/2405.16234)
 87 | * SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [[Paper]](https://arxiv.org/abs/2407.09025)
 88 | 
 89 | 
 90 | ### Tasks and Benchmarks
 91 | * Tablesense: Spreadsheet table detection with convolutional neural networks [[Paper]](https://arxiv.org/pdf/2106.13500)
 92 | * Auto-tables: Synthesizing multi-step transformations to relationalize tables without using examples [[Paper]](https://arxiv.org/pdf/2307.14565)
 93 | * Spreadsheet table transformations from examples [[Paper]](https://www.microsoft.com/en-us/research/wp-content/uploads/2012/08/pldi11-table-synthesis-tr.pdf)
 94 | * TUTA: Tree-based transformers for generally structured table pre-training [[Paper]](https://arxiv.org/pdf/2010.12537)
 95 | * Fortap: Using formulas for numerical-reasoning-aware table pretraining [[Paper]](https://arxiv.org/pdf/2109.07323)
 96 | * Open domain question answering over tables via dense retrieval [[Paper]](https://arxiv.org/pdf/2103.12011)
 97 | * Table Retrieval May Not Necessitate Table-specific Model Design [[Paper]](https://arxiv.org/pdf/2205.09843)
 98 | * Compositional semantic parsing on semi-structured tables [[Paper]](https://arxiv.org/pdf/1508.00305)
 99 | * Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [[Paper]](https://arxiv.org/pdf/1809.08887)
100 | * HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation [[Paper]](https://arxiv.org/pdf/2108.06712)
101 | * Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning [[Paper]](https://arxiv.org/pdf/2209.14610)
102 | * FeTaQA: Free-form Table Question Answering [[Paper]](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00446/109273)
103 | * Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports [[Paper]](https://aclanthology.org/2023.acl-industry.20.pdf)
104 | * TempTabQA: Temporal Question Answering for Semi-Structured Tables [[Paper]](https://arxiv.org/pdf/2311.08002)
105 | * Open question answering over tables and text [[Paper]](https://arxiv.org/pdf/2010.10439)
106 | * TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [[Paper]](https://arxiv.org/pdf/2105.07624)
107 | * AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry [[Paper]](https://arxiv.org/pdf/2106.12944)
108 | * Tabfact: A large-scale dataset for table-based fact verification [[Paper]](https://arxiv.org/pdf/1909.02164)
109 | * ToTTo: A Controlled Table-To-Text Generation Dataset. [[Paper]](https://arxiv.org/pdf/2004.14373) [[Dataset]](https://github.com/google-research-datasets/ToTTo)
110 | * Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs [[Paper]](https://arxiv.org/pdf/2305.18641)
111 | * Matplotagent: Method and evaluation for llm-based agentic scientific data visualization [[Paper]](https://arxiv.org/pdf/2402.11453)
112 | * SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [[Paper]](https://arxiv.org/abs/2305.19308)
113 | * Language models enable simple systems for generating structured views of heterogeneous data lakes [[Paper]](https://arxiv.org/pdf/2304.09433)
114 | * Large language models (LLMs) on tabular data: Prediction, generation, and understanding-a survey [[Paper]](https://www.amazon.science/publications/large-language-models-llms-on-tabular-data-prediction-generation-and-understanding-a-survey)
115 | 
116 |  
117 | ### LLM-driven Table Agents
118 | * Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning [[Paper]](https://arxiv.org/pdf/2301.13808)
119 | * Exploring chain-of-thought style prompting for text-to-sql [[Paper]](https://arxiv.org/pdf/2305.14215)
120 | * Chain-of-table: Evolving tables in the reasoning chain for table understanding. [[Paper]](https://arxiv.org/abs/2401.04398)
121 | * DIN-SQL: Decomposed InContext Learning of Text-to-SQL with Self-Correction. [[Paper]](https://arxiv.org/pdf/2304.11015)
122 | * Tab-cot: Zero-shot tabular chain of thought [[Paper]](https://arxiv.org/pdf/2305.17812)
123 | * Selective demonstrations for cross-domain text-to-SQL [[Paper]](https://arxiv.org/pdf/2310.06302)
124 | * Spreadsheetcoder: Formula prediction from semi-structured context [[Paper]](http://proceedings.mlr.press/v139/chen21m/chen21m.pdf)
125 | * SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [[Paper]](https://arxiv.org/abs/2305.19308)
126 | * Toolqa: A dataset for llm question answering with external tools [[Paper]](https://proceedings.neurips.cc/paper_files/paper/2023/file/9cb2a7495900f8b602cb10159246a016-Paper-Datasets_and_Benchmarks.pdf)
127 | * ReAcTable: Enhancing ReAct for Table Question Answering. [[Paper]](https://arxiv.org/pdf/2310.00815)
128 | * Lever: Learning to verify language-to-code generation with execution [[Paper]](https://proceedings.mlr.press/v202/ni23b/ni23b.pdf)
129 | * MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [[Paper]](https://arxiv.org/pdf/2402.11453)
130 | * Binding Language Models in Symbolic Languages. [[Paper]](https://arxiv.org/pdf/2210.02875)
131 | * Chameleon: Plug-and-play compositional reasoning with large language models [[Paper]](https://proceedings.neurips.cc/paper_files/paper/2023/file/871ed095b734818cfba48db6aeb25a62-Paper-Conference.pdf)
132 | * API-Assisted Code Generation for Question Answering on Varied Table Structures [[Paper]](https://arxiv.org/pdf/2310.14687)
133 | * Executable code actions elicit better llm agents [[Paper]](https://arxiv.org/pdf/2402.01030?trk=public_post_comment-text)
134 | * Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. [[Paper]](https://arxiv.org/pdf/2306.07209)
135 | * ToolWriter: Question Specific Tool Synthesis for Tabular Data [[Paper]](https://eprints.gla.ac.uk/325598/2/325598.pdf)
136 | * CRAFT: Customizing llms by creating and retrieving from specialized toolsets [[Paper]](https://arxiv.org/pdf/2309.17428)
137 | * Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [[Paper]](https://arxiv.org/pdf/2401.12869)
138 | * Cognitive architectures for language agent [[Paper]](https://arxiv.org/pdf/2309.02427.pdf?trk=public_post_comment-text)
139 | * BAGEL: Bootstrapping Agents by Guiding Exploration with Language [[Paper]](https://arxiv.org/pdf/2403.08140)
140 | * Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records [[Paper]](https://openreview.net/pdf?id=ZjXEzFE0Qy)
141 | * Towards knowledge-intensive text-to-SQL semantic parsing with formulaic knowledge [[Paper]](https://arxiv.org/pdf/2301.01067)
142 | 


--------------------------------------------------------------------------------