└── README.md
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # 🔖 AI Data Scientist Handbook 2026
3 | This handbook aims to cut through the noise and curate the AI tools, workflows, and resources that can actually help data scientists accelerate their growth in the age of AI.
4 |
5 | Rather than covering general tools and resources or trying to provide an exhaustive list, the aim is to surface the resources that are most relevant to data scientists right now.
6 |
7 | If you want to read more about the motivation behind this project, see the [About](#about) section.
8 |
9 | ## Table of Contents
10 | - [Tools](#tools)
11 | - [Modern BI & Analytics Tools](#modern-bi--analytics-tools)
12 | - [Conversational Analytics Tools](#conversational-analytics-tools)
13 | - [Miscellaneous Tools](#miscellaneous-tools)
14 | - [Foundation Models](#foundation-models)
15 | - [Learning Resources](#learning-resources)
16 | - [Newsletters](#newsletters)
17 | - [Courses](#courses)
18 | - [YouTube Channels](#youtube-channels)
19 | - [Conferences](#conferences)
20 | - [About](#about)
21 |
22 |
23 | ## 🛠️ Tools
24 |
25 | ### Modern BI & Analytics Tools
26 |
27 | This section highlights **modern, forward-leaning BI and analytics tools** that go beyond traditional dashboarding.
28 | The focus is on tools that emphasize semantic layers, metrics-as-code, search-driven analytics, notebooks, and tighter integration with modern data and AI workflows.
29 |
30 | Well-known, legacy BI platforms (Looker, PowerBI, Qlik, Tableau) are intentionally excluded to keep this list **high-signal** and oriented toward how analytics is evolving rather than how it has historically been done.
31 |
32 | | Tool | Category | What It's Good At | Why It Matters for DS |
33 | |------|----------|-------------------|------------------------|
34 | | [Omni](https://www.omni.co/) | Semantic BI / Metrics Layer | SQL-first BI with strong semantic modeling | Bridges analytics and engineering workflows |
35 | | [Steep](https://www.steep.app/) | Metrics & Analytics | Lightweight, modern metrics exploration | Faster iteration than traditional BI |
36 | | [Lightdash](https://www.lightdash.com/) | Open-source BI | dbt-native BI with metrics-as-code | Fits modern analytics stacks |
37 | | [Evidence](https://evidence.dev/) | Analytics Reporting | Markdown + SQL driven reports | BI as code, versionable insights |
38 | | [Hex](https://hex.tech/) | Notebook BI | Notebooks, dashboards, collaboration; AI-powered conversational analytics | Analyst-to-stakeholder friendly with self-serve AI features |
39 | | [Mode](https://mode.com/) | Analytics Platform | SQL + Python with reporting | Strong hybrid DS / BI workflows |
40 | | [Preset](https://preset.io/) | Open-source BI | Managed Apache Superset | Scalable, customizable BI |
41 | | [Metabase](https://www.metabase.com/) | Open-source BI | Simple querying and dashboards | Fast exploration with low friction |
42 | | [ThoughtSpot](https://www.thoughtspot.com/) | Search-Driven Analytics | Natural language search with AI-assisted insights | Brings search-style analytics to large datasets |
43 |
44 | #### Standalone Semantic Layer Tools
45 |
46 | Dedicated semantic layer platforms that sit between your data warehouse and consumption tools (BI, AI, applications). They provide a single source of truth for metrics, dimensions, and business logic (decoupled from any specific BI tool).
47 |
48 | | Tool | Type | What It's Good At | Why It Matters |
49 | |------|------|-------------------|----------------|
50 | | [Cube](https://cube.dev/) | Open-source / Cloud | Headless semantic layer with REST, GraphQL, MDX, and SQL APIs; caching and pre-aggregations | API-first, works with any BI tool or AI agent; strong for embedded analytics |
51 | | [AtScale](https://www.atscale.com/) | Enterprise | Universal semantic layer with MDX/DAX support; integrates with Power BI, Tableau, Excel | Enterprise-grade; bridges legacy BI tools with modern cloud warehouses |
52 | | [dbt Semantic Layer (MetricFlow)](https://docs.getdbt.com/docs/build/about-metricflow) | Open-source / Cloud | Metrics-as-code defined alongside dbt models; integrates with dbt Cloud | Tight integration with dbt workflows; single place for transforms + metrics |
53 | | [Malloy](https://www.malloydata.dev/) | Open-source | Semantic modeling language from Google; compiles to SQL for BigQuery, Snowflake, Postgres, etc. | Lightweight, expressive; good for teams wanting code-first semantic models |
54 |
55 | *If you want to know more about the semantic layer, check out [this article](https://read.futureproofds.com/p/semantic-layers-and-the-future-of-agentic-analytics).*
56 |
57 | ### Conversational Analytics Tools
58 |
59 | Tools that let you talk to your data via conversational AI, natural-language querying, or AI-assisted analytics, but that don't quite fit into the modern BI category.
60 |
61 | | Tool | What it does |
62 | | --- | --- |
63 | | [PandasAI](https://pandas-ai.com/) | AI-driven interface for Python dataframes; ask questions and get insights and visuals. |
64 | | [Julius AI](https://julius.ai/) | Conversational AI analyst for data insights and charts (supports spreadsheets/CSV/sheet imports). |
65 | | [Zerve AI](https://www.zerve.ai/) | Conversational interface for querying and exploring data. |
66 | | [DataGPT](https://datagpt.com/) | Conversational AI data analyst that generates insights and deep analysis from business data. |
67 | | [FineChatBI](https://www.fanruan.com/en/finechatbi) | Conversational analytics tool to ask questions and build dashboards and visualizations. |
68 | | [Vanna AI](https://vanna.ai/) | Natural-language chat interface for querying SQL databases; generates SQL and charts/summaries. |
69 | | [Powerdrill (Chat with Database)](https://powerdrill.ai/features/chat-with-database) | Chat-based analytics interface for asking questions and analyzing data without writing SQL. |
70 | | [Wren AI](https://www.getwren.ai/) | Natural-language interface for querying and interacting with data sources. |
71 |
72 | ### Miscellaneous Tools
73 |
74 | This section includes **tools built specifically with data scientists in mind** that don’t fit into the other categories but are still highly relevant to modern, AI-native data science workflows.
75 |
76 | | Tool | What it’s for | Why it belongs here |
77 | |------|---------------|---------------------|
78 | | [Google Agent Development Kit (ADK)](https://cloud.google.com/vertex-ai/docs/agent-builder/overview) | Framework for building structured, tool-using agents | Designed for analytical and reasoning-heavy workflows, not just chatbots |
79 | | [MCP Toolbox for Databases](https://github.com/modelcontextprotocol/servers) | Standardized way to connect agents to databases | Directly addresses a core DS need: safe, structured access to data sources |
80 | | [Metaflow](https://metaflow.org/) | DS-first workflow and experiment framework | Built to let data scientists move from notebooks to production without heavy infrastructure |
81 | | [cleanlab](https://cleanlab.ai/) | Data quality and label issue detection | Focuses on a uniquely DS problem: silent data and label errors that hurt model performance |
82 |
83 |
84 |
85 |
86 | ## 🤖 Foundation Models
87 | This section lists foundation models (open-source and commercial) that are relevant to aim at solving core data science problems, including models for tabular data, time series, recommendations, and multimodal analysis.
88 |
89 | Use this section as a starting point for exploring foundation models and their capabilities.
90 |
91 | | Model | Domain | Organization | Access Type | Primary Use Case |
92 | | --- | --- | --- | --- | --- |
93 | | [TimeGPT](https://www.nixtla.io/) | Time series | Nixtla | API / Open | Forecasting and anomaly detection |
94 | | [TimesFM](https://github.com/google-research/timesfm) | Time series | Google | Open | Zero-shot forecasting |
95 | | [Chronos](https://github.com/amazon-science/chronos-forecasting) | Time series | Amazon AWS | Open | General forecasting |
96 | | [Moirai](https://github.com/SalesforceAIResearch/uni2ts) | Time series | Salesforce | Open | Multi-domain forecasting |
97 | | [Toto](https://github.com/DataDog/toto) | Observability | Datadog | Open | High-cardinality forecasting |
98 | | [MOMENT](https://github.com/moment-timeseries-foundation-model/moment) | Time series | CMU | Open | Multi-task (forecasting, anomaly, etc.) |
99 | | [Granite TTM-R2](https://github.com/ibm-granite/granite-tsfm) | Time series | IBM | Open | Sequential prediction |
100 | | [TabPFN](https://github.com/PriorLabs/TabPFN) | Tabular | Prior Labs | Open | Classification and regression |
101 | | [TableGPT2](https://huggingface.co/tablegpt/TableGPT2-7B) | Tabular / NLP | Zhejiang Univ. | Open | Table question answering and code generation |
102 | | [Netflix RecSys Model](https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39) | Recommendations | Netflix | Proprietary | Personalization at scale |
103 | | [Spotify 2T-HGNN](https://research.atspotify.com/publications/personalized-audiobook-recommendations-at-spotify-through-graph-neural-networks) | Recommendations | Spotify | Proprietary | Cross-modal recommendations |
104 |
105 | *If you want to know more about foundation models for data science, check out [this article](https://read.futureproofds.com/p/what-are-foundation-models-and-why-data-scientists-should-care).*
106 |
107 |
108 | ## 📚 Learning Resources
109 |
110 | This section lists **learning resources** that go beyond generic theory and either align with *AI-native data workflows* or *applied data science with modern AI tools*.
111 |
112 | ### Newsletters
113 | - [Future Proof Data Science](https://read.futureproofds.com/): A weekly newsletter for data scientists who want to stay relevant and grow their careers in the age of AI (and beyond).
114 | - [Jam with AI](https://jamwithai.substack.com/): A newsletter inspired by real-world AI/ML events & projects
115 | - [To Data & Beyond](https://youssefh.substack.com/): A newsletter for mastering Data Science & AI—Beyond the Basics
116 | - [Daily Dose of Data Science](https://blog.dailydoseofds.com/): A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes.
117 | - [Neural Pulse](https://neuralpulse.io/subscribe): A 5-minute, human-curated newsletter delivering the best in AI, ML, and data science (twice a week).
118 |
119 |
120 | ### Courses
121 |
122 | | Resource | What It Covers | Why It Belongs Here |
123 | |----------|----------------|---------------------|
124 | | [AI Workflows Bootcamp](https://futureproofds.com/) | A cohort-based program that helps data scientists master AI workflows and automation to 10× productivity, stay relevant, and accelerate their careers. | Built for data scientists, by data scientists. |
125 | | [DeepLearning.AI Courses](https://www.deeplearning.ai/courses/) | AI/ML foundations and applied developer workflows | Useful for DS learners who need *conceptual grounding* alongside applied workflows. |
126 | | [Building AI Agents and Agentic Workflows Specialization (Coursera)](https://www.coursera.org/specializations/building-ai-agents-and-agentic-workflows) | Building and orchestrating agent-based AI systems (LangChain, LangGraph, tool calling) | Focuses on *agentic workflows* that map directly to DS productivity scenarios. |
127 | | [Introduction to LangGraph (LangChain Academy)](https://academy.langchain.com/courses/intro-to-langgraph) | Building stateful, multi-actor agents with LangGraph | Hands-on course for building agentic workflows directly applicable to DS automation. |
128 |
129 |
130 | ### YouTube Channels
131 |
132 | - [Data Neighbor Podcast](https://www.youtube.com/@dataneighborpodcast): Hosted by industry veterans Hai Guan, Sravya Madipalli, and Shane Butler, covering data science careers, AI trends, and professional growth.
133 | - [AI Engineer](https://www.youtube.com/@aiDotEngineer): Official channel from the AI Engineer conference/community, featuring talks on AI engineering, agents, and applied AI development.
134 |
135 | *Feel free to reach out to me if you have any suggestions for channels that should be added to this list!*
136 |
137 |
138 |
139 | ## 🏆 Conferences
140 |
141 | ### United States
142 |
143 | | Conference | Date | Location | Details |
144 | |------------|------|----------|----------|
145 | | [ODSC AI East 2026](https://odsc.ai/east/) | April 28-29, 2026 | Boston, MA | Various tracks including ML, NLP, MLOps, and Data Visualization. 250+ speakers. |
146 | | [IBM Think 2026](https://www.ibm.com/events/think/) | May 4-7, 2026 | Boston, MA | Focuses on AI productivity, trusted data, scalable AI architectures, and cost optimization. |
147 | | [Machine Learning Week 2026](https://machinelearningweek.com/) | May 5-6, 2026 | San Francisco, CA | Focuses on making AI products robust and deployment-worthy. |
148 | | [The Data Science Conference](https://www.thedatascienceconference.com/) | May 28-29, 2026 | Chicago, IL | Vendor-free, sponsor-free, and recruiter-free conference for data science professionals. |
149 | | [Data + AI Summit 2026](https://www.databricks.com/dataaisummit) | June 15-18, 2026 | San Francisco, CA | Hosted by Databricks. Includes discussions, networking, and hands-on training. |
150 | | [AI Engineer World's Fair 2026](https://www.ai.engineer/worldsfair) | June 30 - July 2, 2026 | San Francisco, CA | Largest technical AI conference with 20 tracks, 250 speakers, 6,000+ attendees. |
151 | | [The AI Conference 2026](https://aiconference.com/) | Sept 30 - Oct 1, 2026 | San Francisco, CA | Vendor-neutral event by the creators of MLconf. Features AI research, engineering, and applied ML. |
152 | | [ODSC AI West 2026](https://odsc.ai/west/) | Oct 27-29, 2026 | Burlingame, CA | Focuses on AI and data science with workshops, hands-on training, and strategic insights. |
153 |
154 | ### Europe
155 |
156 | | Conference | Date | Location | Details |
157 | |------------|------|----------|----------|
158 | | [World AI Cannes Festival 2026](https://www.worldaicannes.com/) | Feb 12-13, 2026 | Cannes, France | Focuses on AI, ML, and data science. Features AI technologies and global innovators. |
159 | | [AI Engineer Europe 2026](https://www.ai.engineer/europe) | April 8-10, 2026 | London, UK | First official AI Engineer Europe event. Large multitrack technical AI conference for 1000+ AI engineers. |
160 | | [Data Innovation Summit 2026](https://datainnovationsummit.com/) | May 6-8, 2026 | Stockholm, Sweden | Covers data governance, literacy, machine learning, with speakers from major companies. |
161 | | [DATA 2026](https://data.scitevents.org/) | July 16-18, 2026 | Porto, Portugal | International conference on data science, technology, and applications. |
162 | | [ECML PKDD 2026](https://ecmlpkdd.org/2026/) | Sept 7-11, 2026 | Naples, Italy | Premier European conference on machine learning and knowledge discovery in databases. |
163 |
164 | # Contributing
165 |
166 | If you want to add to the repository or find any issues, please feel free to raise a PR and ensure correct placement within the relevant section or category.
167 |
168 | # About
169 | This repo exists because data science is entering a new phase.
170 |
171 | AI tools are no longer “nice to have” side experiments. They are becoming part of how we actually do the work, from analysis and exploration to production workflows.
172 |
173 | As demand grows for data scientists who understand how to integrate AI into their existing workflows, the signal-to-noise ratio is getting worse. There are endless tools, ideas, and opinions, many of them generic or borrowed from other fields.
174 |
175 | The goal of this repo is to cut through that noise. It’s a curated set of resources that are either built specifically for data scientists or closely aligned with how we already work. Not an exhaustive list, and not a guide, just a focused snapshot of what’s worth paying attention to right now.
176 |
177 | # FAQs
178 | 1. **How is curation done?** Curation is based on thorough research, recommendations from people I trust, my 7+ years of experience as a Data Scientist and extensive work integrating AI into data science workflows.
179 | 2. **Are all resources free?** Most resources here will be free, but I will also include paid alternatives if they are truly valuable to your career development.
180 | 3. **How often is the repository updated?** I plan to come back here as often as possible to ensure all resources are still available and relevant and also to add new ones.
181 |
182 | If you have questions or feedback send me a message through [here](https://www.linkedin.com/in/andresvourakis/). Enjoy!
183 |
--------------------------------------------------------------------------------