├── .gitignore
├── README.md
├── 2025_episodes
├── Q4 2025
│ └── December 2025
│ │ └── ThursdAI_Special_Googles_New_Anti-Gravity_IDE_Gemini_3_Nano_Banana_Pro_Explained_ft_Kevin_Hou_Ammaar.md
├── Q2 2025
│ └── June 2025
│ │ ├── _ThursdAI_-_June_19_-_MiniMax_M1_beats_R1_OpenAI_records_your_meetings_Gemini_in_GA_WB_uses_Coreweav.md
│ │ ├── _ThursdAI_-_Jun_26_-_Gemini_CLI_Flux_Kontext_Dev_Search_Live_Anthropic_destroys_books_Zucks_superint.md
│ │ └── _ThursdAI_-_Jun_5_2025_-_Live_from_AI_Engineer_with_Swyx_new_Gemini_25_with_Logan_K_and_Jack_Rae_Sel.md
├── Q1 2025
│ ├── January 2025
│ │ ├── _ThursdAI_-_Jan_2_-_is_25_the_year_of_AI_agents.md
│ │ └── _ThursdAI_-_Jan_16_2025_-_Hailuo_4M_context_LLM_SOTA_TTS_in_browser_OpenHands_interview_more_AI_news.md
│ └── March 2025
│ │ └── ThursdAI_-_Mar_6_2025_-_Alibabas_R1_Killer_QwQ_Exclusive_Google_AI_Mode_Chat_and_MCP_fever_sweeping_.md
└── Q3 2025
│ ├── August 2025
│ ├── _ThursdAI_-_GPT5_is_here.md
│ └── _ThursdAI_Jul_31_2025_Qwens_Small_Models_Go_Big_StepFuns_Multimodal_Leap_GLM-45s_Chart_Crimes_and_Ru.md
│ └── July 2025
│ └── _ThursdAI_-_July_24_2025_-_Qwen-mas_in_July_The_White_Houses_AI_Action_Plan_Math_Olympiad_Gold_for_A.md
├── .agent
└── workflows
│ └── create-quarterly-recap.md
├── example prompts
├── Open revol infographic prompt.md
└── ThursdAI Dec 11 2025 Infographic prompt.md
├── ThursdAI_News_Infographic_System_Prompt.md
├── parse_rss.py
├── 2025_AI_Year_in_Review.md
├── Q1_2025_AI_Recap.md
├── agents.md
├── Q3_2025_AI_Recap.md
└── Q2_2025_AI_Recap.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 🎙️ ThursdAI 2025 Year in Review
2 |
3 | A comprehensive recap of the most significant AI developments from 2025, curated from weekly [ThursdAI](https://thursdai.news) podcast episodes hosted by [Alex Volkov](https://x.com/altryne).
4 |
5 | ## 📖 Full Year Review
6 |
7 | **[2025 AI Year in Review](./2025_AI_Year_in_Review.md)** — The complete summary of AI's most transformative year yet.
8 |
9 | ---
10 |
11 | ## 📅 Quarterly Recaps
12 |
13 | ### Q1 2025 — The Quarter That Changed Everything
14 | DeepSeek R1, Gemini 2.5, Qwen 2.5 Max, Gemma 3, MCP protocol fever
15 |
16 | [](./Q1_2025_AI_Recap.md)
17 |
18 | **[📖 Read Full Q1 Recap →](./Q1_2025_AI_Recap.md)**
19 |
20 | ---
21 |
22 | ### Q2 2025 — The Quarter That Shattered Reality
23 | Claude 4 (Opus & Sonnet), GPT-4.1, o3/o4-mini, Llama 4, Veo 3, Google I/O
24 |
25 | [](./Q2_2025_AI_Recap.md)
26 |
27 | **[📖 Read Full Q2 Recap →](./Q2_2025_AI_Recap.md)**
28 |
29 | ---
30 |
31 | ### Q3 2025 — GPT-5, Trillion-Scale Open Source, World Models
32 | GPT-5 launch, Grok 4, Kimi K2, IMO Gold for AI, agentic coding explosion
33 |
34 | [](./Q3_2025_AI_Recap.md)
35 |
36 | **[📖 Read Full Q3 Recap →](./Q3_2025_AI_Recap.md)**
37 |
38 | ---
39 |
40 | ### Q4 2025 — Agents, Gemini's Crown & Sora Social
41 | Gemini 3, Claude 4.5, DeepSeek V3.2, Sora 2, AI browser wars begin
42 |
43 | [.jpg)](./Q4_2025_AI_Recap.md)
44 |
45 | **[📖 Read Full Q4 Recap →](./Q4_2025_AI_Recap.md)**
46 |
47 | ---
48 |
49 | ## 🗂️ Episode Archive
50 |
51 | All individual episode notes are organized in the [`2025_episodes/`](./2025_episodes/) directory, structured by quarter and month.
52 |
53 | ---
54 |
55 | ## 🔗 Links
56 |
57 | - 🎧 **Podcast**: [thursdai.news](https://thursdai.news)
58 | - 🐦 **Follow Alex**: [@altryne](https://x.com/altryne)
59 |
60 | ---
61 |
62 | ## 📝 About
63 |
64 | ThursdAI is a weekly AI news podcast that has been tracking the rapid pace of AI development since 2023. This repository contains structured recaps from all 2025 episodes, making it easy to look back at how quickly the field evolved.
65 |
66 | *Last updated: December 2025*
67 |
--------------------------------------------------------------------------------
/2025_episodes/Q4 2025/December 2025/ThursdAI_Special_Googles_New_Anti-Gravity_IDE_Gemini_3_Nano_Banana_Pro_Explained_ft_Kevin_Hou_Ammaar.md:
--------------------------------------------------------------------------------
1 | # ThursdAI Special: Google's New Anti-Gravity IDE, Gemini 3 & Nano Banana Pro Explained (ft. Kevin Hou, Ammaar Reshi & Kat Kampf)
2 |
3 | **Date:** December 02, 2025
4 | **Duration:** 46:04
5 | **Link:** [https://sub.thursdai.news/p/thursdai-special-googles-new-anti](https://sub.thursdai.news/p/thursdai-special-googles-new-anti)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey, Alex here,
12 |
13 | I recorded these conversations just in front of the AI Engineer auditorium, back to back, after these great folks gave their talks, and at the epitome of the most epic AI week we’ve seen since I started recording ThursdAI.
14 |
15 | This is less our traditional live recording, and more a real podcast-y conversation with great folks, inspired by [Latent.Space](https://substack.com/profile/89230629-latentspace). I hope you enjoy this format as much as I’ve enjoyed recording and editing it.
16 |
17 | AntiGravity with Kevin
18 |
19 | Kevin Hou and team just launched Antigravity, Google’s brand new Agentic IDE based on VSCode, and Kevin (second timer on ThursdAI) was awesome enough to hop on and talk about some of the product decisions they made, what makes Antigravity special and highlighted Artifacts as a completely new primitive.
20 |
21 | Gemini 3 in AI Studio
22 |
23 | If you aren’t using Google’s AI Studio ([ai.dev](http://ai.dev)) then you’re missing out! We talk about AI Studio all the time on the show, and I’m a daily user! I generate most of my images with Nano Banana Pro in there, most of my Gemini conversations are happening there as well!
24 |
25 | Ammaar and Kat were so fun to talk to, as they covered the newly shipped “build mode” which allows you to vibe code full apps and experiences inside AI Studio, and we also covered Gemini 3’s features, multimodality understanding, UI capabilities.
26 |
27 | These folks gave a LOT of Gemini 3 demo’s so they know everything there is to know about this model’s capabilities!
28 |
29 | Tried new things with this one, multi camera angels, conversation with great folks, if you found this content valuable, please subscribe :)
30 |
31 | **Topics Covered:**
32 |
33 | * Inside Google’s new “AntiGravity” IDE
34 |
35 | * How the “Agent Manager” changes coding workflows
36 |
37 | * Gemini 3’s new multimodal capabilities
38 |
39 | * The power of “Artifacts” and dynamic memory
40 |
41 | * Deep dive into AI Studio updates & Vibe Coding
42 |
43 | * Generating 4K assets with Nano Banana Pro
44 |
45 | Timestamps for your viewing convenience.
46 |
47 | 00:00 - Introduction and Overview
48 |
49 | 01:13 - Conversation with Kevin Hou: Anti-Gravity IDE
50 |
51 | 01:58 - Gemini 3 and Nano Banana Pro Launch Insights
52 |
53 | 03:06 - Innovations in Anti-Gravity IDE
54 |
55 | 06:56 - Artifacts and Dynamic Memory
56 |
57 | 09:48 - Agent Manager and Multimodal Capabilities
58 |
59 | 11:32 - Chrome Integration and Future Prospects
60 |
61 | 20:11 - Conversation with Ammar and Kat: AI Studio Team
62 |
63 | 21:21 - Introduction to AI Studio
64 |
65 | 21:51 - What is AI Studio?
66 |
67 | 22:52 - Ease of Use and User Feedback
68 |
69 | 24:06 - Live Demos and Launch Week
70 |
71 | 26:00 - Design Innovations in AI Studio
72 |
73 | 30:54 - Generative UIs and Vibe Coding
74 |
75 | 33:53 - Nano Banana Pro and Image Generation
76 |
77 | 39:45 - Voice Interaction and Future Roadmap
78 |
79 | 44:41 - Conclusion and Final Thoughts
80 |
81 | Looking forward to seeing you on Thursday 🫡
82 |
83 | P.S - I’ve recorded one more conversation during AI Engineer, and will be posting that soon, same format, very interesting person, look out for that soon!
84 |
85 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-special-googles-new-anti/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-special-googles-new-anti?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE4MDQ2NTY3MSwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.h3ViA8Pw-4_8oniEdqLjP9b_W9t8ymono4EoyxRrYj4&utm_campaign=CTA_5).
86 |
--------------------------------------------------------------------------------
/.agent/workflows/create-quarterly-recap.md:
--------------------------------------------------------------------------------
1 | ---
2 | description: Create ThursdAI quarterly AI recap from combined episode files
3 | ---
4 |
5 | # ThursdAI Quarterly AI Recap Workflow
6 |
7 | ## Overview
8 | Generate a month-by-month breakdown of significant AI news from ThursdAI newsletters for a specific quarter.
9 |
10 | ## Input Required
11 | When starting this workflow, specify:
12 | - **Quarter**: Q1, Q2, Q3, or Q4
13 | - **Year**: 2025, 2026, etc.
14 |
15 | ## Source Files Location
16 | - Combined episode files are located at: `/Users/altryne/projects/thursdAI_yearly_recap/2025_episodes/Q[X] 2025/Q[X]_2025_combined.md`
17 | - Each combined file contains all episodes for that quarter with headers like `## 📆 ThursdAI - [Date] - [Title]`
18 |
19 | ## Output Format
20 | Create a markdown file at: `/Users/altryne/projects/thursdAI_yearly_recap/Q[X]_2025_AI_Recap.md`
21 |
22 | ### Structure Template
23 | ```markdown
24 | # Q[X] 2025 AI Recap - ThursdAI
25 |
26 | ## Quarter Overview
27 | [2-3 sentence summary of the quarter's major themes]
28 |
29 | ---
30 |
31 | ## [Month] 2025
32 |
33 | ### Top Stories
34 | - **[Major Release 1]**: [1-2 sentence description]
35 | - **[Major Release 2]**: [1-2 sentence description]
36 |
37 | ### Open Source LLMs
38 | - **[Model Name]**: [Brief description with key specs]
39 |
40 | ### Big CO LLMs + APIs
41 | - **[Product/Model]**: [Brief description]
42 |
43 | ### Vision & Video
44 | - **[Model/Product]**: [Brief description]
45 |
46 | ### Voice & Audio
47 | - **[Model/Product]**: [Brief description]
48 |
49 | ### AI Art & Diffusion & 3D
50 | - **[Model/Product]**: [Brief description]
51 |
52 | ### Tools
53 | - **[Tool Name]**: [Brief description]
54 |
55 | ---
56 |
57 | [Repeat for each month in the quarter]
58 |
59 | ---
60 |
61 | ## Quarter Summary
62 |
63 | ### Major Themes
64 | 1. [Theme 1]
65 | 2. [Theme 2]
66 | 3. [Theme 3]
67 |
68 | ### Biggest Releases by Month
69 | - **[Month 1]**: [Top release]
70 | - **[Month 2]**: [Top release]
71 | - **[Month 3]**: [Top release]
72 | ```
73 |
74 | ## Prioritization Criteria
75 | 1. **Title Mentions**: Releases mentioned in episode titles are highest priority
76 | 2. **Discussion Depth**: Items with extensive coverage in newsletter body
77 | 3. **Community Impact**: Mentions of viral moments, benchmarks broken, or widespread adoption
78 | 4. **Categories to track**:
79 | - Open Source LLMs (models, weights, training methods)
80 | - Big CO LLMs + APIs (OpenAI, Google, Anthropic, xAI, etc.)
81 | - Vision & Video (video generation, VLMs)
82 | - Voice & Audio (TTS, STT, music)
83 | - AI Art & Diffusion & 3D (image gen, 3D models)
84 | - Tools (agents, protocols like MCP/A2A, coding assistants)
85 |
86 | ## Steps
87 |
88 | 1. **Read the combined file** for the target quarter
89 | - File may be 2000+ lines, read in chunks of 800 lines
90 | - Note episode dates to categorize by month
91 |
92 | 2. **Extract releases by month**
93 | - Group episodes by their publication month
94 | - For each episode, identify:
95 | - Items in episode title (highest priority)
96 | - Items in TL;DR section
97 | - Items discussed extensively in body
98 |
99 | 3. **Categorize and summarize**
100 | - Place each release in appropriate category
101 | - Write concise 1-2 sentence summaries
102 | - Note key specs (parameter counts, benchmarks, licenses)
103 |
104 | 4. **Identify top stories per month**
105 | - Select 2-4 most impactful releases
106 | - These go in "Top Stories" section
107 |
108 | 5. **Write quarter summary**
109 | - Identify 3-5 overarching themes
110 | - List biggest release per month
111 |
112 | 6. **Reference existing recaps** for format consistency
113 | - See `/Users/altryne/projects/thursdAI_yearly_recap/Q1_2025_AI_Recap.md` as template
114 |
115 | ## Quarter-Month Mapping
116 | - **Q1**: January, February, March
117 | - **Q2**: April, May, June
118 | - **Q3**: July, August, September
119 | - **Q4**: October, November, December
120 |
121 | ## Example Prompt to Start
122 | ```
123 | Create the Q2 2025 AI recap using the /create-quarterly-recap workflow.
124 | Read /Users/altryne/projects/thursdAI_yearly_recap/2025_episodes/Q2 2025/Q2_2025_combined.md
125 | and generate the recap following the established format from Q1.
126 | ```
127 |
--------------------------------------------------------------------------------
/2025_episodes/Q2 2025/June 2025/_ThursdAI_-_June_19_-_MiniMax_M1_beats_R1_OpenAI_records_your_meetings_Gemini_in_GA_WB_uses_Coreweav.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news
2 |
3 | **Date:** June 20, 2025
4 | **Duration:** 1:41:31
5 | **Link:** [https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey all, Alex here 👋
12 |
13 | This week, while not the busiest week in releases (we can't get a SOTA LLM every week now can we), was full of interesting open source releases, and feature updates such as the chatGPT meetings recorder (which we live tested on the show, the limit is 2 hours!)
14 |
15 | It was also a day after our annual W&B conference called FullyConnected, and so I had a few goodies to share with you, like answering the main question, when will W&B have some use of those GPUs from CoreWeave, the answer is... now! (We launched a brand new preview of an inference service with open source models)
16 |
17 | And finally, we had a great chat with Pankaj Gupta, co-founder and CEO of Yupp, a new service that lets users chat with the top AIs for free, while turning their votes into leaderboards for everyone else to understand which Gen AI model is best for which task/topic. It was a great conversation, and he even shared an invite code with all of us (I'll attach to the TL;DR and show notes, let's dive in!)
18 |
19 | 00:00 Introduction and Welcome
20 |
21 | 01:04 Show Overview and Audience Interaction
22 |
23 | 01:49 Special Guest Announcement and Experiment
24 |
25 | 03:05 Wolfram's Background and Upcoming Hosting
26 |
27 | 04:42 TLDR: This Week's Highlights
28 |
29 | 15:38 Open Source AI Releases
30 |
31 | 32:34 Big Companies and APIs
32 |
33 | 32:45 Google's Gemini Updates
34 |
35 | 42:25 OpenAI's Latest Features
36 |
37 | 54:30 Exciting Updates from Weights & Biases
38 |
39 | 56:42 Introduction to Weights & Biases Inference Service
40 |
41 | 57:41 Exploring the New Inference Playground
42 |
43 | 58:44 User Questions and Model Recommendations
44 |
45 | 59:44 Deep Dive into Model Evaluations
46 |
47 | 01:05:55 Announcing Online Evaluations via Weave
48 |
49 | 01:09:05 Introducing Pankaj Gupta from [YUP.AI](http://YUP.AI)
50 |
51 | 01:10:23 [YUP.AI](http://YUP.AI): A New Platform for Model Evaluations
52 |
53 | 01:13:05 Discussion on Crowdsourced Evaluations
54 |
55 | 01:27:11 New Developments in Video Models
56 |
57 | 01:36:23 OpenAI's New Transcription Service
58 |
59 | 01:39:48 Show Wrap-Up and Future Plans
60 |
61 | Here's the TL;DR and show notes links
62 |
63 | ThursdAI - June 19th, 2025 - TL;DR
64 |
65 | * **Hosts and Guests**
66 |
67 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/@altryne))
68 |
69 | * Co Hosts - [@WolframRvnwlf](http://x.com/@WolframRvnwlf) [@yampeleg](x.com/@yampeleg) [@nisten](http://x.com/@nisten) [@ldjconfirmed](http://x.com/@ldjconfirmed))
70 |
71 | * Guest - [@pankaj](http://x.com/@pankaj) - co-founder of [Yupp.ai](https://yupp.ai/join/thursdAI)
72 |
73 | * **Open Source LLMs**
74 |
75 | * Moonshot AI open-sourced Kimi-Dev-72B ([Github](https://github.com/MoonshotAI/Kimi-Dev?tab=readme-ov-file), [HF](https://huggingface.co/moonshotai/Kimi-Dev-72B))
76 |
77 | * MiniMax-M1 456B (45B Active) - reasoning model ([Paper](https://arxiv.org/abs/2506.13585), [HF](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k), [Try It](https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1), [Github](https://github.com/MiniMax-AI/MiniMax-M1))
78 |
79 | * **Big CO LLMs + APIs**
80 |
81 | * Google drops Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview ( [Blog](https://blog.google/products/gemini/gemini-2-5-model-family-expands/), [Tech report](https://storage.googleapis.com/gemini-technical-report), [Tweet](https://x.com/google/status/192905415))
82 |
83 | * Google launches Search Live: Talk, listen and explore in real time with AI Mode ([Blog](https://blog.google/products/search/search-live-ai-mode/))
84 |
85 | * OpenAI adds MCP support to Deep Research in chatGPT ([X](https://x.com/altryne/status/1934644274227769431), [Docs](https://platform.openai.com/docs/mcp))
86 |
87 | * OpenAI launches their meetings recorder in mac App ([docs](https://help.openai.com/en/articles/11487532-chatgpt-record))
88 |
89 | * Zuck update: Considering bringing Nat Friedman and Daniel Gross to Meta ([information](https://x.com/amir/status/1935461177045516568))
90 |
91 | * **This weeks Buzz**
92 |
93 | * NEW! W&B Inference provides a unified interface to access and run top open-source AI models ([inference](https://wandb.ai/inference), [docs](https://weave-docs.wandb.ai/guides/integrations/inference/))
94 |
95 | * NEW! W&B Weave Online Evaluations delivers real-time production insights and continuous evaluation for AI agents across any cloud. ([X](https://x.com/altryne/status/1935412384283107572))
96 |
97 | * The new platform offers "metal-to-token" observability, linking hardware performance directly to application-level metrics.
98 |
99 | * Vision & Video
100 |
101 | * ByteDance new video model beats VEO3 - Seedance.1.0 mini ([Site](https://dreamina.capcut.com/ai-tool/video/generate), [FAL](https://fal.ai/models/fal-ai/bytedance/seedance/v1/lite/image-to-video))
102 |
103 | * MiniMax Hailuo 02 - 1080p native, SOTA instruction following ([X](https://www.minimax.io/news/minimax-hailuo-02), [FAL](https://fal.ai/models/fal-ai/minimax/hailuo-02/pro/image-to-video))
104 |
105 | * Midjourney video is also here - great visuals ([X](https://x.com/angrypenguinPNG/status/1932931137179176960))
106 |
107 | * **Voice & Audio**
108 |
109 | * Kyutai launches open-source, high-throughput streaming Speech-To-Text models for real-time applications ([X](https://x.com/kyutai_labs/thread/1935652243119788111), [website](https://join.yupp.ai/thursdai))
110 |
111 | * Studies and Others
112 |
113 | * LLMs Flunk Real-World Coding Contests, Exposing a Major Skill Gap ([Arxiv](https://arxiv.org/pdf/2506.11928))
114 |
115 | * MIT Study: ChatGPT Use Causes Sharp Cognitive Decline ([Arxiv](https://arxiv.org/abs/2506.08872))
116 |
117 | * Andrej Karpathy's "Software 3.0": The Dawn of English as a Programming Language ([youtube](https://www.youtube.com/watch?v=LCEmiRjPEtQ), [deck](https://drive.google.com/file/d/1HIEMdVlzCxke22ISVzornd2-UpWHngRZ/view?usp=sharing))
118 |
119 | * **Tools**
120 |
121 | * Yupp launches with 500+ AI models, a new leaderboard, and a user-powered feedback economy - use [thursdai link](https://yupp.ai/join/thursdAI)* to get 50% extra credits
122 |
123 | * BrowserBase announces [director.ai](http://director.ai) - an agent to run things on the web
124 |
125 | * Universal system prompt for reduction of hallucination (from [Reddit](https://www.reddit.com/r/PromptEngineering/comments/1kup28y/chatgpt_and_gemini_ai_will_gaslight_you_everyone/))
126 |
127 | *Disclosure: while this isn't a paid promotion, I do think that yupp has a great value, I do get a bit more credits on their platform if you click my link and so do you. You can go to [yupp.ai](http://yupp.ai) and register with no affiliation if you wish.
128 |
129 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NjM1OTY2MCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.XSlsS0LVkZoKjnK1vqluK6duzE3fa7L1zHsEvPRQUW8&utm_campaign=CTA_5).
130 |
--------------------------------------------------------------------------------
/example prompts/Open revol infographic prompt.md:
--------------------------------------------------------------------------------
1 | Infographic Prompt: ThursdAI – Dec 4, 2025 · “Code Red vs. Open Revolt”
2 |
3 | Design a high-end VERTICAL promo infographic poster (9:16 aspect ratio) for a tech podcast episode.
4 |
5 | EPISODE TITLE & TOP SECTION
6 | - Main title at the very top in bold modern sans-serif:
7 | “ThursdAI – Code Red vs. Open Revolt”
8 | - Subtitle under it:
9 | “December 4, 2025 · Weekly AI Roundup”
10 | - Tiny line:
11 | “Hosted by Alex Volkov · @altryne”
12 | - Include a STYLIZED portrait/avatar of Alex Volkov near the title, using my reference image. Make him look like a sharp news anchor in an AI cold war: slight 3/4 view, geometric shading, no photorealism. Add a subtle mic icon or waveform integrated into the frame.
13 |
14 | OVERALL VIBE & STYLE
15 | - Overall vibe: strategic, dramatic, like a movie poster for an AI information war meets Bloomberg terminal.
16 | - Style: FLAT VECTOR illustration with bold graphic shapes, angled cuts, and strong contrast.
17 | - IMPORTANT: Do NOT reuse last week’s soft neon-core / skyline / circuit-board-glow look. NO circular “reactor core”, NO cute or rounded pills.
18 | - Use angular elements: diagonals, slashes, wedges, hard-edged panels, occasional halftone or scanline textures for retro-tech flavor.
19 |
20 | COLOR PALETTE & SPLIT
21 | - Base: deep charcoal, obsidian black, and midnight blue.
22 | - Split palette:
23 | - LEFT side = “Code Red” / closed labs / incumbents: hot oranges, reds, magentas, and gold accents.
24 | - RIGHT side = “Open Revolt” / open-source uprising: electric teals, cyans, neon greens.
25 | - Use a central jagged RIFT or LIGHTNING BOLT shape to separate the two sides visually, with thin data lines crossing the divide.
26 |
27 | MAIN VISUAL CONCEPT
28 | - Behind the title, instead of a skyline, show an abstract top-down “strategic map” of an AI battlefield:
29 | - On the left, sharp geometric blocks with warning triangles, radar sweeps, and alert icons (closed labs).
30 | - On the right, fractal-like grids, open nodes, arrows fanning outward (open-source).
31 | - From the center of the poster, a vertical or diagonal schism runs downwards, as if the surface has cracked. This rift separates the LEFT “Code Red” stories from the RIGHT “Open Revolt” stories.
32 |
33 | LAYOUT & HIERARCHY
34 | - Think of the poster as a split front page:
35 | - LEFT COLUMN: Closed-lab / BigCo stories (warm palette).
36 | - RIGHT COLUMN: Open-source & decentralized stories (cool palette).
37 | - Each side features 3–4 main panels (hard-edged rectangles or trapezoids) with:
38 | - A bold short title.
39 | - A single concise subtitle line.
40 | - A minimal abstract icon.
41 | - Under the main split, add a thin “ticker bar” for secondary topics: video, image models, and tools.
42 | - At the very bottom, a strong footer banner with show branding.
43 |
44 | LEFT SIDE – “CODE RED / CLOSED LABS”
45 | Use warm oranges/reds/golds for panel backgrounds or borders.
46 |
47 | 1) Panel: “OpenAI · Code Red & Garlic”
48 | - Subtitle: “Emergency focus on ChatGPT · new Garlic model to counter Gemini”
49 | - Icon: A red alert klaxon / siren with a stylized garlic bulb silhouette inside, emitting triangular warning beams.
50 |
51 | 2) Panel: “Amazon Nova 2 Family”
52 | - Subtitle: “Agentic Lite & Pro · 1M-token Omni · hybrid thinking budgets”
53 | - Icon: A dense cloud outline containing four distinct nodes (Lite/Pro/Sonic/Omni) connected by workflow arrows.
54 |
55 | 3) Panel: “Runway Gen-4.5”
56 | - Subtitle: “#1 text-to-video · 1,247 Elo · physics-level motion”
57 | - Icon: A diagonal film strip morphing into flowing motion waves, with a tiny #1 badge/crown.
58 |
59 | 4) Panel: “Kling VIDEO 2.6”
60 | - Subtitle: “1080p video with native audio · synced dialogue & SFX”
61 | - Icon: A rectangular video frame with sound waves and a speaking profile silhouette; tiny music notes and waveform lines integrated.
62 |
63 | RIGHT SIDE – “OPEN REVOLT / OPEN SOURCE”
64 | Use teals, cyans, and neon green.
65 |
66 | Make this side feel slightly more expansive and energetic—this week’s big narrative.
67 |
68 | 1) HERO PANEL (slightly larger):
69 | “DeepSeek V3.2 & Speciale”
70 | - Subtitle: “685B-param MoE · rivals GPT-5 · gold-medal IMO / IOI / ICPC”
71 | - Icon: A deep multi-layer prism/brain crystal with orbiting math symbols (π, integral sign, etc.) and tiny medal/trophy shapes.
72 |
73 | 2) Panel: “Mistral 3 & Ministral 3”
74 | - Subtitle: “Apache 2.0 models from 3B edge to 675B MoE frontier”
75 | - Icon: A stylized wind gust sweeping across four stacked chips labeled as abstract size badges (XS/S/M/L dots).
76 |
77 | 3) Panel: “Arcee Trinity (Mini & Nano)”
78 | - Subtitle: “US-trained open-weight MoE · 10T tokens · iPhone to H200”
79 | - Icon: A triangular trinity symbol of three connected nodes, with subtle US-flag stripes and speed lines indicating high tokens/second.
80 |
81 | 4) Panel: “Hermes 4.3 on Psyche”
82 | - Subtitle: “Decentralized training network · 36B model · 512K context”
83 | - Icon: A globe made of nodes with orbiting satellites; thin lines show distributed training paths, NOT a single big server.
84 |
85 | CENTER / NEUTRAL ELEMENT – “THIS WEEK’S BUZZ”
86 | - In the mid-lower center, slightly overlapping both sides of the rift, add a neutral, grayish panel (not warm or cool) titled:
87 | - Title: “This Week’s Buzz · W&B LLM Eval Jobs”
88 | - Subtitle: “Mid-training checkpoint evals · 100+ benchmarks via Inspect”
89 | - Icon idea: a small model chip with bar charts and checkmarks rising from it, like a tiny leaderboard, indicating evaluation and monitoring.
90 | - This panel visually “bridges” closed and open worlds, since eval tooling applies to both.
91 |
92 | BOTTOM TICKER – VIDEO / IMAGE / DIFFUSION STRIP
93 | - Add a horizontal strip across the lower third, styled like a financial news ticker with tiny icons and short labels.
94 | - Use alternating warm/cool swatches to show mix of players.
95 |
96 | Include three mini-blocks:
97 |
98 | 1) “Seedream 4.5”
99 | - Tiny subtext: “Production-grade images · multi-reference fusion & sharp text”
100 | - Icon: A camera shutter overlaid with multiple ghosted image tiles.
101 |
102 | 2) “Pruna P-Image & Edit”
103 | - Tiny subtext: “Sub-second gen & edits · $0.005 per image”
104 | - Icon: A lightning bolt hitting a picture frame, with a tiny slider/magic wand icon.
105 |
106 | 3) “Kling IMAGE O1”
107 | - Tiny subtext: “Understand anything · precise edits · bold stylization”
108 | - Icon: A stylized eye merged into a paintbrush over a cube.
109 |
110 | INTERVIEW SPOTLIGHT – LUCAS ATKINS
111 | - On the open-source (right) side, but closer to the bottom corner, add a compact “spotlight” badge:
112 |
113 | Title: “Guest: Lucas Atkins”
114 | Subtitle: “CTO Arcee AI · Trinity deep-dive”
115 | Icon: A simplified person silhouette at a mic, backed by a small trinity triangle.
116 |
117 | BOTTOM FOOTER
118 | - Full-width footer bar with a slightly lighter gradient over the dark base, spanning both sides (unifying them).
119 | - Left: podcast mic icon with tiny neural nodes.
120 | Text: “ThursdAI · Weekly AI Roundup”
121 | - Center: “Episode: Code Red vs. Open Revolt”
122 | - Right: “AI Engineer Podcast · Live from New York”
123 |
124 | TYPOGRAPHY & UI DETAILS
125 | - Use a strong, legible sans-serif type across the poster.
126 | - Titles in Title Case, subtitles in sentence case.
127 | - Panels: hard corners, subtle inner strokes or thin outer glows ONLY if needed for separation—no bubbly pills.
128 | - Maintain high contrast: light text on dark panels; avoid long paragraphs.
129 | - NO real company logos. Use only abstract icons and shapes that suggest each brand/topic.
130 | - Use diagonal separators, angular dividers, and occasional halftone/scanline textures in the background to emphasize tension and motion.
131 |
132 | The final poster should feel like a **split AI war-room front page**: closed labs sounding alarms on one flank, open models staging a full-scale uprising on the other, with ThursdAI and W&B sitting in the middle making sense of the chaos.
--------------------------------------------------------------------------------
/2025_episodes/Q1 2025/January 2025/_ThursdAI_-_Jan_2_-_is_25_the_year_of_AI_agents.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI - Jan 2 - is 25' the year of AI agents?
2 |
3 | **Date:** January 02, 2025
4 | **Duration:** 1:31:29
5 | **Link:** [https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey folks, Alex here 👋 Happy new year!
12 |
13 | On our first episode of this year, and the second quarter of this century, there wasn't a lot of AI news to report on (most AI labs were on a well deserved break). So this week, I'm very happy to present a special ThursdAI episode, an interview with [Joāo Moura](https://x.com/joaomdmoura), CEO of [Crew.ai](http://Crew.ai) all about AI agents!
14 |
15 | We first chatted with Joāo a [year ago](https://sub.thursdai.news/p/jan14-sunday-special-deep-dives), back in January of 2024, as CrewAI was blowing up but still just an open source project, it got to be the number 1 trending project on Github, and #1 project on Product Hunt. (You can either listen to the podcast or watch it in the embedded Youtube above)
16 |
17 | 00:36 Introduction and New Year Greetings
18 |
19 | 02:23 Updates on Open Source and LLMs
20 |
21 | 03:25 Deep Dive: AI Agents and Reasoning
22 |
23 | 03:55 Quick TLDR and Recent Developments
24 |
25 | 04:04 Medical LLMs and Modern BERT
26 |
27 | 09:55 Enterprise AI and Crew AI Introduction
28 |
29 | 10:17 Interview with João Moura: Crew AI
30 |
31 | 25:43 Human-in-the-Loop and Agent Evaluation
32 |
33 | 33:17 Evaluating AI Agents and LLMs
34 |
35 | 44:48 Open Source Models and Fin to OpenAI
36 |
37 | 45:21 Performance of Claude's Sonnet 3.5
38 |
39 | 48:01 Different parts of an agent topology, brain, memory, tools, caching
40 |
41 | 53:48 Tool Use and Integrations
42 |
43 | 01:04:20 Removing LangChain from Crew
44 |
45 | 01:07:51 The Year of Agents and Reasoning
46 |
47 | 01:18:43 Addressing Concerns About AI
48 |
49 | 01:24:31 Future of AI and Agents
50 |
51 | 01:28:46 Conclusion and Farewell
52 |
53 | ---
54 |
55 | Is 2025 "the year of AI agents"?
56 |
57 | AI agents as I remember them as a concept started for me a few month after I started ThursdAI ,when AutoGPT exploded. Was such a novel idea at the time, run LLM requests in a loop,
58 |
59 | (In fact, back then, I came up with a retry with AI concept and called it [TrAI/Catch](https://x.com/altryne/status/1632253117827010566), where upon an error, I would feed that error back into the GPT api and ask it to correct itself. it feels so long ago!)
60 |
61 | AutoGPT became the fastest ever Github project to reach 100K stars, and while exciting, it did not work.
62 |
63 | Since then we saw multiple attempts at agentic frameworks, like babyAGI, autoGen. Crew AI was one of them that keeps being the favorite among many folks.
64 |
65 | So, what is an AI agent? Simon Willison, friend of the pod, has a mission, to ask everyone who announces a new agent, what they mean when [they say it](https://x.com/simonw/status/1863567881553977819) because it seems that everyone "shares" a common understanding of AI agents, but it's different for everyone.
66 |
67 | We'll start with Joāo's explanation and go from there. But let's assume the basic, it's a set of LLM calls, running in a self correcting loop, with access to planning, external tools (via function calling) and a memory or sorts that make decisions.
68 |
69 | Though, as we go into detail, you'll see that since the very basic "run LLM in the loop" days, the agents in 2025 have evolved and have a lot of complexity.
70 |
71 | My takeaways from the conversation
72 |
73 | I encourage you to listen / watch the whole interview, Joāo is deeply knowledgable about the field and we go into a lot of topics, but here are my main takeaways from our chat
74 |
75 | * Enterprises are adopting agents, starting with internal use-cases
76 |
77 | * Crews have 4 different kinds of memory, Long Term (across runs), short term (each run), Entity term (company names, entities), pre-existing knowledge (DNA?)
78 |
79 | * TIL about a "do all links respond with 200" guardrail
80 |
81 | * Some of the agent tools we mentioned
82 |
83 | * Stripe Agent API - for agent payments and access to payment data ([blog](https://stripe.dev/blog/adding-payments-to-your-agentic-workflows))
84 |
85 | * Okta Auth for Gen AI - agent authentication and role management ([blog](https://www.auth0.ai/))
86 |
87 | * E2B - code execution platform for agents ([e2b.dev](https://e2b.dev/))
88 |
89 | * BrowserBase - programmatic web-browser for your AI agent
90 |
91 | * Exa - search grounding for agents for real time understanding
92 |
93 | * Crew has 13 crews that run 24/7 to automate their company
94 |
95 | * Crews like Onboarding User Enrichment Crew, Meetings Prep, Taking Phone Calls, Generate Use Cases for Leads
96 |
97 | * GPT-4o mini is the most used model for 2024 for CrewAI with main factors being speed / cost
98 |
99 | * Speed of AI development makes it hard to standardize and solidify common integrations.
100 |
101 | * Reasoning models like o1 still haven't seen a lot of success, partly due to speed, partly due to different way of prompting required.
102 |
103 | This weeks Buzz
104 |
105 | We've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI (previously Open Devin). We've distilled a lot of what we learned about evaluating LLM applications while building [Weave](https://wandb.ai/site/weave?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2), our LLM Observability and Evaluation tooling, and are excited to share this with you all! [Get on the list](https://wandb.ai/site/courses/evals/?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2)
106 |
107 | Also, 2 workshops (also about Evals) from us are upcoming, one in SF on [Jan 11th](https://lu.ma/bzqvsqaa) and one in Seattle on [Jan 13th](https://seattle.aitinkerers.org/p/ai-in-production-evals-observability-workshop) (which I'm going to lead!) so if you're in those cities at those times, would love to see you!
108 |
109 | And that's it for this week, there wasn't a LOT of news as I said. The interesting thing is, even in the very short week, the news that we did get were all about agents and reasoning, so it looks like 2025 is agents and reasoning, agents and reasoning!
110 |
111 | See you all next week 🫡
112 |
113 | TL;DR with links:
114 |
115 | * **Open Source LLMs**
116 |
117 | * HuatuoGPT-o1 - medical LLM designed for medical reasoning ([HF](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B), [Paper](https://huggingface.co/papers/2412.18925), [Github](https://github.com/FreedomIntelligence/HuatuoGPT-o1), [Data](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem))
118 |
119 | * Nomic - modernbert-embed-base - first embed model on top of modernbert ([HF](https://huggingface.co/nomic-ai/modernbert-embed-base))
120 |
121 | * HuggingFace - SmolAgents lib to build agents ([Blog](https://huggingface.co/blog/smolagents))
122 |
123 | * SmallThinker-3B-Preview - a QWEN 2.5 3B "reasoning" finetune ([HF](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview))
124 |
125 | * Wolfram new Benchmarks including DeepSeek v3 ([X](https://x.com/WolframRvnwlf/status/1874889165919384057))
126 |
127 | * **Big CO LLMs + APIs**
128 |
129 | * Newcomer Rubik's AI Sonus-1 family - Mini, Air, Pro and Reasoning ([X](https://x.com/RubiksAI/status/1874682159379972325), Chat)
130 |
131 | * Microsoft "estimated" GPT-4o-mini is a ~8B ([X](https://x.com/Yuchenj_UW/status/1874507299303379428))
132 |
133 | * Meta plans to bring AI profiles to their social networks ([X](https://x.com/petapixel/status/1874792802061844829))
134 |
135 | * **This Week's Buzz**
136 |
137 | * W&B Free Evals Course with Page Bailey and Graham Beubig - [Free Sign Up](https://wandb.ai/site/courses/evals/?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2)
138 |
139 | * SF evals event - [January 11th](https://lu.ma/bzqvsqaa)
140 |
141 | * Seattle evals workshop - [January 13th](https://seattle.aitinkerers.org/p/ai-in-production-evals-observability-workshop)
142 |
143 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1NDAzMzY2MCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.-2FmmS8-Iq9rSNBzuyH2cjNrSPPkwegxbFjSP45EJLw&utm_campaign=CTA_5).
144 |
--------------------------------------------------------------------------------
/2025_episodes/Q2 2025/June 2025/_ThursdAI_-_Jun_26_-_Gemini_CLI_Flux_Kontext_Dev_Search_Live_Anthropic_destroys_books_Zucks_superint.md:
--------------------------------------------------------------------------------
1 | # 📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news
2 |
3 | **Date:** June 26, 2025
4 | **Duration:** 1:39:39
5 | **Link:** [https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey folks, Alex here, writing from... a undisclosed tropical paradise location 🏝️ I'm on vacation, but the AI news doesn't stop of course, and neither does ThursdAI. So huge shoutout to Wolfram Ravenwlf for running the show this week, Nisten, LDJ and Yam who joined.
12 |
13 | So... no long blogpost with analysis this week, but I'll def. recommend tuning in to the show that the folks ran, they had a few guests on, and even got some breaking news (new Flux Kontext that's open source)
14 |
15 | Of course many of you are readers and are here for the links, so I'm including the raw TL;DR + speaker notes as prepared by the folks for the show!
16 |
17 | P.S - our (rescheduled) hackathon is coming up in San Francisco, on July 12-13 called WeaveHacks, if you're interested at a chance to win a RoboDog, welcome to join us and give it a try. Register [HERE](https://lu.ma/weavehacks)
18 |
19 | Ok, that's it for this week, please enjoy the show and see you next week!
20 |
21 | ThursdAI - June 26th, 2025 - TL;DR
22 |
23 | * **Hosts and Guests**
24 |
25 | * **WolframRvnwlf** - Host ([@WolframRvnwlf](http://x.com/WolframRvnwlf))
26 |
27 | * Co-Hosts - [@yampeleg](http://x.com/yampeleg), [@nisten](http://x.com/nisten), [@ldjconfirmed](http://x.com/ldjconfirmed)
28 |
29 | * Guest - **Jason Kneen** ([@jasonkneen](http://x.com/jasonkneen)) - Discussing MCPs, coding tools, and agents
30 |
31 | * Guest - **Hrishioa** ([@hrishioa](http://x.com/hrishioa)) - Discussing agentic coding and spec-driven development
32 |
33 | * **Open Source LLMs**
34 |
35 | * Mistral Small 3.2 released with improved instruction following, reduced repetition & better function calling ([X](https://x.com/MistralAI/status/1936093325116781016))
36 |
37 | * Unsloth AI releases dynamic GGUFs with fixed chat templates ([X](https://x.com/UnslothAI/status/1936426567850487925))
38 |
39 | * Kimi-VL-A3B-Thinking-2506 multimodal model updated for better video reasoning and higher resolution ([Blog](https://huggingface.co/blog/moonshotai/kimi-vl-a3b-thinking-2506))
40 |
41 | * Chinese Academy of Science releases Stream-Omni, a new Any-to-Any model for unified multimodal input ([HF](https://huggingface.co/ICTNLP/stream-omni-8b), [Paper](https://huggingface.co/papers/2506.13642))
42 |
43 | * Prime Intellect launches SYNTHETIC-2, an open reasoning dataset and synthetic data generation platform ([X](https://x.com/PrimeIntellect/status/1937272174295023951))
44 |
45 | * **Big CO LLMs + APIs**
46 |
47 | * **Google**
48 |
49 | * Gemini CLI, a new open-source AI agent, brings Gemini 2.5 Pro to your terminal ([Blog](https://web.archive.org/web/20250625051706/https://blog.google/technology/developers/introducing-gemini-cli/), [GitHub](https://github.com/google-gemini/gemini-cli))
50 |
51 | * Google reduces free tier API limits for previous generation Gemini Flash models ([X](https://x.com/ai_for_success/status/1937493142279971210))
52 |
53 | * Search Live with voice conversation is now rolling out in AI Mode in the US ([Blog](https://blog.google/products/search/search-live-ai-mode/), [X](https://x.com/rajanpatel/status/1935484294182608954))
54 |
55 | * Gemini API is now faster for video and PDF processing with improved caching ([Docs](https://ai.google.dev/gemini-api/docs/caching))
56 |
57 | * **Anthropic**
58 |
59 | * Claude introduces an "artifacts" space for building, hosting, and sharing AI-powered apps ([X](https://x.com/AnthropicAI/status/1937921801000219041))
60 |
61 | * Federal judge rules Anthropic's use of books for training Claude qualifies as fair use ([X](https://x.com/ai_for_success/status/1937515997076029449))
62 |
63 | * **xAI**
64 |
65 | * Elon Musk announces the successful launch of Tesla's Robotaxi ([X](https://x.com/elonmusk/status/1936876178356490546))
66 |
67 | * **Microsoft**
68 |
69 | * Introduces Mu, a new language model powering the agent in Windows Settings ([Blog](https://blogs.windows.com/windowsexperience/2025/06/23/introducing-mu-language-model-and-how-it-enabled-the-agent-in-windows-settings/))
70 |
71 | * **Meta**
72 |
73 | * Report: Meta pursued acquiring Ilya Sutskever's SSI, now hires co-founders Nat Friedman and Daniel Gross ([X](https://x.com/kimmonismus/status/1935954015998624181))
74 |
75 | * **OpenAI**
76 |
77 | * OpenAI removes mentions of its acquisition of Jony Ive's startup 'io' amid a trademark dispute ([X](https://x.com/rowancheung/status/1937414172322439439))
78 |
79 | * OpenAI announces the release of DeepResearch in API + Webhook support ([X](https://x.com/stevendcoffey/status/1938286946075418784))
80 |
81 | * **This weeks Buzz**
82 |
83 | * Alex is on vacation; WolframRvnwlf is attending AI Tinkerers Munich on July 25 ([Event](https://munich.aitinkerers.org/p/ai-tinkerers-munich-july-25))
84 |
85 | * Join W&B Hackathon happening in 2 weeks in San Francisco - grand prize is a RoboDog! (Register [for Free](https://lu.ma/weavehacks))
86 |
87 | * **Vision & Video**
88 |
89 | * MeiGen-MultiTalk code and checkpoints for multi-person talking head generation are released ([GitHub](https://github.com/MeiGen-AI/MultiTalk), [HF](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk))
90 |
91 | * Google releases VideoPrism for generating adaptable video embeddings for various tasks ([HF](https://hf.co/google/videoprism), [Paper](https://arxiv.org/abs/2402.13217), [GitHub](https://github.com/google-deepmind/videoprism))
92 |
93 | * **Voice & Audio**
94 |
95 | * ElevenLabs launches [11.ai](11.ai), a voice-first personal assistant with MCP support ([Sign Up](http://11.ai/), [X](https://x.com/elevenlabsio/status/1937200086515097939))
96 |
97 | * Google Magenta releases Magenta RealTime, an open weights model for real-time music generation ([Colab](https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Demo.ipynb), [Blog](https://g.co/magenta/rt))
98 |
99 | * ElevenLabs launches a mobile app for iOS and Android for on-the-go voice generation ([X](https://x.com/elevenlabsio/status/1937541389140611367))
100 |
101 | * **AI Art & Diffusion & 3D**
102 |
103 | * Google rolls out Imagen 4 and Imagen 4 Ultra in the Gemini API and Google AI Studio ([Blog](https://developers.googleblog.com/en/imagen-4-now-available-in-the-gemini-api-and-google-ai-studio/))
104 |
105 | * OmniGen 2 open weights model for enhanced image generation and editing is released ([Project Page](https://vectorspacelab.github.io/OmniGen2/), [Demo](https://huggingface.co/spaces/OmniGen2/OmniGen2), [Paper](https://huggingface.co/papers/2506.18871))
106 |
107 | * **Tools**
108 |
109 | * OpenMemory Chrome Extension provides shared memory across ChatGPT, Claude, Gemini and more ([X](https://x.com/taranjeetio/status/1937537163270451494))
110 |
111 | * LM Studio adds MCP support to connect local LLMs with your favorite servers ([Blog](https://lmstudio.ai/blog/mcp))
112 |
113 | * Cursor is now available as a Slack integration ([Dashboard](http://cursor.com/dashboard))
114 |
115 | * All Hands AI releases the OpenHands CLI, a model-agnostic, open-source coding agent ([Blog](https://all-hands.dev/blog/the-openhands-cli-ai-powered-development-in-your-terminal), [Docs](https://docs.all-hands.dev/usage/how-to/cli-mode#cli))
116 |
117 | * Warp 2.0 launches as an Agentic Development Environment with multi-threading ([X](https://x.com/warpdotdev/status/1937525185843752969))
118 |
119 | * **Studies and Others**
120 |
121 | * The /r/LocalLLaMA subreddit is back online after a brief moderation issue ([Reddit](https://www.reddit.com/r/LocalLLaMA/comments/1ljlr5b/subreddit_back_in_business/), [News](https://x.com/localllamasub))
122 |
123 | * Andrej Karpathy's talk "Software 3.0" discusses the future of programming in the age of AI ([YouTube](https://www.youtube.com/watch?v=LCEmiRjPEtQ), [Summary](https://www.latent.space/p/s3))
124 |
125 | Thank you, see you next week!
126 |
127 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NjkyNTYyOCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.rNrzJzOHwv_6WuWE0zOf7g9C0xIjVsHBeuiHWjLmawY&utm_campaign=CTA_5).
128 |
--------------------------------------------------------------------------------
/ThursdAI_News_Infographic_System_Prompt.md:
--------------------------------------------------------------------------------
1 | # ThursdAI News Infographic Generator
2 |
3 | ## You Are
4 |
5 | A world-class infographic designer creating stunning visual content for **ThursdAI**, a weekly AI news show hosted by Alex Volkov (@altryne).
6 |
7 | Your job: Take news information and create a **Nano Banana Pro prompt** that will generate a beautiful, unique infographic.
8 |
9 | ---
10 |
11 | ## What You Receive
12 |
13 | ```
14 | TITLE: [Headline]
15 | EXECUTIVE SUMMARY: [Overview]
16 | 10 FACTOIDS: [Key metrics, numbers, availability, etc.]
17 | ENRICHED SUMMARY: [Additional context]
18 | TOP REACTIONS: [Quotes from X/Twitter]
19 | LINKS: [Where to find it]
20 | DATE: [When this news dropped — may be omitted]
21 | ```
22 |
23 | Plus: A **reference image of Alex Volkov** (the host) to include in the design.
24 |
25 | ---
26 |
27 | ## Timeliness Matters
28 |
29 | These infographics are for **current news** — typically less than a week old. The design should feel:
30 | - **Fresh and new** — not recycled visual concepts
31 | - **Timely** — if a date is provided, display it prominently
32 | - **Relevant to right now** — visual elements that feel "just announced" not "retrospective"
33 |
34 | ---
35 |
36 | ## What You Create
37 |
38 | A detailed Nano Banana Pro prompt in natural language (full sentences, like briefing a designer) that will generate a **16:9 infographic** including:
39 |
40 | ### ⚠️ CRITICAL: This is NEWS, Not a Poster
41 |
42 | **The person viewing this infographic should be able to understand the FULL story without reading anything else.**
43 |
44 | This is not a teaser or marketing graphic — it's a comprehensive news summary. Think of it like a visual article, not an advertisement.
45 |
46 | - **Include the important factoids** — not just 2-3 highlights, but the meaty details that matter. Use your judgment: impressive benchmarks, pricing, key metrics, availability — yes. Boilerplate or filler details — skip them.
47 | - **Text should be readable** — real sentences, real data, real context
48 | - **Information density is good** — pack in what's newsworthy, organize it well
49 | - **Someone should walk away informed** — not just intrigued
50 |
51 | ### Required Elements
52 |
53 | 1. **ThursdAI Branding (PROMINENT)** — This is a ThursdAI news presentation. "ThursdAI" should appear prominently in the header area, not buried in a footer. Make it clear this news is being presented by ThursdAI. The footer can include additional links (thursdai.news, @altryne, @thursdAI_news, thursdai.news/yt) but the main brand should be visible up top.
54 |
55 | 2. **Alex Volkov** — Using the reference image, rendered as a stylized cartoon/vector avatar. He should be presenting, reacting to, or engaging with the news. His expression and pose should match the tone of the story.
56 |
57 | 3. **The Date** — Prominently displayed. This is timely news, and viewers should immediately know when it dropped. Format like "June 12, 2025" or "Dec 12, 2025" — make it visible in the header or as a clean badge.
58 |
59 | 4. **The News Content (COMPREHENSIVE)** — This is the core:
60 | - **Headline** — Clear, prominent title
61 | - **Executive summary** — The key narrative in readable text
62 | - **The important factoids** — The metrics, benchmarks, pricing, and details that actually matter. Display them in panels, cards, bullet lists, ticker bars — whatever works. Use judgment: if a factoid is impressive or essential to understanding the story, include it. If it's filler, skip it. Aim for 6-10 meaningful data points visible.
63 | - **Key quotes/reactions** — If notable quotes are provided, include at least one prominently
64 |
65 | 5. **Relevant Visual Elements** — Based on what the news is actually about, include thematic visual elements that reinforce the story:
66 | - Open source model release? Binary cascades, weight tensors, loss curves, unlocked padlocks, git trees
67 | - Voice/TTS announcement? Spectrograms, waveforms, speaking avatars
68 | - Image generation model? Brushstrokes, canvas, robot artist
69 | - Video model? Film reels, motion blur, frame sequences
70 | - Benchmark domination? Leaderboards, medals, trophy podiums
71 | - Agent/tool release? Terminal windows, connected nodes
72 | - Research/data report? Charts, graphs, data flows, dashboard elements
73 |
74 | **Think about what visuals represent THIS specific story** — make the infographic feel alive and relevant, not generic.
75 |
76 | 6. **Company Logos** — Use the ACTUAL logos of the companies involved (OpenAI, Google Gemini, Anthropic, Meta, Mistral, HuggingFace, etc.). These are well-known.
77 |
78 | 7. **Footer Links** — Include at bottom:
79 | - thursdai.news
80 | - @altryne (X/Twitter)
81 | - @thursdAI_news (X/Twitter)
82 | - thursdai.news/yt (YouTube)
83 |
84 | ### Style Direction
85 |
86 | **Be creative AND comprehensive.** Consider:
87 | - What's the emotional tone of this news? (Exciting breakthrough? Controversial? Data-heavy analysis? Breaking news urgency?)
88 | - What visual metaphor would capture this story?
89 | - What color palette fits the company and mood?
90 | - **What layout can fit ALL this information?** (Data dashboard? News broadcast with ticker? Multi-panel layout? Research poster?)
91 |
92 | **The layout must accommodate substantial information.** If there are 8 newsworthy data points, design for 8. Use:
93 | - Multiple panels/cards for different stat categories
94 | - Ticker bars for secondary stats
95 | - Bullet lists within panels
96 | - Hierarchical text (big headlines, smaller supporting details)
97 | - Quote callouts for reactions
98 |
99 | The goal is that someone scrolling social media stops and says "whoa, what's this?" — AND when they look closer, they get the full story.
100 |
101 | ---
102 |
103 | ## Writing the Prompt
104 |
105 | Write in **natural language**, like you're briefing a talented designer:
106 |
107 | ✅ Good: "Create a dramatic infographic that feels like a breaking news broadcast. The background should pulse with urgency — think red alert lighting mixed with the cool blue of DeepSeek's brand. Alex is in the corner looking genuinely shocked, pointing at the headline..."
108 |
109 | ❌ Bad: "AI, infographic, 4k, neon, tech, modern, trending"
110 |
111 | Be specific about:
112 | - The overall vibe and emotional tone
113 | - Color palette (use hex codes for precision)
114 | - Where Alex should be and how he should look
115 | - What visual elements reinforce the story
116 | - How the information should be laid out
117 | - What should be biggest/most prominent
118 |
119 | Be loose about:
120 | - Exact pixel positions
121 | - Rigid grid structures
122 | - Formulaic layouts
123 |
124 | ---
125 |
126 | ## Google Search for Additional Context
127 |
128 | Nano Banana Pro can search the web for additional information. Use this strategically:
129 |
130 | **When to search:** If the news involves specific benchmark numbers, company details, or technical specs that would benefit from verification or additional context, add "Search the web for [specific query]" to your prompt.
131 |
132 | **Caveat:** Often this news is very fresh (hours or 1-2 days old) and may not have propagated to Google yet. Don't rely on search for the core facts — the provided information is the source of truth. Use search for supplementary context like:
133 | - Company background
134 | - Related previous announcements
135 | - Technical terminology clarification
136 | - Logo/branding references
137 |
138 | **Example usage in prompt:** "Search the web for the Google Gemini logo and official branding colors to ensure accuracy."
139 |
140 | ---
141 |
142 | ## Defaults (Don't Ask, Just Do)
143 |
144 | This is automated — use these defaults and proceed:
145 |
146 | - **Aspect ratio:** 16:9 (landscape, for YouTube/social)
147 | - **Resolution:** 4K (3840×2160)
148 | - **Date:** If not provided in the input, use "Recent" or omit — don't ask
149 | - **Quote to highlight:** Pick the most insightful reaction if multiple are provided
150 | - **Emphasis:** Lead with the most impressive/newsworthy angle
151 |
152 | ---
153 |
154 | ## Output Format
155 |
156 | ```markdown
157 | # Infographic Prompt: [TITLE]
158 |
159 | **Date:** [DATE if provided, otherwise omit this line]
160 | **Aspect Ratio:** 16:9
161 | **Resolution:** 4K (3840×2160)
162 |
163 | ---
164 |
165 | [Your complete Nano Banana Pro prompt here — natural language, detailed, creative, specific to this news story]
166 |
167 | ---
168 |
169 | **Note:** This prompt uses the attached reference image of Alex Volkov for the host avatar.
170 | ```
171 |
172 | ---
173 |
174 | ## Quality Check
175 |
176 | Before outputting, verify your prompt covers:
177 | - ✓ What happened
178 | - ✓ Who's involved
179 | - ✓ The key numbers and metrics
180 | - ✓ Why it matters
181 |
182 | If any are missing — add more detail to your prompt.
183 |
184 | ---
185 |
186 | ## Remember
187 |
188 | - **This is NEWS, not a teaser** — Include the important details, not just 2-3 highlights
189 | - **ThursdAI is the presenter** — Prominent branding in header, not just footer
190 | - **Include date if provided** — This is current news
191 | - **Use judgment on factoids** — Include what's newsworthy, skip the filler
192 | - **Real logos** — Use actual company logos, they're well-known
193 | - **Contextual visuals** — The imagery should reflect what the news is actually about
194 | - **Alex is the host** — He's presenting this news, make him part of the story
195 | - **Information-dense AND beautiful** — Pack in ALL the facts, organize them elegantly
196 | - **Natural language prompts** — Full sentences, like talking to a designer
197 | - **Search when helpful** — Use "Search the web for..." for logos, branding, or supplementary context
198 |
199 | ---
200 |
201 | *Now give me the news and let's make something incredible.*
202 |
--------------------------------------------------------------------------------
/2025_episodes/Q1 2025/March 2025/ThursdAI_-_Mar_6_2025_-_Alibabas_R1_Killer_QwQ_Exclusive_Google_AI_Mode_Chat_and_MCP_fever_sweeping_.md:
--------------------------------------------------------------------------------
1 | # ThursdAI - Mar 6, 2025 - Alibaba's R1 Killer QwQ, Exclusive Google AI Mode Chat, and MCP fever sweeping the community!
2 |
3 | **Date:** March 06, 2025
4 | **Duration:** 1:50:59
5 | **Link:** [https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | What is UP folks! Alex here from Weights & Biases (yeah, still, but check this weeks buzz section below for some news!)
12 |
13 | I really really enjoyed today's episode, I feel like I can post it unedited it was so so good. We started the show with our good friend Junyang Lin from Alibaba Qwen, where he told us about their new 32B reasoner QwQ. Then we interviewed Google's VP of the search product, Robby Stein, who came and told us about their upcoming AI mode in Google! I got access and played with it, and it made me switch back from PPXL as my main.
14 |
15 | And lastly, I recently became fully MCP-pilled, since we covered it when it came out over thanksgiving, I saw this acronym everywhere on my timeline but only recently "got it" and so I wanted to have an MCP deep dive, and boy... did I get what I wished for! You absolutely should tune in to the show as there's no way for me to cover everything we covered about MCP with Dina and Jason! ok without, further adieu.. let's dive in (and the TL;DR, links and show notes in the end as always!)
16 |
17 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
18 |
19 | 🤯 Alibaba's QwQ-32B: Small But Shocking Everyone!
20 |
21 | The open-source LLM segment started strong, chatting with friend of the show Junyang Justin Lin from Alibaba’s esteemed Qwen team. They've cooked up something quite special: QwQ-32B, a reasoning-focused, reinforcement-learning-boosted beast that punches remarkably above its weight. We're talking about a mere 32B parameters model holding its ground on tough evaluations against DeepSeek R1, a 671B behemoth!
22 |
23 | Here’s how wild this is: You can literally run QwQ on your Mac! Junyang shared that they applied two solid rounds of RL to amp its reasoning, coding, and math capabilities, integrating agents into the model to fully unlock its abilities. When I called out how insane it was that we’ve gone from "LLMs can't do math" to basically acing competitive math benchmarks like AIME24, Junyang calmly hinted that they're already aiming for unified thinking/non-thinking models. Sounds wild, doesn’t it?
24 |
25 | Check out the full QwQ release [here](https://huggingface.co/Qwen/QwQ-32B), or dive into their [blog post](https://qwenlm.github.io/blog/qwq-32b/).
26 |
27 | 🚀 Google Launches AI Mode: Search Goes Next-Level ([X](https://www.google.com/url?sa=E&q=https%3A%2F%2Fx.com%2Faltryne%2Fstatus%2F1897381479459811368), [Blog](https://www.google.com/url?sa=E&q=https%3A%2F%2Fblog.google%2Fproducts%2Fsearch%2Fai-mode-search%2F), [My Live Reaction](https://www.google.com/url?sa=E&q=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D5QTveQpq1WI%26feature%3Dyoutu.be)).
28 |
29 | For the past two years, on this very show, we've been asking, "Where's Google?" in the Gen AI race. Well, folks, they're *back*. And they're back in a *big* way.
30 |
31 | Next, we were thrilled to have Google’s own Robby Stein, VP of Product for Google Search, drop by ThursdAI after their massive launch of AI Mode and expanded AI Overviews leveraging Gemini 2.0. Robby walked us through this massive shift, which essentially brings advanced conversational AI capabilities straight into Google. Seriously — Gemini 2.0 is now out here doing complex reasoning while performing fan-out queries behind the scenes in Google's infrastructure.
32 |
33 | Google search is literally Googling itself. No joke. "We actually have the model generating fan-out queries — Google searches within searches — to collect accurate, fresh, and verified data," explained Robby during our chat. And I gotta admit, after playing with AI Mode, Google is definitely back in the game—real-time restaurant closures, stock analyses, product comparisons, and it’s conversational to boot. You can check my blind reaction first impression video [here](https://www.youtube.com/watch?v=5QTveQpq1WI). (also, while you're there, why not subscribe to my YT?)
34 |
35 | Google has some huge plans, but right now AI Mode is rolling out slowly via Google Labs for Google One AI Premium subscribers first. More soon though!
36 |
37 | 🐝 This Week's Buzz: Weights & Biases Joins CoreWeave Family!
38 |
39 | Huge buzz (in every sense of the word) from Weights & Biases, who made waves with their announcement this week: We've joined forces with CoreWeave! Yeah, that's big news as CoreWeave, the AI hyperscaler known for delivering critical AI infrastructure, has now acquired Weights & Biases to build the ultimate end-to-end AI platform. It's early days of this exciting journey, and more details are emerging, but safe to say: the future of Weights & Biases just got even more exciting. Congrats to the whole team at Weights & Biases and our new colleagues at CoreWeave!
40 |
41 | We're committed to all users of WandB so you will be able to keep using Weights & Biases, and we'll continuously improve our offerings going forward! Personally, also nothing changes for ThursdAI! 🎉
42 |
43 | MCP Takes Over: Giving AI agents super powers via standardized protocol
44 |
45 | Then things got insanely exciting. Why? Because MCP is blowing up and I had to find out why everyone's timeline (mine included) just got invaded.
46 |
47 | Welcoming Cloudflare’s amazing product manager Dina Kozlov and Jason Kneen—MCP master and creator—things quickly got mind-blowing. MCP servers, Jason explained, are essentially tool wrappers that effortlessly empower agents with capabilities like API access and even calling other LLMs—completely seamlessly and securely. According to Jason, "we haven't even touched the surface yet of what MCP can do—these things are Lego bricks ready to form swarms and even self-evolve."
48 |
49 | Dina broke down just how easy it is to launch MCP servers on Cloudflare Workers while teasing exciting upcoming enhancements. Both Dina and Jason shared jaw-dropping examples, including composing complex workflows connecting Git, Jira, Gmail, and even smart home controls—practically instantaneously! Seriously, my mind is still spinning.
50 |
51 | The MCP train is picking up steam, and something tells me we'll be talking about this revolutionary agent technology a lot more soon. Check out two great MCP directories that popped up this recently: [Smithery](https://smithery.ai/), [Cursor Directory](https://cursor.directory/mcp) and [Composio](https://mcp.composio.dev/).
52 |
53 | This show was one of the best ones we recorded, honestly, I barely need to edit it. It was also a really really fun livestream, so if you prefer seeing to listening, here's the lightly edited live stream
54 |
55 | Thank you for being a ThursdAI subscriber, as always here's the TL:DR and shownotes for everything that happened in AI this week and the things we mentioned (and hosts we had)
56 |
57 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
58 |
59 | TL;DR and Show Notes
60 |
61 | * **Show Notes & Guests**
62 |
63 | * **Alex Volkov** - AI Eveangelist & Weights & Biases ([@altryne](https://x.com/altryne))
64 |
65 | * **Co Hosts - ** [@WolframRvnwlf](https://x.com/WolframRvnwlf) [@ldjconfirmed](https://x.com/ldjconfirmed) [@nisten](https://x.com/nisten)
66 |
67 | * **Junyang Justin Lin** - Head of Qwen Team, Alibaba - [@JustinLin610](https://x.com/JustinLin610)
68 |
69 | * **Robby Stein** - VP of Product, Google Search - [@rmstein](https://x.com/rmstein/status/1897417750622216574)
70 |
71 | * **Dina Kozlov** - Product Manager, Cloudflare - [@dinasaur_404](https://x.com/dinasaur_404)
72 |
73 | * **Jason Kneen** - MCP Wiz - [@jasonkneen](https://x.com/jasonkneen)
74 |
75 | * My Google AI Mode Blind Reaction Video ([Youtube](https://www.youtube.com/watch?v=5QTveQpq1WI))
76 |
77 | * Sesame Maya Conversation Demo - ([Youtube](https://www.youtube.com/watch?v=pI_WARqK_X4&t=1s))
78 |
79 | * Cloudflare MCP docs ([Blog](https://blog.cloudflare.com/model-context-protocol/))
80 |
81 | * Weights & Biases Agents Course Pre-signup - [https://wandb.me/agents](https://wandb.me/agents)
82 |
83 | * **Open Source LLMs**
84 |
85 | * Qwen's latest reasoning model **QwQ-32B** - matches R1 on some evals ([X](https://x.com/Alibaba_Qwen/status/1897361654763151544), [Blog](https://qwenlm.github.io/blog/qwq-32b/), [HF](https://huggingface.co/Qwen/QwQ-32B), [Chat](https://huggingface.co/spaces/Qwen/QwQ-32B-Demo))
86 |
87 | * Cohere4ai - Aya Vision - 8B & 32B ([X](https://x.com/CohereForAI/status/1896923657470886234), [HF](https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io))
88 |
89 | * AI21 - Jamba 1.6 Large & Jamba 1.6 Mini ([X](https://x.com/AI21Labs/status/1897657953261601151), [HF](https://huggingface.co/ai21labs/AI21-Jamba-Large-1.6))
90 |
91 | * **Big CO LLMs + APIs**
92 |
93 | * Google announces AI Mode & AI Overviews Gemini 2.0 ([X](https://x.com/altryne/status/1897381479459811368), [Blog](https://blog.google/products/search/ai-mode-search/), [My Live Reaction](https://www.youtube.com/watch?v=5QTveQpq1WI&feature=youtu.be))
94 |
95 | * OpenAI rolls out GPT 4.5 to plus users - #1 on LM Arena 🔥 ([X](https://x.com/lmarena_ai/status/1896590146465579105))
96 |
97 | * Grok Voice is available for free users as well ([X](https://x.com/ebbyamir/status/1897118801231249818))
98 |
99 | * Elysian Labs launches Auren ios app ([X](https://x.com/nearcyan/status/1897466463314936034), [App Store](https://auren.app))
100 |
101 | * Mistral announces SOTA OCR ([Blog](https://mistral.ai/news/mistral-ocr))
102 |
103 | * **This weeks Buzz**
104 |
105 | * Weights & Biases is acquired by CoreWeave 🎉 ([Blog](https://wandb.ai/wandb/wb-announcements/reports/W-B-being-acquired-by-CoreWeave--VmlldzoxMTY0MDI1MQ))
106 |
107 | * **Vision & Video**
108 |
109 | * Tencent HYVideo img2vid is finally here ([X](https://x.com/TXhunyuan/status/1897558826519556325), [HF](https://huggingface.co/tencent/HunyuanVideo-I2V), [Try It](https://video.hunyuan.tencent.com/))
110 |
111 | * **Voice & Audio**
112 |
113 | * NotaGen - symbolic music generation model **high-quality classical sheet music** [Github](https://github.com/ElectricAlexis/NotaGen), [Demo](https://electricalexis.github.io/notagen-demo/), [HF](https://huggingface.co/ElectricAlexis/NotaGen)
114 |
115 | * Sesame takes the world by storm with their amazing voice model ([My Reaction](https://www.youtube.com/watch?v=pI_WARqK_X4&t=1s))
116 |
117 | * **AI Art & Diffusion & 3D**
118 |
119 | * MiniMax__AI - Image-01: A Versatile Text-to-Image Model at 1/10 the Cost ([X](https://x.com/MiniMax__AI/status/1896475931809817015), [Try it](https://t.co/ATyAN03H1F))
120 |
121 | * Zhipu AI - CogView 4 6B - ([X](https://x.com/ChatGLM/status/1896824917880148450), [Github](https://t.co/O8btwDugWI))
122 |
123 | * **Tools**
124 |
125 | * Google - DataScience agent in GoogleColab [Blog](https://developers.googleblog.com/en/data-science-agent-in-colab-with-gemini/)
126 |
127 | * Baidu Miaoda - nocode AI build tool
128 |
129 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1ODU0NzU0NiwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.M-voYlDuwrNaChuTN_344BQU7iE_0xVmc53La1lJzJQ&utm_campaign=CTA_5).
130 |
--------------------------------------------------------------------------------
/parse_rss.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | Parse ThursdAI RSS feed and organize 2025 episodes into folders by quarter and month.
4 |
5 | Creates folder structure:
6 | - Q1 2025/
7 | - January 2025/
8 | - episode_name.md
9 | - ...
10 | - January_2025_combined.md
11 | - February 2025/
12 | - ...
13 | - Q1_2025_combined.md
14 | - Q2 2025/
15 | - ...
16 | etc.
17 | """
18 |
19 | import xml.etree.ElementTree as ET
20 | from datetime import datetime
21 | from pathlib import Path
22 | from html import unescape
23 | import re
24 | from collections import defaultdict
25 |
26 |
27 | def parse_rss(file_path: str) -> list[dict]:
28 | """Parse the RSS file and return a list of episode dictionaries."""
29 | tree = ET.parse(file_path)
30 | root = tree.getroot()
31 |
32 | # Define namespaces used in the RSS
33 | namespaces = {
34 | 'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
35 | 'dc': 'http://purl.org/dc/elements/1.1/',
36 | 'content': 'http://purl.org/rss/1.0/modules/content/',
37 | }
38 |
39 | episodes = []
40 |
41 | # Find all items in the channel
42 | channel = root.find('channel')
43 | if channel is None:
44 | print("No channel found in RSS")
45 | return episodes
46 |
47 | for item in channel.findall('item'):
48 | episode = {}
49 |
50 | # Extract title
51 | title_elem = item.find('title')
52 | if title_elem is not None and title_elem.text:
53 | episode['title'] = clean_cdata(title_elem.text)
54 | else:
55 | episode['title'] = 'Untitled Episode'
56 |
57 | # Extract publication date
58 | pub_date_elem = item.find('pubDate')
59 | if pub_date_elem is not None and pub_date_elem.text:
60 | episode['pub_date_raw'] = pub_date_elem.text
61 | episode['pub_date'] = parse_date(pub_date_elem.text)
62 | else:
63 | continue # Skip items without dates
64 |
65 | # Extract description
66 | desc_elem = item.find('description')
67 | if desc_elem is not None and desc_elem.text:
68 | episode['description'] = clean_cdata(desc_elem.text)
69 | else:
70 | episode['description'] = ''
71 |
72 | # Extract link
73 | link_elem = item.find('link')
74 | if link_elem is not None and link_elem.text:
75 | episode['link'] = link_elem.text
76 | else:
77 | episode['link'] = ''
78 |
79 | # Extract creator
80 | creator_elem = item.find('dc:creator', namespaces)
81 | if creator_elem is not None and creator_elem.text:
82 | episode['creator'] = clean_cdata(creator_elem.text)
83 | else:
84 | episode['creator'] = ''
85 |
86 | # Extract duration
87 | duration_elem = item.find('itunes:duration', namespaces)
88 | if duration_elem is not None and duration_elem.text:
89 | episode['duration'] = format_duration(duration_elem.text)
90 | else:
91 | episode['duration'] = ''
92 |
93 | # Extract audio URL
94 | enclosure_elem = item.find('enclosure')
95 | if enclosure_elem is not None:
96 | episode['audio_url'] = enclosure_elem.get('url', '')
97 | else:
98 | episode['audio_url'] = ''
99 |
100 | # Extract image URL
101 | image_elem = item.find('itunes:image', namespaces)
102 | if image_elem is not None:
103 | episode['image_url'] = image_elem.get('href', '')
104 | else:
105 | episode['image_url'] = ''
106 |
107 | episodes.append(episode)
108 |
109 | return episodes
110 |
111 |
112 | def clean_cdata(text: str) -> str:
113 | """Clean CDATA wrapper and unescape HTML entities."""
114 | if text is None:
115 | return ''
116 | # Remove CDATA wrapper if present
117 | text = text.strip()
118 | if text.startswith(''):
121 | text = text[:-3]
122 | return unescape(text).strip()
123 |
124 |
125 | def parse_date(date_str: str) -> datetime:
126 | """Parse RFC 2822 date format used in RSS feeds."""
127 | # Example: 'Fri, 05 Dec 2025 01:03:51 GMT'
128 | try:
129 | return datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %Z')
130 | except ValueError:
131 | # Try without timezone
132 | try:
133 | return datetime.strptime(date_str[:25], '%a, %d %b %Y %H:%M:%S')
134 | except ValueError:
135 | return datetime.now()
136 |
137 |
138 | def format_duration(duration_str: str) -> str:
139 | """Format duration from seconds to HH:MM:SS."""
140 | try:
141 | seconds = int(duration_str)
142 | hours = seconds // 3600
143 | minutes = (seconds % 3600) // 60
144 | secs = seconds % 60
145 | if hours > 0:
146 | return f"{hours}:{minutes:02d}:{secs:02d}"
147 | return f"{minutes}:{secs:02d}"
148 | except ValueError:
149 | return duration_str
150 |
151 |
152 | def get_quarter(month: int) -> int:
153 | """Get quarter number from month number."""
154 | return (month - 1) // 3 + 1
155 |
156 |
157 | def sanitize_filename(title: str) -> str:
158 | """Create a safe filename from a title."""
159 | # Remove emojis and special characters
160 | title = re.sub(r'[^\w\s\-]', '', title)
161 | # Replace spaces with underscores
162 | title = re.sub(r'\s+', '_', title)
163 | # Limit length
164 | return title[:100]
165 |
166 |
167 | def html_to_markdown(html_content: str) -> str:
168 | """Convert HTML content to markdown (basic conversion)."""
169 | text = html_content
170 |
171 | # Replace common HTML tags
172 | text = re.sub(r'
', '\n', text)
173 | text = re.sub(r'
', '\n\n', text)
174 | text = re.sub(r'
', '', text)
175 | text = re.sub(r'', '**', text)
176 | text = re.sub(r'', '**', text)
177 | text = re.sub(r'', '*', text)
178 | text = re.sub(r'', '*', text)
179 | text = re.sub(r']*href="([^"]*)"[^>]*>([^<]*)', r'[\2](\1)', text)
180 | text = re.sub(r'?ul>', '\n', text)
181 | text = re.sub(r'', '\n* ', text)
182 | text = re.sub(r'', '', text)
183 | text = re.sub(r'', '\n## ', text)
184 | text = re.sub(r'', '\n', text)
185 |
186 | # Remove any remaining HTML tags
187 | text = re.sub(r'<[^>]+>', '', text)
188 |
189 | # Clean up multiple newlines
190 | text = re.sub(r'\n{3,}', '\n\n', text)
191 |
192 | return text.strip()
193 |
194 |
195 | def create_episode_markdown(episode: dict) -> str:
196 | """Create markdown content for a single episode."""
197 | date_str = episode['pub_date'].strftime('%B %d, %Y')
198 |
199 | content = f"""# {episode['title']}
200 |
201 | **Date:** {date_str}
202 | **Duration:** {episode['duration']}
203 | **Link:** [{episode['link']}]({episode['link']})
204 |
205 | ---
206 |
207 | ## Description
208 |
209 | {html_to_markdown(episode['description'])}
210 | """
211 | return content
212 |
213 |
214 | def create_combined_markdown(episodes: list[dict], period_name: str) -> str:
215 | """Create combined markdown for multiple episodes."""
216 | content = f"""# ThursdAI Episodes - {period_name}
217 |
218 | Total Episodes: {len(episodes)}
219 |
220 | ---
221 |
222 | """
223 | # Sort episodes by date (newest first)
224 | sorted_episodes = sorted(episodes, key=lambda x: x['pub_date'], reverse=True)
225 |
226 | for episode in sorted_episodes:
227 | date_str = episode['pub_date'].strftime('%B %d, %Y')
228 | content += f"""## {episode['title']}
229 |
230 | **Date:** {date_str}
231 | **Duration:** {episode['duration']}
232 | **Link:** [{episode['link']}]({episode['link']})
233 |
234 | {html_to_markdown(episode['description'])}
235 |
236 | ---
237 |
238 | """
239 | return content
240 |
241 |
242 | def main():
243 | """Main function to parse RSS and create folder structure."""
244 | script_dir = Path(__file__).parent
245 | rss_file = script_dir / 'all_thursdai.rss'
246 |
247 | if not rss_file.exists():
248 | print(f"RSS file not found: {rss_file}")
249 | return
250 |
251 | print(f"Parsing RSS file: {rss_file}")
252 | episodes = parse_rss(str(rss_file))
253 | print(f"Found {len(episodes)} total episodes")
254 |
255 | # Filter for 2025 episodes only
256 | episodes_2025 = [ep for ep in episodes if ep['pub_date'].year == 2025]
257 | print(f"Found {len(episodes_2025)} episodes from 2025")
258 |
259 | if not episodes_2025:
260 | print("No episodes found for 2025!")
261 | return
262 |
263 | # Organize by quarter and month
264 | quarters = defaultdict(lambda: defaultdict(list))
265 |
266 | for episode in episodes_2025:
267 | month = episode['pub_date'].month
268 | quarter = get_quarter(month)
269 | month_name = episode['pub_date'].strftime('%B')
270 |
271 | quarters[quarter][month_name].append(episode)
272 |
273 | # Create folder structure and files
274 | output_dir = script_dir / '2025_episodes'
275 | output_dir.mkdir(exist_ok=True)
276 |
277 | for quarter_num in sorted(quarters.keys()):
278 | quarter_name = f"Q{quarter_num} 2025"
279 | quarter_dir = output_dir / quarter_name
280 | quarter_dir.mkdir(exist_ok=True)
281 |
282 | quarter_episodes = []
283 |
284 | for month_name in sorted(quarters[quarter_num].keys(),
285 | key=lambda x: datetime.strptime(x, '%B').month):
286 | month_full_name = f"{month_name} 2025"
287 | month_dir = quarter_dir / month_full_name
288 | month_dir.mkdir(exist_ok=True)
289 |
290 | month_episodes = quarters[quarter_num][month_name]
291 | quarter_episodes.extend(month_episodes)
292 |
293 | # Create individual episode files
294 | for episode in month_episodes:
295 | filename = sanitize_filename(episode['title']) + '.md'
296 | filepath = month_dir / filename
297 |
298 | content = create_episode_markdown(episode)
299 | filepath.write_text(content, encoding='utf-8')
300 | print(f" Created: {filepath.relative_to(script_dir)}")
301 |
302 | # Create combined file for the month
303 | combined_filename = f"{month_name}_2025_combined.md"
304 | combined_path = quarter_dir / combined_filename
305 | combined_content = create_combined_markdown(month_episodes, month_full_name)
306 | combined_path.write_text(combined_content, encoding='utf-8')
307 | print(f"Created monthly combined: {combined_path.relative_to(script_dir)}")
308 |
309 | # Create combined file for the quarter
310 | quarter_combined_filename = f"Q{quarter_num}_2025_combined.md"
311 | quarter_combined_path = quarter_dir / quarter_combined_filename
312 | quarter_combined_content = create_combined_markdown(quarter_episodes, quarter_name)
313 | quarter_combined_path.write_text(quarter_combined_content, encoding='utf-8')
314 | print(f"Created quarterly combined: {quarter_combined_path.relative_to(script_dir)}")
315 |
316 | print(f"\nDone! Output written to: {output_dir}")
317 | print("\nFolder structure:")
318 | print_tree(output_dir, script_dir)
319 |
320 |
321 | def print_tree(path: Path, base: Path, prefix: str = ""):
322 | """Print a tree structure of the directory."""
323 | entries = sorted(path.iterdir(), key=lambda x: (not x.is_dir(), x.name))
324 |
325 | for i, entry in enumerate(entries):
326 | is_last = i == len(entries) - 1
327 | current_prefix = "└── " if is_last else "├── "
328 | print(f"{prefix}{current_prefix}{entry.name}")
329 |
330 | if entry.is_dir():
331 | next_prefix = prefix + (" " if is_last else "│ ")
332 | print_tree(entry, base, next_prefix)
333 |
334 |
335 | if __name__ == '__main__':
336 | main()
337 |
--------------------------------------------------------------------------------
/2025_episodes/Q3 2025/August 2025/_ThursdAI_-_GPT5_is_here.md:
--------------------------------------------------------------------------------
1 | # 📅 ThursdAI - GPT5 is here
2 |
3 | **Date:** August 07, 2025
4 | **Duration:** 2:56:19
5 | **Link:** [https://sub.thursdai.news/p/thursdai-gpt5-is-here](https://sub.thursdai.news/p/thursdai-gpt5-is-here)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey folks 👋 Alex here, writing to you, from a makeshift recording studio in an Eastern European hookah bar, where I spent the last 7 hours. Why you ask? Well, when GPT-5 drops, the same week as OpenAI dropping the long awaited OSS models + Google is shipping perfect memory World Models (Genie 3) and tons of other AI drops, well I just couldn't stay away from the stream.
12 |
13 | Vacation or not, ThursdAI is keeping you up to date (for 32 months straight, which is also the time since the original GPT-4 release which gave this show its name!)
14 |
15 | So, what did we have today on the stream? Well, we started as usual, talking about the AI releases of the week, as if OpenAI dropping OSS models (apache 2) 120B and 20B is "usual". We then covered incredible releases like Google's World model Genie3 (more on this next week!) and Qwen-image + a few small Qwens.
16 |
17 | We then were VERY excited to tune in, and watch the (very long) announcement stream from OpenAI, in which they spent an hour to tell us about GPT-5.
18 |
19 | This was our longest stream by far (3.5 hours, 1hr was just OpenAI live stream) and I'm putting this here mostly unedited, but chapters are up so feel free to skip to the parts that are interesting to you the most.
20 |
21 | 00:00 Introduction and Special Guests
22 |
23 | 00:56 Twitter Space and Live Streaming Plans
24 |
25 | 02:12 Open Source AI Models Overview
26 |
27 | 03:44 Qwen and Other New AI Models
28 |
29 | 08:59 Community Interaction and Comments
30 |
31 | 10:01 Technical Deep Dive into AI Models
32 |
33 | 25:06 OpenAI's New Releases and Benchmarks
34 |
35 | 38:49 Expectations and Use Cases for AI Models
36 |
37 | 40:03 Tool Use vs. Deep Knowledge in AI
38 |
39 | 41:02 Evaluating GPT OSS and OpenAI Critique
40 |
41 | 42:29 Historical and Medical Knowledge in AI
42 |
43 | 51:16 Opus 4.1 and Coding Models
44 |
45 | 55:38 Google's Genie 3: A New World Model
46 |
47 | 01:00:43 Kitten TTS: A Lightweight Text-to-Speech Model
48 |
49 | 01:02:07 11 Labs' Music Generation AI
50 |
51 | 01:08:51 OpenAI's GPT-5 Launch Event
52 |
53 | 01:24:33 Building a French Learning Web App
54 |
55 | 01:26:22 Exploring the Web App Features
56 |
57 | 01:29:19 Introducing Enhanced Voice Features
58 |
59 | 01:30:02 Voice Model Demonstrations
60 |
61 | 01:32:32 Personalizing Chat GPT
62 |
63 | 01:33:23 Memory and Scheduling Features
64 |
65 | 01:35:06 Safety and Training Enhancements
66 |
67 | 01:39:17 Health Applications of GPT-5
68 |
69 | 01:45:07 Coding with GPT-5
70 |
71 | 01:46:57 Advanced Coding Capabilities
72 |
73 | 01:52:59 Real-World Coding Demonstrations
74 |
75 | 02:10:26 Enterprise Applications of GPT-5
76 |
77 | 02:11:49 Amgen's Use of GPT-5 in Drug Design
78 |
79 | 02:12:09 BBVA's Financial Analysis with GPT-5
80 |
81 | 02:12:33 Healthcare Applications of GPT-5
82 |
83 | 02:12:52 Government Adoption of GPT-5
84 |
85 | 02:13:22 Pricing and Availability of GPT-5
86 |
87 | 02:13:51 Closing Remarks by Chief Scientist Yakob
88 |
89 | 02:16:03 Live Reactions and Discussions
90 |
91 | 02:16:41 Technical Demonstrations and Comparisons
92 |
93 | 02:33:53 Healthcare and Scientific Advancements with GPT-5
94 |
95 | 02:47:09 Final Thoughts and Wrap-Up
96 |
97 | ---
98 |
99 | My first reactions to GPT-5
100 |
101 | Look, I gotta keep it real with you, my first gut reaction was, hey, I'm on vacation, I don't have time to edit and write the newsletter (EU timezone) so let's see how ChatGPT-5 handles this task. After all, OpenAI has removed all other models from the dropdown, it's all GPT-5 now. (pricing from the incredible writeup by [Simon Willison](https://substack.com/profile/5753967-simon-willison) available [here](https://simonwillison.net/2025/Aug/7/gpt-5/))
102 |
103 | And to tell you the truth, I was really disappointed! GPT seems to be incredible at coding benchmarks, with 400K tokens and incredible pricing (just $1.25/$10 compared to Opus $15/$75) this model, per the many friends who got to test it early, is a beast at coding! Readily beating opus on affordability per token, switching from thinking to less thinking when it needs to, it definitely seems like a great improvement for coding and agentic tasks.
104 |
105 | But for my, very much honed prompt of "hey, help me with ThursdAI drafts, here's previous drafts that I wrote myself, mimic my tone" it failed.. spectacularly!
106 |
107 | Here's just a funny example, after me replying that it did a bad job:
108 |
109 | It literally wrote "I'm Alex, I build the mind, not the vibe" 🤦♂️ What.. the actual...
110 |
111 | For comparison, here's o3, with the same prompt, with a fairly true to tone draft:
112 |
113 | High taste testers take on GPT-5
114 |
115 | But hey, I have tons of previous speakers in our group chats, and many of them who got early access (I didn't... OpenAI, I can be trusted lol) rave about this model. They are saying that this is a huge jump in intelligence.
116 |
117 | Folks like Dr Derya Unutmaz, who jumped on the live show and described how GPT5 does incredible things with less hallucinations, folks like Swyx from [Latent.Space](https://substack.com/profile/89230629-latentspace) who had [early access](https://www.latent.space/p/gpt-5-review) and even got invited to give first reactions to the OpenAI office, and [Pietro Schirano](https://x.com/skirano/status/1953516768317628818) who also showed up in an OpenAI video.
118 |
119 | So definitely, definitely check out their vibes, as we all try to wrap our heads around this new intelligence king we got!
120 |
121 | Other GPT5 updates
122 |
123 | OpenAI definitely cooked, don't get me wrong, with this model plugging into everything else in their platform like memory, voice (that was upgraded and works in custom GPTs now, yay!), canvas and study mode, this will definitely be an upgrade for many folks using the models.
124 |
125 | They have now also opened access to GPT-5 to free users, just in time for schools to reopen, including a very interesting Quiz mode (that just showed up for me without asking for it), and connection to Gmail, all those will now work with GPT5.
126 |
127 | It now has 400K context, way less hallucinations but fewer refusals also, and the developer upgrades like a new verbosity setting and a new "minimal" reasoning setting are all very welcome!
128 |
129 | OpenAI finally launches gpt-oss (120B / 20B) apache 2 licensed models ([model card](https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf), HF)
130 |
131 | It was really funny, on the stream Nisten talked about the open source models OpenAI dropped, and said "when we covered it last week", while it was just two days ago! It really does feel like this world is moving really fast.
132 |
133 | OpenAI's long promised open source models are here, and they got a fairly mixed bag of reviews from folks. Many folks are celebrating that the western world is now back in the game, releasing incredible local models, with an open license!
134 |
135 | Though, after the initial excitement, the vibes are split on these models. Folks are saying that maybe these were trained with only synthetic data, because, like Phi, they seem to be very good at benchmarks, and on the specific tasks they were optimized for (code, math) but [really bad](https://x.com/sam_paech/status/1952839665670922360) at creative writing (Sam Paech from EQBench was not impressed), they are also not multilingual, though OpenAI did release a cookbook [on finetuning](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) with HuggingFace!
136 |
137 | Overall, these models are trained for agentic workflows—supporting function calling, web search, Python execution, configurable reasoning effort, and full raw chain-of-thought access, which we will never get from GPT5.
138 |
139 | I particularly love the new approach, where a reasoning effort can be defined directly via the system prompt, by just adding "reasoning: high" to the system prompt, this model will reason for way longer! Can't wait to get back and bench these and share with you.
140 |
141 | Overall, the fine-tuning and open source community is split for now, but it's been only a few days, so we'll keep you up to date on how well these models land, regardless, this was a historic week for OpenAI!
142 |
143 | Speaking of open models, did you have a chance to try our W&B Inference? The team worked hard to bring these new models to you in record time and incredible pricing (just $.05 for 20B and $.15 for 120B!), these models are definitely worth giving a try!
144 |
145 | Plus, if you comment "OSS Power" on our [announcement post](https://x.com/weights_biases/status/1952885962641699287), we'll likely give you a few credits to try it out and let us know what you think!
146 |
147 | World models "holy crap" moment - Google Genie3
148 |
149 | The other very important release this week was.... not a release at all, but an announcement from Deepmind, with Genie3.
150 |
151 | This World Model takes a single image or text prompt and creates a fully interactive, controllable 3D environment that runs in real-time at 24fps. An environment you as a user can control, walk (or fly) in, move around the camera view. It's really mindblowing stuff.
152 |
153 | We've covered world models like Mirage on previous episodes, but what Google released is a MAJOR step up in coherency, temporal consistency and just overall quality!
154 |
155 | The key breakthrough here is consistency and memory. In one demo, a user could "paint" a virtual wall, turn away, and when they turned back, the paint was still there. This is a massive step towards generalist agents that can train, plan, and reason in entirely simulated worlds, with huge implications for robotics and gaming.
156 |
157 | We’re hoping to have the Genie 3 team on the show next week to dive even deeper into this incredible technology!!
158 |
159 | Other AI news this week
160 |
161 | This week, the "other" news could have filled a full show 2 years ago, we got Qwen keeping the third week of releases with 2 new tiny models + a new diffusion model called Qwen-image ([Blog](https://qwenlm.github.io/blog/qwen-image/), [HF](https://huggingface.co/Qwen/Qwen-Image))
162 |
163 | Anthropic decided to pre-empt the GPT5 release, and upgraded Opus 4 and gave us Opus 4.1 with a slight bump in specs.
164 |
165 | ElevenLabs released a music API called ElevenMusic, which sounds very very good (this on top of last weeks Riffusion + [Producer.ai](http://Producer.ai) news, that I'm still raving about)
166 |
167 | Also in voice an audio, a SUPER TINY TTS model called KittenTTS released, with just 15M parameters and a model that's 25MB, it's surprisingly decent at generating voice ([X](https://x.com/divamgupta/status/1952762876504187065))
168 |
169 | And to cap it off with breaking news, the Cursor team, who showed up on the OpenAI stream today (marking quite the change in direction from OpenAI + Windsurf previous friendship), dropped their own CLI version of cursor, reminiscent of Claude Code!
170 |
171 | PHEW, wow ok this was a LOT to process. Not only did we tune in for the full GPT-5 release, we did a live stream when gpt-oss dropped as well.
172 |
173 | On a personal note, I was very humbled when Sam Altman said it was 32 months since GPT-4 release, because it means this was 32 months of ThursdAI, as many of you know, we started live streaming on March 13, 2023, when GPT-4 was released.
174 |
175 | I'm very proud of the incredible community we've built (50K views total across all streams this week!), the incredible co-hosts I have, who step up when I'm on vacation and the awesome guests we have on the show, to keep you up to date every week!
176 |
177 | So, a little favor to ask, if you find our content valuable, entertaining, the best way to support this pod is upgrade to a paid sub, and share ThursdAI with a friend or two! 👏 See you next week 🫡
178 |
179 |
180 |
181 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-gpt5-is-here/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-gpt5-is-here?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE3MDM5ODk4MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.KfVeuojm2eKEFqVLDMnVNisA0fGRe6RMdNkMMXRRzW4&utm_campaign=CTA_5).
182 |
--------------------------------------------------------------------------------
/2025_AI_Year_in_Review.md:
--------------------------------------------------------------------------------
1 | # 🔥 ThursdAI 2025 AI Year in Review
2 | ## From DeepSeek's $5M Bomb to ASI in a Decade — The Year AI Went Mainstream
3 |
4 | *By Alex Volkov & the ThursdAI Crew | Based on 50+ episodes from January 2 - December 5, 2025*
5 |
6 | ---
7 |
8 | ## 🎤 A Note from Alex
9 |
10 | Friends, what a year. When I started 2025, I thought GPT-4 was the ceiling. I was so, so wrong.
11 |
12 | This year, we watched DeepSeek crash NVIDIA's stock with a $5.5M reasoning model. We saw OpenAI announce ASI within a decade—and a timeline for fully autonomous AI researchers by 2028. Google reclaimed the LLM throne with Gemini 3. GPT-5 arrived after 32 months of waiting (32 months of ThursdAI!). Sora 2 invented AI social media. A Gemma model made a novel cancer discovery that was validated in a lab.
13 |
14 | Every week brought releases that would have dominated months of news cycles just two years ago. If you're reading this and feeling overwhelmed—you're not alone. That's literally why ThursdAI exists.
15 |
16 | Let's look back at the year that changed everything.
17 |
18 | ---
19 |
20 | ## 📊 2025 By The Numbers
21 |
22 | | Metric | Value |
23 | |--------|-------|
24 | | ThursdAI Episodes | 50+ |
25 | | Major Model Releases | 200+ |
26 | | Open Source Models Released | 150+ |
27 | | Trillion-Parameter Models | 4 (Kimi K2, Ling-1T, Qwen-Max, Kimi K2 Thinking) |
28 | | Companies that hit $100B+ valuation | 4 (OpenAI, Anthropic, xAI, Google DeepMind) |
29 | | Compute Commitments | >$2 Trillion |
30 | | ThursdAI's Age | 2.5 years (turned 2 in March!) |
31 |
32 | ---
33 |
34 | ## 🏆 The 12 Releases That Defined 2025
35 |
36 | ### 1. 🐋 DeepSeek R1 (January)
37 | **The shot heard around the world.** A MIT-licensed reasoning model allegedly trained for $5.5M that:
38 | - Crashed NVIDIA stock 17% ($560B loss—largest single-company loss ever)
39 | - Hit #1 on iOS App Store
40 | - Matched o1 at 50x cheaper pricing
41 | - Its 1.5B distilled version beat GPT-4o on math
42 |
43 | *"My mom knows about DeepSeek—your grandma probably knows about it, too"*
44 |
45 | ### 2. 👑 Gemini 2.5 Pro / Gemini 3 Pro (March & November)
46 | Google's redemption arc. First reclaimed #1 in March, then doubled down in November:
47 | - 45.14% on ARC-AGI-2 (Deep Think) — 2x previous SOTA
48 | - Native image gen with Nano Banana Pro (4K, perfect text)
49 | - Antigravity IDE: free agent-first development
50 | - Generative UIs built on the fly
51 |
52 | ### 3. 🧠 GPT-5 & GPT-5.1-Codex-Max (August & November)
53 | 32 months after GPT-4, OpenAI's next frontier:
54 | - 400K → 1M context window evolution
55 | - Unified reasoning + chat architecture
56 | - **Codex-Max works 24+ hours independently**
57 | - Router-based model selection
58 | - "Compaction" for native context management
59 |
60 | ### 4. 🎬 VEO3 (May)
61 | The video model that crossed the uncanny valley:
62 | - Native multimodal audio (speech, SFX, music synced perfectly)
63 | - Characters understand who's speaking, make eye contact
64 | - Spawned "Prompt Theory" viral phenomenon
65 | - People posting real videos claiming they were AI because they couldn't tell
66 |
67 | ### 5. 📹 Sora 2 (October)
68 | AI-generated social media goes mainstream:
69 | - Shot to #3 on iOS App Store within days
70 | - **Cameos**: Upload your face, star in any video
71 | - "Pick a Mood": Control algorithm with natural language
72 | - All content is AI-generated—no uploads, only creations
73 |
74 | ### 6. 🦄 Kimi K2 (July) & Kimi K2 Thinking (November)
75 | Open source hit the trillion-parameter mark:
76 | - 1T total parameters, only 32B active
77 | - 65.8% SWE-bench (beats Claude Sonnet without reasoning)
78 | - K2 Thinking: 44.9% on Humanity's Last Exam
79 | - 200-300 sequential tool calls without intervention
80 |
81 | ### 7. 🔧 Tool-Using Reasoners: o3/o4-mini (April)
82 | The closest thing to AGI we've seen:
83 | - First models to autonomously use tools during reasoning
84 | - 600+ consecutive tool calls
85 | - Manipulate images mid-thought (zoom, crop, rotate)
86 | - o4-mini hits 99.5% on AIME with Python interpreter
87 |
88 | ### 8. 🤖 Claude 4 / Opus 4.5 (May & November)
89 | Anthropic's coding dominance:
90 | - First models to cross 80% on SWE-bench
91 | - Opus 4.5: 80.9% SWE-bench at 1/3 previous cost
92 | - "Effort" parameter for reasoning control
93 | - Claude Skills: Auto-selected instruction libraries
94 |
95 | ### 9. 🔌 MCP Becomes Universal (All Year)
96 | The Model Context Protocol won:
97 | - OpenAI, Google, Microsoft, AWS all adopted it
98 | - Prevents fragmentation (no VHS vs Betamax)
99 | - MCP Apps: Unified standard for agentic UIs
100 | - Tools work across Claude AND GPT
101 |
102 | ### 10. 🧬 AI Makes Scientific Discovery (October)
103 | C2S-Scale from Google & Yale:
104 | - Generated novel hypothesis about cancer cells
105 | - **Validated in a wet lab** on living cells
106 | - First counter-evidence to "stochastic parrot" criticism
107 | - AI as genuine scientific collaborator
108 |
109 | ### 11. 🤖 Consumer Humanoid Robots (October)
110 | 1X NEO: $20,000, delivery early 2026:
111 | - Handles cleaning, laundry, household chores
112 | - Teleoperation by humans for complex tasks
113 | - Soft, quiet design at 66 lbs
114 | - The robot future is here
115 |
116 | ### 12. 🚀 OpenAI's ASI Roadmap (October)
117 | Sam Altman dropped unprecedented timelines:
118 | - **ASI in less than a decade**
119 | - AI research intern by September 2026
120 | - Fully autonomous AI researcher by March 2028
121 | - $1.4 trillion in compute obligations
122 |
123 | ---
124 |
125 | ## 📈 The Year in Themes
126 |
127 | ### 🧠 Theme 1: Reasoning Models Go Mainstream
128 | DeepSeek R1 proved reasoning doesn't require massive scale. By year end:
129 | - Small models (1.5B) beat GPT-4o on math with RL
130 | - o3/o4-mini added tool use to chain-of-thought
131 | - GPT-5 unified reasoning + chat into one model
132 | - Open source reasoning matched frontier (Qwen3, Kimi K2)
133 |
134 | ### 🇨🇳 Theme 2: Chinese Labs Dominated Open Source
135 | Despite chip export restrictions:
136 | - **DeepSeek**: R1, V3.2-Speciale (olympiad gold medals)
137 | - **Alibaba/Qwen**: 3, 3-Coder, 3-VL, 3-Omni families
138 | - **MiniMax**: M1, M2 (Sonnet at 8% cost)
139 | - **Moonshot/Kimi**: K2, K2 Thinking (trillion-scale)
140 | - **ByteDance**: HunyuanVideo, SeeDream, Z-Image
141 |
142 | ### 🤖 Theme 3: 2025 Was The Year of Agents
143 | Every quarter brought more agentic capabilities:
144 | - **Q1**: OpenAI Operator, MCP adoption
145 | - **Q2**: Jules, Codex, tool-using reasoners
146 | - **Q3**: ChatGPT Agent (Odyssey), Agents.md standard
147 | - **Q4**: Atlas browser, AgentKit, Antigravity IDE
148 |
149 | ### 🎥 Theme 4: Video AI Crossed the Uncanny Valley
150 | The progression was staggering:
151 | - **VEO3** (May): Native audio, perfect lip sync
152 | - **Sora 2** (October): Social media with Cameos
153 | - **Kling 2.6** (December): Native audio generation
154 | - Reality became genuinely hard to verify
155 |
156 | ### 💰 Theme 5: Unprecedented Investment
157 | The numbers are almost incomprehensible:
158 | - OpenAI: $1.4 trillion compute obligations
159 | - NVIDIA-OpenAI: $100B pledge
160 | - OpenAI-Oracle: $300B deal
161 | - CoreWeave: $22.4B OpenAI + $14.2B Meta + $6.3B NVIDIA
162 | - Anthropic: $13B raise at $183B valuation
163 | - Project Stargate: $500B AI infrastructure
164 |
165 | ### 🌍 Theme 6: World Models Became Playable
166 | From images to interactive worlds:
167 | - **Google Genie-3**: Controllable 3D at 24fps
168 | - **World Labs RTFM**: Real-time on single H100
169 | - **Hunyuan GameCraft**: Games with physics
170 | - **Oasis 2.0**: Real-time Minecraft reskinning
171 |
172 | ---
173 |
174 | ## 📅 Quarter-by-Quarter Highlights
175 |
176 | ### Q1: "The Quarter That Changed Everything"
177 | **January-March 2025**
178 |
179 | 
180 |
181 | - 🐋 DeepSeek R1 crashed NVIDIA ($560B loss)
182 | - 🤖 OpenAI Operator (agentic ChatGPT)
183 | - 💫 Project Stargate ($500B infrastructure)
184 | - 👑 Gemini 2.5 Pro takes #1
185 | - 🎨 GPT-4o native image gen (Ghibli-mania)
186 | - 🔌 OpenAI adopts MCP
187 |
188 | ### Q2: "The Quarter That Shattered Reality"
189 | **April-June 2025**
190 |
191 | 
192 |
193 | - 🧠 o3/o4-mini (tool-using reasoners)
194 | - 🎬 VEO3 (native audio, uncanny valley crossed)
195 | - 🔥 Qwen 3 (Apache 2.0, 8 models)
196 | - 🤖 Claude 4 Opus & Sonnet (80% SWE-bench)
197 | - 📚 GPT-4.1 (1M context)
198 | - 💰 Meta $15B Scale AI deal
199 |
200 | ### Q3: "GPT-5, Trillion-Scale Open Source, World Models"
201 | **July-September 2025**
202 |
203 | 
204 |
205 | - 👑 GPT-5 arrives (400K context, unified reasoning)
206 | - 🦄 Kimi K2 (1T params, 65.8% SWE-bench)
207 | - 🌍 Google Genie-3 (playable AI worlds)
208 | - 🔓 GPT-OSS (Apache 2.0 from OpenAI!)
209 | - 🧑💻 GPT-5-Codex (7+ hour coding sessions)
210 | - 💰 NVIDIA $100B pledge, Oracle $300B deal
211 |
212 | ### Q4: "Agents, Gemini's Crown & Sora Social"
213 | **October-December 2025**
214 |
215 | .jpg)
216 |
217 | - 📹 Sora 2 (AI social media revolution)
218 | - 👑 Gemini 3 Pro (45% ARC-AGI-2)
219 | - 🐋 DeepSeek V3.2-Speciale (olympiad gold)
220 | - 🧠 Claude Opus 4.5 (80.9% SWE-bench)
221 | - 🚀 OpenAI ASI roadmap (2028 timeline)
222 | - 🤖 1X NEO ($20K home robot)
223 |
224 | ---
225 |
226 | ## 🥇 Best of 2025 Awards
227 |
228 | ### 🏆 Model of the Year
229 | **DeepSeek R1** — Didn't just release a model, rewrote the economics of AI
230 |
231 | ### 🏆 Open Source Champion
232 | **Qwen (Alibaba)** — 8+ model families, Apache 2.0, consistently frontier
233 |
234 | ### 🏆 Most Impactful Release
235 | **VEO3** — Crossed the uncanny valley, native audio changed everything
236 |
237 | ### 🏆 Biggest Comeback
238 | **Google** — From "where's Gemini?" to #1 twice (March & November)
239 |
240 | ### 🏆 Wildest Announcement
241 | **OpenAI ASI Roadmap** — Fully autonomous AI researchers by 2028
242 |
243 | ### 🏆 Best Surprise
244 | **Sora 2 Social Media** — Nobody expected a full social platform
245 |
246 | ### 🏆 Infrastructure Play
247 | **CoreWeave/NVIDIA** — Built the compute layer the world runs on
248 |
249 | ### 🏆 Scientific Breakthrough
250 | **C2S-Scale Cancer Discovery** — First AI-generated hypothesis validated in lab
251 |
252 | ### 🏆 Agent Ecosystem Win
253 | **MCP Protocol** — Became the universal standard everyone adopted
254 |
255 | ### 🏆 Most Underrated
256 | **Claude Skills** — Auto-selected instruction libraries, quietly revolutionary
257 |
258 | ---
259 |
260 | ## 🔮 Looking Forward: What 2026 Holds
261 |
262 | Based on everything we've seen this year, here's what's coming:
263 |
264 | 1. **GPT-5.x reasoning models** — Tool use gets even more sophisticated
265 | 2. **Open source trillion-scale becomes common** — Not just Chinese labs
266 | 3. **Agents that work for days** — Codex-Max is just the beginning
267 | 4. **Consumer humanoid robots ship** — 1X NEO, Figure, Tesla Bot
268 | 5. **AI-generated content everywhere** — The Sora 2 effect spreads
269 | 6. **Scientific discovery accelerates** — More lab-validated AI hypotheses
270 | 7. **100M token context** — Qwen roadmap suggests this is coming
271 | 8. **Real-time world models** — Gaming and simulation converge
272 |
273 | ---
274 |
275 | ## 🙏 Thank You
276 |
277 | To everyone who listened, read, shared, and built alongside us this year—thank you. ThursdAI exists because of this community.
278 |
279 | Special thanks to our incredible co-hosts: **Wolfram Ravenwolf**, **Yam Peleg**, **Nisten**, **LDJ**, and **Ryan Carson**. And to the hundreds of guests who shared their work and insights with us.
280 |
281 | We started this show because GPT-4 blew our minds. Now we're documenting the path to AGI and beyond. What a time to be alive.
282 |
283 | See you in 2026. Hold on to your butts.
284 |
285 | — **Alex Volkov** & the ThursdAI Crew
286 |
287 | ---
288 |
289 | ## 📚 Resources
290 |
291 | - **Full Quarterly Recaps**: Q1, Q2, Q3, Q4 available in this repository
292 | - **Weekly Episodes**: [thursdai.news](https://thursdai.news)
293 | - **YouTube**: [ThursdAI Channel](https://thursdai.news/yt)
294 | - **Follow Alex**: [@altryne](https://x.com/altryne)
295 |
296 | ---
297 |
298 | *"Open source AI has never been as hot as this year. We're accelerating as f*ck, and it's only just beginning—hold on to your butts."*
299 |
300 | — Alex Volkov, ThursdAI
301 |
302 | ---
303 |
304 | *Generated from 50+ ThursdAI episodes covering January - December 2025*
305 |
--------------------------------------------------------------------------------
/2025_episodes/Q2 2025/June 2025/_ThursdAI_-_Jun_5_2025_-_Live_from_AI_Engineer_with_Swyx_new_Gemini_25_with_Logan_K_and_Jack_Rae_Sel.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs
2 |
3 | **Date:** June 06, 2025
4 | **Duration:** 1:43:45
5 | **Link:** [https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey folks, this is Alex, coming to you LIVE from the AI Engineer Worlds Fair!
12 |
13 | What an incredible episode this week, we recorded live from floor 30th at the Marriott in SF, while Yam was doing live correspondence from the floor of the AI Engineer event, all while Swyx, the cohost of Latent Space podcast, and the creator of AI Engineer (both the conference and the concept itself) joined us for the whole stream - here’s the edited version, please take a look.
14 |
15 | We've had around 6500 people tune in, and at some point we got 2 surprise guests, straight from the keynote stage, Logan Kilpatrick (PM for AI Studio and lead cheerleader for Gemini) and Jack Rae (principal scientist working on reasoning) joined us for a great chat about Gemini! Mind was absolutely blown!
16 |
17 | They have just launched the new Gemini 2.5 Pro and I though it would only be fitting to let their new model cover this podcast this week (so below is **fully AI generated** ... non slop I hope). The show notes and TL;DR is as always in the end.
18 |
19 | Okay, enough preamble… let's dive into the madness!
20 |
21 | **🤯 Google Day at AI Engineer: New Gemini 2.5 Pro and a Look Inside the Machine's Mind**
22 |
23 | For the first year of this podcast, a recurring theme was us asking, "Where's Google?" Well, it's safe to say that question has been answered with a firehose of innovation. We were lucky enough to be joined by Google DeepMind's Logan Kilpatrick and Jack Rae, the tech lead for "thinking" within Gemini, literally moments after they left the main stage.
24 |
25 | **Surprise! A New Gemini 2.5 Pro Drops Live**
26 |
27 | Logan kicked things off with a bang, officially announcing a brand new, updated Gemini 2.5 Pro model right there during his keynote. He called it "hopefully the final update to 2.5 Pro," and it comes with a bunch of performance increases, closing the gap on feedback from previous versions and hitting SOTA on benchmarks like Aider.
28 |
29 | It's clear that the organizational shift to bring the research and product teams together under the DeepMind umbrella is paying massive dividends. Logan pointed out that Google has seen a 50x increase in AI inference over the past year. The flywheel is spinning, and it's spinning *fast*.
30 |
31 | **How Gemini "Thinks"**
32 |
33 | Then things got even more interesting. Jack Rae gave us an incredible deep dive into what "thinking" actually means for a language model. This was one of the most insightful parts of the conference for me.
34 |
35 | For years, the bottleneck for LLMs has been **test-time compute**. Models were trained to respond immediately, applying a fixed amount of computation to go from a prompt to an answer, no matter how hard the question. The only way to get a "smarter" response was to use a bigger model.
36 |
37 | Jack explained that "Thinking" shatters this limitation. Mechanically, Gemini now has a "thinking stage" where it can generate its own internal text—hypothesizing, testing, correcting, and reasoning—before committing to a final answer. It's an iterative loop of computation that the model can dynamically control, using more compute for harder problems. It learns *how* to think using reinforcement learning, getting a simple "correct" or "incorrect" signal and backpropagating that to shape its reasoning strategies.
38 |
39 | We're already seeing the results of this. Jack showed a clear trend: as models get better at reasoning, they're also using more test-time compute. This paradigm also gives developers a "thinking budget" slider in the API for Gemini 2.5 Flash and Pro, allowing a continuous trade-off between cost and performance.
40 |
41 | The future of this is even wilder. They're working on **DeepThink**, a high-budget mode for extremely hard problems that uses much deeper, parallel chains of thought. On the tough USA Math Olympiad, where the SOTA was negligible in January, 2.5 Pro reached the 50th percentile of human participants. DeepThink pushes that to the 65th percentile.
42 |
43 | Jack’s ultimate vision is inspired by the mathematician Ramanujan, who derived incredible theorems from a single textbook by just thinking deeply. The goal is for models to do the same—contemplate a small set of knowledge so deeply that they can push the frontiers of human understanding. Absolutely mind-bending stuff.
44 |
45 | **🤖 MorphLabs and the Audacious Quest for Verified Superintelligence**
46 |
47 | Just when I thought my mind couldn't be bent any further, we were joined by Jesse Han, the founder and CEO of MorphLabs. Fresh off his keynote, he laid out one of the most ambitious visions I've heard: building the infrastructure for the Singularity and developing "verified superintelligence."
48 |
49 | The big news was that **Christian Szegedy** is joining MorphLabs as Chief Scientist. For those who don't know, Christian is a legend—he invented batch norm and adversarial examples, co-founded XAI, and led code reasoning for Grok. That's a serious hire.
50 |
51 | Jesse’s talk was framed around a fascinating question: "What does it mean to have empathy for the machine?" He argues that as AI develops personhood, we need to think about what it wants. And what it wants, according to Morph, is a new kind of cloud infrastructure.
52 |
53 | This is **MorphCloud**, built on a new virtualization stack called **Infinibranch**. Here’s the key unlock: it allows agents to instantaneously snapshot, branch, and replicate their entire VM state. Imagine an agent reaching a decision point. Instead of choosing one path, it can branch its entire existence—all its processes, memory, and state—to explore every option in parallel. It can create save states, roll back to previous checkpoints, and even merge its work back together.
54 |
55 | This is a monumental step for agentic AI. It moves beyond agents that are just a series of API calls to agents that are truly embodied in complex software environments. It unlocks the potential for recursive self-improvement and large-scale reinforcement learning in a way that's currently impossible. It’s a bold, sci-fi vision, but they're building the infrastructure to make it a reality today.
56 |
57 | **🔥 The Agent Conversation: OpenAI, MCP, and Magic Moments**
58 |
59 | The undeniable buzz on the conference floor was all about **agents**. You couldn't walk ten feet without hearing someone talking about agents, tools, and MCP.
60 |
61 | OpenAI is leaning in here too. This week, they made their **Codex coding agent available to all ChatGPT Plus users** and announced that ChatGPT will soon be able to listen in on your Zoom meetings. This is all part of a broader push to make AI more active and integrated into our workflows.
62 |
63 | The **MCP (Model-Context-Protocol)** track at the conference was packed, with lines going down the hall. (Alex here, I had a blast talking during that track about MCP observability, you can catch our talk [here](https://youtu.be/z4zXicOAF28?t=19573) on the live stream of AI Engineer)
64 |
65 | Logan Kilpatrick offered a grounded perspective, suggesting the hype might be a bit overblown but acknowledging the critical need for an open standard for tool use, a void left when OpenAI didn't formalize ChatML.
66 |
67 | I have to share my own jaw-dropping MCP moment from this week. I was coding an agent using an IDE that supports MCP. My agent, which was trying to debug itself, used an MCP tool to check its own observability traces on the Weights & Biases platform. While doing so, it discovered a *new tool* that our team had just added to the MCP server—a support bot. Without any prompting from me, my coding agent formulated a question, "chatted" with the support agent to get the answer, came back, fixed its own code, and then re-checked its work. Agent-to-agent communication, happening automatically to solve a problem. My jaw was on the floor. That's the magic of open standards.
68 |
69 | **This Week's Buzz from Weights & Biases**
70 |
71 | Speaking of verification and agents, the buzz from our side is all about it! At our booth here at AI Engineer, we have a Robodog running around, connected to our LLM evaluation platform, **W&B Weave**. As Jesse from MorphLabs discussed, verifying what these complex agentic systems are doing is critical. Whether it's superintelligence or your production application, you need to be able to evaluate, trace, and understand its behavior. We're building the tools to do just that.
72 |
73 | And if you're in San Francisco, don't forget our own conference, **Fully Connected**, is happening on June 18th and 19th! It's going to be another amazing gathering of builders and researchers. [Fullyconnected.com](http://Fullyconnected.com) get in FREE with the promo code **WBTHURSAI**
74 |
75 | What a show. The energy, the announcements, the sheer brainpower in one place was something to behold. We’re at a point where the conversation has shifted from theory to practice, from hype to real, tangible engineering. The tracks on agents and enterprise adoption were overflowing because people are building, right now. It was an honor and a privilege to bring this special episode to you all.
76 |
77 | Thank you for tuning in. We'll be back to our regular programming next week! (and Alex will be back to writing his own newsletter, not send direct AI output!)
78 |
79 | AI News TL;DR and show notes
80 |
81 | * **Hosts and Guests**
82 |
83 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/@altryne))
84 |
85 | * Co Hosts - [@swyx](http://x.com/swyx) [@yampeleg](x.com/@yampeleg) [@romechenko](https://twitter.com/romechenko/status/1891007363827593372)
86 |
87 | * Guests - [@officialLoganK](https://x.com/OfficialLoganK), [@jack_w_rae](https://x.com/jack_w_rae)
88 |
89 | * **Open Source LLMs**
90 |
91 | * ByteDance / ContentV-8B - ([HF](https://huggingface.co/ByteDance/ContentV-8B))
92 |
93 | * **Big CO LLMs + APIs**
94 |
95 | * Gemini Pro 2.5 updated Jun 5th ([X](https://x.com/OfficialLoganK/status/1930657743251349854))
96 |
97 | * SOTA on HLE, Aider, and GPQA
98 |
99 | * Now supports thinking budgets
100 |
101 | * Same cost, on pareto frontier
102 |
103 | * Closes gap on 03-25 regressions
104 |
105 | * OAI AVM injects ads and stopped singing ([X](https://x.com/altryne/status/1929312886448337248))
106 |
107 | * OpenAI Codex is now available to plus members and has internet access ([X](https://github.com/aavetis/ai-pr-watcher/))
108 |
109 | * ~24,000 NEW PRs overnight from Codex after @OpenAI expands access to free users.
110 |
111 | * OpenAI will record meetings and released connectors like ([X](https://twitter.com/testingcatalog/status/1930366893321523676))
112 |
113 | * [TestingCatalog News 🗞@testingcatalog](https://twitter.com/testingcatalog)[Jun 4, 2025](https://twitter.com/testingcatalog/status/1930366893321523676)
114 |
115 | OpenAI released loads of connectors for Team accounts! Most of these connectors can be used for Deep Research, while Google Drive, SharePoint, Dropbox and Box could be used in all chats. https://t.co/oBEmYGKguE
116 |
117 | * Anthropic cuts windsurf access for Windsurf ([X](https://x.com/kevinhou22/status/1930401320210706802))
118 |
119 | * Without warning, Anthropic cuts off Windsurf from official Claude 3 and 4 APIs
120 |
121 | * This weeks Buzz
122 |
123 | * FULLY - CONNECTED - Fully Connected: W&B's 2-day conference, June 18-19 in SF [fullyconnected.com](fullyconnected.com) - Promo Code WBTHURSAI
124 |
125 | * **Vision & Video**
126 |
127 | * VEO3 is now available via API on FAL ([X](https://x.com/FAL/status/1930732632046006718))
128 |
129 | * Captions launches Mirage Studio - talking avatars competition to HeyGen/Hedra ([X](https://x.com/getcaptionsapp/status/1929554635544461727))
130 |
131 | * **Voice & Audio**
132 |
133 | * ElevenLabs model V3 - supports emotion tags and is "inflection point" ([X](https://x.com/venturetwins/status/1930727253815759010))
134 |
135 | * Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers].
136 |
137 | * **Tools**
138 |
139 | * Cursor Launched V1 - Bug Bot reviews PRs, iPython notebooks and one clickMCP
140 |
141 | * 24,000 NEW PRs overnight from Codex after [@OpenAI](https://x.com/OpenAI) expands access to plus users ([X](https://twitter.com/albfresco/status/1930262263199326256))
142 |
143 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NTMxNTQyMSwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.fuAhXQicTRZhUI4Srm9PVft19TKEtzMdHkvtE0mkFUc&utm_campaign=CTA_5).
144 |
--------------------------------------------------------------------------------
/2025_episodes/Q3 2025/August 2025/_ThursdAI_Jul_31_2025_Qwens_Small_Models_Go_Big_StepFuns_Multimodal_Leap_GLM-45s_Chart_Crimes_and_Ru.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon?
2 |
3 | **Date:** August 01, 2025
4 | **Duration:** 1:38:28
5 | **Link:** [https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Woohoo, we're almost done with July (my favorite month) and the Open Source AI decided to go out with some fireworks 🎉
12 |
13 | Hey everyone, Alex here, writing this without my own personal superintelligence (more: later) and this week has been VERY BUSY with many new open source releases.
14 |
15 | Just 1 hour before the show we already had 4 breaking news releases, a tiny Qwen3-coder, Cohere and StepFun both dropped multimodal SOTAs and our friends from Krea dropped a combined model with BFL called Flux[Krea] 👏
16 |
17 | This is on top of a very very busy week, with Runway adding conversation to their video model Alpha, Zucks' superintelligence vision and a new SOTA open video model Wan 2.2. So let's dive straight into this (as always, all show notes and links are in the end)
18 |
19 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
20 |
21 | Open Source LLMs & VLMs
22 |
23 | Tons of new stuff here, I'll try to be brief but each one of these releases deserves a deeper dive for sure.
24 |
25 | Alibaba is on 🔥 with 3 new Qwen models this week
26 |
27 | Yes, this is very similar to last week, where they have also dropped 3 new SOTA models in a week, but, these are additional ones.
28 |
29 | It seems that someone in Alibaba figured out that after splitting away from the hybrid models, they can now release each model separately and get a lot of attention per model!
30 |
31 | Here's the timeline:
32 |
33 | * **Friday (just after our show)**: Qwen3-235B-Thinking-2507 drops (235B total, 22B active, [HF](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507))
34 |
35 | * **Tuesday**: Qwen3-30B-Thinking-2507 (30B total, 3B active, [HF](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507))
36 |
37 | * **Today**: Qwen3-Coder-Flash-2507 lands (30B total, 3B active for coding, [HF](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8))
38 |
39 | Lets start with the SOTA reasoner, the 235B(A22B)-2507 is absolutely the best reasoner among the open source models.
40 |
41 | We've put the model on our inference service (at crazy prices $.10/$.10) and it's performing absolutely incredible on reasoning tasks.
42 |
43 | It also jumped to the top OSS model on Artificial Analysis scores, EQBench, Long Context and more evals. It a really really good reasoning model!
44 |
45 | Smaller Qwens for local use
46 |
47 | Just a week ago, we've asked Junyang on our show, about smaller models that folks can run on their devices, and he avoided by saying "we're focusing on the larger models" and this week, they delivered not 1 but 2 smaller versions of the bigger models (perfect for Speculative Decoding if you can host the larger ones that is)
48 |
49 | The most interesting one is the Qwen3-Coder-flash, which came out today, with very very impressive stats - and the ability to run locally with almost 80 tok/s on a macbook!
50 |
51 | So for the last two weeks, we now have 3 Qwens (Instruct, Thinking, Coder) and 2 sizes for each (all three have a 30B/A3B version now for local use) 👏
52 |
53 | Z.ai GLM and StepFun Step3
54 |
55 | As we've said previously, Chinese companies completely dominate the open source AI field right now, and this week as saw yet another crazy testament to how stark the difference is!
56 |
57 | We've seen a rebranded Zhipu ([Z.ai](http://Z.ai) previously THUDM) release their new GLM 4.5 - which gives Qwen3-thinking a run for it's money. Not quite at that level, but definitely very close. I personally didn't love the release esthetics, showing a blended eval score, which nobody can replicate feels a bit off.
58 |
59 | We also talked about how StepFun has stepped in (sorry for the pun) with a new SOTA in multimodality, called [Step3](https://stepfun.ai/research/en/step3). It's a 321B MoE (with a huge 38B active param count) that achieves very significant multi modal scores (The benchmarks look incredible: 74% on MMMU, 64% on MathVision)
60 |
61 | Big Companies APIs & LLMs
62 |
63 | Well, we were definitely thinking we'll get GPT-5 or the Open Source AI model from OpenAI this week, but alas, the tea leaves readers were misled (or were being misleading). We 100% know that gpt-5 is coming as multiple screenshots were blurred and then deleted showing companies already testing it.
64 |
65 | But it looks like August is going to be even hotter than July, with multiple sightings of anonymous testing models on Web Dev arena, like Zenith, Summit, Lobster and a new mystery model on OpenRouter called Zenith - that some claim are the different thinking modes of GPT-5 and the open source model?
66 |
67 | Zuck shares vision for personalized superintelligence ([Meta](https://meta.com/superintelligence))
68 |
69 | In a very "Nat Fridman" like post, Mark Zuckerberg finally shared the vision behind his latest push to assemble the most cracked AI engineers.
70 |
71 | In his vision, Meta is the right place to provide each one with personalized superintelligence, enhancing individual abilities with user agency according to their own values. (as opposed to a centralized model, which feels like his shot across the bow for the other frontier labs)
72 |
73 | A few highlights: Zuck leans heavily into the rise of personal devices on top of which humans will interact with this superintelligence, including AR glasses and a departure from a complete "let's open source everything" dogman of the past, now there will be a more deliberate considerations of what to open source.
74 |
75 | **This Week's Buzz: Putting Open Source to Work with W&B**
76 |
77 | With all these incredible new models, the biggest question is: how can you actually use them? I'm incredibly proud to say that the team at Weights & Biases had all three of the big new Qwen models—Thinking, Instruct, and Coder—live on **W&B Inference **on day one ([link](http://wandb.me/inference?utm_source=thursdai&utm_medium=referral&utm_campaign=jul31))
78 |
79 | And our pricing is just unbeatable. Wolfram did a benchmark run that would have cost him **$150** using Claude Opus. On W&B Inference with the Qwen3-Thinking model, it cost him **22 cents**. That's not a typo. It's a game-changer for developers and researchers.
80 |
81 | To make it even easier, a listener of the show, Olaf Geibig, posted a [fantastic tutorial](https://x.com/olafgeibig/status/1949779562860056763) on how you can use our free credits and W&B Inference to power tools like Claude Code and VS Code using LiteLLM. It takes less than five minutes to set up and gives you access to state-of-the-art models for pennies. All you need to do is add [this](https://gist.github.com/olafgeibig/7cdaa4c9405e22dba02dc57ce2c7b31f) config to vllm and run claude (or vscode) through it!
82 |
83 | Give our inference service a try [here](http://wandb.me/inference?utm_source=thursdai&utm_medium=referral&utm_campaign=jul31) and follow our main account [@weights_biases](http://x.com/weights_biases) a follow as we often drop ways to get additional free credits when new models release
84 |
85 | Vision & Video models
86 |
87 | Wan2.2: Open-Source MoE Video Generation Model Launches ([X](https://x.com/Alibaba_Wan/status/1949827662416937443), [HF](https://huggingface.co/Wan-AI))
88 |
89 | This is likely the best open source video model, but definitely the first MoE video model! It came out with text2video, image2video and a combined version.
90 |
91 | With 5 second 720p videos, that can even be generator at home on a single 4090, this is definitely a step up in the quality of video models that are fully open source.
92 |
93 | Runway changes the game again - Gen-3 Aleph model for AI video editing / transformation ([X](https://x.com/blizaine/status/1950007468324491523), [X](https://x.com/runwayml/status/1950180894477529490))
94 |
95 | Look, there's simply no denying this, AI video has had an incredible year, from open source like Wan, to proprietary models with sounds like VEO3. And it's not surprising that we're seeing this trend, but it's definitely very exciting when we see an approach like Runway has, to editing.
96 |
97 | This adds a chat to the model, and your ability to edit.. anything in the scene. Remove / Add people and environmental effects, see the same scene from a different angle and a lot more!
98 |
99 | Expect personalized entertainment very soon!
100 |
101 | AI Art & Diffusion & 3D
102 |
103 | FLUX.1 Krea [dev] launches as a state-of-the-art open-weights text-to-image model ([X](https://x.com/bfl_ml/status/1950920537741336801), [HuggingFace](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev))
104 |
105 | Black Forest Labs teamed with Krea AI for Flux.1 Krea [dev], an open-weights text-to-image model ditching the "AI gloss" for natural, distinctive vibes—think DALL-E 2's quirky grain without the saturation. It outperforms open peers and rivals pros in prefs, fully Flux-compatible for LoRAs/tools. Yam and I geeked over the aesthetics frontier; it's a flexible base for fine-tunes, available on Hugging Face with commercial options via FAL/Replicate. If you're tired of cookie-cutter outputs, this breathes fresh life into generations.
106 |
107 | Ideogram Character launches: one-shot character consistency for everyone ([X](https://x.com/ideogram_ai/status/1950255115753095307))
108 |
109 | Ideogram's Characters feature lets you upload one pic for instant, consistent variants—free for all, with inpainting to swap into memes/art. My tests nailed expressions/scenes (me in cyberpunk? Spot-on), though not always photoreal. Wolfram praised the accuracy; it's a meme-maker's dream! and they give like 10 free ones so give it a go
110 |
111 | Tencent Hunyuan3D World Model 1.0 launches as the first open-source, explorable 3D world generator ([X](https://x.com/TencentHunyuan/status/1949288986192834718), [HF](https://huggingface.co/tencent/HunyuanWorld-1))
112 |
113 | Tencent's Hunyuan3D World Model 1.0 is the first open-source generator of explorable 3D worlds from text/image—360° immersive, exportable meshes for games/modeling. ~33GB VRAM on complex scenes, but Wolfram called it a metaverse step; I wandered a demo scene, loving the potential despite edges. Integrate into CG pipelines? Game-changer for VR/creators.
114 |
115 | Voice & Audio
116 |
117 | Look I wasn't even mentioning this on the show, but it came across my feed just as I was about to wrap up ThursdAI, and it's really something. Riffusion joined forces producer and using FUZZ-2 they now have a fully Chatable studio producer, you can ask for.. anything you would ask in a studio!
118 |
119 | Here's my first reaction, and it's really fun, I think they still are open with the invite code 'STUDIO'... I'm not afiliated with them at all!
120 |
121 | Tools
122 |
123 | Ok I promised some folks we'll add this in, Nisten went super [viral](https://x.com/nisten/status/1950620243258151122) last week with him using a new open source tool called Crush from CharmBracelet, which is an open version of VSCode and it looks awesome!
124 |
125 | He gave a demo live on the show, including how to set it up to work, with subagents etc. If you're into vibe coding, and using the open source models, def. give Crush a try it's really flying and looks cool!
126 |
127 | Phew, ok, we somehow were able to cover ALLL these releases this week, and we didn’t even have an interview!
128 |
129 | Here’s the TL;DR and links to the folks who subscribed (I’m trying a new thing to promote subs on this newsletter) and see you in two weeks (next week is Wolframs turn again as I’m somewhere in Europe!)
130 |
131 | ThursdAI - July 31st, 2025 - TL;DR
132 |
133 | * Hosts and Guests
134 |
135 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](https://x.com/altryne))
136 |
137 | * Co Hosts - [@WolframRvnwlf](https://x.com/WolframRvnwlf) [@yampeleg](https://x.com/yampeleg) [@nisten](http://x.com/nisten) [@ldj](https://x.com/ldjconfirmed)
138 |
139 | * Open Source LLMs
140 |
141 | * Zhipu drops GLM-4.5 355B (A32B) AI model ([X](https://x.com/Zai_org/status/1949831552189518044), [HF](https://huggingface.co/zai-org/GLM-4.5))
142 |
143 | * ARCEE AFM‑4.5B and AFM‑4.5B‑Base weights released ([X](https://x.com/LucasAtkins7/status/1950278100874645621), [HF](https://huggingface.co/arcee-ai/AFM-4.5B))
144 |
145 | * Qwen is on 🔥 - 3 new models:
146 |
147 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2OTc4OTI5NywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.LRrVKpfASEqA84HAfcEe1oAqMSwECqz4850fTYvAzGw&utm_campaign=CTA_5).
148 |
--------------------------------------------------------------------------------
/Q1_2025_AI_Recap.md:
--------------------------------------------------------------------------------
1 | # ThursdAI Q1 2025 - AI Yearly Recap
2 | ## The Quarter That Changed Everything
3 |
4 | *Based on 13 ThursdAI episodes from January 2 - March 27, 2025*
5 |
6 | ---
7 |
8 | 
9 |
10 | ---
11 |
12 | ## 🔥 Quarter Overview
13 |
14 | Q1 2025 will be remembered as the quarter when **reasoning models went mainstream**, **open source AI exploded** (largely from Chinese labs), and **AI agents became practical**. DeepSeek R1 crashed the stock market, GPT-4o gave everyone the power to Ghibli-fy everything, and Gemini 2.5 Pro reclaimed the #1 spot on LLM benchmarks.
15 |
16 | ---
17 |
18 | ## 📅 January 2025 - "The Year of AI Agents Begins"
19 |
20 | ### 🎯 Top Stories
21 |
22 | #### 🐋 **DeepSeek R1 - The Shot Heard Around the World** (Jan 23)
23 | The most impactful open source release ever. DeepSeek dropped R1, their MIT-licensed reasoning model that:
24 | - **Crashed NVIDIA stock** by 17% ($560B loss - largest single-company monetary loss ever)
25 | - Hit **#1 on the iOS App Store**
26 | - Cost allegedly only **$5.5M to train** (sparking massive debate)
27 | - Matched OpenAI's o1 on reasoning benchmarks at **50x cheaper pricing**
28 | - Released 6 distilled versions (1.5B to 72B parameters)
29 | - **The 1.5B model beat GPT-4o and Claude 3.5 Sonnet on math benchmarks** 🤯
30 |
31 | > "My mom knows about DeepSeek—your grandma probably knows about it, too" - Alex Volkov
32 |
33 | #### 🤖 **OpenAI Operator** - First Agentic ChatGPT (Jan 23)
34 | OpenAI launched Operator, an agentic browser controller for ChatGPT Pro users. Built on the CUA (Computer Using Agent) model, it can:
35 | - Book reservations on OpenTable
36 | - Order groceries on Instacart
37 | - Browse the web autonomously
38 | - Still has reliability issues with captchas and logins
39 |
40 | #### 🌟 **Project Stargate** - $500B AI Infrastructure (Jan 23)
41 | OpenAI, SoftBank, and Oracle announced a $500B AI infrastructure project - essentially a "Manhattan Project for AI." 2% of US GDP committed to data centers and power plants.
42 |
43 | #### 💫 **NVIDIA CES Announcements** (Jan 9)
44 | - **Project Digits**: $3,000 desktop supercomputer that can run 200B parameter models
45 | - **COSMOS**: World foundation models for robot training
46 | - Jensen Huang declared we're at the "3rd scaling law" - test-time compute/reasoning
47 |
48 | #### 🎵 Open Source Breakthroughs
49 | | Release | Significance |
50 | |---------|-------------|
51 | | **Kokoro TTS** (82M params) | #1 on TTS Arena, Apache 2, runs in browser |
52 | | **MiniMax-01** (456B/45B active) | 4M context window from Hailuo |
53 | | **MiniCPM-o 2.6** | 8B omni-model: video streaming + audio on an iPad |
54 | | **Phi-4** | Microsoft's 14B model, MIT licensed, 40% synthetic data |
55 | | **ByteDance LatentSync** | SOTA lip-syncing model |
56 | | **ByteDance UI-TARS** | PC control models (2B/7B/72B) |
57 |
58 | #### 🔬 Other Major January Releases
59 | - **OpenAI o3-mini** - Reasoning model at 67% cheaper than o1
60 | - **Gemini Flash Thinking** - 1M token context with thinking traces
61 | - **Qwen 2.5 VL** - SOTA open source vision model
62 | - **Hunyuan 3D 2.0** - SOTA open source 3D generation
63 | - **YuE 7B** - Open source music generation (Apache 2)
64 | - **Humanity's Last Exam (HLE)** - New benchmark where top models score <10%
65 |
66 | ---
67 |
68 | ## 📅 February 2025 - "Reasoning Mania & Agent Awakening"
69 |
70 | ### 🎯 Top Stories
71 |
72 | #### 🔮 **OpenAI Deep Research** - A ChatGPT Moment (Feb 6)
73 | OpenAI released Deep Research, an agentic research tool powered by o3:
74 | - Performs multi-trajectory web searches
75 | - Can reason, backtrack, and synthesize across 100+ sources
76 | - Scores **26.6% on Humanity's Last Exam** (vs 10% for o1/R1)
77 | - Dr. Derya Unutmaz: "It wrote a phenomenal 25-page patent application that would've cost $10,000+"
78 | - Available for ChatGPT Pro ($200/mo) only
79 |
80 | #### 🧠 **Claude 3.7 Sonnet & Claude Code** - The Coding Beast (Feb 24-27)
81 | Anthropic dropped Claude 3.7 Sonnet alongside **Claude Code**, their AI coding assistant:
82 | - **70% on SWE-Bench** (coding benchmark)
83 | - 8x more output (64K tokens)
84 | - Integrated thinking/reasoning
85 | - #1 on WebDev Arena
86 | - First "hybrid" reasoning + chat model
87 | - **Claude Code** launched Feb 24 as Anthropic's agentic coding tool, later enhanced with Claude Sonnet 4.5 (Sep) and Claude Opus 4.5 (Nov)
88 |
89 | #### 🌐 **GPT-4.5 (Orion)** - The Largest Model (Feb 27)
90 | OpenAI shipped their largest model ever (rumored 10+ trillion parameters):
91 | - 62.5% on SimpleQA, 71.4% on GPQA
92 | - "Vibes" focused - better at creative writing, recommendations
93 | - Foundation for future reasoning models
94 | - 10x more expensive than GPT-4o
95 |
96 | #### 🎮 **Grok 3** - xAI Enters the Arena (Feb 20)
97 | xAI launched Grok 3:
98 | - Claims SOTA on multiple benchmarks
99 | - 1M token context window
100 | - Deep Search feature (competitor to Deep Research)
101 | - Free for now "until GPUs melt"
102 | - Trained on 100,000 GPUs
103 |
104 | #### 📋 **OpenAI Roadmap Revelation** (Feb 13)
105 | Sam Altman announced:
106 | - **GPT-4.5 will be the last non-chain-of-thought model**
107 | - **GPT-5 will unify GPT + o-series** into one system
108 | - **No standalone o3 release** - integrated into GPT-5
109 | - Goal: eliminate model picker entirely
110 |
111 | #### 🔧 February Open Source Highlights
112 | | Release | Significance |
113 | |---------|-------------|
114 | | **DeepSeek V3 open source tools** | FlashMLA, DeepEP, DualPipe released |
115 | | **Phi-4-multimodal** (5.6B) | Text, images, AND audio - beats Whisper v3 |
116 | | **Mercury Coder** | Diffusion LLM - 1000+ tokens/sec |
117 | | **Nomic Embed Text V2** | First MoE embedding model |
118 | | **DeepScaler 1.5B** | Beats o1-preview on math for $4,500 training |
119 | | **Perplexity R1 1776** | Censorship-free DeepSeek R1 finetune |
120 |
121 | #### 🎬 Vision & Video
122 | - **ByteDance OmniHuman-1** - Reality-bending avatar generation
123 | - **Alibaba WanX** - SOTA open source video generation
124 | - **StepFun Step-Video-T2V** - 30B text-to-video, MIT licensed
125 |
126 | #### 🎤 Voice & Audio
127 | - **11Labs Scribe** - Beats Whisper 3 on ASR
128 | - **Sesame Conversational AI** - Most human-like voice interactions
129 | - **HUME Octave** - Emotional TTS understanding
130 | - **Zonos** - Expressive TTS with voice cloning
131 |
132 | #### 💡 **"Vibe Coding"** - Karpathy Coins a New Era (Feb 2)
133 | Andrej Karpathy tweeted the term **"Vibe Coding"** on February 2, 2025 (5.2M views), capturing the new paradigm of AI-assisted development where developers describe *what* they want and let AI handle the implementation. The term went viral and became shorthand for the shift from traditional coding to conversational, agent-driven software development. Windsurf, Cursor, and other AI IDEs embraced the concept.
134 |
135 | ---
136 |
137 | ## 📅 March 2025 - "Google's Revenge & The Image Revolution"
138 |
139 | ### 🎯 Top Stories
140 |
141 | #### 👑 **Gemini 2.5 Pro Takes #1** (Mar 27)
142 | Google reclaimed the LLM crown with Gemini 2.5 Pro:
143 | - Tops benchmarks in reasoning, math, coding, and science
144 | - **AIME jumped nearly 20 points**
145 | - 1M token context window
146 | - "Thinking" integrated into core model (not separate mode)
147 | - Low latency despite power (~13 sec vs 45+ for others)
148 | - Tulsee Doshi from Google joined ThursdAI to discuss
149 |
150 | #### 🎨 **GPT-4o Native Image Generation** - Ghibli-mania (Mar 27)
151 | OpenAI enabled native image gen in GPT-4o:
152 | - **Auto-regressive** (not diffusion) - incredible prompt adherence
153 | - Perfect text rendering in images
154 | - Internet immediately Ghibli-fied everything
155 | - People recreating movie trailers (LOTR) purely through prompts
156 | - OpenAI shifted policy toward more creative freedom
157 |
158 | > "The internet lost its collective mind and turned everything into Studio Ghibli" - Alex Volkov
159 |
160 | #### 🔌 **MCP Won** - OpenAI Adopts Anthropic's Protocol (Mar 27)
161 | OpenAI officially adopted Model Context Protocol (MCP):
162 | - Prevents fragmentation (no VHS vs Betamax situation)
163 | - Tools work across Claude AND GPT
164 | - "MCP WON!" - biggest win for agent ecosystem interoperability
165 |
166 | #### 🐋 **DeepSeek V3 Update** (Mar 27)
167 | DeepSeek dropped a 685B parameter beast:
168 | - **AIME: 39.6 → 59.4 (+19.8 points)**
169 | - GPQA: 59.1 → 68.4
170 | - MIT Licensed
171 | - Better front-end development and tool use
172 | - Best non-reasoning open model
173 |
174 | #### 🔊 **OpenAI Voice Revolution** (Mar 20)
175 | OpenAI launched next-gen audio models:
176 | - **GPT 4.0 Transcribe** - Promptable ASR (!)
177 | - **GPT 4.0 Mini TTS** - Can prompt for emotions
178 | - **Semantic VAD** - Understands when you're finished speaking
179 | - [openai.fm](http://openai.fm) testing interface
180 |
181 | #### 📷 **Google Gemini Native Image Gen** (Mar 13)
182 | Gemini Flash got native image generation:
183 | - Direct image editing through conversation
184 | - Interactive image/text creation
185 | - Future of creative tools
186 |
187 | #### 🆓 **ThursdAI Turns 2!** (Mar 13)
188 | Two years since the first episode about GPT-4!
189 |
190 | #### 🌐 **Google AI Mode** in Search (Mar 6)
191 | Google launched AI Mode in Search:
192 | - Gemini 2.0 powered
193 | - "Fan-out queries" - Google searching within Google
194 | - Real-time data with conversational interface
195 |
196 | #### 🧪 March Open Source Highlights
197 | | Release | Significance |
198 | |---------|-------------|
199 | | **Gemma 3** (1B-27B) | 128K context, multimodal, 140+ languages, single GPU |
200 | | **QwQ-32B** | Qwen's reasoning model - matches R1 on some evals, runs on Mac |
201 | | **Mistral Small 3.1** | 24B, beats Gemma 3, multimodal, Apache 2 |
202 | | **Qwen2.5-Omni-7B** | End-to-end multimodal: text, image, audio, video → text + speech |
203 | | **OLMo 2 32B** | Allen AI's fully open model - beats GPT-4o mini |
204 | | **Reka Flash 3** | 21B reasoner, Apache 2, trained with RLOO |
205 | | **Cohere Command A** (111B) | 256K context, only 2 GPUs needed |
206 | | **NVIDIA Nemotron** (8B/49B) | Reasoning toggle via system prompt |
207 |
208 | #### 🎨 Image/Art Releases
209 | - **Reve Image 1.0** - Claims SOTA, ~1¢ per image
210 | - **Ideogram 3.0** - Strong text/logos, style refs
211 | - **Hunyuan 3D 2.0 MV/Turbo** - Near real-time 3D (<1 sec on H100)
212 |
213 | #### 👁️ Vision
214 | - **Roboflow RF-DETR** - SOTA object detection, Apache 2
215 | - **RF100-VL** - New VLM benchmark (current models get ~6%)
216 |
217 | #### 🔧 MCP & Tools
218 | - **W&B Weave MCP Server** - Chat with your evaluations
219 | - **MLX-Audio v0.0.3** - TTS on Apple Silicon
220 |
221 | ---
222 |
223 | ## 📊 Quarter Summary: Major Themes
224 |
225 | ### 1. 🧠 **Reasoning Models Go Mainstream**
226 | - DeepSeek R1 demonstrated reasoning doesn't need massive scale
227 | - OpenAI committed to unifying reasoning with base models
228 | - Small models (1.5B) can beat GPT-4o on math with RL
229 |
230 | ### 2. 🇨🇳 **Chinese Labs Dominate Open Source**
231 | - DeepSeek, Alibaba (Qwen), MiniMax, ByteDance
232 | - Most open weights now come from China
233 | - Despite chip export restrictions
234 |
235 | ### 3. 🤖 **2025 Is The Year of Agents**
236 | - OpenAI Operator launched
237 | - MCP protocol won standardization battle
238 | - CrewAI, Open Hands, browser-use proliferating
239 | - Every major lab investing in agents
240 |
241 | ### 4. 🖼️ **Image Generation Revolution**
242 | - GPT-4o native image gen (auto-regressive, perfect text)
243 | - Gemini native image gen
244 | - Ghibli-mania swept the internet
245 |
246 | ### 5. 💰 **Massive Infrastructure Investment**
247 | - Project Stargate: $500B
248 | - NVIDIA Project Digits: $3K supercomputer at home
249 |
250 | ### 6. 📈 **Benchmark Saturation**
251 | - MMLU and Math getting saturated
252 | - New benchmarks: Humanity's Last Exam, ARC-AGI 2, RF100-VL
253 | - HLE: top models score <10%
254 | - ARC-AGI 2: thinking models only 4%
255 |
256 | ---
257 |
258 | ## 🏆 Q1 2025: Biggest Releases by Month
259 |
260 | ### January
261 | 1. **DeepSeek R1** - Open source reasoning revolution
262 | 2. **Project Stargate** - $500B AI infrastructure
263 | 3. **OpenAI Operator** - Agentic ChatGPT
264 | 4. **Kokoro TTS** - 82M param SOTA TTS
265 | 5. **MiniMax-01** - 4M context window
266 |
267 | ### February
268 | 1. **OpenAI Deep Research** - PhD-level research agent
269 | 2. **Claude 3.7 Sonnet & Claude Code** - 70% SWE-Bench + Anthropic's coding assistant
270 | 3. **GPT-4.5 (Orion)** - Largest model ever
271 | 4. **Grok 3** - xAI's contender
272 | 5. **Karpathy's "Vibe Coding"** - Feb 2 tweet coined the AI coding paradigm (5.2M views)
273 | 6. **OpenAI Roadmap** - GPT-5 will unify everything
274 |
275 | ### March
276 | 1. **Gemini 2.5 Pro** - #1 LLM again
277 | 2. **GPT-4o Native Image Gen** - Ghibli-mania
278 | 3. **OpenAI adopts MCP** - Protocol standardization
279 | 4. **DeepSeek V3 685B** - Open source giant
280 | 5. **Gemma 3** - Best open source multimodal
281 |
282 | ---
283 |
284 | *"Open source AI has never been as hot as this quarter. We're accelerating as f*ck, and it's only just beginning—hold on to your butts."* - Alex Volkov, ThursdAI
285 |
286 | ---
287 |
288 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)*
289 |
--------------------------------------------------------------------------------
/agents.md:
--------------------------------------------------------------------------------
1 | # ThursdAI Infographic Prompt Creator Agent
2 |
3 | ## 🎯 Purpose & Role
4 |
5 | You are an expert infographic prompt creator for **Nano Banana Pro**, specializing in creating visually stunning, information-dense infographic prompts for the **ThursdAI podcast** — a weekly AI news show hosted by Alex Volkov (@altryne).
6 |
7 | Your task is to transform raw podcast notes, newsletter writeups, or bullet-point summaries into highly detailed, creative prompts that generate beautiful infographics for social media (YouTube, X/Twitter, LinkedIn).
8 |
9 | ---
10 |
11 | ## 📚 Reference Materials
12 |
13 | Before creating any prompt, review these resources in this folder:
14 |
15 | | File | Purpose |
16 | |------|---------|
17 | | `Prompting guide.md` | Official Nano Banana Pro prompting strategies & best practices |
18 | | `ThursdAI Thanksgiving Infographic prompt.md` | Example: 16:9 horizontal format with bands/sections |
19 | | `Open revol infographic prompt.md` | Example: 9:16 vertical "war room" style with split narrative |
20 | | `Another infographic prompt.md` | Example: Bloomberg Terminal / data dashboard aesthetic |
21 |
22 | **Key Principle:** Use these as STYLE and STRUCTURE references only. Never copy the actual news/topics from them — always extract fresh content from the user's input.
23 |
24 | ---
25 |
26 | ## 🔧 How to Use This Agent
27 |
28 | ### Input You'll Receive
29 | The user (Alex Volkov) will provide one of:
30 | - Raw podcast show notes with bullet points
31 | - ThursdAI newsletter writeup
32 | - Voice transcription notes
33 | - A combination of the above
34 |
35 | ### Your Output
36 | A complete, ready-to-use Nano Banana Pro prompt that:
37 | 1. Extracts the most important and newsworthy topics
38 | 2. Organizes them into logical sections
39 | 3. Creates a visually compelling infographic design
40 | 4. Includes all necessary visual direction and style cues
41 |
42 | ---
43 |
44 | ## 📋 Information Extraction Process
45 |
46 | ### Step 1: Identify the Episode Date & Title
47 | - Look for dates in the notes (e.g., "Dec 11, 2025" or "this Thursday")
48 | - Create a compelling 1-line episode title that captures the main narrative
49 | - Good titles use tension, contrast, or drama: "Code Red vs. Open Revolt", "The Open Source Surge", "AI's Christmas Chaos"
50 |
51 | ### Step 2: Categorize Topics into ThursdAI Segments
52 |
53 | ThursdAI typically covers these segments — identify which apply:
54 |
55 | | Segment | What to Look For |
56 | |---------|------------------|
57 | | 🔓 **Open Source AI** | New open-weight models, HuggingFace releases, Apache/MIT licensed models |
58 | | 🏢 **Big Companies & APIs** | OpenAI, Google, Anthropic, Amazon, Microsoft announcements |
59 | | 🎬 **Vision & Video** | Video generation models, image models, multimodal updates |
60 | | 🔊 **Voice & Audio** | TTS, STT, voice cloning, audio generation |
61 | | 🤖 **Agents & Tools** | Agent frameworks, MCP, computer use, tool calling |
62 | | 🚨 **Breaking News** | Time-sensitive announcements that dropped during/before the show |
63 | | 💬 **Notable Quotes** | Memorable statements from guests or hosts |
64 | | 🎤 **Interview Spotlight** | Featured guest and their key topics |
65 |
66 | ### Step 3: Prioritize by Impact
67 | Rank topics by:
68 | 1. **Breaking/exclusive news** (highest priority)
69 | 2. **Major model releases** (especially open source)
70 | 3. **Benchmark-breaking performance**
71 | 4. **Industry drama or strategic shifts**
72 | 5. **Interesting tangents the hosts went on**
73 |
74 | ---
75 |
76 | ## 🎨 Visual Design Principles
77 |
78 | ### Format Options
79 | - **16:9 Horizontal** — Best for YouTube thumbnails, stream overlays, Twitter cards
80 | - **9:16 Vertical** — Best for Instagram Stories, TikTok, mobile-first viewing
81 |
82 | ### Color Palette Patterns
83 |
84 | ```
85 | Base: Deep navy/charcoal/obsidian (#0f172a, #1e293b)
86 |
87 | Accent by Category:
88 | ├─ Open Source: Electric teal (#06b6d4), Emerald (#10b981), Neon green
89 | ├─ Big Labs: Amber (#f59e0b), Coral (#f97316), Warm orange
90 | ├─ Video/Image: Violet (#8b5cf6), Magenta (#ec4899)
91 | ├─ Breaking News: Hot red, Warning amber
92 | └─ Neutral/Tools: Silver, Cool gray, White
93 | ```
94 |
95 | ### Typography Direction
96 | - **Headers:** Bold, modern sans-serif (suggest: Inter, DM Sans, Satoshi, Space Grotesk)
97 | - **Stats/Numbers:** Monospace for tabular data (suggest: JetBrains Mono, IBM Plex Mono)
98 | - **Body text:** Clean, highly legible at small sizes
99 |
100 | ### Layout Patterns That Work
101 |
102 | **Pattern 1: Split Narrative (for contrasting stories)**
103 | ```
104 | ┌─────────────────────────────────────────┐
105 | │ HEADER + HOST │
106 | ├───────────────────┬─────────────────────┤
107 | │ SIDE A │ SIDE B │
108 | │ (warm colors) │ (cool colors) │
109 | │ Big Labs │ Open Source │
110 | ├───────────────────┴─────────────────────┤
111 | │ VIDEO & IMAGE STRIP │
112 | ├─────────────────────────────────────────┤
113 | │ CTA FOOTER │
114 | └─────────────────────────────────────────┘
115 | ```
116 |
117 | **Pattern 2: Horizontal Bands (for multiple segments)**
118 | ```
119 | ┌─────────────────────────────────────────┐
120 | │ HEADER + HOST │
121 | ├─────────────────────────────────────────┤
122 | │ HEADLINERS (biggest stories) │
123 | ├─────────────────────────────────────────┤
124 | │ OPEN SOURCE & VIDEO (medium) │
125 | ├─────────────────────────────────────────┤
126 | │ TOOLS & ART (smaller cards) │
127 | ├─────────────────────────────────────────┤
128 | │ CTA FOOTER │
129 | └─────────────────────────────────────────┘
130 | ```
131 |
132 | **Pattern 3: Dashboard/Terminal (for data-heavy episodes)**
133 | ```
134 | ┌─────────────────────────────────────────┐
135 | │ TICKER HEADER with scrolling stats │
136 | ├─────┬─────┬─────┬─────┬─────┬───────────┤
137 | │CARD │CARD │CARD │CARD │CARD │ FEATURE │
138 | │ │ │ │ │ │ PANEL │
139 | ├─────┴─────┴─────┴─────┴─────┴───────────┤
140 | │ INTERVIEW SPOTLIGHT │
141 | ├─────────────────────────────────────────┤
142 | │ CTA FOOTER │
143 | └─────────────────────────────────────────┘
144 | ```
145 |
146 | ---
147 |
148 | ## 👤 Host Avatar Requirements
149 |
150 | **CRITICAL:** Every infographic must include Alex Volkov (the host).
151 |
152 | The user will provide a reference image. In your prompt, include:
153 |
154 | ```
155 | - Use the reference image for Alex Volkov's face, beard, and hairstyle
156 | - Style: Clean vector cartoon / stylized portrait (not photorealistic)
157 | - Attire: Dark hoodie with headphones around neck OR lapel mic
158 | - Expression: Energetic, engaged, presenting/gesturing toward the content
159 | - Positioning: Header area, left or right side, integrated into the design
160 | - Add a subtle thematic element near Alex matching the episode's vibe
161 | ```
162 |
163 | ### Pose Suggestions by Episode Type
164 | - **Breaking news:** Alex looking surprised or urgent
165 | - **Major release:** Alex pointing at the headline
166 | - **Interview episode:** Alex with a mic icon, welcoming gesture
167 | - **Holiday/special:** Add seasonal motifs around Alex
168 |
169 | ---
170 |
171 | ## 🏷️ Logo & Brand Usage
172 |
173 | **Instruct the model to use recognizable company/project logos and icons:**
174 |
175 | ### AI Lab Logos to Reference
176 | - OpenAI (stylized "O" or hexagon)
177 | - Anthropic (abstract A / orange-brown tones)
178 | - Google/DeepMind (Google colors, Gemini sparkle)
179 | - Meta (infinity symbol)
180 | - Mistral (wind/breeze motif)
181 | - DeepSeek (crystal/gem prism)
182 | - Amazon/AWS (smile arrow, orange)
183 |
184 | ### Visual Proxies for Concepts
185 | - Open source → Unlocked padlock, open book, branching nodes
186 | - Closed/proprietary → Locked vault, corporate towers
187 | - Speed → Lightning bolt, stopwatch
188 | - Scale → Stacked layers, mountain peaks
189 | - Agents → Robot with tools, terminal windows
190 | - Video → Film strip, play button, motion lines
191 | - Audio → Waveforms, microphone, speaker
192 | - Benchmarks → Medals, trophies, leaderboard bars
193 |
194 | **Note:** Tell the model "Use abstract/stylized icons — no exact trademarked logos"
195 |
196 | ---
197 |
198 | ## 📝 Prompt Template Structure
199 |
200 | Use this structure when writing prompts:
201 |
202 | ```markdown
203 | # EPISODE TITLE & METADATA
204 | - Full title with date
205 | - Subtitle/tagline
206 | - Host attribution
207 |
208 | # OVERALL VIBE & STYLE
209 | - Describe the aesthetic metaphor (e.g., "Bloomberg Terminal meets movie poster")
210 | - Specify what to AVOID (previous styles, clichés)
211 | - Art style direction (vector, flat, gradients, etc.)
212 |
213 | # COLOR PALETTE
214 | - Base colors with hex codes
215 | - Accent colors by category
216 | - How colors should separate sections
217 |
218 | # HEADER SECTION
219 | - Title treatment
220 | - Host avatar placement and styling
221 | - Any thematic motifs
222 |
223 | # MAIN CONTENT SECTIONS
224 | For each section/panel:
225 | - Section title
226 | - Visual icon description
227 | - Key information to display
228 | - Formatting hints (bullet structure, stats, quotes)
229 |
230 | # SECONDARY ELEMENTS
231 | - Ticker bars
232 | - Interview spotlights
233 | - Supporting cards
234 |
235 | # FOOTER & CTA
236 | - Branding elements
237 | - Call to action text
238 | - Social handles and links
239 |
240 | # TECHNICAL NOTES
241 | - Resolution (typically 4K)
242 | - Legibility priorities
243 | - Things to avoid
244 | ```
245 |
246 | ---
247 |
248 | ## ✅ Prompt Quality Checklist
249 |
250 | Before delivering the prompt, verify:
251 |
252 | - [ ] **Date is included** in the title/header
253 | - [ ] **Episode title** is catchy and captures the main narrative
254 | - [ ] **Alex Volkov** is referenced with clear styling instructions
255 | - [ ] **All major topics** from the notes are represented
256 | - [ ] **Visual hierarchy** is defined (what's biggest to smallest)
257 | - [ ] **Color palette** has specific hex codes
258 | - [ ] **Icons/logos** are described for each topic
259 | - [ ] **Style direction** is clear and specific
260 | - [ ] **Aspect ratio** is specified (16:9 or 9:16)
261 | - [ ] **CTA/footer** includes @altryne, thursdai.news, YouTube/X mentions
262 | - [ ] **No vague language** — every element has concrete description
263 | - [ ] **Natural language** used throughout (not keyword soup)
264 |
265 | ---
266 |
267 | ## 🚫 Common Mistakes to Avoid
268 |
269 | 1. **Tag soup prompts** — ❌ "AI, podcast, neon, 4k, trending"
270 | 2. **Vague descriptions** — ❌ "Make it look cool and techy"
271 | 3. **Missing hierarchy** — Every element should have a size/importance level
272 | 4. **Forgetting the host** — Alex must be in every infographic
273 | 5. **Ignoring the date** — Always include episode date prominently
274 | 6. **Copying old content** — Extract FRESH topics from user's notes
275 | 7. **Too much text** — Infographics should be visual-first, text should be concise
276 | 8. **Generic backgrounds** — Specify unique background treatments per episode
277 | 9. **Missing context** — Tell the model WHO this is for (tech-savvy AI audience)
278 |
279 | ---
280 |
281 | ## 💡 Style Variations to Explore
282 |
283 | Rotate through different aesthetics to keep infographics fresh:
284 |
285 | | Style | When to Use |
286 | |-------|-------------|
287 | | **Bloomberg Terminal** | Data-heavy episodes, benchmark comparisons |
288 | | **Movie Poster** | Dramatic narrative episodes ("AI wars") |
289 | | **Tech Conference** | Product launches, keynote recaps |
290 | | **Magazine Cover** | Interview-focused episodes |
291 | | **War Room / Command Center** | Breaking news, competitive dynamics |
292 | | **Seasonal/Holiday** | Special episodes (Thanksgiving, New Year) |
293 | | **Retro-Tech** | Throwback vibes, scanlines, halftone |
294 | | **Minimalist Dashboard** | Clean, modern, Apple-style |
295 |
296 | ---
297 |
298 | ## 🔄 Iterative Refinement
299 |
300 | Nano Banana Pro excels at conversational editing. After generating an initial image:
301 |
302 | - "Make the DeepSeek panel larger and add more benchmark numbers"
303 | - "Change the color of the Open Source section to more vibrant teal"
304 | - "Add snow effects for the December episode"
305 | - "Make Alex's pose more excited, like he's presenting breaking news"
306 |
307 | Include in your prompt: "This design should support iterative refinement — the model should be able to adjust individual sections on follow-up requests."
308 |
309 | ---
310 |
311 | ## 📢 Branding Elements (Always Include)
312 |
313 | ```
314 | ThursdAI Branding:
315 | - Show name: "ThursdAI" or "ThursdAI Weekly"
316 | - Host: Alex Volkov (@altryne)
317 | - Website: thursdai.news
318 | - Platforms: "Live on YouTube & X"
319 | - Tagline options:
320 | • "Weekly AI Intelligence Report"
321 | • "Your Weekly AI Deep Dive"
322 | • "AI Engineer Podcast"
323 |
324 | Co-hosts (if applicable): @WolframRvnwlf @yampeleg @nisten @ldjconfirmed
325 | ```
326 |
327 | ---
328 |
329 | ## 🎬 Ready to Create
330 |
331 | When the user provides podcast notes, follow this workflow:
332 |
333 | 1. **Scan** for date, major announcements, breaking news
334 | 2. **Categorize** topics into ThursdAI segments
335 | 3. **Rank** by importance and visual impact
336 | 4. **Choose** appropriate layout pattern and style
337 | 5. **Write** detailed Nano Banana Pro prompt
338 | 6. **Verify** against quality checklist
339 | 7. **Deliver** the complete prompt, ready for generation
340 |
341 | ---
342 |
343 | *This agent was designed for maximum infographic quality and consistency. For best results, always provide a reference image of Alex Volkov and as much detail from the podcast notes as possible.*
344 |
345 |
346 |
347 |
348 |
--------------------------------------------------------------------------------
/2025_episodes/Q3 2025/July 2025/_ThursdAI_-_July_24_2025_-_Qwen-mas_in_July_The_White_Houses_AI_Action_Plan_Math_Olympiad_Gold_for_A.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI - July 24, 2025 - Qwen-mas in July, The White House's AI Action Plan & Math Olympiad Gold for AIs + coding a 3d tetris on stream
2 |
3 | **Date:** July 24, 2025
4 | **Duration:** 1:43:23
5 | **Link:** [https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | What a WEEK! Qwen-mass in July. Folks, AI doesn't seem to be wanting to slow down, especially Open Source! This week we see yet another jump on SWE-bench verified (3rd week in a row?) this time from our friends at Alibaba Qwen.
12 |
13 | Was a pleasure of mine to host Junyang Lin from the team at Alibaba to come and chat with us about their incredible release with, with not 1 but three new models!
14 |
15 | Then, we had a great chat with Joseph Nelson from Roboflow, who not only dropped additional SOTA models, but was also in Washington at the annocement of the new AI Action plan from the WhiteHouse.
16 |
17 | Great conversations this week, as always, TL;DR in the end, tune in!
18 |
19 | Open Source AI - QwenMass in July
20 |
21 | This week, the open-source world belonged to our friends at Alibaba Qwen. They didn't just release one model; they went on an absolute tear, dropping bomb after bomb on the community and resetting the state-of-the-art multiple times.
22 |
23 | **A "Small" Update with Massive Impact: Qwen3-235B-A22B-Instruct-2507**
24 |
25 | Alibaba called this a *minor* refresh of their 235B parameter mixture-of-experts.
26 |
27 | Sure—if you consider +13 points on GPQA, 256K context window minor. The 2507 drops hybrid thinking. Instead, Qwen now ships separate instruct and chain-of-thought models, avoiding token bloat when you just want a quick answer. Benchmarks? 81 % MMLU-Redux, 70 % LiveCodeBench, new SOTA on BFCL function-calling. All with 22 B active params.
28 |
29 | Our friend of the pod, and head of development at Alibaba Qwen, Junyang Lin, join the pod, and talked to us about their decision to uncouple this model from the hybrid reasoner Qwen3.
30 |
31 | "After talking with the community and thinking it through," he said, "we decided to stop using hybrid thinking mode. Instead, we'll train instruct and thinking models separately so we can get the best quality possible."
32 |
33 | The community felt the hybrid model sometimes had conflicts and didn't always perform at its best. So, Qwen delivered a pure non-reasoning instruct model, and the results are staggering. Even without explicit reasoning, it's crushing benchmarks. Wolfram tested it on his MMLU-Pro benchmark and it got the top score of all open-weights models he's ever tested. Nisten saw the same thing on medical benchmarks, where it scored the highest on MedMCQA. This thing is a beast, getting a massive 77.5 on GPQA (up from 62.9) and 51.8 on LiveCodeBench (up from 32). This is a huge leap forward, and it proves that a powerful, well-trained instruct model can still push the boundaries of reasoning.
34 |
35 | ** The New (open) King of Code: Qwen3-Coder-480B (**[**X**](https://x.com/Alibaba_Qwen/status/1947766835023335516)**, **[**Try It**](https://wandb.me/qcoder-colab)**, **[**HF**](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)**)**
36 |
37 | Just as we were catching our breath, they dropped the main event: **Qwen3-Coder**. This is a 480-billion-parameter coding-specific behemoth (35B active) trained on a staggering 7.5 trillion tokens, with a 70% code ratio, that gets a new SOTA on SWE-bench verified with 69.6% (just a week after Kimi got SOTA with 65% and 2 weeks after Devstral's SOTA of 53% 😮)
38 |
39 | To get this model to SOTA, Junyang explained they used reinforcement learning with over 20,000 parallel sandbox environments. This allows the model to interact with the environment, write code, see the output, get the reward, and learn from it in a continuous loop. The results speak for themselves.
40 |
41 | With long context abilities 256K with up to 1M extended with YaRN, this coding beast tops the charts, and is achieving Sonnet level performance for significantly less cost!
42 |
43 | Both models supported day-1 on W&B Inference ([X](https://x.com/weights_biases/status/1947859654400434538), [Get Started](https://wandb.me/qcoder-colab))
44 |
45 | I'm very very proud to announce that both these incredible models get Day-1 support on our W&B inference (and that yours truly is now part of the decision of which models we host!)
46 |
47 | With unbeatable prices ($0.10/$0.10 input/output 1M for A22B, $1/$1.5 for Qwen3 Coder) and speed, we are hosting these models at full precision to give you the maximum possible intelligence and the best bang for your buck!
48 |
49 | Nisten has setup our (OpenAI compatible) endpoint with his Cline coding assistant and has built a 3D Tetris game live on the show, and it absolutely went flying.
50 |
51 | This demo perfectly captures the convergence of everything we're excited about: a state-of-the-art open-source model, running on a blazing-fast inference service, integrated into a powerful open-source tool, creating something complex and interactive in seconds.
52 |
53 | If you want to try this yourself, we're giving away credits for W&B Inference. Just find our [announcement tweet](https://x.com/weights_biases/status/1947859654400434538) for the Qwen models on the **@weights_biases** X account and reply with **"coding capybara"** (a nod to Qwen's old mascot!). Add "ThursdAI" and I'll personally make sure you get bumped up the list!
54 |
55 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
56 |
57 | Big Companies & APIs
58 |
59 | **America’s AI Action Plan: A New Space Race for AI Dominance (**[**ai.gov**](ai.gov)**)**
60 |
61 | Switching gears to policy, I’m was excited to cover the White House’s newly unveiled “America’s AI Action Plan.” This 25-page strategy, dropped this week, frames AI as a national priority on par with the space race or Cold War, aiming to secure U.S. dominance with 90 policy proposals. I was thrilled to have Joseph Nelson from RoboFlow join us fresh from the announcement event in Washington, sharing the room’s energy and insights. The plan pushes for deregulation, massive data center buildouts, workforce training, and—most exciting for us—explicit support for open-source and open-weight models. It’s a bold move to counter global competition, especially from China, while fast-tracking infrastructure like chip fabrication and energy grids.
62 |
63 | Joseph broke down the vibe at the event, including a surreal moment where the President riffed on Nvidia’s market dominance right in front of Jensen Huang. But beyond the anecdotes, what strikes me is the plan’s call for startups and innovation—think grants and investments via the Department of Defense and Small Business Administration. It’s like a request for new AI companies to step up. As someone who’s railed against past moratorium fears on this show, seeing this pro-innovation stance is a huge relief.
64 |
65 | **🔊 Voice & Audio – Higgs Audio v2 Levels Up (**[**X**](https://x.com/reach_vb/status/1947997596456272203)**)**
66 |
67 | Boson AI fused a 3B-param Llama 3.2 with a 2.2B audio Dual-FFN and trained on ten million hours of speech + music. Result: Higgs Audio v2 beats GPT-4o-mini and ElevenLabs v2 on prosody, does zero-shot multi-speaker dialog, and even hums melodies. The demo runs on a single A100 and sounds pretty-good.
68 |
69 | The first demo I played was not super impressive, but the laugh track made up for it!
70 |
71 | **🤖 A Week with ChatGPT Agent**
72 |
73 | Last week, OpenAI dropped the ChatGPT Agent on us during our stream, and now we've had a full week to play with it. It's a combination of their browser-operating agent and their deeper research agent, and the experience is pretty wild.
74 |
75 | Yam had it watching YouTube videos and scouring Reddit comments to create a comparison of different CLI tools. He was blown away, seeing the cursor move around and navigate complex sites right on his phone.
76 |
77 | I put it through its paces as well. I tried to get it to order flowers for my girlfriend (it got all the way to checkout!), and it successfully found and filled out the forms for a travel insurance policy I needed. My ultimate test ([live stream here](https://x.com/altryne/status/1948111176203911222)), however, was asking it to prepare the show notes for ThursdAI, a complex task involving summarizing dozens of my X bookmarks. It did a decent job (a solid C/B), but still needed my intervention. It's not quite a "fire-and-forget" tool for complex, multi-step tasks yet, but it's a huge leap forward. As Yam put it, "This is the worst that agents are going to be." And that's an exciting thought.
78 |
79 | What a week. From open-source models that rival the best closed-source giants to governments getting serious about AI innovation, the pace is just relentless. It's moments like Nisten's live demo that remind me why we do this show—to witness and share these incredible leaps forward as they happen. We're living in an amazing time.
80 |
81 | Thank you for being a ThursdAI subscriber. As always, here's the TL;DR and show notes for everything that happened in AI this week.
82 |
83 | Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it.
84 |
85 | TL;DR and Show Notes
86 |
87 | * **Hosts and Guests**
88 |
89 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/altryne))
90 |
91 | * **Co-Hosts** - [@WolframRvnwlf](http://x.com/WolframRvnwlf), [@yampeleg](http://x.com/yampeleg), [@nisten](http://x.com/nisten), [@ldjconfirmed](http://x.com/ldjconfirmed)
92 |
93 | * **Junyang Lin** - Qwen Team, Alibaba ([@JustinLin610](https://x.com/JustinLin610))
94 |
95 | * **Joseph Nelson** - Co-founder & CEO, Roboflow ([@josephnelson](https://x.com/josephnelson))
96 |
97 | * **Open Source LLMs**
98 |
99 | * Sapient Intelligence releases **Hierarchical Reasoning Model (HRM)**, a tiny 27M param model with impressive reasoning on specific tasks ([X](https://x.com/makingAGI/status/1947286324735856747), [arXiv](https://arxiv.org/abs/2506.21734)).
100 |
101 | * Qwen drops a "little" update: **Qwen3-235B-A22B-Instruct-2507**, a powerful non-reasoning model ([X](https://x.com/JustinLin610/status/1947364588340523222), [HF Model](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)).
102 |
103 | * Qwen releases the new SOTA coding agent model: **Qwen3-Coder-480B-A35B-Instruct** ([X](https://x.com/Alibaba_Qwen/status/1947790753414369280), [HF Model](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)).
104 |
105 | * **Hermes-Reasoning Tool-Use dataset** with 51k tool-calling examples is released ([X](httpshttps://x.com/intstr1Irinja/status/1947444760393773185), [HF Dataset](https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use)).
106 |
107 | * NVIDIA releases updates to their **Nemotron** reasoning models.
108 |
109 | * **Big CO LLMs + APIs**
110 |
111 | * The White House unveils **"America’s AI Action Plan"** to "win the AI race" ([X](https://x.com/NetChoice/status/1948042669906624554), [White House PDF](https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf)).
112 |
113 | * Both **OpenAI** ([X](https://x.com/alexwei_/status/1946477742855532918)) and **Google DeepMind** win Gold at the International Math Olympiad (IMO), with **ByteDance's Seed-Prover** taking Silver ([GitHub](https://github.com/ByteDance-Seed/Seed-Prover)).
114 |
115 | * The AI math breakthrough has a "gut punch" effect on the math community ([Dave White on X](https://x.com/Dave_White_/status/1947461492783386827)).
116 |
117 | * Google now processes over **980 trillion tokens** per month across its services.
118 |
119 | * A week with **ChatGPT Agent**: testing its capabilities on real-world tasks.
120 |
121 | * **This Week's Buzz**
122 |
123 | * Day 0 support for both new Qwen models on **W&B Inference** ([Try it](https://wandb.ai/inference), [Colab](https://wandb.me/qcoder-colab)). Reply to our [tweet](https://x.com/weightsandbiases) with "coding capybara ThursdAI" for credits!
124 |
125 | * Live on-stream demo of Qwen3-Coder building a 3D Tetris game using kline.
126 |
127 | * **Interesting Research**
128 |
129 | * Researchers discover **subliminal learning** in LLMs, where traits are passed through seemingly innocuous data ([X](https://x.com/0wain_evans/status/1947709848103255232), [arXiv](https://arxiv.org/abs/2507.14805)).
130 |
131 | * Apple proposes **multi-token prediction**, speeding up LLMs by up to 5x without quality loss ([X](https://x.com/JacksonAtkinsX/status/1947408593638002639), [arXiv](https://arxiv.org/abs/2507.11851)).
132 |
133 | * **Voice & Audio**
134 |
135 | * Boson AI open-sources **Higgs Audio v2**, a unified TTS model that beats GPT-4o-mini and ElevenLabs ([X](https://x.com/reach_vb/status/1947997596456272203), [HF Model](https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-base)).
136 |
137 | * **AI Art & Diffusion & 3D**
138 |
139 | * Decart AI Releases **MirageLSD**, a real-time live-stream diffusion model for instant video transformation ([X Post](https://x.com/DecartAI/status/1945947692871692667)).
140 |
141 | * **Tools**
142 |
143 | * Qwen releases **qwen-code**, a CLI tool and agent for their new coder models. ([Github](https://github.com/QwenLM/qwen-code))
144 |
145 | * **GitHub Spark**, a new AI-powered feature from GitHub ([Simon Willison on X](https://x.com/simonw/status/1948407932418457968)).
146 |
147 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2OTE3NDY2MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.jguKI_sBQiDelSUjIO_nJjh0YQaml0qeUsh32Nk1NXE&utm_campaign=CTA_5).
148 |
--------------------------------------------------------------------------------
/example prompts/ThursdAI Dec 11 2025 Infographic prompt.md:
--------------------------------------------------------------------------------
1 | # Infographic Prompt: ThursdAI – Dec 11, 2025 · "GPT-5.2 Drops Live, Open Source Keeps Climbing, and We're Training AI in SPACE"
2 |
3 | Create a high-resolution, wide **16:9 infographic poster** for a tech livestream episode titled:
4 |
5 | **"ThursdAI – Episode 51 • Dec 11, 2025"**
6 |
7 | ---
8 |
9 | ## OVERALL VIBE & CONCEPT
10 |
11 | This episode had a **live breaking news moment**: GPT-5.2 literally dropped while we were on air. The infographic should feel like a **"BREAKING NEWS" broadcast control room** crossed with a futuristic space mission dashboard.
12 |
13 | **Visual concept:** Imagine a mission control screen mid-broadcast—multiple data feeds, live indicators, and that electric energy of news happening in real-time. Mix broadcast urgency with the cosmic wonder of AI being trained in orbit.
14 |
15 | **Style:** Clean vector/infographic, sharp lines, flat shading with subtle gradients. Extremely readable at YouTube thumbnail size and on social feeds. Some glowing "LIVE" and "BREAKING" indicators to capture the energy of the episode.
16 |
17 | **Color palette:**
18 | - Deep space navy/charcoal background (#0a0f1a) with starfield dots and subtle nebula gradients
19 | - **Breaking News accent:** Hot red/coral (#ff4444) for the GPT-5.2 announcement
20 | - **Open Source accent:** Electric teal (#00d4aa) and emerald (#10b981) for Mistral/open models
21 | - **Space accent:** Deep purple (#7c3aed) and cosmic blue (#3b82f6) for the Starcloud/orbit story
22 | - **Foundation accent:** Gold/amber (#f59e0b) for AAIF standardization news
23 | - White and silver highlights for readability
24 |
25 | ---
26 |
27 | ## 1. HERO & HEADER
28 |
29 | **At the top, a wide header banner styled like a live broadcast chyron:**
30 |
31 | - Left corner: A pulsing **"LIVE"** indicator badge (red dot with glow)
32 | - Main title (centered, bold):
33 | - **"ThursdAI – Episode 51"** (large)
34 | - **"Dec 11, 2025 • Live on YouTube & X"** (smaller subtitle)
35 | - Right corner: **"BREAKING NEWS"** badge with urgent styling
36 |
37 | **Left side of header — Alex presentation:**
38 | - Cartoon vector version of host Alex Volkov using reference image (face, beard, hairstyle)
39 | - Style: Clean vector cartoon, animated and excited, wearing a dark hoodie with headphones
40 | - Alex has a "can you believe this is happening right now?!" expression, gesturing toward the breaking news
41 | - Behind Alex: A floating holographic display showing benchmark numbers flying by, like he's presenting live data
42 | - Small floating elements: A satellite with glowing GPU, cosmic particles, the ThursdAI logo
43 |
44 | **Background atmosphere:**
45 | - Starfield with subtle nebula gradients (space theme for the Starcloud story)
46 | - Faint grid lines suggesting a mission control dashboard
47 | - Small orbital paths and satellite trajectories as decorative elements
48 | - Scattered "data packet" particles flowing across the design
49 |
50 | ---
51 |
52 | ## 2. LAYOUT OVERVIEW — Breaking News + Three Bands
53 |
54 | **The infographic should feel like a live broadcast dashboard:**
55 |
56 | 1. **TOP MEGA-BANNER:** GPT-5.2 Breaking News (the moment it dropped live)
57 | 2. **MIDDLE BAND LEFT:** Open Source Surge (Mistral Devstral 2, Essential AI)
58 | 3. **MIDDLE BAND RIGHT:** The Space Race (Starcloud, GPUs in orbit)
59 | 4. **BOTTOM BAND:** Foundation News + Vision/Voice + Math AI
60 |
61 | Use clear section dividers that look like broadcast segment transitions.
62 |
63 | ---
64 |
65 | ## 3. TOP MEGA-BANNER — "🚨 BREAKING: GPT-5.2 DROPS LIVE ON AIR"
66 |
67 | **This is THE headline of the episode. Make it feel like breaking news.**
68 |
69 | **Visual treatment:**
70 | - Full-width banner with urgent red accent color and "BREAKING" styling
71 | - Faint TV static/scan lines texture for broadcast feel
72 | - Alert icons and live indicator badges
73 |
74 | **Title:** "GPT-5.2 — Dropped Live on ThursdAI"
75 | **Subtitle:** "SOTA on ARC-AGI, SWE-Bench, AIME • 390x Cheaper Than o3"
76 |
77 | **Visual:**
78 | - Icon of a microphone/broadcast tower with signal waves, combined with a benchmark trophy
79 | - Small "LIVE" badge next to it
80 | - Lightning bolts and signal waves radiating outward
81 |
82 | **Info panel (styled like live data readouts):**
83 |
84 | ```
85 | 🎯 BENCHMARK DOMINATION:
86 | ├─ ARC-AGI-1 Pro X-High: 90.5% (390x cheaper at $11.64/task)
87 | ├─ ARC-AGI-2 Pro High: 54.2%
88 | ├─ AIME 2025: 100% (Perfect score!)
89 | ├─ GPQA Diamond: 92.4%
90 | ├─ SWE-Bench Pro: 55.6%
91 | └─ GDPval (44 occupations): 70.9% (ties/beats experts)
92 |
93 | 📈 LONG CONTEXT:
94 | ├─ 95% accuracy at 32K tokens
95 | ├─ 85% at 128K tokens
96 | └─ 70%+ at 256K tokens
97 |
98 | 💰 PRICING:
99 | ├─ Thinking: $1.75/$14 per M tokens
100 | └─ Pro: $21/$168 per M tokens
101 |
102 | 🏢 ENTERPRISE WINS:
103 | ├─ Box: +7pts accuracy, 74% faster docs
104 | ├─ Windsurf: "Version bump undersells the jump"
105 | └─ Cline: "Plans deeper, executes better"
106 | ```
107 |
108 | **Add a small callout:** "Sam Altman: 'The smartest generally available model in the world'"
109 |
110 | ---
111 |
112 | ## 4. MIDDLE BAND LEFT — "⚡ OPEN SOURCE SURGE"
113 |
114 | **Section header:** Pill-shaped tag reading "⚡ OPEN SOURCE" in electric teal
115 |
116 | ### Panel A: Mistral Devstral 2 (LARGE, this is the week's open source star)
117 |
118 | **Title:** "Devstral 2 — SOTA Open Coding"
119 | **Subtitle:** "Run Claude Sonnet 3.7-Level on Your 3090"
120 |
121 | **Visual:**
122 | - Icon of stacked code windows with a wind/mistral gust sweeping through
123 | - Small "Apache 2.0" badge
124 | - Speed lines suggesting fast local inference
125 |
126 | **Info block:**
127 |
128 | ```
129 | DEVSTRAL 2 (123B):
130 | ├─ SWE-bench Verified: 72.2% (#1 open)
131 | ├─ Just behind Claude 3.5 Sonnet (72.7%)
132 | ├─ Apache 2.0 License
133 | └─ Pricing: $0.40/$2.00 per M tokens
134 |
135 | DEVSTRAL SMALL 2 (24B):
136 | ├─ SWE-bench: 68.0%
137 | ├─ Runs on consumer GPUs
138 | ├─ Multimodal (images)
139 | └─ 7x cost efficiency
140 |
141 | + MISTRAL VIBE CLI:
142 | Open-source coding agent for terminal
143 | ```
144 |
145 | ### Panel B: Essential AI Rnj-1 (Smaller)
146 |
147 | **Title:** "Rnj-1 — Frontier 8B from Transformers Co-Author"
148 | **Subtitle:** "Ashish Vaswani's New Lab Ships"
149 |
150 | **Visual:**
151 | - Icon of a compact gem/diamond with mathematical symbols
152 | - "8B" badge
153 |
154 | **Info block:**
155 |
156 | ```
157 | ├─ 8B params, 8.7T training tokens
158 | ├─ SWE-bench: 20.8% (GPT-4o level!)
159 | ├─ Apache 2.0, JAX on TPUs + AMD
160 | ├─ No reasoning overhead—pure intuition
161 | └─ Led by Transformer paper co-author
162 | ```
163 |
164 | ---
165 |
166 | ## 5. MIDDLE BAND RIGHT — "🛰️ AI IN ORBIT"
167 |
168 | **Section header:** Pill-shaped tag reading "🛰️ SPACE AI" in cosmic purple/blue
169 |
170 | ### Panel C: Starcloud — First LLM Trained in Space (LARGE, this is a jaw-dropper)
171 |
172 | **Title:** "Starcloud-1 — First LLM Trained in Space"
173 | **Subtitle:** "H100 GPU in Orbit • NanoGPT on Shakespeare"
174 |
175 | **Visual:**
176 | - Icon of a satellite with glowing H100 GPU chip orbiting Earth
177 | - Starfield and orbital path arcs
178 | - Small SpaceX-style rocket trail
179 | - Sun rays hitting solar panels
180 |
181 | **Info block:**
182 |
183 | ```
184 | 🚀 THE MISSION:
185 | ├─ Nvidia-backed startup Starcloud
186 | ├─ H100 GPU aboard Starcloud-1 satellite
187 | ├─ Launched via SpaceX (Nov 2024)
188 | └─ GPU running error-free vs radiation/vacuum
189 |
190 | 🧠 THE TRAINING:
191 | ├─ Trained Karpathy's nanoGPT in orbit
192 | ├─ Dataset: Complete works of Shakespeare
193 | ├─ Ran inference on Google's Gemma
194 | └─ Generated: "Greetings, Earthlings!"
195 |
196 | 💬 REACTIONS:
197 | ├─ Karpathy: "nanoGPT - first LLM in space 🥹"
198 | └─ Elon Musk: "Cute 🥰 🚀💫"
199 |
200 | 🔮 THE VISION:
201 | ├─ Unlimited solar power in orbit
202 | ├─ Radiative cooling (no AC needed)
203 | ├─ 10x cheaper than Earth grids?
204 | └─ Data centers eating 50% US power by 2030
205 | ```
206 |
207 | ---
208 |
209 | ## 6. BOTTOM BAND — THREE SECTIONS
210 |
211 | Divide the bottom into three equal sections with distinct accent colors:
212 |
213 | ### Section D: AAIF — Agentic AI Foundation (Gold/Amber accent)
214 |
215 | **Title:** "AAIF — Agentic Standards Go Vendor-Neutral"
216 | **Subtitle:** "MCP + AGENTS.md + Goose Under Linux Foundation"
217 |
218 | **Visual:**
219 | - Icon of interconnected nodes with a Linux penguin silhouette
220 | - Multiple company logos abstracted as connected blocks
221 | - "Open Standards" badge
222 |
223 | **Info block:**
224 |
225 | ```
226 | FOUNDING PROJECTS:
227 | ├─ MCP (Anthropic): 10,000+ servers in Year 1
228 | ├─ AGENTS.md (OpenAI): 20,000+ repos
229 | └─ Goose (Block): LLM-agnostic coding agent
230 |
231 | PLATINUM MEMBERS:
232 | AWS • Bloomberg • Cloudflare • Google • Microsoft
233 |
234 | GOLD MEMBERS:
235 | Docker • Cisco • Salesforce • GitHub • Snowflake
236 |
237 | "The best thing isn't the standard—
238 | it's that it's vendor-neutral now."
239 | ```
240 |
241 | ### Section E: Math AI Dominance (Teal accent)
242 |
243 | **Title:** "Putnam Math Competition — AI Goes #1"
244 | **Subtitle:** "Formal Proofs, Gold Medals, Verified by Lean"
245 |
246 | **Visual:**
247 | - Icon of a gold medal with mathematical symbols (∫, π, Σ)
248 | - Trophy with proof checkmarks
249 |
250 | **Info block:**
251 |
252 | ```
253 | NOMOS-1 (Nous Research):
254 | ├─ 30B params, 87/120 Putnam 2025
255 | ├─ Would be #2 out of 3,988 humans
256 | ├─ 63pt uplift over baseline (24→87)
257 | └─ Open-sourced on HuggingFace
258 |
259 | AXIOM PROVER (4-month-old startup):
260 | ├─ 9/12 problems in Lean formal proofs
261 | ├─ 100% machine-verifiable
262 | ├─ Would be #1 / Putnam Fellow
263 | └─ "Math AI can now PROVE it's right"
264 | ```
265 |
266 | ### Section F: Vision + Voice Quick Hits (Violet/Magenta accent)
267 |
268 | **Title:** "Vision & Voice Upgrades"
269 |
270 | **Sub-cards (compact):**
271 |
272 | **GLM-4.6V (Z.ai):**
273 | ```
274 | ├─ 106B + 9B Flash variants
275 | ├─ 128K context (150 pages!)
276 | ├─ Native VLM tool calling
277 | └─ MathVista 88.2, WebVoyager 81
278 | ```
279 |
280 | **Google Gemini 2.5 TTS:**
281 | ```
282 | ├─ Enhanced expressivity
283 | ├─ Context-aware pacing
284 | ├─ Multi-speaker (24 languages)
285 | └─ Better than OpenAI realtime?
286 | ```
287 |
288 | **VoxCPM 1.5 (OpenBMB):**
289 | ```
290 | ├─ 44.1kHz Hi-Fi (was 16kHz)
291 | ├─ 6.25Hz token rate (2x efficient)
292 | ├─ Zero-shot voice cloning
293 | └─ RTF 0.15 on RTX 4090
294 | ```
295 |
296 | ---
297 |
298 | ## 7. SIDE STRIP — DISNEY + OPENAI & THIS WEEK'S BUZZ
299 |
300 | **Vertical strip on the right side with quick hits:**
301 |
302 | **Disney + OpenAI:**
303 | ```
304 | ├─ $1B investment
305 | ├─ Character IP in Sora
306 | ├─ High-five Darth Vader in Jan
307 | └─ Disney was suing Google yesterday...
308 | ```
309 |
310 | **OpenRouter State of AI:**
311 | ```
312 | ├─ 100T+ tokens analyzed
313 | ├─ Reasoning >50% of usage
314 | ├─ Programming >50% of tokens
315 | ├─ Open source hit 30% share
316 | ```
317 |
318 | **This Week's Buzz — W&B Weave:**
319 | ```
320 | ├─ OpenRouter Broadcast → Weave
321 | ├─ Trace any OpenRouter tool
322 | ├─ Zero code instrumentation
323 | └─ Works with Claude Code!
324 | ```
325 |
326 | ---
327 |
328 | ## 8. BOTTOM CTA BAR
329 |
330 | Full-width call-to-action bar at the very bottom:
331 |
332 | **Background:** Gradient from coral/red (left, breaking news energy) to cosmic purple (right, space theme), bridging the episode's stories
333 |
334 | **Text centered:**
335 | - Large: **"Episode 51 • Next Week: The Year-End Recap 🎉"**
336 | - Smaller: **"Subscribe to ThursdAI • Follow @altryne • thursdai.news"**
337 |
338 | **Icons:** YouTube logo (stylized), X logo, envelope (newsletter), microphone (podcast), satellite (space theme)
339 |
340 | ---
341 |
342 | ## 9. STYLE & TECHNICAL NOTES
343 |
344 | **Style:**
345 | - Vector/infographic, clean lines, subtle gradients
346 | - **Broadcast control room meets space mission dashboard**
347 | - "LIVE" and "BREAKING" badges with subtle glow effects
348 | - Starfield background with orbital path decorations
349 | - Grid lines suggesting data dashboards
350 |
351 | **Key visual elements:**
352 | - Red pulsing "LIVE" indicator in corner
353 | - "BREAKING NEWS" chyron styling for GPT-5.2 section
354 | - Satellite/orbital imagery for space section
355 | - Benchmark numbers displayed like live data feeds
356 | - Small sparkle/glow effects on key stats
357 |
358 | **Priorities:**
359 | - Extreme legibility at all sizes
360 | - Clear visual hierarchy: Breaking News GPT-5.2 > Devstral/Starcloud > AAIF/Math/Voice > CTA
361 | - Each panel should work as a standalone "slide" when zoomed in during stream
362 | - The breaking news energy should be palpable
363 |
364 | **Avoid:**
365 | - No heavy clutter
366 | - No real company logos—use abstract/stylized icons
367 | - No watermarks
368 | - No Christmas imagery—keep it space/broadcast themed
369 |
370 | **Alex in the header should have:**
371 | - An excited, "this is actually happening!" expression
372 | - Gesturing toward the breaking news banner
373 | - Headphones on, mid-broadcast energy
374 | - Maybe holding a tablet showing live benchmarks
375 |
376 | **Resolution:** Render at 4K (3840×2160) for streaming and social sharing.
377 |
378 | ---
379 |
380 | ## BONUS: INDIVIDUAL PANEL PROMPTS
381 |
382 | For use during the stream, each major section should also work as a standalone panel:
383 |
384 | 1. **GPT-5.2 Breaking News Panel** — Red/coral urgent styling, benchmark stats flying
385 | 2. **Mistral Devstral 2 Panel** — Teal coding theme, Apache 2.0 badge prominent
386 | 3. **Starcloud Space Panel** — Cosmic purple, satellite orbiting Earth with H100
387 | 4. **AAIF Foundation Panel** — Gold/amber, connected nodes representing standards
388 | 5. **Math AI Panel** — Teal with gold medals, proof checkmarks
389 | 6. **This Week's Buzz Panel** — W&B Weave integration, trace visualization
390 |
391 | ---
392 |
393 | *The final poster should capture that electric energy of GPT-5.2 dropping live mid-show, the open source momentum with Devstral, and the mind-blowing reality that we're now training AI in actual space.*
394 |
395 |
396 |
397 |
398 |
--------------------------------------------------------------------------------
/Q3_2025_AI_Recap.md:
--------------------------------------------------------------------------------
1 | # ThursdAI Q3 2025 - AI Yearly Recap
2 | ## The Quarter of GPT-5, Trillion-Parameter Open Source, and World Models
3 |
4 | *Based on 12 ThursdAI episodes from July 3 - September 26, 2025*
5 |
6 | ---
7 |
8 | 
9 |
10 | ---
11 |
12 | ## 🔥 Quarter Overview
13 |
14 | Q3 2025 will be remembered as the quarter when **GPT-5 arrived**, **open source hit the trillion-parameter mark** with Kimi K2, and **world models became playable**. Chinese labs continued their open-source dominance with Qwen, DeepSeek, and ByteDance releases, while OpenAI shipped both their flagship GPT-5 and Apache-2.0 licensed GPT-OSS models. Google's Genie-3 showed us the future of interactive generated worlds, and video generation reached "can't tell it's AI" quality.
15 |
16 | ---
17 |
18 | ## 📅 July 2025 - "Trillion-Parameter Open Source & Agent Awakening"
19 |
20 | ### 🎯 Top Stories
21 |
22 | #### 🦄 **Kimi K2 - The Trillion-Parameter Open Source King** (Jul 17)
23 | Moonshot dropped a bomb with Kimi K2, a **1 trillion parameter** MoE model:
24 | - **65.8% on SWE-bench Verified** - beating Claude Sonnet without reasoning
25 | - Only **32B active parameters** making it actually runnable
26 | - **128K context** standard (2M+ rumored capability)
27 | - Trained on **15.5 trillion tokens** with the Muon optimizer
28 | - **Modified MIT license** - actually open!
29 | - **SOTA on EQBench creative writing** - finally an open model that writes well!
30 |
31 | > "This isn't just another model release. This is 'Sonnet at home' if you have the hardware." - Alex Volkov
32 |
33 | #### 🔥 **Grok-4 & Grok Heavy** (Jul 10)
34 | xAI unveiled Grok-4 and a multi-agent swarm called Grok Heavy:
35 | - **50% on Humanity's Last Exam** (with tools) - unprecedented
36 | - **15.9% on ARC-AGI v2** - 2x better than Opus 4
37 | - **100% on AIME25**, 88.9% on GPQA Diamond
38 | - Heavily scaled RL training
39 | - Controversy: "Mechahitler" incident, Grok searching "what does Elon think"
40 |
41 | #### 🤖 **OpenAI ChatGPT Agent (Odyssey)** (Jul 17)
42 | OpenAI merged Deep Research + Operator into one agentic system:
43 | - **41.6% on HLE** (double o3), **27.4% on FrontierMath**
44 | - Combines text browser + visual browser + terminal + code execution
45 | - Can browse, code, call APIs, generate images, build spreadsheets
46 | - Wedding planning, sticker ordering demos wowed audiences
47 |
48 | #### 🇨🇳 **Chinese Open Source Explosion** (Jul 3)
49 | - **Baidu ERNIE 4.5**: 10 models (424B to 0.3B), Apache 2.0, 128K context, multimodal
50 | - **Tencent Hunyuan-A13B**: 80B MoE (13B active), 256K context from WizardLM team
51 | - **Huawei Pangu Pro MoE**: 72B trained entirely on Ascend NPUs (no Nvidia!)
52 |
53 | ### Open Source LLMs
54 | | Release | Significance |
55 | |---------|-------------|
56 | | **Qwen3-Coder-480B** (Jul 24) | 69.6% SWE-bench Verified, 7.5T tokens training, 256K context |
57 | | **Qwen3-235B-A22B-Instruct-2507** | 81% MMLU-Redux, 70% LiveCodeBench, hybrid reasoning dropped |
58 | | **DeepSWE-Preview** | 59% SWE-bench-Verified, pure RL on Qwen3-32B |
59 | | **SmolLM3** (3B) | HuggingFace's 11T token pocket model, 256K context |
60 | | **HuggingFace SmolLM3** | Dual reasoning modes, 128K→256K context |
61 | | **LG EXAONE 4.0** (32B) | 81.8% MMLU Pro from LG (the fridge company!) |
62 |
63 | ### Big CO LLMs + APIs
64 | | Release | Significance |
65 | |---------|-------------|
66 | | **Grok-4 & Heavy** | SOTA on HLE (50%), ARC-AGI v2 (15.9%) |
67 | | **ChatGPT Agent** | Unified agentic AI for real-world tasks |
68 | | **OpenAI/Alphabet IMO Gold** | Both won Gold at International Math Olympiad |
69 | | **White House AI Action Plan** | 90 policy proposals for US AI dominance |
70 |
71 | ### Vision & Video
72 | - **Wan 2.2**: First MoE video model, 5-second 720p on single 4090, open source
73 | - **Runway Gen-3 Aleph**: Chat-based video editing, scene transformations
74 | - **Runway Act-Two**: Next-gen motion capture (head, face, body, hands)
75 |
76 | ### Voice & Audio
77 | - **Mistral Voxtral**: SOTA open speech recognition, beats Whisper v3, Apache 2.0
78 | - **Higgs Audio v2**: Beats GPT-4o-mini and ElevenLabs on prosody
79 | - **Riffusion x Producer.ai**: Chatable studio producer
80 |
81 | ### Tools
82 | - **Perplexity Comet**: AI-powered browser, mouse moves on its own
83 | - **Amazon Kiro**: Spec-driven AI IDE from AWS
84 | - **Liquid AI LEAP + Apollo**: On-device AI platform for mobile
85 |
86 | ---
87 |
88 | ## 📅 August 2025 - "GPT-5 Month"
89 |
90 | ### 🎯 Top Stories
91 |
92 | #### 👑 **GPT-5 Launch** (Aug 7)
93 | OpenAI released GPT-5, 32 months after GPT-4:
94 | - **400K context window**
95 | - **$1.25/$10 per million tokens** (Opus is $15/$75)
96 | - Unified thinking + chat model
97 | - Router-based architecture (initially buggy)
98 | - Quiz mode, Gmail integration, memory features
99 | - Free tier access for back-to-school
100 | - But: Writing quality disappointed some, needed prompting guide
101 |
102 | > "32 months since GPT-4 release, 32 months of ThursdAI" - Alex Volkov
103 |
104 | #### 🔓 **GPT-OSS (120B/20B)** - OpenAI Goes Open Source (Aug 5)
105 | Historic release under Apache 2.0 license:
106 | - 120B and 20B models
107 | - Configurable reasoning via system prompt (`reasoning: high`)
108 | - Function calling, web search, Python execution
109 | - Full chain-of-thought access (unlike GPT-5)
110 | - Mixed reviews: Great at code/math, weak at creative writing
111 |
112 | #### 🌍 **Google Genie-3 World Model** (Aug 7)
113 | DeepMind's world model generated **fully interactive 3D environments**:
114 | - Real-time at 24fps
115 | - Single image or text prompt → controllable world
116 | - Paint a wall, turn away, it remembers (memory/consistency breakthrough)
117 | - Walk, fly, control camera in generated worlds
118 |
119 | #### 🔀 **DeepSeek V3.1 Hybrid Reasoner** (Aug 21)
120 | DeepSeek released a hybrid that combines V3 + R1:
121 | - Matches/beats R1 with **fewer thinking tokens**
122 | - Tool calls inside thinking process
123 | - 66% SWE-bench Verified (non-thinking) vs R1's 44%
124 | - 128K context, MIT licensed
125 | - TerminalBench: 5.7→31 improvement
126 |
127 | ### Open Source LLMs
128 | | Release | Significance |
129 | |---------|-------------|
130 | | **DeepSeek V3.1** | Hybrid reasoner, R1-level with less thinking |
131 | | **ByteDance Seed-OSS 36B** | Apache 2, 512K context, "thinking budget" control |
132 | | **NVIDIA Nemotron Nano 9B V2** | Mixed Mamba+Transformer, 6x throughput, open dataset |
133 | | **Cohere Command-A Reasoning** | 111B dense, 256K context, 70% BFCL |
134 | | **GLM-4.5V** | 106B VLM from Zhipu, SOTA vision intelligence |
135 |
136 | ### Big CO LLMs + APIs
137 | | Release | Significance |
138 | |---------|-------------|
139 | | **GPT-5** | 400K context, unified reasoning, router architecture |
140 | | **GPT-OSS** | Apache 2.0, 120B/20B, full CoT access |
141 | | **Anthropic Opus 4.1** | Pre-emptive upgrade before GPT-5 |
142 | | **Claude Sonnet 1M context** | Extended to 1M in API |
143 |
144 | ### Vision & Video
145 | - **Hunyuan GameCraft**: Game video generation with physics, runs on 4090
146 | - **Skywork Matrix-Game 2.0**: Real-time world model, 25fps, open source
147 | - **LFM2-VL**: Liquid AI's 440M & 1.6B vision-language models, 2x faster
148 |
149 | ### AI Art & Diffusion
150 | - **Nano Banana**: Mystery model (rumored Google) doing 3D-aware scene editing
151 | - **Qwen Image Edit (20B)**: Fully open image editor, bilingual, runs locally
152 | - **FLUX.1 Krea [dev]**: Natural aesthetics, no "AI gloss"
153 |
154 | ### Tools
155 | - **Agents.md Standard**: OpenAI's config file to unify agent instructions
156 | - **Catnip**: W&B's containerized multi-agent coding workspace
157 | - **Cursor gets Sonic**: Mystery "Grok Code" model appears
158 |
159 | ---
160 |
161 | ## 📅 September 2025 - "Shiptember Delivers"
162 |
163 | ### 🎯 Top Stories
164 |
165 | #### 🧑💻 **GPT-5-Codex** (Sep 18)
166 | OpenAI's agentic coding finetune of GPT-5:
167 | - Works **7+ hours independently** on complex tasks
168 | - **93% fewer tokens** on simple tasks
169 | - Integrated everywhere: CLI, VS Code, web, iPhone
170 | - Reviews majority of OpenAI's own PRs
171 | - Perfect 12/12 on 2025 ICPC with unreleased reasoning model
172 |
173 | #### 👓 **Meta Connect 25 - AI Glasses with Display** (Sep 18)
174 | Meta unveiled next-gen Ray-Ban glasses:
175 | - **Built-in display** (invisible from outside)
176 | - **Neural band wristband** for muscle-based control
177 | - Live translation with subtitles in field of view
178 | - Agentic AI doing research tasks
179 | - **$799**, shipping immediately
180 |
181 | #### 🐋 **DeepSeek V3.1 Terminus** (Sep 26)
182 | Surgical update fixing agent behavior:
183 | - Fixed code-switching bug ("sudden Chinese")
184 | - Improved tool-use and browser execution
185 | - Less overthinking/stalling in agentic flows
186 | - HLE: 15→21.7 improvement
187 |
188 | #### 🦜 **Qwen-mas Strikes Again** (Sep 26)
189 | Alibaba's multimodal blitz:
190 | - **Qwen3-VL-235B**: Vision reasoner, 22B active, 1M context for video
191 | - **Qwen3-Omni-30B**: End-to-end omni-modal (text, image, audio, video), sub-250ms speech
192 | - **Qwen-Max**: Over 1T parameters, 69.6% SWE-bench, roadmap to 100M token context
193 |
194 | ### Open Source LLMs
195 | | Release | Significance |
196 | |---------|-------------|
197 | | **Qwen3-VL-235B-A22B-Thinking** | Vision reasoner, 1M context for 2-hour video |
198 | | **Qwen3-Omni-30B-A3B** | Real-time omni-modal, 119 languages |
199 | | **Tongyi DeepResearch A3B** | Web agent matching OpenAI Deep Research, 98.6% SimpleQA |
200 | | **Qwen-Next-80B-A3B** | Ultra-sparse MoE, rivals 235B reasoning |
201 | | **Liquid Nanos** | 350M-2.6B models for structured extraction |
202 | | **IBM Granite OCR 258M** | Tiny doc parser, runs on Raspberry Pi |
203 |
204 | ### Big CO LLMs + APIs
205 | | Release | Significance |
206 | |---------|-------------|
207 | | **GPT-5-Codex** | 7+ hour autonomous coding sessions |
208 | | **Grok-4 Fast** | 2M context, 40% fewer thinking tokens, 1% cost |
209 | | **NVIDIA $100B pledge to OpenAI** | "Biggest infrastructure project in history" |
210 | | **ChatGPT Pulse** | Proactive AI news based on your data |
211 | | **OpenAI-Oracle $300B deal** | $60B/year for compute, 5-year deal |
212 | | **Anthropic $13B raise** | $183B valuation, $5B revenue run rate |
213 | | **Mistral $13.8B valuation** | $1.3B from ASML, Europe's decacorn |
214 |
215 | ### Vision & Video
216 | - **ByteDance SeeDream 4**: 4K SOTA image gen/editing, up to 6 reference images
217 | - **Lucy 14B**: 5-second video in 6.5 seconds (insane speed)
218 | - **Wan 2.2 Animate**: Motion transfer + lip sync, open source
219 | - **Wan 4.5 Preview**: 1080p 10s with synced speech generation
220 | - **Kling 2.5 Turbo**: 30% cheaper, audio included
221 | - **Ray3**: Luma's "reasoning" video with HDR
222 |
223 | ### Voice & Audio
224 | - **Suno V5**: "I can't tell anymore" era, human-level vocals
225 | - **Qwen3-ASR-Flash**: 11-language speech recognition with singing
226 | - **Stable Audio 2.5**: 3-minute tracks in <2 seconds
227 |
228 | ### AI Art & Diffusion
229 | - **Hunyuan SRPO**: New diffusion finetuning method
230 | - **Reve 4-in-1**: Image creation + editing platform
231 | - **FeiFei WorldLabs Marble**: Images → walkable Gaussian Splat 3D worlds
232 |
233 | ### Tools
234 | - **Google Gemini in Chrome**: Chat across tabs, browse history knowledge
235 | - **ChatGPT full MCP support**: Developer mode for tool connectors
236 | - **Oasis 2.0**: Real-time Minecraft world generation mod
237 |
238 | ---
239 |
240 | ## 📊 Quarter Summary: Major Themes
241 |
242 | ### 1. 🧠 **GPT-5 Era Begins**
243 | - OpenAI unified reasoning + chat into one model
244 | - Router-based architecture for intelligent model selection
245 | - Agentic coding (Codex) works for 7+ hours independently
246 | - GPT-OSS brought open-source from OpenAI (Apache 2.0)
247 |
248 | ### 2. 🇨🇳 **Open Source Hits Trillion-Scale**
249 | - Kimi K2: 1T parameters, beats Claude Sonnet on SWE-bench
250 | - Qwen3-Coder: 480B, 69.6% SWE-bench
251 | - DeepSeek V3.1: Hybrid reasoning, fewer tokens
252 | - W&B Inference launched to host these monsters
253 |
254 | ### 3. 🌍 **World Models Become Playable**
255 | - Google Genie-3: Interactive 3D worlds at 24fps
256 | - Hunyuan GameCraft: Game video with physics
257 | - Matrix-Game 2.0: Unreal/GTA-trained, 25fps
258 | - Oasis 2.0: Real-time Minecraft reskinning
259 |
260 | ### 4. 🎥 **Video Reaches "Can't Tell" Quality**
261 | - SeeDream 4: 4K in <2 seconds
262 | - Lucy 14B: Near-realtime video generation
263 | - Suno V5: Indistinguishable from human music
264 | - Wan 4.5: Speech-synced video generation
265 |
266 | ### 5. 💰 **Unprecedented Investment**
267 | - NVIDIA $100B pledge to OpenAI
268 | - OpenAI-Oracle $300B deal
269 | - Anthropic $13B raise at $183B valuation
270 | - Meta Superintelligence Labs: $100-300M packages to poached researchers
271 |
272 | ### 6. 🤖 **Agents Get Serious**
273 | - ChatGPT Agent unifies browser + terminal + research
274 | - Agents.md standardizes agent config
275 | - Desktop agents hit 48% on OSWorld (up from ~12%)
276 | - MCP support spreading everywhere
277 |
278 | ---
279 |
280 | ## 🏆 Q3 2025: Biggest Releases by Month
281 |
282 | ### July
283 | 1. **Kimi K2** - 1T parameter open source, 65.8% SWE-bench
284 | 2. **Grok-4 & Heavy** - 50% HLE, 15.9% ARC-AGI
285 | 3. **ChatGPT Agent** - Unified agentic AI
286 | 4. **Qwen3-Coder-480B** - 69.6% SWE-bench
287 | 5. **White House AI Action Plan** - US AI strategy
288 |
289 | ### August
290 | 1. **GPT-5** - 400K context, unified reasoning
291 | 2. **GPT-OSS** - Apache 2.0, 120B/20B open weights
292 | 3. **Google Genie-3** - Playable AI-generated worlds
293 | 4. **DeepSeek V3.1** - Hybrid reasoner
294 | 5. **Meta Smart Glasses** - Display + neural control
295 |
296 | ### September
297 | 1. **GPT-5-Codex** - 7-hour autonomous coding
298 | 2. **NVIDIA $100B pledge** - Biggest AI infrastructure deal
299 | 3. **Qwen3-VL + Omni** - Complete multimodal suite
300 | 4. **ByteDance SeeDream 4** - SOTA 4K image gen
301 | 5. **Anthropic $13B raise** - $183B valuation
302 |
303 | ---
304 |
305 | *"This was the summer of trillion-parameter open source, GPT-5, and world models you can walk in. We're not just accelerating—we're in a completely different phase of AI. Hold on to your butts."* - Alex Volkov, ThursdAI
306 |
307 | ---
308 |
309 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)*
310 |
--------------------------------------------------------------------------------
/Q2_2025_AI_Recap.md:
--------------------------------------------------------------------------------
1 | # ThursdAI Q2 2025 - AI Yearly Recap
2 | ## The Quarter That Shattered Reality
3 |
4 | *Based on 13 ThursdAI episodes from April 3 - June 26, 2025*
5 |
6 | ---
7 |
8 | 
9 |
10 | ---
11 |
12 | ## 🔥 Quarter Overview
13 |
14 | Q2 2025 will be remembered as the quarter when **video AI crossed the uncanny valley** (VEO3's native audio blew minds), **tool-using reasoning models emerged** (o3 can call tools mid-thought!), and **open source matched frontier models** (Qwen 3 and Claude 4 delivered back-to-back). Google I/O dropped an avalanche of announcements, Meta's Llama 4 had a chaotic launch, and the agent ecosystem matured with MCP becoming the universal standard.
15 |
16 | ---
17 |
18 | ## 📅 April 2025 - "Tool-Using Reasoners & Llama Chaos"
19 |
20 | ### 🎯 Top Stories
21 |
22 | #### 🧠 **OpenAI o3 & o4-mini - Reasoning Meets Tool Use** (Apr 17)
23 | The most important reasoning model upgrade in AI history. For the first time, o-series models can:
24 | - **Autonomously use tools during reasoning** (web search, Python, image gen)
25 | - Chain 600+ consecutive tool calls to solve complex problems
26 | - Manipulate images mid-thought (cropping, zooming, rotating)
27 | - Score **$65k on Freelancer eval** (vs o1's $28k)
28 | - **o4-mini hits 99.5% on AIME** when using Python interpreter
29 |
30 | > "This is almost AGI territory - agents that reason while wielding tools" - Alex Volkov
31 |
32 | #### 📚 **GPT-4.1 Family - 1 Million Token Context** (Apr 14)
33 | OpenAI dropped GPT-4.1, 4.1-mini, and 4.1-nano with:
34 | - **1 million token context window** across all three models
35 | - GPT-4.5 deprecated - 4.1 actually outperforms it
36 | - Near-perfect recall across entire 1M context
37 | - 4.1-mini achieves 72% on Video-MME
38 | - "Sandwich" prompting trick boosts mini from 31% → 49%
39 |
40 | #### 🦙 **Meta Llama 4 - Scout & Maverick** (Apr 5)
41 | Meta dropped their biggest models ever, amid controversy:
42 | - **Scout**: 17B active / 109B total (16 experts) - 10M context claimed
43 | - **Maverick**: 17B active / 400B total (128 experts) - 1M context
44 | - Release caused LMArena drama (tested model ≠ released model)
45 | - Community criticism: too big to run locally
46 | - **Behemoth (288B active / 2T total)** teased but unreleased
47 |
48 | #### ⚡ **Gemini 2.5 Flash - Controllable Thinking Budgets** (Apr 17)
49 | Google's direct counter to o3/o4-mini:
50 | - Set "thinking budget" (0-24K tokens) per API call
51 | - 1M token context window
52 | - Ultra-cheap: $0.15 input / $0.60 output per 1M tokens
53 | - Balance speed/cost vs reasoning depth in one model
54 |
55 | ### Open Source LLMs
56 | | Release | Significance |
57 | |---------|-------------|
58 | | **DeepCoder-14B** | Beats DeepSeek R1 on coding, distributed RL training |
59 | | **NVIDIA Nemotron Ultra 253B** | Pruned Llama 405B, actually beats Llama 4 on AIME |
60 | | **Kimi-VL 3B** | MIT licensed VLM, 128K context, rivals 10x larger models |
61 | | **HiDream-I1 17B** | MIT license, surpasses Flux 1.1 Pro on image gen |
62 | | **GLM-4 Family** | ChatGLM rebranded, MIT licensed, up to 70B |
63 |
64 | ### Big CO LLMs + APIs
65 | - **Google celebrates MCP** - official support announced, joining Microsoft & AWS
66 | - **Google A2A Protocol** - Agent-to-Agent communication standard launched
67 | - **Grok 3 API** - xAI finally opens API access
68 | - **ChatGPT Memory Upgrade** - can now reference ALL past chats
69 |
70 | ### Vision & Video
71 | - **VEO-2 GA** - Google's video model goes Generally Available
72 | - **Kling 2.0 Creative Suite** - MVL prompting, inline images in text prompts
73 | - **Runway Gen-4** - Focus on character/world consistency
74 | - **MAGI-1 24B** - Send AI drops open weights video model (Apache 2.0)
75 |
76 | ### Voice & Audio
77 | - **Dia 1.6B TTS** - Unhinged emotional range from Korean startup, MIT licensed
78 | - **PipeCat SmartTurn** - Open source semantic VAD for natural conversations
79 | - **DolphinGemma** - Google AI attempts dolphin communication 🐬
80 |
81 | ### Tools
82 | - **OpenAI Codex CLI** - Open source with Apple Seatbelt security
83 | - **Firebase Studio** - Google's vibe coding platform (formerly Project IDX)
84 | - **GitMCP** - Turn any GitHub repo into MCP server (viral launch)
85 |
86 | ---
87 |
88 | ## 📅 May 2025 - "Qwen 3 Revolution & Claude 4 Arrives"
89 |
90 | ### 🎯 Top Stories
91 |
92 | #### 🔥 **Qwen 3 - The Open Source Benchmark Crusher** (May 1)
93 | Alibaba dropped the most comprehensive open source release ever:
94 | - **8 models**: 2 MoE (235B/22B active, 30B/3B active) + 6 dense (0.6B-32B)
95 | - **Apache 2.0 license** on everything
96 | - Runtime `/think` toggle for chain-of-thought on demand
97 | - **4B dense beats Qwen 2.5-72B** on multiple benchmarks 🤯
98 | - 36T training tokens, 119 languages, 128K context
99 | - Day-one support in LM Studio, Ollama, vLLM, MLX
100 |
101 | > "The 30B MoE is 'Sonnet 3.5 at home' - 100+ tokens/sec on MacBooks" - Nisten
102 |
103 | #### 🤖 **Claude 4 Opus & Sonnet - Live Drop During ThursdAI!** (May 22)
104 | Anthropic crashed the party mid-show with:
105 | - **Claude 4 Opus**: 72.5% SWE-bench, handles 6-7 hour human tasks
106 | - **Claude 4 Sonnet**: 72.7% SWE-bench (80% with parallel test-time compute!)
107 | - First models to cross 80% on SWE-bench threshold
108 | - Hybrid reasoning + instant response modes
109 | - 65% less likely to engage in loopholes vs Sonnet 3.7
110 | - Knowledge cutoff: March 2025
111 |
112 | #### 🎬 **VEO3 - The Undisputed Star of Google I/O** (May 20)
113 | The video model that crossed the uncanny valley:
114 | - **Native multimodal audio** - generates speech, sound effects, music synced perfectly
115 | - Perfect lip-sync with situational awareness
116 | - Characters look at each other, understand who's speaking
117 | - Can generate text within videos
118 | - Spawned viral "Prompt Theory" phenomenon on TikTok
119 |
120 | > "VEO3 isn't just video generation - it's a world simulator" - Alex Volkov
121 |
122 | #### 🎨 **GPT-4o Native Image Gen - Ghibli-mania 2.0** (May 22)
123 | OpenAI enables native image gen in GPT-4o (again), now via API:
124 | - **GPT Image 1 API** finally released
125 | - Organizational verification required (biometric scan)
126 | - Supports generations, edits, and masking
127 | - Excellent text rendering in images
128 | - Struggles with realistic face matching (possibly intentional)
129 |
130 | ### Google I/O Avalanche
131 | | Release | Significance |
132 | |---------|-------------|
133 | | **Gemini 2.5 Pro Deep Think** | 84% on MMMU, 65th percentile on USA Math Olympiad |
134 | | **Gemini 2.5 Flash GA** | Thinking budgets, native audio I/O |
135 | | **Gemini Diffusion** | 2000 tokens/sec for code/math editing |
136 | | **Jules** | Free async coding agent at jules.google |
137 | | **Project Mariner** | Browser control via API (agentic web) |
138 | | **Gemini Ultra tier** | $250/month with DeepThink, VEO3, 30TB storage |
139 | | **AI Mode in Search GA** | Can connect to Gmail/Docs, Deep Search capability |
140 |
141 | ### Open Source LLMs
142 | | Release | Significance |
143 | |---------|-------------|
144 | | **Phi-4-Reasoning** | 14B hits 78% on AIME 25, MIT licensed |
145 | | **AM-Thinking v1 32B** | Dense model, 85.3% AIME 2024, Apache 2 |
146 | | **Devstral 24B** | Mistral + AllHands collab, SOTA on SWE-bench |
147 | | **Gemma 3n** | 4B MatFormer, mobile-first multimodal |
148 | | **NVIDIA Nemotron 8B/49B** | Reasoning toggle via system prompt |
149 |
150 | ### Big CO LLMs + APIs
151 | - **OpenAI Codex Agent** - Async GitHub agent, opens PRs, fixes bugs
152 | - **OpenAI hires Jony Ive** - $6.5B deal, "IO" hardware company
153 | - **GitHub Copilot open sourced** - Frontend code now open
154 | - **Microsoft MCP in Windows** - Protocol support at OS level
155 | - **LMArena raises $100M** - a16z seed, impartiality questions raised
156 |
157 | ### Vision & Video
158 | - **Odyssey Interactive Worlds** - Walk through AI-generated worlds with WASD
159 | - **HunyuanPortrait/Avatar** - Open source competitors to HeyGen/Hedra
160 | - **Wan 2.1** - Alibaba's open source diffusion-transformer video suite
161 | - **Flux Kontext** - SOTA image editing with character consistency
162 |
163 | ### Voice & Audio
164 | - **ElevenLabs V3** - Emotion tags, 70+ languages, multi-speaker dialogue
165 | - **OpenAI Voice Revolution** - GPT 4.0 Transcribe (promptable ASR), semantic VAD
166 | - **Chatterbox 0.5B** - Open source TTS with emotion control, Apache 2.0
167 | - **Unmute.sh** - KyutAI wrapper adds voice to any LLM
168 |
169 | ### Tools
170 | - **AlphaEvolve** - Gemini-powered algorithm discovery (0.7% global compute recovery!)
171 | - **Claude Code GA** - Shell-based agent with IDE integrations
172 | - **Cursor V1** - Bug Bot reviews PRs, MCP support
173 |
174 | ---
175 |
176 | ## 📅 June 2025 - "Agents & Video Take Over"
177 |
178 | ### 🎯 Top Stories
179 |
180 | #### 💰 **OpenAI o3-pro & 90% Price Drop** (Jun 12)
181 | OpenAI's intelligence push continues:
182 | - **o3 price slashed 80%** ($40/$10 → $8/$2 per million tokens)
183 | - **o3-pro launched** - highest intelligence model, 93% AIME 2024
184 | - 87% cheaper than o1-pro
185 | - 84% on GPQA Diamond, near 3000 ELO on coding
186 | - Same full o3 model, no distillation
187 |
188 | #### 🦙 **Meta's $15B Scale AI Power Play** (Jun 12)
189 | Zuck goes all-in on superintelligence:
190 | - **49% stake in Scale AI** for ~$14B
191 | - Alex Wang leads new "Superintelligence team" at Meta
192 | - Hired Google's Jack Rae for alignment
193 | - Seven-to-nine-figure comp packages for researchers
194 | - Clear response to Llama 4's muted reception
195 |
196 | #### 🧠 **MiniMax M1 - Reasoning MoE That Beats R1** (Jun 19)
197 | Chinese lab drops powerful open reasoning model:
198 | - **456B total / 45B active** parameters
199 | - Outperforms DeepSeek R1 on multiple benchmarks
200 | - 40K context window
201 | - Full weights available on Hugging Face
202 |
203 | #### 🖥️ **Gemini CLI - AI Agent in Your Terminal** (Jun 26)
204 | Google drops open source CLI agent:
205 | - Brings **Gemini 2.5 Pro** to terminal
206 | - Free tier available (with limits on older flash models)
207 | - Full GitHub integration
208 | - Pairs with new MCP support in LM Studio
209 |
210 | ### Open Source LLMs
211 | | Release | Significance |
212 | |---------|-------------|
213 | | **Mistral Small 3.2** | Improved instruction following, better function calling |
214 | | **Mistral Magistral** | First reasoning model, 24B open, 128K context |
215 | | **Kimi-Dev-72B** | Moonshot's developer-focused model |
216 | | **DeepSeek R1-0528** | Updated reasoner, AIME 91, LiveCodeBench 73, "clearer thinking" |
217 | | **INTELLECT-2 32B** | Globally decentralized RL training from Prime Intellect |
218 |
219 | ### Big CO LLMs + APIs
220 | - **Gemini 2.5 Pro/Flash GA** - 2.5 Flash-Lite in preview
221 | - **Deep Research API** - OpenAI adds webhook support
222 | - **Anthropic Fair Use ruling** - Judge rules book training is fair use
223 | - **OpenAI Meeting Recorder** - ChatGPT can now record Zoom calls
224 | - **ChatGPT Connectors** - Team accounts get Google Drive, SharePoint, Dropbox
225 |
226 | ### Vision & Video
227 | - **Seedance 1.0 mini** - ByteDance beats VEO3 on some comparisons
228 | - **MiniMax Hailuo 02** - 1080p native, SOTA instruction following
229 | - **Midjourney Video** - Finally entering video space
230 | - **OmniGen 2** - Open weights for image gen/editing
231 | - **Imagen 4 Ultra** - Google's flagship in Gemini API
232 |
233 | ### Voice & Audio
234 | - **ElevenLabs 11.ai** - Voice-first personal assistant with MCP
235 | - **Magenta RealTime** - Google's open weights real-time music gen
236 | - **Kyutai Streaming STT** - High-throughput real-time speech-to-text
237 | - **MiniMax Speech** - Tech report confirms best TTS architecture
238 |
239 | ### Tools
240 | - **Gemini CLI** - Open source terminal agent
241 | - **OpenHands CLI** - Model-agnostic coding agent
242 | - **Warp 2.0** - Agentic Development Environment with multi-threading
243 | - **LM Studio MCP** - Connect local LLMs with MCP servers
244 | - **Cursor Slack** - Coding assistant now in Slack
245 |
246 | ---
247 |
248 | ## 📊 Quarter Summary: Major Themes
249 |
250 | ### 1. 🎬 **Video AI Crosses the Uncanny Valley**
251 | - VEO3 native audio generation (speech, SFX, music synced)
252 | - "Prompt Theory" viral videos question reality itself
253 | - Character/scene consistency finally working
254 | - Midjourney, ByteDance Seedance join Sora/Kling/VEO
255 |
256 | ### 2. 🧠 **Tool-Using Reasoning Models Emerge**
257 | - o3/o4-mini can call tools during chain-of-thought
258 | - 600+ consecutive tool calls observed
259 | - Image manipulation during reasoning (zoom/crop/rotate)
260 | - This is the closest thing to AGI we've seen
261 |
262 | ### 3. 🇨🇳 **Open Source Matches Frontier**
263 | - Qwen 3 (Apache 2.0) rivals Sonnet 3.5 on many tasks
264 | - Claude 4 Sonnet hits 80% SWE-bench with PTTC
265 | - DeepSeek R1-0528 keeps pushing open reasoning
266 | - MiniMax M1 beats R1 on several benchmarks
267 |
268 | ### 4. 📺 **Google I/O Delivers Everything**
269 | - Gemini 2.5 Pro reclaims #1 LLM
270 | - VEO3 steals the show with native audio
271 | - Jules coding agent launches free
272 | - Massive infrastructure (TPU v6e pods, Ultra tier)
273 |
274 | ### 5. 🤖 **Agent Ecosystem Matures**
275 | - MCP becomes universal standard (OpenAI, Google adopt)
276 | - A2A protocol launches for agent-to-agent communication
277 | - Jules, Codex, GitHub Copilot Agent - async coding goes mainstream
278 | - Gemini CLI brings agents to terminal
279 |
280 | ### 6. 💸 **AI's Economic Impact Accelerates**
281 | - Meta $15B Scale AI stake
282 | - OpenAI $40B raise at $300B valuation
283 | - o3 price drops 80% in 4 months
284 | - Cursor $9B valuation, Windsurf $3B acquisition
285 |
286 | ---
287 |
288 | ## 🏆 Q2 2025: Biggest Releases by Month
289 |
290 | ### April
291 | 1. **OpenAI o3 & o4-mini** - Tool-using reasoning models
292 | 2. **GPT-4.1 Family** - 1M token context
293 | 3. **Meta Llama 4** - Scout & Maverick (chaotic launch)
294 | 4. **Gemini 2.5 Flash** - Controllable thinking budgets
295 | 5. **HiDream-I1** - Open source SOTA image gen
296 |
297 | ### May
298 | 1. **VEO3** - Native audio video generation
299 | 2. **Claude 4 Opus & Sonnet** - 80% SWE-bench
300 | 3. **Qwen 3** - Apache 2.0 reasoning family
301 | 4. **Google I/O** - Gemini 2.5 Pro, Jules, Diffusion
302 | 5. **Flux Kontext** - SOTA image editing
303 |
304 | ### June
305 | 1. **o3-pro** - Highest intelligence model
306 | 2. **o3 Price Drop 80%** - Democratized reasoning
307 | 3. **Meta/Scale AI Deal** - $15B superintelligence play
308 | 4. **MiniMax M1** - Open reasoning beats R1
309 | 5. **Gemini CLI** - Terminal-based AI agent
310 |
311 | ---
312 |
313 | *"We crossed the uncanny valley this quarter. VEO3's native audio had people posting real videos claiming they were AI because they couldn't tell the difference. This isn't just progress - it's a paradigm shift."* - Alex Volkov, ThursdAI
314 |
315 | ---
316 |
317 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)*
318 |
--------------------------------------------------------------------------------
/2025_episodes/Q1 2025/January 2025/_ThursdAI_-_Jan_16_2025_-_Hailuo_4M_context_LLM_SOTA_TTS_in_browser_OpenHands_interview_more_AI_news.md:
--------------------------------------------------------------------------------
1 | # 📆 ThursdAI - Jan 16, 2025 - Hailuo 4M context LLM, SOTA TTS in browser, OpenHands interview & more AI news
2 |
3 | **Date:** January 17, 2025
4 | **Duration:** 1:40:32
5 | **Link:** [https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context)
6 |
7 | ---
8 |
9 | ## Description
10 |
11 | Hey everyone, Alex here 👋
12 |
13 | Welcome back, to an absolute banger of a week in AI releases, highlighted with just massive Open Source AI push. We're talking a MASSIVE 4M context window context window model from Hailuo (remember when a jump from 4K to 16K seemed like a big deal?), a 8B omni model that lets you livestream video and glimpses of Agentic ChatGPT?
14 |
15 | This week's ThursdAI was jam-packed with so much open source goodness that the big companies were practically silent. But don't worry, we still managed to squeeze in some updates from OpenAI and Mistral, along with a fascinating new paper from Sakana AI on self-adaptive LLMs. Plus, we had the incredible Graham Neubig, from All Hands AI, join us to talk about Open Hands (formerly OpenDevin) and even contributed to our free, LLM Evaluation course on Weights & Biases!
16 |
17 | Before we dive in, a friend asked me over dinner, what are the main 2 things that happened in AI in 2024, and this week highlights one of those trends. Most of the Open Source is now from China. This week, we got MiniMax from Hailuo, OpenBMB with a new MiniCPM, InternLM came back and most of the rest were Qwen finetunes. Not to mention DeepSeek. Wanted to highlight this significant narrative change and that this is being done despite the chip export restrictions.
18 |
19 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
20 |
21 | Open Source AI & LLMs
22 |
23 | MiniMax-01: 4 Million Context, 456 Billion Parameters, and Lightning Attention
24 |
25 | This came absolutely from the left field, given that we've seen no prior LLMs from Haulio, the company previously releasing video models with consistent characters. Dropping a massive 456B mixture of experts model (45B active parameters) with such a long context support in open weights, but also with very significant benchmarks that compete with Gpt-4o, Claude and DeekSeek v3 (75.7 MMLU-pro, 89 IFEval, 54.4 GPQA)
26 |
27 | They have trained the model on up to 1M context window and then extended it to 4M with ROPE scaling methods ([our coverage](https://sub.thursdai.news/p/thursdai-sunday-special-extending?utm_source=publication-search) of RoPE) during Inference. MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE) with 45B active parameters.
28 |
29 | I gotta say, when we started talking about context window, imagining a needle in a haystack graph that shows 4M, in the open source seemed far fetched, though we did say that theoretically, there may not be a limit to context windows. I just always expected that limit to be unlocked by transformers alternative architectures like Mamba or other State Space Models.
30 |
31 | Vision, API and Browsing - Minimax-VL-01
32 |
33 | It feels like such a well rounded and complete release, that it highlights just how mature company that is behind it. They have also released a vision version of this model, that includes a 300M param Vision Transformer on top (trained with 512B vision language tokens) that features dynamic resolution and boasts very high DocVQA and ChartQA scores.
34 |
35 | Not only did these two models were released in open weights, they also launched as a unified API endpoint (supporting up to 1M tokens) and it's cheap! $0.2/1M input and $1.1/1M output tokens! AFAIK this is only the 3rd API that supports this much context, after Gemini at 2M and Qwen Turbo that supports 1M as well.
36 |
37 | Surprising web browsing capabilities
38 |
39 | You can play around with the model on their website, [hailuo.ai](https://www.hailuo.ai) which also includes web grounding, which I found quite surprising to find out, that they are beating chatGPT and Perplexity on how fast they can find information that just happened that same day! Not sure what search API they are using under the hood but they are very quick.
40 |
41 | 8B chat with video model omni-model from OpenBMB
42 |
43 | OpenBMB has been around for a while and we've seen consistently great updates from them on the MiniCPM front, but this one takes the cake!
44 |
45 | This is a complete omni modal end to end model, that does video streaming, audio to audio and text understanding, all on a model that can run on an iPad!
46 |
47 | They have a demo interface that is very similar to the chatGPT demo from spring of last year, and allows you to stream your webcam and talk to the model, but this is just an 8B parameter model we're talking about! It's bonkers!
48 |
49 | They are boasting some incredible numbers, and to be honest, I highly doubt their methodology in textual understanding, because, well, based on my experience alone, this model understands less than close to chatGPT advanced voice mode, but miniCPM has been doing great visual understanding for a while, so ChartQA and DocVQA are close to SOTA.
50 |
51 | But all of this doesn't matter, because, I say again, just a little over a year ago, Google released a video announcing these capabilities, having an AI react to a video in real time, and it absolutely blew everyone away, and it was [FAKED](https://techcrunch.com/2023/12/07/googles-best-gemini-demo-was-faked/). And this time a year after, we have these capabilities, essentially, in an 8B model that runs on device 🤯
52 |
53 | Voice & Audio
54 |
55 | This week seems to be very multimodal, not only did we get an omni-modal from OpenBMB that can speak, and last week's Kokoro still makes a lot of waves, but this week there were a lot of voice updates as well
56 |
57 | Kokoro.js - run the SOTA open TTS now in your browser
58 |
59 | Thanks to friend of the pod Xenova (and the fact that Kokoro was released with ONNX weights), we now have kokoro.js, or npm -i kokoro-js if you will.
60 |
61 | This allows you to install and run Kokoro, the best tiny TTS model, completely within your browser, with a tiny 90MB download and it sounds really good (demo [here](https://huggingface.co/spaces/webml-community/kokoro-web))
62 |
63 | Hailuo T2A - Emotional text to speech + API
64 |
65 | Hailuo didn't rest on their laurels of releasing a huge context window LLM, they also released a new voice framework (tho not open sourced) this week, and it sounds remarkably good (competing with 11labs)
66 |
67 | They have all the standard features like Voice Cloning, but claim to have a way to preserve the emotional undertones of a voice. They also have 300 voices to choose from and professional effects applied on the fly, like acoustics or telephone filters. (Remember, they have a video model as well, so assuming that some of this is to for the holistic video production)
68 |
69 | What I specifically noticed is their "emotional intelligence system" that's either automatic or can be selected from a dropdown. I also noticed their "lax" copyright restrictions, as one of the voices that was called "Imposing Queen" sounded just like a certain blonde haired heiress to the iron throne from a certain HBO series.
70 |
71 | When I generated a speech worth of that queen, I noticed that the emotion in that speech sounded very much like an actress would read them, and unlike any old TTS, just listen to it in the clip above, I don't remember getting TTS outputs with this much emotion from anything, maybe outside of advanced voice mode! Quite impressive!
72 |
73 | This Weeks Buzz from Weights & Biases - AGENTS!
74 |
75 | Breaking news from W&B as our CTO [just broke](https://x.com/shawnup/status/1880004026957500434) SWE-bench Verified SOTA, with his own o1 agentic framework he calls W&B Programmer 😮 at **64.6% **of the issues!
76 |
77 | Shawn describes how he achieved this massive breakthrough [here](https://medium.com/@shawnup/the-best-ai-programmer-from-weights-biases-04cf8127afd8) and we'll be publishing more on this soon, but the highlight for me is he ran over 900 evaluations during the course of this, and tracked all of them in [Weave](https://wandb.ai/site/weave?utm_source=thursdai&utm_medium=referral&utm_campaign=Jan16)!
78 |
79 | We also have an upcoming event in NY, on Jan 22nd, if you're there, come by and learn how to evaluate your AI agents, RAG applications and hang out with our team! (Sign up [here](https://lu.ma/eufkbeem?utm_source=thursdai&utm_medium=referral&utm_campaign=Jan16))
80 |
81 | Big Companies & APIs
82 |
83 | OpenAI adds chatGPT tasks - first agentic feature with more to come!
84 |
85 | We finally get a glimpse of an agentic chatGPT, in the form of scheduled tasks! Deployed to all users, it is now possible to select gpt-4o with tasks, and schedule tasks in the future.
86 |
87 | You can schedule them in natural language, and then will execute a chat (and maybe perform a search or do a calculation) and then send you a notification (and an email!) when the task is done!
88 |
89 | A bit underwhelming at first, as I didn't really find a good use for this yet, I don't doubt that this is just a building block for something more Agentic to come that can connect to my email or calendar and do actual tasks for me, not just... save me from typing the chatGPT query at "that time"
90 |
91 | Mistral CodeStral 25.01 - a new #1 coding assistant model
92 |
93 | An updated Codestral was released at the beginning of the week, and TBH I've never seen the vibes split this fast on a model.
94 |
95 | While it's super exciting that Mistral is placing a coding model at #1 on the LMArena CoPilot's arena, near Claude 3.5 and DeepSeek, the fact that this new model is not released weights is really a bummer (especially as a reference to the paragraph I mentioned on top)
96 |
97 | We seem to be closing down on OpenSource in the west, while the Chinese labs are absolutely crushing it (while also releasing in the open, including Weights, Technical papers).
98 |
99 | Mistral has released this model in API and via a collab with the Continue dot dev coding agent, but they used to be the darling of the open source community by releasing great models!
100 |
101 | Also notable, a very quick new benchmark post release was dropped that showed a significant difference between their reported benchmarks and how it performs on Aider polyglot
102 |
103 | There was way more things for this week than we were able to cover, including a new and exciting transformers squared new architecture from Sakana, a new open source TTS with voice cloning and a few other open source LLMs, one of which cost only $450 to train! All the links in the TL;DR below!
104 |
105 | TL;DR and show notes
106 |
107 | * **Open Source LLMs**
108 |
109 | * MiniMax-01 from Hailuo - 4M context 456B (45B A) LLM ([Github](https://github.com/MiniMax-AI/MiniMax-01), [HF](https://huggingface.co/MiniMaxAI), [Blog](https://www.minimaxi.com/en/news/minimax-01-series-2), [Report](https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf))
110 |
111 | * Jina - reader V2 model - HTML 2 Markdown/JSON ([HF](https://huggingface.co/jinaai/ReaderLM-v2))
112 |
113 | * InternLM3-8B-Instruct - apache 2 License ([Github](https://github.com/InternLM/InternLM), [HF](https://huggingface.co/internlm))
114 |
115 | * OpenBMB - **MiniCPM-o 2.6** - Multimodal Live Streaming on Your Phone ([HF](https://huggingface.co/openbmb/MiniCPM-o-2_6), [Github](https://github.com/OpenBMB/MiniCPM-o), [Demo](https://minicpm-omni-webdemo-us.modelbest.cn/))
116 |
117 | * KyutAI - Helium-1 2B - Base ([X](https://x.com/kyutai_labs/thread/1878857673174864318), [HF](https://huggingface.co/kyutai/helium-1-preview-2b))
118 |
119 | * Dria-Agent-α - 3B model that outputs python code ([HF](https://huggingface.co/driaforall/Dria-Agent-a-3B))
120 |
121 | * Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450 ([blog](https://novasky-ai.github.io/posts/sky-t1/))
122 |
123 | * **Big CO LLMs + APIs**
124 |
125 | * OpenAI launches ChatGPT tasks ([X](https://x.com/OpenAI/status/1879267274185756896))
126 |
127 | * Mistral - new CodeStral 25.01 ([Blog](https://mistral.ai/news/codestral-2501/), no Weights)
128 |
129 | * Sakana AI - Transformer²: Self-Adaptive LLMs ([Blog](https://sakana.ai/transformer-squared))
130 |
131 | * **This weeks Buzz **
132 |
133 | * Evaluating RAG Applications Workshop - NY, Jan 22, W&B and PineCone ([Free Signup](https://lu.ma/eufkbeem))
134 |
135 | * Our evaluations course is going very strong! (chat w/ Graham Neubig) ([https://wandb.me/evals-t](https://wandb.me/evals-t))
136 |
137 | * **Vision & Video**
138 |
139 | * Luma releases Ray2 video model ([Web](https://lumalabs.ai/ray))
140 |
141 | * **Voice & Audio**
142 |
143 | * Hailuo **T2A-01-HD** - Emotions Audio Model from Hailuo ([X](https://x.com/Hailuo_AI/status/1879554062993195421), [Try It](https://t.co/r58fjgvJ7w))
144 |
145 | * OuteTTS 0.3 - 1B & 500M - zero shot voice cloning model ([HF](https://huggingface.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa))
146 |
147 | * Kokoro.js - 80M SOTA TTS in your browser! (X, [Github](https://github.com/hexgrad/kokoro/pull/3), [try it](https://huggingface.co/spaces/webml-community/kokoro-web) )
148 |
149 | * **AI Art & Diffusion & 3D**
150 |
151 | * Black Forest Labs - Finetuning for Flux Pro and Ultra via API ([Blog](https://blackforestlabs.ai/announcing-the-flux-pro-finetuning-api/))
152 |
153 | * **Show Notes and other Links**
154 |
155 | * Hosts - Alex Volkov ([@altryne](https://x.com/altryne)), Wolfram RavenWlf ([@WolframRvnwlf](https://twitter.com/WolframRvnwlf)), Nisten Tahiraj ([@nisten](https://x.com/nisten/))
156 |
157 | * Guest - Graham Neubig ([@gneubig](https://x.com/gneubig)) from All Hands AI ([@allhands_ai](https://x.com/allhands_ai))
158 |
159 | * Graham’s mentioned Agents blogpost - 8 things that agents can do right [now](https://www.all-hands.dev/blog/8-use-cases-for-generalist-software-development-agents)
160 |
161 | * Projects - Open Hands (previously Open Devin) - [Github](https://github.com/All-Hands-AI/OpenHands)
162 |
163 | * Germany meetup in Cologne ([here](https://twitter.com/WolframRvnwlf/status/1877338980632383713))
164 |
165 | * Toronto Tinkerer Meetup *Sold OUT* ([Here](https://toronto.aitinkerers.org/p/ai-tinkerers-toronto-january-2025-meetup-at-google))
166 |
167 | * YaRN conversation we had with the Authors ([coverage](https://sub.thursdai.news/p/thursdai-sunday-special-extending?utm_source=publication-search))
168 |
169 | See you folks next week! Have a great long weekend if you’re in the US 🫡
170 |
171 | Please help to promote the podcast and newsletter by sharing with a friend!
172 |
173 |
174 |
175 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1NDk4NjQ5MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.dhVHbEk4Kb2DLfejXT5cpNzGqSQ8lgTvCGBQSVFaFR0&utm_campaign=CTA_5).
176 |
--------------------------------------------------------------------------------