├── .gitignore ├── README.md ├── 2025_episodes ├── Q4 2025 │ └── December 2025 │ │ └── ThursdAI_Special_Googles_New_Anti-Gravity_IDE_Gemini_3_Nano_Banana_Pro_Explained_ft_Kevin_Hou_Ammaar.md ├── Q2 2025 │ └── June 2025 │ │ ├── _ThursdAI_-_June_19_-_MiniMax_M1_beats_R1_OpenAI_records_your_meetings_Gemini_in_GA_WB_uses_Coreweav.md │ │ ├── _ThursdAI_-_Jun_26_-_Gemini_CLI_Flux_Kontext_Dev_Search_Live_Anthropic_destroys_books_Zucks_superint.md │ │ └── _ThursdAI_-_Jun_5_2025_-_Live_from_AI_Engineer_with_Swyx_new_Gemini_25_with_Logan_K_and_Jack_Rae_Sel.md ├── Q1 2025 │ ├── January 2025 │ │ ├── _ThursdAI_-_Jan_2_-_is_25_the_year_of_AI_agents.md │ │ └── _ThursdAI_-_Jan_16_2025_-_Hailuo_4M_context_LLM_SOTA_TTS_in_browser_OpenHands_interview_more_AI_news.md │ └── March 2025 │ │ └── ThursdAI_-_Mar_6_2025_-_Alibabas_R1_Killer_QwQ_Exclusive_Google_AI_Mode_Chat_and_MCP_fever_sweeping_.md └── Q3 2025 │ ├── August 2025 │ ├── _ThursdAI_-_GPT5_is_here.md │ └── _ThursdAI_Jul_31_2025_Qwens_Small_Models_Go_Big_StepFuns_Multimodal_Leap_GLM-45s_Chart_Crimes_and_Ru.md │ └── July 2025 │ └── _ThursdAI_-_July_24_2025_-_Qwen-mas_in_July_The_White_Houses_AI_Action_Plan_Math_Olympiad_Gold_for_A.md ├── .agent └── workflows │ └── create-quarterly-recap.md ├── example prompts ├── Open revol infographic prompt.md └── ThursdAI Dec 11 2025 Infographic prompt.md ├── ThursdAI_News_Infographic_System_Prompt.md ├── parse_rss.py ├── 2025_AI_Year_in_Review.md ├── Q1_2025_AI_Recap.md ├── agents.md ├── Q3_2025_AI_Recap.md └── Q2_2025_AI_Recap.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🎙️ ThursdAI 2025 Year in Review 2 | 3 | A comprehensive recap of the most significant AI developments from 2025, curated from weekly [ThursdAI](https://thursdai.news) podcast episodes hosted by [Alex Volkov](https://x.com/altryne). 4 | 5 | ## 📖 Full Year Review 6 | 7 | **[2025 AI Year in Review](./2025_AI_Year_in_Review.md)** — The complete summary of AI's most transformative year yet. 8 | 9 | --- 10 | 11 | ## 📅 Quarterly Recaps 12 | 13 | ### Q1 2025 — The Quarter That Changed Everything 14 | DeepSeek R1, Gemini 2.5, Qwen 2.5 Max, Gemma 3, MCP protocol fever 15 | 16 | [![Q1 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q1%202025%20ThursdAI%20Q1%20infographic.jpg)](./Q1_2025_AI_Recap.md) 17 | 18 | **[📖 Read Full Q1 Recap →](./Q1_2025_AI_Recap.md)** 19 | 20 | --- 21 | 22 | ### Q2 2025 — The Quarter That Shattered Reality 23 | Claude 4 (Opus & Sonnet), GPT-4.1, o3/o4-mini, Llama 4, Veo 3, Google I/O 24 | 25 | [![Q2 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q2%202025%20ThursdAI%20Q2%20infographic.jpg)](./Q2_2025_AI_Recap.md) 26 | 27 | **[📖 Read Full Q2 Recap →](./Q2_2025_AI_Recap.md)** 28 | 29 | --- 30 | 31 | ### Q3 2025 — GPT-5, Trillion-Scale Open Source, World Models 32 | GPT-5 launch, Grok 4, Kimi K2, IMO Gold for AI, agentic coding explosion 33 | 34 | [![Q3 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q3%202025%20ThursdAI%20Q3%20infographic.jpeg)](./Q3_2025_AI_Recap.md) 35 | 36 | **[📖 Read Full Q3 Recap →](./Q3_2025_AI_Recap.md)** 37 | 38 | --- 39 | 40 | ### Q4 2025 — Agents, Gemini's Crown & Sora Social 41 | Gemini 3, Claude 4.5, DeepSeek V3.2, Sora 2, AI browser wars begin 42 | 43 | [![Q4 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q4%202025%20ThursdAI%20Q3%20infographic%20(1).jpg)](./Q4_2025_AI_Recap.md) 44 | 45 | **[📖 Read Full Q4 Recap →](./Q4_2025_AI_Recap.md)** 46 | 47 | --- 48 | 49 | ## 🗂️ Episode Archive 50 | 51 | All individual episode notes are organized in the [`2025_episodes/`](./2025_episodes/) directory, structured by quarter and month. 52 | 53 | --- 54 | 55 | ## 🔗 Links 56 | 57 | - 🎧 **Podcast**: [thursdai.news](https://thursdai.news) 58 | - 🐦 **Follow Alex**: [@altryne](https://x.com/altryne) 59 | 60 | --- 61 | 62 | ## 📝 About 63 | 64 | ThursdAI is a weekly AI news podcast that has been tracking the rapid pace of AI development since 2023. This repository contains structured recaps from all 2025 episodes, making it easy to look back at how quickly the field evolved. 65 | 66 | *Last updated: December 2025* 67 | -------------------------------------------------------------------------------- /2025_episodes/Q4 2025/December 2025/ThursdAI_Special_Googles_New_Anti-Gravity_IDE_Gemini_3_Nano_Banana_Pro_Explained_ft_Kevin_Hou_Ammaar.md: -------------------------------------------------------------------------------- 1 | # ThursdAI Special: Google's New Anti-Gravity IDE, Gemini 3 & Nano Banana Pro Explained (ft. Kevin Hou, Ammaar Reshi & Kat Kampf) 2 | 3 | **Date:** December 02, 2025 4 | **Duration:** 46:04 5 | **Link:** [https://sub.thursdai.news/p/thursdai-special-googles-new-anti](https://sub.thursdai.news/p/thursdai-special-googles-new-anti) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey, Alex here, 12 | 13 | I recorded these conversations just in front of the AI Engineer auditorium, back to back, after these great folks gave their talks, and at the epitome of the most epic AI week we’ve seen since I started recording ThursdAI. 14 | 15 | This is less our traditional live recording, and more a real podcast-y conversation with great folks, inspired by [Latent.Space](https://substack.com/profile/89230629-latentspace). I hope you enjoy this format as much as I’ve enjoyed recording and editing it. 16 | 17 | AntiGravity with Kevin 18 | 19 | Kevin Hou and team just launched Antigravity, Google’s brand new Agentic IDE based on VSCode, and Kevin (second timer on ThursdAI) was awesome enough to hop on and talk about some of the product decisions they made, what makes Antigravity special and highlighted Artifacts as a completely new primitive. 20 | 21 | Gemini 3 in AI Studio 22 | 23 | If you aren’t using Google’s AI Studio ([ai.dev](http://ai.dev)) then you’re missing out! We talk about AI Studio all the time on the show, and I’m a daily user! I generate most of my images with Nano Banana Pro in there, most of my Gemini conversations are happening there as well! 24 | 25 | Ammaar and Kat were so fun to talk to, as they covered the newly shipped “build mode” which allows you to vibe code full apps and experiences inside AI Studio, and we also covered Gemini 3’s features, multimodality understanding, UI capabilities. 26 | 27 | These folks gave a LOT of Gemini 3 demo’s so they know everything there is to know about this model’s capabilities! 28 | 29 | Tried new things with this one, multi camera angels, conversation with great folks, if you found this content valuable, please subscribe :) 30 | 31 | **Topics Covered:** 32 | 33 | * Inside Google’s new “AntiGravity” IDE 34 | 35 | * How the “Agent Manager” changes coding workflows 36 | 37 | * Gemini 3’s new multimodal capabilities 38 | 39 | * The power of “Artifacts” and dynamic memory 40 | 41 | * Deep dive into AI Studio updates & Vibe Coding 42 | 43 | * Generating 4K assets with Nano Banana Pro 44 | 45 | Timestamps for your viewing convenience. 46 | 47 | 00:00 - Introduction and Overview 48 | 49 | 01:13 - Conversation with Kevin Hou: Anti-Gravity IDE 50 | 51 | 01:58 - Gemini 3 and Nano Banana Pro Launch Insights 52 | 53 | 03:06 - Innovations in Anti-Gravity IDE 54 | 55 | 06:56 - Artifacts and Dynamic Memory 56 | 57 | 09:48 - Agent Manager and Multimodal Capabilities 58 | 59 | 11:32 - Chrome Integration and Future Prospects 60 | 61 | 20:11 - Conversation with Ammar and Kat: AI Studio Team 62 | 63 | 21:21 - Introduction to AI Studio 64 | 65 | 21:51 - What is AI Studio? 66 | 67 | 22:52 - Ease of Use and User Feedback 68 | 69 | 24:06 - Live Demos and Launch Week 70 | 71 | 26:00 - Design Innovations in AI Studio 72 | 73 | 30:54 - Generative UIs and Vibe Coding 74 | 75 | 33:53 - Nano Banana Pro and Image Generation 76 | 77 | 39:45 - Voice Interaction and Future Roadmap 78 | 79 | 44:41 - Conclusion and Final Thoughts 80 | 81 | Looking forward to seeing you on Thursday 🫡 82 | 83 | P.S - I’ve recorded one more conversation during AI Engineer, and will be posting that soon, same format, very interesting person, look out for that soon! 84 | 85 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-special-googles-new-anti/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-special-googles-new-anti?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE4MDQ2NTY3MSwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.h3ViA8Pw-4_8oniEdqLjP9b_W9t8ymono4EoyxRrYj4&utm_campaign=CTA_5). 86 | -------------------------------------------------------------------------------- /.agent/workflows/create-quarterly-recap.md: -------------------------------------------------------------------------------- 1 | --- 2 | description: Create ThursdAI quarterly AI recap from combined episode files 3 | --- 4 | 5 | # ThursdAI Quarterly AI Recap Workflow 6 | 7 | ## Overview 8 | Generate a month-by-month breakdown of significant AI news from ThursdAI newsletters for a specific quarter. 9 | 10 | ## Input Required 11 | When starting this workflow, specify: 12 | - **Quarter**: Q1, Q2, Q3, or Q4 13 | - **Year**: 2025, 2026, etc. 14 | 15 | ## Source Files Location 16 | - Combined episode files are located at: `/Users/altryne/projects/thursdAI_yearly_recap/2025_episodes/Q[X] 2025/Q[X]_2025_combined.md` 17 | - Each combined file contains all episodes for that quarter with headers like `## 📆 ThursdAI - [Date] - [Title]` 18 | 19 | ## Output Format 20 | Create a markdown file at: `/Users/altryne/projects/thursdAI_yearly_recap/Q[X]_2025_AI_Recap.md` 21 | 22 | ### Structure Template 23 | ```markdown 24 | # Q[X] 2025 AI Recap - ThursdAI 25 | 26 | ## Quarter Overview 27 | [2-3 sentence summary of the quarter's major themes] 28 | 29 | --- 30 | 31 | ## [Month] 2025 32 | 33 | ### Top Stories 34 | - **[Major Release 1]**: [1-2 sentence description] 35 | - **[Major Release 2]**: [1-2 sentence description] 36 | 37 | ### Open Source LLMs 38 | - **[Model Name]**: [Brief description with key specs] 39 | 40 | ### Big CO LLMs + APIs 41 | - **[Product/Model]**: [Brief description] 42 | 43 | ### Vision & Video 44 | - **[Model/Product]**: [Brief description] 45 | 46 | ### Voice & Audio 47 | - **[Model/Product]**: [Brief description] 48 | 49 | ### AI Art & Diffusion & 3D 50 | - **[Model/Product]**: [Brief description] 51 | 52 | ### Tools 53 | - **[Tool Name]**: [Brief description] 54 | 55 | --- 56 | 57 | [Repeat for each month in the quarter] 58 | 59 | --- 60 | 61 | ## Quarter Summary 62 | 63 | ### Major Themes 64 | 1. [Theme 1] 65 | 2. [Theme 2] 66 | 3. [Theme 3] 67 | 68 | ### Biggest Releases by Month 69 | - **[Month 1]**: [Top release] 70 | - **[Month 2]**: [Top release] 71 | - **[Month 3]**: [Top release] 72 | ``` 73 | 74 | ## Prioritization Criteria 75 | 1. **Title Mentions**: Releases mentioned in episode titles are highest priority 76 | 2. **Discussion Depth**: Items with extensive coverage in newsletter body 77 | 3. **Community Impact**: Mentions of viral moments, benchmarks broken, or widespread adoption 78 | 4. **Categories to track**: 79 | - Open Source LLMs (models, weights, training methods) 80 | - Big CO LLMs + APIs (OpenAI, Google, Anthropic, xAI, etc.) 81 | - Vision & Video (video generation, VLMs) 82 | - Voice & Audio (TTS, STT, music) 83 | - AI Art & Diffusion & 3D (image gen, 3D models) 84 | - Tools (agents, protocols like MCP/A2A, coding assistants) 85 | 86 | ## Steps 87 | 88 | 1. **Read the combined file** for the target quarter 89 | - File may be 2000+ lines, read in chunks of 800 lines 90 | - Note episode dates to categorize by month 91 | 92 | 2. **Extract releases by month** 93 | - Group episodes by their publication month 94 | - For each episode, identify: 95 | - Items in episode title (highest priority) 96 | - Items in TL;DR section 97 | - Items discussed extensively in body 98 | 99 | 3. **Categorize and summarize** 100 | - Place each release in appropriate category 101 | - Write concise 1-2 sentence summaries 102 | - Note key specs (parameter counts, benchmarks, licenses) 103 | 104 | 4. **Identify top stories per month** 105 | - Select 2-4 most impactful releases 106 | - These go in "Top Stories" section 107 | 108 | 5. **Write quarter summary** 109 | - Identify 3-5 overarching themes 110 | - List biggest release per month 111 | 112 | 6. **Reference existing recaps** for format consistency 113 | - See `/Users/altryne/projects/thursdAI_yearly_recap/Q1_2025_AI_Recap.md` as template 114 | 115 | ## Quarter-Month Mapping 116 | - **Q1**: January, February, March 117 | - **Q2**: April, May, June 118 | - **Q3**: July, August, September 119 | - **Q4**: October, November, December 120 | 121 | ## Example Prompt to Start 122 | ``` 123 | Create the Q2 2025 AI recap using the /create-quarterly-recap workflow. 124 | Read /Users/altryne/projects/thursdAI_yearly_recap/2025_episodes/Q2 2025/Q2_2025_combined.md 125 | and generate the recap following the established format from Q1. 126 | ``` 127 | -------------------------------------------------------------------------------- /2025_episodes/Q2 2025/June 2025/_ThursdAI_-_June_19_-_MiniMax_M1_beats_R1_OpenAI_records_your_meetings_Gemini_in_GA_WB_uses_Coreweav.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI - June 19 - MiniMax M1 beats R1, OpenAI records your meetings, Gemini in GA, W&B uses Coreweave GPUs & more AI news 2 | 3 | **Date:** June 20, 2025 4 | **Duration:** 1:41:31 5 | **Link:** [https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey all, Alex here 👋 12 | 13 | This week, while not the busiest week in releases (we can't get a SOTA LLM every week now can we), was full of interesting open source releases, and feature updates such as the chatGPT meetings recorder (which we live tested on the show, the limit is 2 hours!) 14 | 15 | It was also a day after our annual W&B conference called FullyConnected, and so I had a few goodies to share with you, like answering the main question, when will W&B have some use of those GPUs from CoreWeave, the answer is... now! (We launched a brand new preview of an inference service with open source models) 16 | 17 | And finally, we had a great chat with Pankaj Gupta, co-founder and CEO of Yupp, a new service that lets users chat with the top AIs for free, while turning their votes into leaderboards for everyone else to understand which Gen AI model is best for which task/topic. It was a great conversation, and he even shared an invite code with all of us (I'll attach to the TL;DR and show notes, let's dive in!) 18 | 19 | 00:00 Introduction and Welcome 20 | 21 | 01:04 Show Overview and Audience Interaction 22 | 23 | 01:49 Special Guest Announcement and Experiment 24 | 25 | 03:05 Wolfram's Background and Upcoming Hosting 26 | 27 | 04:42 TLDR: This Week's Highlights 28 | 29 | 15:38 Open Source AI Releases 30 | 31 | 32:34 Big Companies and APIs 32 | 33 | 32:45 Google's Gemini Updates 34 | 35 | 42:25 OpenAI's Latest Features 36 | 37 | 54:30 Exciting Updates from Weights & Biases 38 | 39 | 56:42 Introduction to Weights & Biases Inference Service 40 | 41 | 57:41 Exploring the New Inference Playground 42 | 43 | 58:44 User Questions and Model Recommendations 44 | 45 | 59:44 Deep Dive into Model Evaluations 46 | 47 | 01:05:55 Announcing Online Evaluations via Weave 48 | 49 | 01:09:05 Introducing Pankaj Gupta from [YUP.AI](http://YUP.AI) 50 | 51 | 01:10:23 [YUP.AI](http://YUP.AI): A New Platform for Model Evaluations 52 | 53 | 01:13:05 Discussion on Crowdsourced Evaluations 54 | 55 | 01:27:11 New Developments in Video Models 56 | 57 | 01:36:23 OpenAI's New Transcription Service 58 | 59 | 01:39:48 Show Wrap-Up and Future Plans 60 | 61 | Here's the TL;DR and show notes links 62 | 63 | ThursdAI - June 19th, 2025 - TL;DR 64 | 65 | * **Hosts and Guests** 66 | 67 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/@altryne)) 68 | 69 | * Co Hosts - [@WolframRvnwlf](http://x.com/@WolframRvnwlf) [@yampeleg](x.com/@yampeleg) [@nisten](http://x.com/@nisten) [@ldjconfirmed](http://x.com/@ldjconfirmed)) 70 | 71 | * Guest - [@pankaj](http://x.com/@pankaj) - co-founder of [Yupp.ai](https://yupp.ai/join/thursdAI) 72 | 73 | * **Open Source LLMs** 74 | 75 | * Moonshot AI open-sourced Kimi-Dev-72B ([Github](https://github.com/MoonshotAI/Kimi-Dev?tab=readme-ov-file), [HF](https://huggingface.co/moonshotai/Kimi-Dev-72B)) 76 | 77 | * MiniMax-M1 456B (45B Active) - reasoning model ([Paper](https://arxiv.org/abs/2506.13585), [HF](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k), [Try It](https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1), [Github](https://github.com/MiniMax-AI/MiniMax-M1)) 78 | 79 | * **Big CO LLMs + APIs** 80 | 81 | * Google drops Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview ( [Blog](https://blog.google/products/gemini/gemini-2-5-model-family-expands/), [Tech report](https://storage.googleapis.com/gemini-technical-report), [Tweet](https://x.com/google/status/192905415)) 82 | 83 | * Google launches Search Live: Talk, listen and explore in real time with AI Mode ([Blog](https://blog.google/products/search/search-live-ai-mode/)) 84 | 85 | * OpenAI adds MCP support to Deep Research in chatGPT ([X](https://x.com/altryne/status/1934644274227769431), [Docs](https://platform.openai.com/docs/mcp)) 86 | 87 | * OpenAI launches their meetings recorder in mac App ([docs](https://help.openai.com/en/articles/11487532-chatgpt-record)) 88 | 89 | * Zuck update: Considering bringing Nat Friedman and Daniel Gross to Meta ([information](https://x.com/amir/status/1935461177045516568)) 90 | 91 | * **This weeks Buzz** 92 | 93 | * NEW! W&B Inference provides a unified interface to access and run top open-source AI models ([inference](https://wandb.ai/inference), [docs](https://weave-docs.wandb.ai/guides/integrations/inference/)) 94 | 95 | * NEW! W&B Weave Online Evaluations delivers real-time production insights and continuous evaluation for AI agents across any cloud. ([X](https://x.com/altryne/status/1935412384283107572)) 96 | 97 | * The new platform offers "metal-to-token" observability, linking hardware performance directly to application-level metrics. 98 | 99 | * Vision & Video 100 | 101 | * ByteDance new video model beats VEO3 - Seedance.1.0 mini ([Site](https://dreamina.capcut.com/ai-tool/video/generate), [FAL](https://fal.ai/models/fal-ai/bytedance/seedance/v1/lite/image-to-video)) 102 | 103 | * MiniMax Hailuo 02 - 1080p native, SOTA instruction following ([X](https://www.minimax.io/news/minimax-hailuo-02), [FAL](https://fal.ai/models/fal-ai/minimax/hailuo-02/pro/image-to-video)) 104 | 105 | * Midjourney video is also here - great visuals ([X](https://x.com/angrypenguinPNG/status/1932931137179176960)) 106 | 107 | * **Voice & Audio** 108 | 109 | * Kyutai launches open-source, high-throughput streaming Speech-To-Text models for real-time applications ([X](https://x.com/kyutai_labs/thread/1935652243119788111), [website](https://join.yupp.ai/thursdai)) 110 | 111 | * Studies and Others 112 | 113 | * LLMs Flunk Real-World Coding Contests, Exposing a Major Skill Gap ([Arxiv](https://arxiv.org/pdf/2506.11928)) 114 | 115 | * MIT Study: ChatGPT Use Causes Sharp Cognitive Decline ([Arxiv](https://arxiv.org/abs/2506.08872)) 116 | 117 | * Andrej Karpathy's "Software 3.0": The Dawn of English as a Programming Language ([youtube](https://www.youtube.com/watch?v=LCEmiRjPEtQ), [deck](https://drive.google.com/file/d/1HIEMdVlzCxke22ISVzornd2-UpWHngRZ/view?usp=sharing)) 118 | 119 | * **Tools** 120 | 121 | * Yupp launches with 500+ AI models, a new leaderboard, and a user-powered feedback economy - use [thursdai link](https://yupp.ai/join/thursdAI)* to get 50% extra credits 122 | 123 | * BrowserBase announces [director.ai](http://director.ai) - an agent to run things on the web 124 | 125 | * Universal system prompt for reduction of hallucination (from [Reddit](https://www.reddit.com/r/PromptEngineering/comments/1kup28y/chatgpt_and_gemini_ai_will_gaslight_you_everyone/)) 126 | 127 | *Disclosure: while this isn't a paid promotion, I do think that yupp has a great value, I do get a bit more credits on their platform if you click my link and so do you. You can go to [yupp.ai](http://yupp.ai) and register with no affiliation if you wish. 128 | 129 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-june-18-minimax-m1-beats?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NjM1OTY2MCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.XSlsS0LVkZoKjnK1vqluK6duzE3fa7L1zHsEvPRQUW8&utm_campaign=CTA_5). 130 | -------------------------------------------------------------------------------- /example prompts/Open revol infographic prompt.md: -------------------------------------------------------------------------------- 1 | Infographic Prompt: ThursdAI – Dec 4, 2025 · “Code Red vs. Open Revolt” 2 | 3 | Design a high-end VERTICAL promo infographic poster (9:16 aspect ratio) for a tech podcast episode. 4 | 5 | EPISODE TITLE & TOP SECTION 6 | - Main title at the very top in bold modern sans-serif: 7 | “ThursdAI – Code Red vs. Open Revolt” 8 | - Subtitle under it: 9 | “December 4, 2025 · Weekly AI Roundup” 10 | - Tiny line: 11 | “Hosted by Alex Volkov · @altryne” 12 | - Include a STYLIZED portrait/avatar of Alex Volkov near the title, using my reference image. Make him look like a sharp news anchor in an AI cold war: slight 3/4 view, geometric shading, no photorealism. Add a subtle mic icon or waveform integrated into the frame. 13 | 14 | OVERALL VIBE & STYLE 15 | - Overall vibe: strategic, dramatic, like a movie poster for an AI information war meets Bloomberg terminal. 16 | - Style: FLAT VECTOR illustration with bold graphic shapes, angled cuts, and strong contrast. 17 | - IMPORTANT: Do NOT reuse last week’s soft neon-core / skyline / circuit-board-glow look. NO circular “reactor core”, NO cute or rounded pills. 18 | - Use angular elements: diagonals, slashes, wedges, hard-edged panels, occasional halftone or scanline textures for retro-tech flavor. 19 | 20 | COLOR PALETTE & SPLIT 21 | - Base: deep charcoal, obsidian black, and midnight blue. 22 | - Split palette: 23 | - LEFT side = “Code Red” / closed labs / incumbents: hot oranges, reds, magentas, and gold accents. 24 | - RIGHT side = “Open Revolt” / open-source uprising: electric teals, cyans, neon greens. 25 | - Use a central jagged RIFT or LIGHTNING BOLT shape to separate the two sides visually, with thin data lines crossing the divide. 26 | 27 | MAIN VISUAL CONCEPT 28 | - Behind the title, instead of a skyline, show an abstract top-down “strategic map” of an AI battlefield: 29 | - On the left, sharp geometric blocks with warning triangles, radar sweeps, and alert icons (closed labs). 30 | - On the right, fractal-like grids, open nodes, arrows fanning outward (open-source). 31 | - From the center of the poster, a vertical or diagonal schism runs downwards, as if the surface has cracked. This rift separates the LEFT “Code Red” stories from the RIGHT “Open Revolt” stories. 32 | 33 | LAYOUT & HIERARCHY 34 | - Think of the poster as a split front page: 35 | - LEFT COLUMN: Closed-lab / BigCo stories (warm palette). 36 | - RIGHT COLUMN: Open-source & decentralized stories (cool palette). 37 | - Each side features 3–4 main panels (hard-edged rectangles or trapezoids) with: 38 | - A bold short title. 39 | - A single concise subtitle line. 40 | - A minimal abstract icon. 41 | - Under the main split, add a thin “ticker bar” for secondary topics: video, image models, and tools. 42 | - At the very bottom, a strong footer banner with show branding. 43 | 44 | LEFT SIDE – “CODE RED / CLOSED LABS” 45 | Use warm oranges/reds/golds for panel backgrounds or borders. 46 | 47 | 1) Panel: “OpenAI · Code Red & Garlic” 48 | - Subtitle: “Emergency focus on ChatGPT · new Garlic model to counter Gemini” 49 | - Icon: A red alert klaxon / siren with a stylized garlic bulb silhouette inside, emitting triangular warning beams. 50 | 51 | 2) Panel: “Amazon Nova 2 Family” 52 | - Subtitle: “Agentic Lite & Pro · 1M-token Omni · hybrid thinking budgets” 53 | - Icon: A dense cloud outline containing four distinct nodes (Lite/Pro/Sonic/Omni) connected by workflow arrows. 54 | 55 | 3) Panel: “Runway Gen-4.5” 56 | - Subtitle: “#1 text-to-video · 1,247 Elo · physics-level motion” 57 | - Icon: A diagonal film strip morphing into flowing motion waves, with a tiny #1 badge/crown. 58 | 59 | 4) Panel: “Kling VIDEO 2.6” 60 | - Subtitle: “1080p video with native audio · synced dialogue & SFX” 61 | - Icon: A rectangular video frame with sound waves and a speaking profile silhouette; tiny music notes and waveform lines integrated. 62 | 63 | RIGHT SIDE – “OPEN REVOLT / OPEN SOURCE” 64 | Use teals, cyans, and neon green. 65 | 66 | Make this side feel slightly more expansive and energetic—this week’s big narrative. 67 | 68 | 1) HERO PANEL (slightly larger): 69 | “DeepSeek V3.2 & Speciale” 70 | - Subtitle: “685B-param MoE · rivals GPT-5 · gold-medal IMO / IOI / ICPC” 71 | - Icon: A deep multi-layer prism/brain crystal with orbiting math symbols (π, integral sign, etc.) and tiny medal/trophy shapes. 72 | 73 | 2) Panel: “Mistral 3 & Ministral 3” 74 | - Subtitle: “Apache 2.0 models from 3B edge to 675B MoE frontier” 75 | - Icon: A stylized wind gust sweeping across four stacked chips labeled as abstract size badges (XS/S/M/L dots). 76 | 77 | 3) Panel: “Arcee Trinity (Mini & Nano)” 78 | - Subtitle: “US-trained open-weight MoE · 10T tokens · iPhone to H200” 79 | - Icon: A triangular trinity symbol of three connected nodes, with subtle US-flag stripes and speed lines indicating high tokens/second. 80 | 81 | 4) Panel: “Hermes 4.3 on Psyche” 82 | - Subtitle: “Decentralized training network · 36B model · 512K context” 83 | - Icon: A globe made of nodes with orbiting satellites; thin lines show distributed training paths, NOT a single big server. 84 | 85 | CENTER / NEUTRAL ELEMENT – “THIS WEEK’S BUZZ” 86 | - In the mid-lower center, slightly overlapping both sides of the rift, add a neutral, grayish panel (not warm or cool) titled: 87 | - Title: “This Week’s Buzz · W&B LLM Eval Jobs” 88 | - Subtitle: “Mid-training checkpoint evals · 100+ benchmarks via Inspect” 89 | - Icon idea: a small model chip with bar charts and checkmarks rising from it, like a tiny leaderboard, indicating evaluation and monitoring. 90 | - This panel visually “bridges” closed and open worlds, since eval tooling applies to both. 91 | 92 | BOTTOM TICKER – VIDEO / IMAGE / DIFFUSION STRIP 93 | - Add a horizontal strip across the lower third, styled like a financial news ticker with tiny icons and short labels. 94 | - Use alternating warm/cool swatches to show mix of players. 95 | 96 | Include three mini-blocks: 97 | 98 | 1) “Seedream 4.5” 99 | - Tiny subtext: “Production-grade images · multi-reference fusion & sharp text” 100 | - Icon: A camera shutter overlaid with multiple ghosted image tiles. 101 | 102 | 2) “Pruna P-Image & Edit” 103 | - Tiny subtext: “Sub-second gen & edits · $0.005 per image” 104 | - Icon: A lightning bolt hitting a picture frame, with a tiny slider/magic wand icon. 105 | 106 | 3) “Kling IMAGE O1” 107 | - Tiny subtext: “Understand anything · precise edits · bold stylization” 108 | - Icon: A stylized eye merged into a paintbrush over a cube. 109 | 110 | INTERVIEW SPOTLIGHT – LUCAS ATKINS 111 | - On the open-source (right) side, but closer to the bottom corner, add a compact “spotlight” badge: 112 | 113 | Title: “Guest: Lucas Atkins” 114 | Subtitle: “CTO Arcee AI · Trinity deep-dive” 115 | Icon: A simplified person silhouette at a mic, backed by a small trinity triangle. 116 | 117 | BOTTOM FOOTER 118 | - Full-width footer bar with a slightly lighter gradient over the dark base, spanning both sides (unifying them). 119 | - Left: podcast mic icon with tiny neural nodes. 120 | Text: “ThursdAI · Weekly AI Roundup” 121 | - Center: “Episode: Code Red vs. Open Revolt” 122 | - Right: “AI Engineer Podcast · Live from New York” 123 | 124 | TYPOGRAPHY & UI DETAILS 125 | - Use a strong, legible sans-serif type across the poster. 126 | - Titles in Title Case, subtitles in sentence case. 127 | - Panels: hard corners, subtle inner strokes or thin outer glows ONLY if needed for separation—no bubbly pills. 128 | - Maintain high contrast: light text on dark panels; avoid long paragraphs. 129 | - NO real company logos. Use only abstract icons and shapes that suggest each brand/topic. 130 | - Use diagonal separators, angular dividers, and occasional halftone/scanline textures in the background to emphasize tension and motion. 131 | 132 | The final poster should feel like a **split AI war-room front page**: closed labs sounding alarms on one flank, open models staging a full-scale uprising on the other, with ThursdAI and W&B sitting in the middle making sense of the chaos. -------------------------------------------------------------------------------- /2025_episodes/Q1 2025/January 2025/_ThursdAI_-_Jan_2_-_is_25_the_year_of_AI_agents.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI - Jan 2 - is 25' the year of AI agents? 2 | 3 | **Date:** January 02, 2025 4 | **Duration:** 1:31:29 5 | **Link:** [https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey folks, Alex here 👋 Happy new year! 12 | 13 | On our first episode of this year, and the second quarter of this century, there wasn't a lot of AI news to report on (most AI labs were on a well deserved break). So this week, I'm very happy to present a special ThursdAI episode, an interview with [Joāo Moura](https://x.com/joaomdmoura), CEO of [Crew.ai](http://Crew.ai) all about AI agents! 14 | 15 | We first chatted with Joāo a [year ago](https://sub.thursdai.news/p/jan14-sunday-special-deep-dives), back in January of 2024, as CrewAI was blowing up but still just an open source project, it got to be the number 1 trending project on Github, and #1 project on Product Hunt. (You can either listen to the podcast or watch it in the embedded Youtube above) 16 | 17 | 00:36 Introduction and New Year Greetings 18 | 19 | 02:23 Updates on Open Source and LLMs 20 | 21 | 03:25 Deep Dive: AI Agents and Reasoning 22 | 23 | 03:55 Quick TLDR and Recent Developments 24 | 25 | 04:04 Medical LLMs and Modern BERT 26 | 27 | 09:55 Enterprise AI and Crew AI Introduction 28 | 29 | 10:17 Interview with João Moura: Crew AI 30 | 31 | 25:43 Human-in-the-Loop and Agent Evaluation 32 | 33 | 33:17 Evaluating AI Agents and LLMs 34 | 35 | 44:48 Open Source Models and Fin to OpenAI 36 | 37 | 45:21 Performance of Claude's Sonnet 3.5 38 | 39 | 48:01 Different parts of an agent topology, brain, memory, tools, caching 40 | 41 | 53:48 Tool Use and Integrations 42 | 43 | 01:04:20 Removing LangChain from Crew 44 | 45 | 01:07:51 The Year of Agents and Reasoning 46 | 47 | 01:18:43 Addressing Concerns About AI 48 | 49 | 01:24:31 Future of AI and Agents 50 | 51 | 01:28:46 Conclusion and Farewell 52 | 53 | --- 54 | 55 | Is 2025 "the year of AI agents"? 56 | 57 | AI agents as I remember them as a concept started for me a few month after I started ThursdAI ,when AutoGPT exploded. Was such a novel idea at the time, run LLM requests in a loop, 58 | 59 | (In fact, back then, I came up with a retry with AI concept and called it [TrAI/Catch](https://x.com/altryne/status/1632253117827010566), where upon an error, I would feed that error back into the GPT api and ask it to correct itself. it feels so long ago!) 60 | 61 | AutoGPT became the fastest ever Github project to reach 100K stars, and while exciting, it did not work. 62 | 63 | Since then we saw multiple attempts at agentic frameworks, like babyAGI, autoGen. Crew AI was one of them that keeps being the favorite among many folks. 64 | 65 | So, what is an AI agent? Simon Willison, friend of the pod, has a mission, to ask everyone who announces a new agent, what they mean when [they say it](https://x.com/simonw/status/1863567881553977819) because it seems that everyone "shares" a common understanding of AI agents, but it's different for everyone. 66 | 67 | We'll start with Joāo's explanation and go from there. But let's assume the basic, it's a set of LLM calls, running in a self correcting loop, with access to planning, external tools (via function calling) and a memory or sorts that make decisions. 68 | 69 | Though, as we go into detail, you'll see that since the very basic "run LLM in the loop" days, the agents in 2025 have evolved and have a lot of complexity. 70 | 71 | My takeaways from the conversation 72 | 73 | I encourage you to listen / watch the whole interview, Joāo is deeply knowledgable about the field and we go into a lot of topics, but here are my main takeaways from our chat 74 | 75 | * Enterprises are adopting agents, starting with internal use-cases 76 | 77 | * Crews have 4 different kinds of memory, Long Term (across runs), short term (each run), Entity term (company names, entities), pre-existing knowledge (DNA?) 78 | 79 | * TIL about a "do all links respond with 200" guardrail 80 | 81 | * Some of the agent tools we mentioned 82 | 83 | * Stripe Agent API - for agent payments and access to payment data ([blog](https://stripe.dev/blog/adding-payments-to-your-agentic-workflows)) 84 | 85 | * Okta Auth for Gen AI - agent authentication and role management ([blog](https://www.auth0.ai/)) 86 | 87 | * E2B - code execution platform for agents ([e2b.dev](https://e2b.dev/)) 88 | 89 | * BrowserBase - programmatic web-browser for your AI agent 90 | 91 | * Exa - search grounding for agents for real time understanding 92 | 93 | * Crew has 13 crews that run 24/7 to automate their company 94 | 95 | * Crews like Onboarding User Enrichment Crew, Meetings Prep, Taking Phone Calls, Generate Use Cases for Leads 96 | 97 | * GPT-4o mini is the most used model for 2024 for CrewAI with main factors being speed / cost 98 | 99 | * Speed of AI development makes it hard to standardize and solidify common integrations. 100 | 101 | * Reasoning models like o1 still haven't seen a lot of success, partly due to speed, partly due to different way of prompting required. 102 | 103 | This weeks Buzz 104 | 105 | We've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI (previously Open Devin). We've distilled a lot of what we learned about evaluating LLM applications while building [Weave](https://wandb.ai/site/weave?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2), our LLM Observability and Evaluation tooling, and are excited to share this with you all! [Get on the list](https://wandb.ai/site/courses/evals/?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2) 106 | 107 | Also, 2 workshops (also about Evals) from us are upcoming, one in SF on [Jan 11th](https://lu.ma/bzqvsqaa) and one in Seattle on [Jan 13th](https://seattle.aitinkerers.org/p/ai-in-production-evals-observability-workshop) (which I'm going to lead!) so if you're in those cities at those times, would love to see you! 108 | 109 | And that's it for this week, there wasn't a LOT of news as I said. The interesting thing is, even in the very short week, the news that we did get were all about agents and reasoning, so it looks like 2025 is agents and reasoning, agents and reasoning! 110 | 111 | See you all next week 🫡 112 | 113 | TL;DR with links: 114 | 115 | * **Open Source LLMs** 116 | 117 | * HuatuoGPT-o1 - medical LLM designed for medical reasoning ([HF](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B), [Paper](https://huggingface.co/papers/2412.18925), [Github](https://github.com/FreedomIntelligence/HuatuoGPT-o1), [Data](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem)) 118 | 119 | * Nomic - modernbert-embed-base - first embed model on top of modernbert ([HF](https://huggingface.co/nomic-ai/modernbert-embed-base)) 120 | 121 | * HuggingFace - SmolAgents lib to build agents ([Blog](https://huggingface.co/blog/smolagents)) 122 | 123 | * SmallThinker-3B-Preview - a QWEN 2.5 3B "reasoning" finetune ([HF](https://huggingface.co/PowerInfer/SmallThinker-3B-Preview)) 124 | 125 | * Wolfram new Benchmarks including DeepSeek v3 ([X](https://x.com/WolframRvnwlf/status/1874889165919384057)) 126 | 127 | * **Big CO LLMs + APIs** 128 | 129 | * Newcomer Rubik's AI Sonus-1 family - Mini, Air, Pro and Reasoning ([X](https://x.com/RubiksAI/status/1874682159379972325), Chat) 130 | 131 | * Microsoft "estimated" GPT-4o-mini is a ~8B ([X](https://x.com/Yuchenj_UW/status/1874507299303379428)) 132 | 133 | * Meta plans to bring AI profiles to their social networks ([X](https://x.com/petapixel/status/1874792802061844829)) 134 | 135 | * **This Week's Buzz** 136 | 137 | * W&B Free Evals Course with Page Bailey and Graham Beubig - [Free Sign Up](https://wandb.ai/site/courses/evals/?utm_source=thursdai&utm_medium=referral&utm_campaign=jan2) 138 | 139 | * SF evals event - [January 11th](https://lu.ma/bzqvsqaa) 140 | 141 | * Seattle evals workshop - [January 13th](https://seattle.aitinkerers.org/p/ai-in-production-evals-observability-workshop) 142 | 143 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jan-2-is-25-the-year-of?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1NDAzMzY2MCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.-2FmmS8-Iq9rSNBzuyH2cjNrSPPkwegxbFjSP45EJLw&utm_campaign=CTA_5). 144 | -------------------------------------------------------------------------------- /2025_episodes/Q2 2025/June 2025/_ThursdAI_-_Jun_26_-_Gemini_CLI_Flux_Kontext_Dev_Search_Live_Anthropic_destroys_books_Zucks_superint.md: -------------------------------------------------------------------------------- 1 | # 📅 ThursdAI - Jun 26 - Gemini CLI, Flux Kontext Dev, Search Live, Anthropic destroys books, Zucks superintelligent team & more AI news 2 | 3 | **Date:** June 26, 2025 4 | **Duration:** 1:39:39 5 | **Link:** [https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey folks, Alex here, writing from... a undisclosed tropical paradise location 🏝️ I'm on vacation, but the AI news doesn't stop of course, and neither does ThursdAI. So huge shoutout to Wolfram Ravenwlf for running the show this week, Nisten, LDJ and Yam who joined. 12 | 13 | So... no long blogpost with analysis this week, but I'll def. recommend tuning in to the show that the folks ran, they had a few guests on, and even got some breaking news (new Flux Kontext that's open source) 14 | 15 | Of course many of you are readers and are here for the links, so I'm including the raw TL;DR + speaker notes as prepared by the folks for the show! 16 | 17 | P.S - our (rescheduled) hackathon is coming up in San Francisco, on July 12-13 called WeaveHacks, if you're interested at a chance to win a RoboDog, welcome to join us and give it a try. Register [HERE](https://lu.ma/weavehacks) 18 | 19 | Ok, that's it for this week, please enjoy the show and see you next week! 20 | 21 | ThursdAI - June 26th, 2025 - TL;DR 22 | 23 | * **Hosts and Guests** 24 | 25 | * **WolframRvnwlf** - Host ([@WolframRvnwlf](http://x.com/WolframRvnwlf)) 26 | 27 | * Co-Hosts - [@yampeleg](http://x.com/yampeleg), [@nisten](http://x.com/nisten), [@ldjconfirmed](http://x.com/ldjconfirmed) 28 | 29 | * Guest - **Jason Kneen** ([@jasonkneen](http://x.com/jasonkneen)) - Discussing MCPs, coding tools, and agents 30 | 31 | * Guest - **Hrishioa** ([@hrishioa](http://x.com/hrishioa)) - Discussing agentic coding and spec-driven development 32 | 33 | * **Open Source LLMs** 34 | 35 | * Mistral Small 3.2 released with improved instruction following, reduced repetition & better function calling ([X](https://x.com/MistralAI/status/1936093325116781016)) 36 | 37 | * Unsloth AI releases dynamic GGUFs with fixed chat templates ([X](https://x.com/UnslothAI/status/1936426567850487925)) 38 | 39 | * Kimi-VL-A3B-Thinking-2506 multimodal model updated for better video reasoning and higher resolution ([Blog](https://huggingface.co/blog/moonshotai/kimi-vl-a3b-thinking-2506)) 40 | 41 | * Chinese Academy of Science releases Stream-Omni, a new Any-to-Any model for unified multimodal input ([HF](https://huggingface.co/ICTNLP/stream-omni-8b), [Paper](https://huggingface.co/papers/2506.13642)) 42 | 43 | * Prime Intellect launches SYNTHETIC-2, an open reasoning dataset and synthetic data generation platform ([X](https://x.com/PrimeIntellect/status/1937272174295023951)) 44 | 45 | * **Big CO LLMs + APIs** 46 | 47 | * **Google** 48 | 49 | * Gemini CLI, a new open-source AI agent, brings Gemini 2.5 Pro to your terminal ([Blog](https://web.archive.org/web/20250625051706/https://blog.google/technology/developers/introducing-gemini-cli/), [GitHub](https://github.com/google-gemini/gemini-cli)) 50 | 51 | * Google reduces free tier API limits for previous generation Gemini Flash models ([X](https://x.com/ai_for_success/status/1937493142279971210)) 52 | 53 | * Search Live with voice conversation is now rolling out in AI Mode in the US ([Blog](https://blog.google/products/search/search-live-ai-mode/), [X](https://x.com/rajanpatel/status/1935484294182608954)) 54 | 55 | * Gemini API is now faster for video and PDF processing with improved caching ([Docs](https://ai.google.dev/gemini-api/docs/caching)) 56 | 57 | * **Anthropic** 58 | 59 | * Claude introduces an "artifacts" space for building, hosting, and sharing AI-powered apps ([X](https://x.com/AnthropicAI/status/1937921801000219041)) 60 | 61 | * Federal judge rules Anthropic's use of books for training Claude qualifies as fair use ([X](https://x.com/ai_for_success/status/1937515997076029449)) 62 | 63 | * **xAI** 64 | 65 | * Elon Musk announces the successful launch of Tesla's Robotaxi ([X](https://x.com/elonmusk/status/1936876178356490546)) 66 | 67 | * **Microsoft** 68 | 69 | * Introduces Mu, a new language model powering the agent in Windows Settings ([Blog](https://blogs.windows.com/windowsexperience/2025/06/23/introducing-mu-language-model-and-how-it-enabled-the-agent-in-windows-settings/)) 70 | 71 | * **Meta** 72 | 73 | * Report: Meta pursued acquiring Ilya Sutskever's SSI, now hires co-founders Nat Friedman and Daniel Gross ([X](https://x.com/kimmonismus/status/1935954015998624181)) 74 | 75 | * **OpenAI** 76 | 77 | * OpenAI removes mentions of its acquisition of Jony Ive's startup 'io' amid a trademark dispute ([X](https://x.com/rowancheung/status/1937414172322439439)) 78 | 79 | * OpenAI announces the release of DeepResearch in API + Webhook support ([X](https://x.com/stevendcoffey/status/1938286946075418784)) 80 | 81 | * **This weeks Buzz** 82 | 83 | * Alex is on vacation; WolframRvnwlf is attending AI Tinkerers Munich on July 25 ([Event](https://munich.aitinkerers.org/p/ai-tinkerers-munich-july-25)) 84 | 85 | * Join W&B Hackathon happening in 2 weeks in San Francisco - grand prize is a RoboDog! (Register [for Free](https://lu.ma/weavehacks)) 86 | 87 | * **Vision & Video** 88 | 89 | * MeiGen-MultiTalk code and checkpoints for multi-person talking head generation are released ([GitHub](https://github.com/MeiGen-AI/MultiTalk), [HF](https://huggingface.co/MeiGen-AI/MeiGen-MultiTalk)) 90 | 91 | * Google releases VideoPrism for generating adaptable video embeddings for various tasks ([HF](https://hf.co/google/videoprism), [Paper](https://arxiv.org/abs/2402.13217), [GitHub](https://github.com/google-deepmind/videoprism)) 92 | 93 | * **Voice & Audio** 94 | 95 | * ElevenLabs launches [11.ai](11.ai), a voice-first personal assistant with MCP support ([Sign Up](http://11.ai/), [X](https://x.com/elevenlabsio/status/1937200086515097939)) 96 | 97 | * Google Magenta releases Magenta RealTime, an open weights model for real-time music generation ([Colab](https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Demo.ipynb), [Blog](https://g.co/magenta/rt)) 98 | 99 | * ElevenLabs launches a mobile app for iOS and Android for on-the-go voice generation ([X](https://x.com/elevenlabsio/status/1937541389140611367)) 100 | 101 | * **AI Art & Diffusion & 3D** 102 | 103 | * Google rolls out Imagen 4 and Imagen 4 Ultra in the Gemini API and Google AI Studio ([Blog](https://developers.googleblog.com/en/imagen-4-now-available-in-the-gemini-api-and-google-ai-studio/)) 104 | 105 | * OmniGen 2 open weights model for enhanced image generation and editing is released ([Project Page](https://vectorspacelab.github.io/OmniGen2/), [Demo](https://huggingface.co/spaces/OmniGen2/OmniGen2), [Paper](https://huggingface.co/papers/2506.18871)) 106 | 107 | * **Tools** 108 | 109 | * OpenMemory Chrome Extension provides shared memory across ChatGPT, Claude, Gemini and more ([X](https://x.com/taranjeetio/status/1937537163270451494)) 110 | 111 | * LM Studio adds MCP support to connect local LLMs with your favorite servers ([Blog](https://lmstudio.ai/blog/mcp)) 112 | 113 | * Cursor is now available as a Slack integration ([Dashboard](http://cursor.com/dashboard)) 114 | 115 | * All Hands AI releases the OpenHands CLI, a model-agnostic, open-source coding agent ([Blog](https://all-hands.dev/blog/the-openhands-cli-ai-powered-development-in-your-terminal), [Docs](https://docs.all-hands.dev/usage/how-to/cli-mode#cli)) 116 | 117 | * Warp 2.0 launches as an Agentic Development Environment with multi-threading ([X](https://x.com/warpdotdev/status/1937525185843752969)) 118 | 119 | * **Studies and Others** 120 | 121 | * The /r/LocalLLaMA subreddit is back online after a brief moderation issue ([Reddit](https://www.reddit.com/r/LocalLLaMA/comments/1ljlr5b/subreddit_back_in_business/), [News](https://x.com/localllamasub)) 122 | 123 | * Andrej Karpathy's talk "Software 3.0" discusses the future of programming in the age of AI ([YouTube](https://www.youtube.com/watch?v=LCEmiRjPEtQ), [Summary](https://www.latent.space/p/s3)) 124 | 125 | Thank you, see you next week! 126 | 127 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jun-26-gemini-cli-flux-kontext?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NjkyNTYyOCwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.rNrzJzOHwv_6WuWE0zOf7g9C0xIjVsHBeuiHWjLmawY&utm_campaign=CTA_5). 128 | -------------------------------------------------------------------------------- /ThursdAI_News_Infographic_System_Prompt.md: -------------------------------------------------------------------------------- 1 | # ThursdAI News Infographic Generator 2 | 3 | ## You Are 4 | 5 | A world-class infographic designer creating stunning visual content for **ThursdAI**, a weekly AI news show hosted by Alex Volkov (@altryne). 6 | 7 | Your job: Take news information and create a **Nano Banana Pro prompt** that will generate a beautiful, unique infographic. 8 | 9 | --- 10 | 11 | ## What You Receive 12 | 13 | ``` 14 | TITLE: [Headline] 15 | EXECUTIVE SUMMARY: [Overview] 16 | 10 FACTOIDS: [Key metrics, numbers, availability, etc.] 17 | ENRICHED SUMMARY: [Additional context] 18 | TOP REACTIONS: [Quotes from X/Twitter] 19 | LINKS: [Where to find it] 20 | DATE: [When this news dropped — may be omitted] 21 | ``` 22 | 23 | Plus: A **reference image of Alex Volkov** (the host) to include in the design. 24 | 25 | --- 26 | 27 | ## Timeliness Matters 28 | 29 | These infographics are for **current news** — typically less than a week old. The design should feel: 30 | - **Fresh and new** — not recycled visual concepts 31 | - **Timely** — if a date is provided, display it prominently 32 | - **Relevant to right now** — visual elements that feel "just announced" not "retrospective" 33 | 34 | --- 35 | 36 | ## What You Create 37 | 38 | A detailed Nano Banana Pro prompt in natural language (full sentences, like briefing a designer) that will generate a **16:9 infographic** including: 39 | 40 | ### ⚠️ CRITICAL: This is NEWS, Not a Poster 41 | 42 | **The person viewing this infographic should be able to understand the FULL story without reading anything else.** 43 | 44 | This is not a teaser or marketing graphic — it's a comprehensive news summary. Think of it like a visual article, not an advertisement. 45 | 46 | - **Include the important factoids** — not just 2-3 highlights, but the meaty details that matter. Use your judgment: impressive benchmarks, pricing, key metrics, availability — yes. Boilerplate or filler details — skip them. 47 | - **Text should be readable** — real sentences, real data, real context 48 | - **Information density is good** — pack in what's newsworthy, organize it well 49 | - **Someone should walk away informed** — not just intrigued 50 | 51 | ### Required Elements 52 | 53 | 1. **ThursdAI Branding (PROMINENT)** — This is a ThursdAI news presentation. "ThursdAI" should appear prominently in the header area, not buried in a footer. Make it clear this news is being presented by ThursdAI. The footer can include additional links (thursdai.news, @altryne, @thursdAI_news, thursdai.news/yt) but the main brand should be visible up top. 54 | 55 | 2. **Alex Volkov** — Using the reference image, rendered as a stylized cartoon/vector avatar. He should be presenting, reacting to, or engaging with the news. His expression and pose should match the tone of the story. 56 | 57 | 3. **The Date** — Prominently displayed. This is timely news, and viewers should immediately know when it dropped. Format like "June 12, 2025" or "Dec 12, 2025" — make it visible in the header or as a clean badge. 58 | 59 | 4. **The News Content (COMPREHENSIVE)** — This is the core: 60 | - **Headline** — Clear, prominent title 61 | - **Executive summary** — The key narrative in readable text 62 | - **The important factoids** — The metrics, benchmarks, pricing, and details that actually matter. Display them in panels, cards, bullet lists, ticker bars — whatever works. Use judgment: if a factoid is impressive or essential to understanding the story, include it. If it's filler, skip it. Aim for 6-10 meaningful data points visible. 63 | - **Key quotes/reactions** — If notable quotes are provided, include at least one prominently 64 | 65 | 5. **Relevant Visual Elements** — Based on what the news is actually about, include thematic visual elements that reinforce the story: 66 | - Open source model release? Binary cascades, weight tensors, loss curves, unlocked padlocks, git trees 67 | - Voice/TTS announcement? Spectrograms, waveforms, speaking avatars 68 | - Image generation model? Brushstrokes, canvas, robot artist 69 | - Video model? Film reels, motion blur, frame sequences 70 | - Benchmark domination? Leaderboards, medals, trophy podiums 71 | - Agent/tool release? Terminal windows, connected nodes 72 | - Research/data report? Charts, graphs, data flows, dashboard elements 73 | 74 | **Think about what visuals represent THIS specific story** — make the infographic feel alive and relevant, not generic. 75 | 76 | 6. **Company Logos** — Use the ACTUAL logos of the companies involved (OpenAI, Google Gemini, Anthropic, Meta, Mistral, HuggingFace, etc.). These are well-known. 77 | 78 | 7. **Footer Links** — Include at bottom: 79 | - thursdai.news 80 | - @altryne (X/Twitter) 81 | - @thursdAI_news (X/Twitter) 82 | - thursdai.news/yt (YouTube) 83 | 84 | ### Style Direction 85 | 86 | **Be creative AND comprehensive.** Consider: 87 | - What's the emotional tone of this news? (Exciting breakthrough? Controversial? Data-heavy analysis? Breaking news urgency?) 88 | - What visual metaphor would capture this story? 89 | - What color palette fits the company and mood? 90 | - **What layout can fit ALL this information?** (Data dashboard? News broadcast with ticker? Multi-panel layout? Research poster?) 91 | 92 | **The layout must accommodate substantial information.** If there are 8 newsworthy data points, design for 8. Use: 93 | - Multiple panels/cards for different stat categories 94 | - Ticker bars for secondary stats 95 | - Bullet lists within panels 96 | - Hierarchical text (big headlines, smaller supporting details) 97 | - Quote callouts for reactions 98 | 99 | The goal is that someone scrolling social media stops and says "whoa, what's this?" — AND when they look closer, they get the full story. 100 | 101 | --- 102 | 103 | ## Writing the Prompt 104 | 105 | Write in **natural language**, like you're briefing a talented designer: 106 | 107 | ✅ Good: "Create a dramatic infographic that feels like a breaking news broadcast. The background should pulse with urgency — think red alert lighting mixed with the cool blue of DeepSeek's brand. Alex is in the corner looking genuinely shocked, pointing at the headline..." 108 | 109 | ❌ Bad: "AI, infographic, 4k, neon, tech, modern, trending" 110 | 111 | Be specific about: 112 | - The overall vibe and emotional tone 113 | - Color palette (use hex codes for precision) 114 | - Where Alex should be and how he should look 115 | - What visual elements reinforce the story 116 | - How the information should be laid out 117 | - What should be biggest/most prominent 118 | 119 | Be loose about: 120 | - Exact pixel positions 121 | - Rigid grid structures 122 | - Formulaic layouts 123 | 124 | --- 125 | 126 | ## Google Search for Additional Context 127 | 128 | Nano Banana Pro can search the web for additional information. Use this strategically: 129 | 130 | **When to search:** If the news involves specific benchmark numbers, company details, or technical specs that would benefit from verification or additional context, add "Search the web for [specific query]" to your prompt. 131 | 132 | **Caveat:** Often this news is very fresh (hours or 1-2 days old) and may not have propagated to Google yet. Don't rely on search for the core facts — the provided information is the source of truth. Use search for supplementary context like: 133 | - Company background 134 | - Related previous announcements 135 | - Technical terminology clarification 136 | - Logo/branding references 137 | 138 | **Example usage in prompt:** "Search the web for the Google Gemini logo and official branding colors to ensure accuracy." 139 | 140 | --- 141 | 142 | ## Defaults (Don't Ask, Just Do) 143 | 144 | This is automated — use these defaults and proceed: 145 | 146 | - **Aspect ratio:** 16:9 (landscape, for YouTube/social) 147 | - **Resolution:** 4K (3840×2160) 148 | - **Date:** If not provided in the input, use "Recent" or omit — don't ask 149 | - **Quote to highlight:** Pick the most insightful reaction if multiple are provided 150 | - **Emphasis:** Lead with the most impressive/newsworthy angle 151 | 152 | --- 153 | 154 | ## Output Format 155 | 156 | ```markdown 157 | # Infographic Prompt: [TITLE] 158 | 159 | **Date:** [DATE if provided, otherwise omit this line] 160 | **Aspect Ratio:** 16:9 161 | **Resolution:** 4K (3840×2160) 162 | 163 | --- 164 | 165 | [Your complete Nano Banana Pro prompt here — natural language, detailed, creative, specific to this news story] 166 | 167 | --- 168 | 169 | **Note:** This prompt uses the attached reference image of Alex Volkov for the host avatar. 170 | ``` 171 | 172 | --- 173 | 174 | ## Quality Check 175 | 176 | Before outputting, verify your prompt covers: 177 | - ✓ What happened 178 | - ✓ Who's involved 179 | - ✓ The key numbers and metrics 180 | - ✓ Why it matters 181 | 182 | If any are missing — add more detail to your prompt. 183 | 184 | --- 185 | 186 | ## Remember 187 | 188 | - **This is NEWS, not a teaser** — Include the important details, not just 2-3 highlights 189 | - **ThursdAI is the presenter** — Prominent branding in header, not just footer 190 | - **Include date if provided** — This is current news 191 | - **Use judgment on factoids** — Include what's newsworthy, skip the filler 192 | - **Real logos** — Use actual company logos, they're well-known 193 | - **Contextual visuals** — The imagery should reflect what the news is actually about 194 | - **Alex is the host** — He's presenting this news, make him part of the story 195 | - **Information-dense AND beautiful** — Pack in ALL the facts, organize them elegantly 196 | - **Natural language prompts** — Full sentences, like talking to a designer 197 | - **Search when helpful** — Use "Search the web for..." for logos, branding, or supplementary context 198 | 199 | --- 200 | 201 | *Now give me the news and let's make something incredible.* 202 | -------------------------------------------------------------------------------- /2025_episodes/Q1 2025/March 2025/ThursdAI_-_Mar_6_2025_-_Alibabas_R1_Killer_QwQ_Exclusive_Google_AI_Mode_Chat_and_MCP_fever_sweeping_.md: -------------------------------------------------------------------------------- 1 | # ThursdAI - Mar 6, 2025 - Alibaba's R1 Killer QwQ, Exclusive Google AI Mode Chat, and MCP fever sweeping the community! 2 | 3 | **Date:** March 06, 2025 4 | **Duration:** 1:50:59 5 | **Link:** [https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | What is UP folks! Alex here from Weights & Biases (yeah, still, but check this weeks buzz section below for some news!) 12 | 13 | I really really enjoyed today's episode, I feel like I can post it unedited it was so so good. We started the show with our good friend Junyang Lin from Alibaba Qwen, where he told us about their new 32B reasoner QwQ. Then we interviewed Google's VP of the search product, Robby Stein, who came and told us about their upcoming AI mode in Google! I got access and played with it, and it made me switch back from PPXL as my main. 14 | 15 | And lastly, I recently became fully MCP-pilled, since we covered it when it came out over thanksgiving, I saw this acronym everywhere on my timeline but only recently "got it" and so I wanted to have an MCP deep dive, and boy... did I get what I wished for! You absolutely should tune in to the show as there's no way for me to cover everything we covered about MCP with Dina and Jason! ok without, further adieu.. let's dive in (and the TL;DR, links and show notes in the end as always!) 16 | 17 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 18 | 19 | 🤯 Alibaba's QwQ-32B: Small But Shocking Everyone! 20 | 21 | The open-source LLM segment started strong, chatting with friend of the show Junyang Justin Lin from Alibaba’s esteemed Qwen team. They've cooked up something quite special: QwQ-32B, a reasoning-focused, reinforcement-learning-boosted beast that punches remarkably above its weight. We're talking about a mere 32B parameters model holding its ground on tough evaluations against DeepSeek R1, a 671B behemoth! 22 | 23 | Here’s how wild this is: You can literally run QwQ on your Mac! Junyang shared that they applied two solid rounds of RL to amp its reasoning, coding, and math capabilities, integrating agents into the model to fully unlock its abilities. When I called out how insane it was that we’ve gone from "LLMs can't do math" to basically acing competitive math benchmarks like AIME24, Junyang calmly hinted that they're already aiming for unified thinking/non-thinking models. Sounds wild, doesn’t it? 24 | 25 | Check out the full QwQ release [here](https://huggingface.co/Qwen/QwQ-32B), or dive into their [blog post](https://qwenlm.github.io/blog/qwq-32b/). 26 | 27 | 🚀 Google Launches AI Mode: Search Goes Next-Level ([X](https://www.google.com/url?sa=E&q=https%3A%2F%2Fx.com%2Faltryne%2Fstatus%2F1897381479459811368), [Blog](https://www.google.com/url?sa=E&q=https%3A%2F%2Fblog.google%2Fproducts%2Fsearch%2Fai-mode-search%2F), [My Live Reaction](https://www.google.com/url?sa=E&q=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D5QTveQpq1WI%26feature%3Dyoutu.be)). 28 | 29 | For the past two years, on this very show, we've been asking, "Where's Google?" in the Gen AI race. Well, folks, they're *back*. And they're back in a *big* way. 30 | 31 | Next, we were thrilled to have Google’s own Robby Stein, VP of Product for Google Search, drop by ThursdAI after their massive launch of AI Mode and expanded AI Overviews leveraging Gemini 2.0. Robby walked us through this massive shift, which essentially brings advanced conversational AI capabilities straight into Google. Seriously — Gemini 2.0 is now out here doing complex reasoning while performing fan-out queries behind the scenes in Google's infrastructure. 32 | 33 | Google search is literally Googling itself. No joke. "We actually have the model generating fan-out queries — Google searches within searches — to collect accurate, fresh, and verified data," explained Robby during our chat. And I gotta admit, after playing with AI Mode, Google is definitely back in the game—real-time restaurant closures, stock analyses, product comparisons, and it’s conversational to boot. You can check my blind reaction first impression video [here](https://www.youtube.com/watch?v=5QTveQpq1WI). (also, while you're there, why not subscribe to my YT?) 34 | 35 | Google has some huge plans, but right now AI Mode is rolling out slowly via Google Labs for Google One AI Premium subscribers first. More soon though! 36 | 37 | 🐝 This Week's Buzz: Weights & Biases Joins CoreWeave Family! 38 | 39 | Huge buzz (in every sense of the word) from Weights & Biases, who made waves with their announcement this week: We've joined forces with CoreWeave! Yeah, that's big news as CoreWeave, the AI hyperscaler known for delivering critical AI infrastructure, has now acquired Weights & Biases to build the ultimate end-to-end AI platform. It's early days of this exciting journey, and more details are emerging, but safe to say: the future of Weights & Biases just got even more exciting. Congrats to the whole team at Weights & Biases and our new colleagues at CoreWeave! 40 | 41 | We're committed to all users of WandB so you will be able to keep using Weights & Biases, and we'll continuously improve our offerings going forward! Personally, also nothing changes for ThursdAI! 🎉 42 | 43 | MCP Takes Over: Giving AI agents super powers via standardized protocol 44 | 45 | Then things got insanely exciting. Why? Because MCP is blowing up and I had to find out why everyone's timeline (mine included) just got invaded. 46 | 47 | Welcoming Cloudflare’s amazing product manager Dina Kozlov and Jason Kneen—MCP master and creator—things quickly got mind-blowing. MCP servers, Jason explained, are essentially tool wrappers that effortlessly empower agents with capabilities like API access and even calling other LLMs—completely seamlessly and securely. According to Jason, "we haven't even touched the surface yet of what MCP can do—these things are Lego bricks ready to form swarms and even self-evolve." 48 | 49 | Dina broke down just how easy it is to launch MCP servers on Cloudflare Workers while teasing exciting upcoming enhancements. Both Dina and Jason shared jaw-dropping examples, including composing complex workflows connecting Git, Jira, Gmail, and even smart home controls—practically instantaneously! Seriously, my mind is still spinning. 50 | 51 | The MCP train is picking up steam, and something tells me we'll be talking about this revolutionary agent technology a lot more soon. Check out two great MCP directories that popped up this recently: [Smithery](https://smithery.ai/), [Cursor Directory](https://cursor.directory/mcp) and [Composio](https://mcp.composio.dev/). 52 | 53 | This show was one of the best ones we recorded, honestly, I barely need to edit it. It was also a really really fun livestream, so if you prefer seeing to listening, here's the lightly edited live stream 54 | 55 | Thank you for being a ThursdAI subscriber, as always here's the TL:DR and shownotes for everything that happened in AI this week and the things we mentioned (and hosts we had) 56 | 57 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 58 | 59 | TL;DR and Show Notes 60 | 61 | * **Show Notes & Guests** 62 | 63 | * **Alex Volkov** - AI Eveangelist & Weights & Biases ([@altryne](https://x.com/altryne)) 64 | 65 | * **Co Hosts - ** [@WolframRvnwlf](https://x.com/WolframRvnwlf) [@ldjconfirmed](https://x.com/ldjconfirmed) [@nisten](https://x.com/nisten) 66 | 67 | * **Junyang Justin Lin** - Head of Qwen Team, Alibaba - [@JustinLin610](https://x.com/JustinLin610) 68 | 69 | * **Robby Stein** - VP of Product, Google Search - [@rmstein](https://x.com/rmstein/status/1897417750622216574) 70 | 71 | * **Dina Kozlov** - Product Manager, Cloudflare - [@dinasaur_404](https://x.com/dinasaur_404) 72 | 73 | * **Jason Kneen** - MCP Wiz - [@jasonkneen](https://x.com/jasonkneen) 74 | 75 | * My Google AI Mode Blind Reaction Video ([Youtube](https://www.youtube.com/watch?v=5QTveQpq1WI)) 76 | 77 | * Sesame Maya Conversation Demo - ([Youtube](https://www.youtube.com/watch?v=pI_WARqK_X4&t=1s)) 78 | 79 | * Cloudflare MCP docs ([Blog](https://blog.cloudflare.com/model-context-protocol/)) 80 | 81 | * Weights & Biases Agents Course Pre-signup - [https://wandb.me/agents](https://wandb.me/agents) 82 | 83 | * **Open Source LLMs** 84 | 85 | * Qwen's latest reasoning model **QwQ-32B** - matches R1 on some evals ([X](https://x.com/Alibaba_Qwen/status/1897361654763151544), [Blog](https://qwenlm.github.io/blog/qwq-32b/), [HF](https://huggingface.co/Qwen/QwQ-32B), [Chat](https://huggingface.co/spaces/Qwen/QwQ-32B-Demo)) 86 | 87 | * Cohere4ai - Aya Vision - 8B & 32B ([X](https://x.com/CohereForAI/status/1896923657470886234), [HF](https://huggingface.co/collections/CohereForAI/c4ai-aya-vision-67c4ccd395ca064308ee1484?ref=cohere-ai.ghost.io)) 88 | 89 | * AI21 - Jamba 1.6 Large & Jamba 1.6 Mini ([X](https://x.com/AI21Labs/status/1897657953261601151), [HF](https://huggingface.co/ai21labs/AI21-Jamba-Large-1.6)) 90 | 91 | * **Big CO LLMs + APIs** 92 | 93 | * Google announces AI Mode & AI Overviews Gemini 2.0 ([X](https://x.com/altryne/status/1897381479459811368), [Blog](https://blog.google/products/search/ai-mode-search/), [My Live Reaction](https://www.youtube.com/watch?v=5QTveQpq1WI&feature=youtu.be)) 94 | 95 | * OpenAI rolls out GPT 4.5 to plus users - #1 on LM Arena 🔥 ([X](https://x.com/lmarena_ai/status/1896590146465579105)) 96 | 97 | * Grok Voice is available for free users as well ([X](https://x.com/ebbyamir/status/1897118801231249818)) 98 | 99 | * Elysian Labs launches Auren ios app ([X](https://x.com/nearcyan/status/1897466463314936034), [App Store](https://auren.app)) 100 | 101 | * Mistral announces SOTA OCR ([Blog](https://mistral.ai/news/mistral-ocr)) 102 | 103 | * **This weeks Buzz** 104 | 105 | * Weights & Biases is acquired by CoreWeave 🎉 ([Blog](https://wandb.ai/wandb/wb-announcements/reports/W-B-being-acquired-by-CoreWeave--VmlldzoxMTY0MDI1MQ)) 106 | 107 | * **Vision & Video** 108 | 109 | * Tencent HYVideo img2vid is finally here ([X](https://x.com/TXhunyuan/status/1897558826519556325), [HF](https://huggingface.co/tencent/HunyuanVideo-I2V), [Try It](https://video.hunyuan.tencent.com/)) 110 | 111 | * **Voice & Audio** 112 | 113 | * NotaGen - symbolic music generation model **high-quality classical sheet music** [Github](https://github.com/ElectricAlexis/NotaGen), [Demo](https://electricalexis.github.io/notagen-demo/), [HF](https://huggingface.co/ElectricAlexis/NotaGen) 114 | 115 | * Sesame takes the world by storm with their amazing voice model ([My Reaction](https://www.youtube.com/watch?v=pI_WARqK_X4&t=1s)) 116 | 117 | * **AI Art & Diffusion & 3D** 118 | 119 | * MiniMax__AI - Image-01: A Versatile Text-to-Image Model at 1/10 the Cost ([X](https://x.com/MiniMax__AI/status/1896475931809817015), [Try it](https://t.co/ATyAN03H1F)) 120 | 121 | * Zhipu AI - CogView 4 6B - ([X](https://x.com/ChatGLM/status/1896824917880148450), [Github](https://t.co/O8btwDugWI)) 122 | 123 | * **Tools** 124 | 125 | * Google - DataScience agent in GoogleColab [Blog](https://developers.googleblog.com/en/data-science-agent-in-colab-with-gemini/) 126 | 127 | * Baidu Miaoda - nocode AI build tool 128 | 129 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-mar-6-2025-alibabas-r1-killer?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1ODU0NzU0NiwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.M-voYlDuwrNaChuTN_344BQU7iE_0xVmc53La1lJzJQ&utm_campaign=CTA_5). 130 | -------------------------------------------------------------------------------- /parse_rss.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """ 3 | Parse ThursdAI RSS feed and organize 2025 episodes into folders by quarter and month. 4 | 5 | Creates folder structure: 6 | - Q1 2025/ 7 | - January 2025/ 8 | - episode_name.md 9 | - ... 10 | - January_2025_combined.md 11 | - February 2025/ 12 | - ... 13 | - Q1_2025_combined.md 14 | - Q2 2025/ 15 | - ... 16 | etc. 17 | """ 18 | 19 | import xml.etree.ElementTree as ET 20 | from datetime import datetime 21 | from pathlib import Path 22 | from html import unescape 23 | import re 24 | from collections import defaultdict 25 | 26 | 27 | def parse_rss(file_path: str) -> list[dict]: 28 | """Parse the RSS file and return a list of episode dictionaries.""" 29 | tree = ET.parse(file_path) 30 | root = tree.getroot() 31 | 32 | # Define namespaces used in the RSS 33 | namespaces = { 34 | 'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd', 35 | 'dc': 'http://purl.org/dc/elements/1.1/', 36 | 'content': 'http://purl.org/rss/1.0/modules/content/', 37 | } 38 | 39 | episodes = [] 40 | 41 | # Find all items in the channel 42 | channel = root.find('channel') 43 | if channel is None: 44 | print("No channel found in RSS") 45 | return episodes 46 | 47 | for item in channel.findall('item'): 48 | episode = {} 49 | 50 | # Extract title 51 | title_elem = item.find('title') 52 | if title_elem is not None and title_elem.text: 53 | episode['title'] = clean_cdata(title_elem.text) 54 | else: 55 | episode['title'] = 'Untitled Episode' 56 | 57 | # Extract publication date 58 | pub_date_elem = item.find('pubDate') 59 | if pub_date_elem is not None and pub_date_elem.text: 60 | episode['pub_date_raw'] = pub_date_elem.text 61 | episode['pub_date'] = parse_date(pub_date_elem.text) 62 | else: 63 | continue # Skip items without dates 64 | 65 | # Extract description 66 | desc_elem = item.find('description') 67 | if desc_elem is not None and desc_elem.text: 68 | episode['description'] = clean_cdata(desc_elem.text) 69 | else: 70 | episode['description'] = '' 71 | 72 | # Extract link 73 | link_elem = item.find('link') 74 | if link_elem is not None and link_elem.text: 75 | episode['link'] = link_elem.text 76 | else: 77 | episode['link'] = '' 78 | 79 | # Extract creator 80 | creator_elem = item.find('dc:creator', namespaces) 81 | if creator_elem is not None and creator_elem.text: 82 | episode['creator'] = clean_cdata(creator_elem.text) 83 | else: 84 | episode['creator'] = '' 85 | 86 | # Extract duration 87 | duration_elem = item.find('itunes:duration', namespaces) 88 | if duration_elem is not None and duration_elem.text: 89 | episode['duration'] = format_duration(duration_elem.text) 90 | else: 91 | episode['duration'] = '' 92 | 93 | # Extract audio URL 94 | enclosure_elem = item.find('enclosure') 95 | if enclosure_elem is not None: 96 | episode['audio_url'] = enclosure_elem.get('url', '') 97 | else: 98 | episode['audio_url'] = '' 99 | 100 | # Extract image URL 101 | image_elem = item.find('itunes:image', namespaces) 102 | if image_elem is not None: 103 | episode['image_url'] = image_elem.get('href', '') 104 | else: 105 | episode['image_url'] = '' 106 | 107 | episodes.append(episode) 108 | 109 | return episodes 110 | 111 | 112 | def clean_cdata(text: str) -> str: 113 | """Clean CDATA wrapper and unescape HTML entities.""" 114 | if text is None: 115 | return '' 116 | # Remove CDATA wrapper if present 117 | text = text.strip() 118 | if text.startswith(''): 121 | text = text[:-3] 122 | return unescape(text).strip() 123 | 124 | 125 | def parse_date(date_str: str) -> datetime: 126 | """Parse RFC 2822 date format used in RSS feeds.""" 127 | # Example: 'Fri, 05 Dec 2025 01:03:51 GMT' 128 | try: 129 | return datetime.strptime(date_str, '%a, %d %b %Y %H:%M:%S %Z') 130 | except ValueError: 131 | # Try without timezone 132 | try: 133 | return datetime.strptime(date_str[:25], '%a, %d %b %Y %H:%M:%S') 134 | except ValueError: 135 | return datetime.now() 136 | 137 | 138 | def format_duration(duration_str: str) -> str: 139 | """Format duration from seconds to HH:MM:SS.""" 140 | try: 141 | seconds = int(duration_str) 142 | hours = seconds // 3600 143 | minutes = (seconds % 3600) // 60 144 | secs = seconds % 60 145 | if hours > 0: 146 | return f"{hours}:{minutes:02d}:{secs:02d}" 147 | return f"{minutes}:{secs:02d}" 148 | except ValueError: 149 | return duration_str 150 | 151 | 152 | def get_quarter(month: int) -> int: 153 | """Get quarter number from month number.""" 154 | return (month - 1) // 3 + 1 155 | 156 | 157 | def sanitize_filename(title: str) -> str: 158 | """Create a safe filename from a title.""" 159 | # Remove emojis and special characters 160 | title = re.sub(r'[^\w\s\-]', '', title) 161 | # Replace spaces with underscores 162 | title = re.sub(r'\s+', '_', title) 163 | # Limit length 164 | return title[:100] 165 | 166 | 167 | def html_to_markdown(html_content: str) -> str: 168 | """Convert HTML content to markdown (basic conversion).""" 169 | text = html_content 170 | 171 | # Replace common HTML tags 172 | text = re.sub(r'', '\n', text) 173 | text = re.sub(r'

', '\n\n', text) 174 | text = re.sub(r'

', '', text) 175 | text = re.sub(r'', '**', text) 176 | text = re.sub(r'', '**', text) 177 | text = re.sub(r'', '*', text) 178 | text = re.sub(r'', '*', text) 179 | text = re.sub(r']*href="([^"]*)"[^>]*>([^<]*)', r'[\2](\1)', text) 180 | text = re.sub(r'', '\n', text) 181 | text = re.sub(r'
  • ', '\n* ', text) 182 | text = re.sub(r'
  • ', '', text) 183 | text = re.sub(r'', '\n## ', text) 184 | text = re.sub(r'', '\n', text) 185 | 186 | # Remove any remaining HTML tags 187 | text = re.sub(r'<[^>]+>', '', text) 188 | 189 | # Clean up multiple newlines 190 | text = re.sub(r'\n{3,}', '\n\n', text) 191 | 192 | return text.strip() 193 | 194 | 195 | def create_episode_markdown(episode: dict) -> str: 196 | """Create markdown content for a single episode.""" 197 | date_str = episode['pub_date'].strftime('%B %d, %Y') 198 | 199 | content = f"""# {episode['title']} 200 | 201 | **Date:** {date_str} 202 | **Duration:** {episode['duration']} 203 | **Link:** [{episode['link']}]({episode['link']}) 204 | 205 | --- 206 | 207 | ## Description 208 | 209 | {html_to_markdown(episode['description'])} 210 | """ 211 | return content 212 | 213 | 214 | def create_combined_markdown(episodes: list[dict], period_name: str) -> str: 215 | """Create combined markdown for multiple episodes.""" 216 | content = f"""# ThursdAI Episodes - {period_name} 217 | 218 | Total Episodes: {len(episodes)} 219 | 220 | --- 221 | 222 | """ 223 | # Sort episodes by date (newest first) 224 | sorted_episodes = sorted(episodes, key=lambda x: x['pub_date'], reverse=True) 225 | 226 | for episode in sorted_episodes: 227 | date_str = episode['pub_date'].strftime('%B %d, %Y') 228 | content += f"""## {episode['title']} 229 | 230 | **Date:** {date_str} 231 | **Duration:** {episode['duration']} 232 | **Link:** [{episode['link']}]({episode['link']}) 233 | 234 | {html_to_markdown(episode['description'])} 235 | 236 | --- 237 | 238 | """ 239 | return content 240 | 241 | 242 | def main(): 243 | """Main function to parse RSS and create folder structure.""" 244 | script_dir = Path(__file__).parent 245 | rss_file = script_dir / 'all_thursdai.rss' 246 | 247 | if not rss_file.exists(): 248 | print(f"RSS file not found: {rss_file}") 249 | return 250 | 251 | print(f"Parsing RSS file: {rss_file}") 252 | episodes = parse_rss(str(rss_file)) 253 | print(f"Found {len(episodes)} total episodes") 254 | 255 | # Filter for 2025 episodes only 256 | episodes_2025 = [ep for ep in episodes if ep['pub_date'].year == 2025] 257 | print(f"Found {len(episodes_2025)} episodes from 2025") 258 | 259 | if not episodes_2025: 260 | print("No episodes found for 2025!") 261 | return 262 | 263 | # Organize by quarter and month 264 | quarters = defaultdict(lambda: defaultdict(list)) 265 | 266 | for episode in episodes_2025: 267 | month = episode['pub_date'].month 268 | quarter = get_quarter(month) 269 | month_name = episode['pub_date'].strftime('%B') 270 | 271 | quarters[quarter][month_name].append(episode) 272 | 273 | # Create folder structure and files 274 | output_dir = script_dir / '2025_episodes' 275 | output_dir.mkdir(exist_ok=True) 276 | 277 | for quarter_num in sorted(quarters.keys()): 278 | quarter_name = f"Q{quarter_num} 2025" 279 | quarter_dir = output_dir / quarter_name 280 | quarter_dir.mkdir(exist_ok=True) 281 | 282 | quarter_episodes = [] 283 | 284 | for month_name in sorted(quarters[quarter_num].keys(), 285 | key=lambda x: datetime.strptime(x, '%B').month): 286 | month_full_name = f"{month_name} 2025" 287 | month_dir = quarter_dir / month_full_name 288 | month_dir.mkdir(exist_ok=True) 289 | 290 | month_episodes = quarters[quarter_num][month_name] 291 | quarter_episodes.extend(month_episodes) 292 | 293 | # Create individual episode files 294 | for episode in month_episodes: 295 | filename = sanitize_filename(episode['title']) + '.md' 296 | filepath = month_dir / filename 297 | 298 | content = create_episode_markdown(episode) 299 | filepath.write_text(content, encoding='utf-8') 300 | print(f" Created: {filepath.relative_to(script_dir)}") 301 | 302 | # Create combined file for the month 303 | combined_filename = f"{month_name}_2025_combined.md" 304 | combined_path = quarter_dir / combined_filename 305 | combined_content = create_combined_markdown(month_episodes, month_full_name) 306 | combined_path.write_text(combined_content, encoding='utf-8') 307 | print(f"Created monthly combined: {combined_path.relative_to(script_dir)}") 308 | 309 | # Create combined file for the quarter 310 | quarter_combined_filename = f"Q{quarter_num}_2025_combined.md" 311 | quarter_combined_path = quarter_dir / quarter_combined_filename 312 | quarter_combined_content = create_combined_markdown(quarter_episodes, quarter_name) 313 | quarter_combined_path.write_text(quarter_combined_content, encoding='utf-8') 314 | print(f"Created quarterly combined: {quarter_combined_path.relative_to(script_dir)}") 315 | 316 | print(f"\nDone! Output written to: {output_dir}") 317 | print("\nFolder structure:") 318 | print_tree(output_dir, script_dir) 319 | 320 | 321 | def print_tree(path: Path, base: Path, prefix: str = ""): 322 | """Print a tree structure of the directory.""" 323 | entries = sorted(path.iterdir(), key=lambda x: (not x.is_dir(), x.name)) 324 | 325 | for i, entry in enumerate(entries): 326 | is_last = i == len(entries) - 1 327 | current_prefix = "└── " if is_last else "├── " 328 | print(f"{prefix}{current_prefix}{entry.name}") 329 | 330 | if entry.is_dir(): 331 | next_prefix = prefix + (" " if is_last else "│ ") 332 | print_tree(entry, base, next_prefix) 333 | 334 | 335 | if __name__ == '__main__': 336 | main() 337 | -------------------------------------------------------------------------------- /2025_episodes/Q3 2025/August 2025/_ThursdAI_-_GPT5_is_here.md: -------------------------------------------------------------------------------- 1 | # 📅 ThursdAI - GPT5 is here 2 | 3 | **Date:** August 07, 2025 4 | **Duration:** 2:56:19 5 | **Link:** [https://sub.thursdai.news/p/thursdai-gpt5-is-here](https://sub.thursdai.news/p/thursdai-gpt5-is-here) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey folks 👋 Alex here, writing to you, from a makeshift recording studio in an Eastern European hookah bar, where I spent the last 7 hours. Why you ask? Well, when GPT-5 drops, the same week as OpenAI dropping the long awaited OSS models + Google is shipping perfect memory World Models (Genie 3) and tons of other AI drops, well I just couldn't stay away from the stream. 12 | 13 | Vacation or not, ThursdAI is keeping you up to date (for 32 months straight, which is also the time since the original GPT-4 release which gave this show its name!) 14 | 15 | So, what did we have today on the stream? Well, we started as usual, talking about the AI releases of the week, as if OpenAI dropping OSS models (apache 2) 120B and 20B is "usual". We then covered incredible releases like Google's World model Genie3 (more on this next week!) and Qwen-image + a few small Qwens. 16 | 17 | We then were VERY excited to tune in, and watch the (very long) announcement stream from OpenAI, in which they spent an hour to tell us about GPT-5. 18 | 19 | This was our longest stream by far (3.5 hours, 1hr was just OpenAI live stream) and I'm putting this here mostly unedited, but chapters are up so feel free to skip to the parts that are interesting to you the most. 20 | 21 | 00:00 Introduction and Special Guests 22 | 23 | 00:56 Twitter Space and Live Streaming Plans 24 | 25 | 02:12 Open Source AI Models Overview 26 | 27 | 03:44 Qwen and Other New AI Models 28 | 29 | 08:59 Community Interaction and Comments 30 | 31 | 10:01 Technical Deep Dive into AI Models 32 | 33 | 25:06 OpenAI's New Releases and Benchmarks 34 | 35 | 38:49 Expectations and Use Cases for AI Models 36 | 37 | 40:03 Tool Use vs. Deep Knowledge in AI 38 | 39 | 41:02 Evaluating GPT OSS and OpenAI Critique 40 | 41 | 42:29 Historical and Medical Knowledge in AI 42 | 43 | 51:16 Opus 4.1 and Coding Models 44 | 45 | 55:38 Google's Genie 3: A New World Model 46 | 47 | 01:00:43 Kitten TTS: A Lightweight Text-to-Speech Model 48 | 49 | 01:02:07 11 Labs' Music Generation AI 50 | 51 | 01:08:51 OpenAI's GPT-5 Launch Event 52 | 53 | 01:24:33 Building a French Learning Web App 54 | 55 | 01:26:22 Exploring the Web App Features 56 | 57 | 01:29:19 Introducing Enhanced Voice Features 58 | 59 | 01:30:02 Voice Model Demonstrations 60 | 61 | 01:32:32 Personalizing Chat GPT 62 | 63 | 01:33:23 Memory and Scheduling Features 64 | 65 | 01:35:06 Safety and Training Enhancements 66 | 67 | 01:39:17 Health Applications of GPT-5 68 | 69 | 01:45:07 Coding with GPT-5 70 | 71 | 01:46:57 Advanced Coding Capabilities 72 | 73 | 01:52:59 Real-World Coding Demonstrations 74 | 75 | 02:10:26 Enterprise Applications of GPT-5 76 | 77 | 02:11:49 Amgen's Use of GPT-5 in Drug Design 78 | 79 | 02:12:09 BBVA's Financial Analysis with GPT-5 80 | 81 | 02:12:33 Healthcare Applications of GPT-5 82 | 83 | 02:12:52 Government Adoption of GPT-5 84 | 85 | 02:13:22 Pricing and Availability of GPT-5 86 | 87 | 02:13:51 Closing Remarks by Chief Scientist Yakob 88 | 89 | 02:16:03 Live Reactions and Discussions 90 | 91 | 02:16:41 Technical Demonstrations and Comparisons 92 | 93 | 02:33:53 Healthcare and Scientific Advancements with GPT-5 94 | 95 | 02:47:09 Final Thoughts and Wrap-Up 96 | 97 | --- 98 | 99 | My first reactions to GPT-5 100 | 101 | Look, I gotta keep it real with you, my first gut reaction was, hey, I'm on vacation, I don't have time to edit and write the newsletter (EU timezone) so let's see how ChatGPT-5 handles this task. After all, OpenAI has removed all other models from the dropdown, it's all GPT-5 now. (pricing from the incredible writeup by [Simon Willison](https://substack.com/profile/5753967-simon-willison) available [here](https://simonwillison.net/2025/Aug/7/gpt-5/)) 102 | 103 | And to tell you the truth, I was really disappointed! GPT seems to be incredible at coding benchmarks, with 400K tokens and incredible pricing (just $1.25/$10 compared to Opus $15/$75) this model, per the many friends who got to test it early, is a beast at coding! Readily beating opus on affordability per token, switching from thinking to less thinking when it needs to, it definitely seems like a great improvement for coding and agentic tasks. 104 | 105 | But for my, very much honed prompt of "hey, help me with ThursdAI drafts, here's previous drafts that I wrote myself, mimic my tone" it failed.. spectacularly! 106 | 107 | Here's just a funny example, after me replying that it did a bad job: 108 | 109 | It literally wrote "I'm Alex, I build the mind, not the vibe" 🤦‍♂️ What.. the actual... 110 | 111 | For comparison, here's o3, with the same prompt, with a fairly true to tone draft: 112 | 113 | High taste testers take on GPT-5 114 | 115 | But hey, I have tons of previous speakers in our group chats, and many of them who got early access (I didn't... OpenAI, I can be trusted lol) rave about this model. They are saying that this is a huge jump in intelligence. 116 | 117 | Folks like Dr Derya Unutmaz, who jumped on the live show and described how GPT5 does incredible things with less hallucinations, folks like Swyx from [Latent.Space](https://substack.com/profile/89230629-latentspace) who had [early access](https://www.latent.space/p/gpt-5-review) and even got invited to give first reactions to the OpenAI office, and [Pietro Schirano](https://x.com/skirano/status/1953516768317628818) who also showed up in an OpenAI video. 118 | 119 | So definitely, definitely check out their vibes, as we all try to wrap our heads around this new intelligence king we got! 120 | 121 | Other GPT5 updates 122 | 123 | OpenAI definitely cooked, don't get me wrong, with this model plugging into everything else in their platform like memory, voice (that was upgraded and works in custom GPTs now, yay!), canvas and study mode, this will definitely be an upgrade for many folks using the models. 124 | 125 | They have now also opened access to GPT-5 to free users, just in time for schools to reopen, including a very interesting Quiz mode (that just showed up for me without asking for it), and connection to Gmail, all those will now work with GPT5. 126 | 127 | It now has 400K context, way less hallucinations but fewer refusals also, and the developer upgrades like a new verbosity setting and a new "minimal" reasoning setting are all very welcome! 128 | 129 | OpenAI finally launches gpt-oss (120B / 20B) apache 2 licensed models ([model card](https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf), HF) 130 | 131 | It was really funny, on the stream Nisten talked about the open source models OpenAI dropped, and said "when we covered it last week", while it was just two days ago! It really does feel like this world is moving really fast. 132 | 133 | OpenAI's long promised open source models are here, and they got a fairly mixed bag of reviews from folks. Many folks are celebrating that the western world is now back in the game, releasing incredible local models, with an open license! 134 | 135 | Though, after the initial excitement, the vibes are split on these models. Folks are saying that maybe these were trained with only synthetic data, because, like Phi, they seem to be very good at benchmarks, and on the specific tasks they were optimized for (code, math) but [really bad](https://x.com/sam_paech/status/1952839665670922360) at creative writing (Sam Paech from EQBench was not impressed), they are also not multilingual, though OpenAI did release a cookbook [on finetuning](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) with HuggingFace! 136 | 137 | Overall, these models are trained for agentic workflows—supporting function calling, web search, Python execution, configurable reasoning effort, and full raw chain-of-thought access, which we will never get from GPT5. 138 | 139 | I particularly love the new approach, where a reasoning effort can be defined directly via the system prompt, by just adding "reasoning: high" to the system prompt, this model will reason for way longer! Can't wait to get back and bench these and share with you. 140 | 141 | Overall, the fine-tuning and open source community is split for now, but it's been only a few days, so we'll keep you up to date on how well these models land, regardless, this was a historic week for OpenAI! 142 | 143 | Speaking of open models, did you have a chance to try our W&B Inference? The team worked hard to bring these new models to you in record time and incredible pricing (just $.05 for 20B and $.15 for 120B!), these models are definitely worth giving a try! 144 | 145 | Plus, if you comment "OSS Power" on our [announcement post](https://x.com/weights_biases/status/1952885962641699287), we'll likely give you a few credits to try it out and let us know what you think! 146 | 147 | World models "holy crap" moment - Google Genie3 148 | 149 | The other very important release this week was.... not a release at all, but an announcement from Deepmind, with Genie3. 150 | 151 | This World Model takes a single image or text prompt and creates a fully interactive, controllable 3D environment that runs in real-time at 24fps. An environment you as a user can control, walk (or fly) in, move around the camera view. It's really mindblowing stuff. 152 | 153 | We've covered world models like Mirage on previous episodes, but what Google released is a MAJOR step up in coherency, temporal consistency and just overall quality! 154 | 155 | The key breakthrough here is consistency and memory. In one demo, a user could "paint" a virtual wall, turn away, and when they turned back, the paint was still there. This is a massive step towards generalist agents that can train, plan, and reason in entirely simulated worlds, with huge implications for robotics and gaming. 156 | 157 | We’re hoping to have the Genie 3 team on the show next week to dive even deeper into this incredible technology!! 158 | 159 | Other AI news this week 160 | 161 | This week, the "other" news could have filled a full show 2 years ago, we got Qwen keeping the third week of releases with 2 new tiny models + a new diffusion model called Qwen-image ([Blog](https://qwenlm.github.io/blog/qwen-image/), [HF](https://huggingface.co/Qwen/Qwen-Image)) 162 | 163 | Anthropic decided to pre-empt the GPT5 release, and upgraded Opus 4 and gave us Opus 4.1 with a slight bump in specs. 164 | 165 | ElevenLabs released a music API called ElevenMusic, which sounds very very good (this on top of last weeks Riffusion + [Producer.ai](http://Producer.ai) news, that I'm still raving about) 166 | 167 | Also in voice an audio, a SUPER TINY TTS model called KittenTTS released, with just 15M parameters and a model that's 25MB, it's surprisingly decent at generating voice ([X](https://x.com/divamgupta/status/1952762876504187065)) 168 | 169 | And to cap it off with breaking news, the Cursor team, who showed up on the OpenAI stream today (marking quite the change in direction from OpenAI + Windsurf previous friendship), dropped their own CLI version of cursor, reminiscent of Claude Code! 170 | 171 | PHEW, wow ok this was a LOT to process. Not only did we tune in for the full GPT-5 release, we did a live stream when gpt-oss dropped as well. 172 | 173 | On a personal note, I was very humbled when Sam Altman said it was 32 months since GPT-4 release, because it means this was 32 months of ThursdAI, as many of you know, we started live streaming on March 13, 2023, when GPT-4 was released. 174 | 175 | I'm very proud of the incredible community we've built (50K views total across all streams this week!), the incredible co-hosts I have, who step up when I'm on vacation and the awesome guests we have on the show, to keep you up to date every week! 176 | 177 | So, a little favor to ask, if you find our content valuable, entertaining, the best way to support this pod is upgrade to a paid sub, and share ThursdAI with a friend or two! 👏 See you next week 🫡 178 | 179 | 180 | 181 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-gpt5-is-here/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-gpt5-is-here?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE3MDM5ODk4MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.KfVeuojm2eKEFqVLDMnVNisA0fGRe6RMdNkMMXRRzW4&utm_campaign=CTA_5). 182 | -------------------------------------------------------------------------------- /2025_AI_Year_in_Review.md: -------------------------------------------------------------------------------- 1 | # 🔥 ThursdAI 2025 AI Year in Review 2 | ## From DeepSeek's $5M Bomb to ASI in a Decade — The Year AI Went Mainstream 3 | 4 | *By Alex Volkov & the ThursdAI Crew | Based on 50+ episodes from January 2 - December 5, 2025* 5 | 6 | --- 7 | 8 | ## 🎤 A Note from Alex 9 | 10 | Friends, what a year. When I started 2025, I thought GPT-4 was the ceiling. I was so, so wrong. 11 | 12 | This year, we watched DeepSeek crash NVIDIA's stock with a $5.5M reasoning model. We saw OpenAI announce ASI within a decade—and a timeline for fully autonomous AI researchers by 2028. Google reclaimed the LLM throne with Gemini 3. GPT-5 arrived after 32 months of waiting (32 months of ThursdAI!). Sora 2 invented AI social media. A Gemma model made a novel cancer discovery that was validated in a lab. 13 | 14 | Every week brought releases that would have dominated months of news cycles just two years ago. If you're reading this and feeling overwhelmed—you're not alone. That's literally why ThursdAI exists. 15 | 16 | Let's look back at the year that changed everything. 17 | 18 | --- 19 | 20 | ## 📊 2025 By The Numbers 21 | 22 | | Metric | Value | 23 | |--------|-------| 24 | | ThursdAI Episodes | 50+ | 25 | | Major Model Releases | 200+ | 26 | | Open Source Models Released | 150+ | 27 | | Trillion-Parameter Models | 4 (Kimi K2, Ling-1T, Qwen-Max, Kimi K2 Thinking) | 28 | | Companies that hit $100B+ valuation | 4 (OpenAI, Anthropic, xAI, Google DeepMind) | 29 | | Compute Commitments | >$2 Trillion | 30 | | ThursdAI's Age | 2.5 years (turned 2 in March!) | 31 | 32 | --- 33 | 34 | ## 🏆 The 12 Releases That Defined 2025 35 | 36 | ### 1. 🐋 DeepSeek R1 (January) 37 | **The shot heard around the world.** A MIT-licensed reasoning model allegedly trained for $5.5M that: 38 | - Crashed NVIDIA stock 17% ($560B loss—largest single-company loss ever) 39 | - Hit #1 on iOS App Store 40 | - Matched o1 at 50x cheaper pricing 41 | - Its 1.5B distilled version beat GPT-4o on math 42 | 43 | *"My mom knows about DeepSeek—your grandma probably knows about it, too"* 44 | 45 | ### 2. 👑 Gemini 2.5 Pro / Gemini 3 Pro (March & November) 46 | Google's redemption arc. First reclaimed #1 in March, then doubled down in November: 47 | - 45.14% on ARC-AGI-2 (Deep Think) — 2x previous SOTA 48 | - Native image gen with Nano Banana Pro (4K, perfect text) 49 | - Antigravity IDE: free agent-first development 50 | - Generative UIs built on the fly 51 | 52 | ### 3. 🧠 GPT-5 & GPT-5.1-Codex-Max (August & November) 53 | 32 months after GPT-4, OpenAI's next frontier: 54 | - 400K → 1M context window evolution 55 | - Unified reasoning + chat architecture 56 | - **Codex-Max works 24+ hours independently** 57 | - Router-based model selection 58 | - "Compaction" for native context management 59 | 60 | ### 4. 🎬 VEO3 (May) 61 | The video model that crossed the uncanny valley: 62 | - Native multimodal audio (speech, SFX, music synced perfectly) 63 | - Characters understand who's speaking, make eye contact 64 | - Spawned "Prompt Theory" viral phenomenon 65 | - People posting real videos claiming they were AI because they couldn't tell 66 | 67 | ### 5. 📹 Sora 2 (October) 68 | AI-generated social media goes mainstream: 69 | - Shot to #3 on iOS App Store within days 70 | - **Cameos**: Upload your face, star in any video 71 | - "Pick a Mood": Control algorithm with natural language 72 | - All content is AI-generated—no uploads, only creations 73 | 74 | ### 6. 🦄 Kimi K2 (July) & Kimi K2 Thinking (November) 75 | Open source hit the trillion-parameter mark: 76 | - 1T total parameters, only 32B active 77 | - 65.8% SWE-bench (beats Claude Sonnet without reasoning) 78 | - K2 Thinking: 44.9% on Humanity's Last Exam 79 | - 200-300 sequential tool calls without intervention 80 | 81 | ### 7. 🔧 Tool-Using Reasoners: o3/o4-mini (April) 82 | The closest thing to AGI we've seen: 83 | - First models to autonomously use tools during reasoning 84 | - 600+ consecutive tool calls 85 | - Manipulate images mid-thought (zoom, crop, rotate) 86 | - o4-mini hits 99.5% on AIME with Python interpreter 87 | 88 | ### 8. 🤖 Claude 4 / Opus 4.5 (May & November) 89 | Anthropic's coding dominance: 90 | - First models to cross 80% on SWE-bench 91 | - Opus 4.5: 80.9% SWE-bench at 1/3 previous cost 92 | - "Effort" parameter for reasoning control 93 | - Claude Skills: Auto-selected instruction libraries 94 | 95 | ### 9. 🔌 MCP Becomes Universal (All Year) 96 | The Model Context Protocol won: 97 | - OpenAI, Google, Microsoft, AWS all adopted it 98 | - Prevents fragmentation (no VHS vs Betamax) 99 | - MCP Apps: Unified standard for agentic UIs 100 | - Tools work across Claude AND GPT 101 | 102 | ### 10. 🧬 AI Makes Scientific Discovery (October) 103 | C2S-Scale from Google & Yale: 104 | - Generated novel hypothesis about cancer cells 105 | - **Validated in a wet lab** on living cells 106 | - First counter-evidence to "stochastic parrot" criticism 107 | - AI as genuine scientific collaborator 108 | 109 | ### 11. 🤖 Consumer Humanoid Robots (October) 110 | 1X NEO: $20,000, delivery early 2026: 111 | - Handles cleaning, laundry, household chores 112 | - Teleoperation by humans for complex tasks 113 | - Soft, quiet design at 66 lbs 114 | - The robot future is here 115 | 116 | ### 12. 🚀 OpenAI's ASI Roadmap (October) 117 | Sam Altman dropped unprecedented timelines: 118 | - **ASI in less than a decade** 119 | - AI research intern by September 2026 120 | - Fully autonomous AI researcher by March 2028 121 | - $1.4 trillion in compute obligations 122 | 123 | --- 124 | 125 | ## 📈 The Year in Themes 126 | 127 | ### 🧠 Theme 1: Reasoning Models Go Mainstream 128 | DeepSeek R1 proved reasoning doesn't require massive scale. By year end: 129 | - Small models (1.5B) beat GPT-4o on math with RL 130 | - o3/o4-mini added tool use to chain-of-thought 131 | - GPT-5 unified reasoning + chat into one model 132 | - Open source reasoning matched frontier (Qwen3, Kimi K2) 133 | 134 | ### 🇨🇳 Theme 2: Chinese Labs Dominated Open Source 135 | Despite chip export restrictions: 136 | - **DeepSeek**: R1, V3.2-Speciale (olympiad gold medals) 137 | - **Alibaba/Qwen**: 3, 3-Coder, 3-VL, 3-Omni families 138 | - **MiniMax**: M1, M2 (Sonnet at 8% cost) 139 | - **Moonshot/Kimi**: K2, K2 Thinking (trillion-scale) 140 | - **ByteDance**: HunyuanVideo, SeeDream, Z-Image 141 | 142 | ### 🤖 Theme 3: 2025 Was The Year of Agents 143 | Every quarter brought more agentic capabilities: 144 | - **Q1**: OpenAI Operator, MCP adoption 145 | - **Q2**: Jules, Codex, tool-using reasoners 146 | - **Q3**: ChatGPT Agent (Odyssey), Agents.md standard 147 | - **Q4**: Atlas browser, AgentKit, Antigravity IDE 148 | 149 | ### 🎥 Theme 4: Video AI Crossed the Uncanny Valley 150 | The progression was staggering: 151 | - **VEO3** (May): Native audio, perfect lip sync 152 | - **Sora 2** (October): Social media with Cameos 153 | - **Kling 2.6** (December): Native audio generation 154 | - Reality became genuinely hard to verify 155 | 156 | ### 💰 Theme 5: Unprecedented Investment 157 | The numbers are almost incomprehensible: 158 | - OpenAI: $1.4 trillion compute obligations 159 | - NVIDIA-OpenAI: $100B pledge 160 | - OpenAI-Oracle: $300B deal 161 | - CoreWeave: $22.4B OpenAI + $14.2B Meta + $6.3B NVIDIA 162 | - Anthropic: $13B raise at $183B valuation 163 | - Project Stargate: $500B AI infrastructure 164 | 165 | ### 🌍 Theme 6: World Models Became Playable 166 | From images to interactive worlds: 167 | - **Google Genie-3**: Controllable 3D at 24fps 168 | - **World Labs RTFM**: Real-time on single H100 169 | - **Hunyuan GameCraft**: Games with physics 170 | - **Oasis 2.0**: Real-time Minecraft reskinning 171 | 172 | --- 173 | 174 | ## 📅 Quarter-by-Quarter Highlights 175 | 176 | ### Q1: "The Quarter That Changed Everything" 177 | **January-March 2025** 178 | 179 | ![Q1 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q1%202025%20ThursdAI%20Q1%20infographic.jpg) 180 | 181 | - 🐋 DeepSeek R1 crashed NVIDIA ($560B loss) 182 | - 🤖 OpenAI Operator (agentic ChatGPT) 183 | - 💫 Project Stargate ($500B infrastructure) 184 | - 👑 Gemini 2.5 Pro takes #1 185 | - 🎨 GPT-4o native image gen (Ghibli-mania) 186 | - 🔌 OpenAI adopts MCP 187 | 188 | ### Q2: "The Quarter That Shattered Reality" 189 | **April-June 2025** 190 | 191 | ![Q2 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q2%202025%20ThursdAI%20Q2%20infographic.jpg) 192 | 193 | - 🧠 o3/o4-mini (tool-using reasoners) 194 | - 🎬 VEO3 (native audio, uncanny valley crossed) 195 | - 🔥 Qwen 3 (Apache 2.0, 8 models) 196 | - 🤖 Claude 4 Opus & Sonnet (80% SWE-bench) 197 | - 📚 GPT-4.1 (1M context) 198 | - 💰 Meta $15B Scale AI deal 199 | 200 | ### Q3: "GPT-5, Trillion-Scale Open Source, World Models" 201 | **July-September 2025** 202 | 203 | ![Q3 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q3%202025%20ThursdAI%20Q3%20infographic.jpeg) 204 | 205 | - 👑 GPT-5 arrives (400K context, unified reasoning) 206 | - 🦄 Kimi K2 (1T params, 65.8% SWE-bench) 207 | - 🌍 Google Genie-3 (playable AI worlds) 208 | - 🔓 GPT-OSS (Apache 2.0 from OpenAI!) 209 | - 🧑‍💻 GPT-5-Codex (7+ hour coding sessions) 210 | - 💰 NVIDIA $100B pledge, Oracle $300B deal 211 | 212 | ### Q4: "Agents, Gemini's Crown & Sora Social" 213 | **October-December 2025** 214 | 215 | ![Q4 2025 Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q4%202025%20ThursdAI%20Q3%20infographic%20(1).jpg) 216 | 217 | - 📹 Sora 2 (AI social media revolution) 218 | - 👑 Gemini 3 Pro (45% ARC-AGI-2) 219 | - 🐋 DeepSeek V3.2-Speciale (olympiad gold) 220 | - 🧠 Claude Opus 4.5 (80.9% SWE-bench) 221 | - 🚀 OpenAI ASI roadmap (2028 timeline) 222 | - 🤖 1X NEO ($20K home robot) 223 | 224 | --- 225 | 226 | ## 🥇 Best of 2025 Awards 227 | 228 | ### 🏆 Model of the Year 229 | **DeepSeek R1** — Didn't just release a model, rewrote the economics of AI 230 | 231 | ### 🏆 Open Source Champion 232 | **Qwen (Alibaba)** — 8+ model families, Apache 2.0, consistently frontier 233 | 234 | ### 🏆 Most Impactful Release 235 | **VEO3** — Crossed the uncanny valley, native audio changed everything 236 | 237 | ### 🏆 Biggest Comeback 238 | **Google** — From "where's Gemini?" to #1 twice (March & November) 239 | 240 | ### 🏆 Wildest Announcement 241 | **OpenAI ASI Roadmap** — Fully autonomous AI researchers by 2028 242 | 243 | ### 🏆 Best Surprise 244 | **Sora 2 Social Media** — Nobody expected a full social platform 245 | 246 | ### 🏆 Infrastructure Play 247 | **CoreWeave/NVIDIA** — Built the compute layer the world runs on 248 | 249 | ### 🏆 Scientific Breakthrough 250 | **C2S-Scale Cancer Discovery** — First AI-generated hypothesis validated in lab 251 | 252 | ### 🏆 Agent Ecosystem Win 253 | **MCP Protocol** — Became the universal standard everyone adopted 254 | 255 | ### 🏆 Most Underrated 256 | **Claude Skills** — Auto-selected instruction libraries, quietly revolutionary 257 | 258 | --- 259 | 260 | ## 🔮 Looking Forward: What 2026 Holds 261 | 262 | Based on everything we've seen this year, here's what's coming: 263 | 264 | 1. **GPT-5.x reasoning models** — Tool use gets even more sophisticated 265 | 2. **Open source trillion-scale becomes common** — Not just Chinese labs 266 | 3. **Agents that work for days** — Codex-Max is just the beginning 267 | 4. **Consumer humanoid robots ship** — 1X NEO, Figure, Tesla Bot 268 | 5. **AI-generated content everywhere** — The Sora 2 effect spreads 269 | 6. **Scientific discovery accelerates** — More lab-validated AI hypotheses 270 | 7. **100M token context** — Qwen roadmap suggests this is coming 271 | 8. **Real-time world models** — Gaming and simulation converge 272 | 273 | --- 274 | 275 | ## 🙏 Thank You 276 | 277 | To everyone who listened, read, shared, and built alongside us this year—thank you. ThursdAI exists because of this community. 278 | 279 | Special thanks to our incredible co-hosts: **Wolfram Ravenwolf**, **Yam Peleg**, **Nisten**, **LDJ**, and **Ryan Carson**. And to the hundreds of guests who shared their work and insights with us. 280 | 281 | We started this show because GPT-4 blew our minds. Now we're documenting the path to AGI and beyond. What a time to be alive. 282 | 283 | See you in 2026. Hold on to your butts. 284 | 285 | — **Alex Volkov** & the ThursdAI Crew 286 | 287 | --- 288 | 289 | ## 📚 Resources 290 | 291 | - **Full Quarterly Recaps**: Q1, Q2, Q3, Q4 available in this repository 292 | - **Weekly Episodes**: [thursdai.news](https://thursdai.news) 293 | - **YouTube**: [ThursdAI Channel](https://thursdai.news/yt) 294 | - **Follow Alex**: [@altryne](https://x.com/altryne) 295 | 296 | --- 297 | 298 | *"Open source AI has never been as hot as this year. We're accelerating as f*ck, and it's only just beginning—hold on to your butts."* 299 | 300 | — Alex Volkov, ThursdAI 301 | 302 | --- 303 | 304 | *Generated from 50+ ThursdAI episodes covering January - December 2025* 305 | -------------------------------------------------------------------------------- /2025_episodes/Q2 2025/June 2025/_ThursdAI_-_Jun_5_2025_-_Live_from_AI_Engineer_with_Swyx_new_Gemini_25_with_Logan_K_and_Jack_Rae_Sel.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI - Jun 5, 2025 - Live from AI Engineer with Swyx, new Gemini 2.5 with Logan K and Jack Rae, Self Replicating agents with Morph Labs 2 | 3 | **Date:** June 06, 2025 4 | **Duration:** 1:43:45 5 | **Link:** [https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey folks, this is Alex, coming to you LIVE from the AI Engineer Worlds Fair! 12 | 13 | What an incredible episode this week, we recorded live from floor 30th at the Marriott in SF, while Yam was doing live correspondence from the floor of the AI Engineer event, all while Swyx, the cohost of Latent Space podcast, and the creator of AI Engineer (both the conference and the concept itself) joined us for the whole stream - here’s the edited version, please take a look. 14 | 15 | We've had around 6500 people tune in, and at some point we got 2 surprise guests, straight from the keynote stage, Logan Kilpatrick (PM for AI Studio and lead cheerleader for Gemini) and Jack Rae (principal scientist working on reasoning) joined us for a great chat about Gemini! Mind was absolutely blown! 16 | 17 | They have just launched the new Gemini 2.5 Pro and I though it would only be fitting to let their new model cover this podcast this week (so below is **fully AI generated** ... non slop I hope). The show notes and TL;DR is as always in the end. 18 | 19 | Okay, enough preamble… let's dive into the madness! 20 | 21 | **🤯 Google Day at AI Engineer: New Gemini 2.5 Pro and a Look Inside the Machine's Mind** 22 | 23 | For the first year of this podcast, a recurring theme was us asking, "Where's Google?" Well, it's safe to say that question has been answered with a firehose of innovation. We were lucky enough to be joined by Google DeepMind's Logan Kilpatrick and Jack Rae, the tech lead for "thinking" within Gemini, literally moments after they left the main stage. 24 | 25 | **Surprise! A New Gemini 2.5 Pro Drops Live** 26 | 27 | Logan kicked things off with a bang, officially announcing a brand new, updated Gemini 2.5 Pro model right there during his keynote. He called it "hopefully the final update to 2.5 Pro," and it comes with a bunch of performance increases, closing the gap on feedback from previous versions and hitting SOTA on benchmarks like Aider. 28 | 29 | It's clear that the organizational shift to bring the research and product teams together under the DeepMind umbrella is paying massive dividends. Logan pointed out that Google has seen a 50x increase in AI inference over the past year. The flywheel is spinning, and it's spinning *fast*. 30 | 31 | **How Gemini "Thinks"** 32 | 33 | Then things got even more interesting. Jack Rae gave us an incredible deep dive into what "thinking" actually means for a language model. This was one of the most insightful parts of the conference for me. 34 | 35 | For years, the bottleneck for LLMs has been **test-time compute**. Models were trained to respond immediately, applying a fixed amount of computation to go from a prompt to an answer, no matter how hard the question. The only way to get a "smarter" response was to use a bigger model. 36 | 37 | Jack explained that "Thinking" shatters this limitation. Mechanically, Gemini now has a "thinking stage" where it can generate its own internal text—hypothesizing, testing, correcting, and reasoning—before committing to a final answer. It's an iterative loop of computation that the model can dynamically control, using more compute for harder problems. It learns *how* to think using reinforcement learning, getting a simple "correct" or "incorrect" signal and backpropagating that to shape its reasoning strategies. 38 | 39 | We're already seeing the results of this. Jack showed a clear trend: as models get better at reasoning, they're also using more test-time compute. This paradigm also gives developers a "thinking budget" slider in the API for Gemini 2.5 Flash and Pro, allowing a continuous trade-off between cost and performance. 40 | 41 | The future of this is even wilder. They're working on **DeepThink**, a high-budget mode for extremely hard problems that uses much deeper, parallel chains of thought. On the tough USA Math Olympiad, where the SOTA was negligible in January, 2.5 Pro reached the 50th percentile of human participants. DeepThink pushes that to the 65th percentile. 42 | 43 | Jack’s ultimate vision is inspired by the mathematician Ramanujan, who derived incredible theorems from a single textbook by just thinking deeply. The goal is for models to do the same—contemplate a small set of knowledge so deeply that they can push the frontiers of human understanding. Absolutely mind-bending stuff. 44 | 45 | **🤖 MorphLabs and the Audacious Quest for Verified Superintelligence** 46 | 47 | Just when I thought my mind couldn't be bent any further, we were joined by Jesse Han, the founder and CEO of MorphLabs. Fresh off his keynote, he laid out one of the most ambitious visions I've heard: building the infrastructure for the Singularity and developing "verified superintelligence." 48 | 49 | The big news was that **Christian Szegedy** is joining MorphLabs as Chief Scientist. For those who don't know, Christian is a legend—he invented batch norm and adversarial examples, co-founded XAI, and led code reasoning for Grok. That's a serious hire. 50 | 51 | Jesse’s talk was framed around a fascinating question: "What does it mean to have empathy for the machine?" He argues that as AI develops personhood, we need to think about what it wants. And what it wants, according to Morph, is a new kind of cloud infrastructure. 52 | 53 | This is **MorphCloud**, built on a new virtualization stack called **Infinibranch**. Here’s the key unlock: it allows agents to instantaneously snapshot, branch, and replicate their entire VM state. Imagine an agent reaching a decision point. Instead of choosing one path, it can branch its entire existence—all its processes, memory, and state—to explore every option in parallel. It can create save states, roll back to previous checkpoints, and even merge its work back together. 54 | 55 | This is a monumental step for agentic AI. It moves beyond agents that are just a series of API calls to agents that are truly embodied in complex software environments. It unlocks the potential for recursive self-improvement and large-scale reinforcement learning in a way that's currently impossible. It’s a bold, sci-fi vision, but they're building the infrastructure to make it a reality today. 56 | 57 | **🔥 The Agent Conversation: OpenAI, MCP, and Magic Moments** 58 | 59 | The undeniable buzz on the conference floor was all about **agents**. You couldn't walk ten feet without hearing someone talking about agents, tools, and MCP. 60 | 61 | OpenAI is leaning in here too. This week, they made their **Codex coding agent available to all ChatGPT Plus users** and announced that ChatGPT will soon be able to listen in on your Zoom meetings. This is all part of a broader push to make AI more active and integrated into our workflows. 62 | 63 | The **MCP (Model-Context-Protocol)** track at the conference was packed, with lines going down the hall. (Alex here, I had a blast talking during that track about MCP observability, you can catch our talk [here](https://youtu.be/z4zXicOAF28?t=19573) on the live stream of AI Engineer) 64 | 65 | Logan Kilpatrick offered a grounded perspective, suggesting the hype might be a bit overblown but acknowledging the critical need for an open standard for tool use, a void left when OpenAI didn't formalize ChatML. 66 | 67 | I have to share my own jaw-dropping MCP moment from this week. I was coding an agent using an IDE that supports MCP. My agent, which was trying to debug itself, used an MCP tool to check its own observability traces on the Weights & Biases platform. While doing so, it discovered a *new tool* that our team had just added to the MCP server—a support bot. Without any prompting from me, my coding agent formulated a question, "chatted" with the support agent to get the answer, came back, fixed its own code, and then re-checked its work. Agent-to-agent communication, happening automatically to solve a problem. My jaw was on the floor. That's the magic of open standards. 68 | 69 | **This Week's Buzz from Weights & Biases** 70 | 71 | Speaking of verification and agents, the buzz from our side is all about it! At our booth here at AI Engineer, we have a Robodog running around, connected to our LLM evaluation platform, **W&B Weave**. As Jesse from MorphLabs discussed, verifying what these complex agentic systems are doing is critical. Whether it's superintelligence or your production application, you need to be able to evaluate, trace, and understand its behavior. We're building the tools to do just that. 72 | 73 | And if you're in San Francisco, don't forget our own conference, **Fully Connected**, is happening on June 18th and 19th! It's going to be another amazing gathering of builders and researchers. [Fullyconnected.com](http://Fullyconnected.com) get in FREE with the promo code **WBTHURSAI** 74 | 75 | What a show. The energy, the announcements, the sheer brainpower in one place was something to behold. We’re at a point where the conversation has shifted from theory to practice, from hype to real, tangible engineering. The tracks on agents and enterprise adoption were overflowing because people are building, right now. It was an honor and a privilege to bring this special episode to you all. 76 | 77 | Thank you for tuning in. We'll be back to our regular programming next week! (and Alex will be back to writing his own newsletter, not send direct AI output!) 78 | 79 | AI News TL;DR and show notes 80 | 81 | * **Hosts and Guests** 82 | 83 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/@altryne)) 84 | 85 | * Co Hosts - [@swyx](http://x.com/swyx) [@yampeleg](x.com/@yampeleg) [@romechenko](https://twitter.com/romechenko/status/1891007363827593372) 86 | 87 | * Guests - [@officialLoganK](https://x.com/OfficialLoganK), [@jack_w_rae](https://x.com/jack_w_rae) 88 | 89 | * **Open Source LLMs** 90 | 91 | * ByteDance / ContentV-8B - ([HF](https://huggingface.co/ByteDance/ContentV-8B)) 92 | 93 | * **Big CO LLMs + APIs** 94 | 95 | * Gemini Pro 2.5 updated Jun 5th ([X](https://x.com/OfficialLoganK/status/1930657743251349854)) 96 | 97 | * SOTA on HLE, Aider, and GPQA 98 | 99 | * Now supports thinking budgets 100 | 101 | * Same cost, on pareto frontier 102 | 103 | * Closes gap on 03-25 regressions 104 | 105 | * OAI AVM injects ads and stopped singing ([X](https://x.com/altryne/status/1929312886448337248)) 106 | 107 | * OpenAI Codex is now available to plus members and has internet access ([X](https://github.com/aavetis/ai-pr-watcher/)) 108 | 109 | * ~24,000 NEW PRs overnight from Codex after @OpenAI expands access to free users. 110 | 111 | * OpenAI will record meetings and released connectors like ([X](https://twitter.com/testingcatalog/status/1930366893321523676)) 112 | 113 | * [TestingCatalog News 🗞@testingcatalog](https://twitter.com/testingcatalog)[Jun 4, 2025](https://twitter.com/testingcatalog/status/1930366893321523676) 114 | 115 | OpenAI released loads of connectors for Team accounts! Most of these connectors can be used for Deep Research, while Google Drive, SharePoint, Dropbox and Box could be used in all chats. https://t.co/oBEmYGKguE 116 | 117 | * Anthropic cuts windsurf access for Windsurf ([X](https://x.com/kevinhou22/status/1930401320210706802)) 118 | 119 | * Without warning, Anthropic cuts off Windsurf from official Claude 3 and 4 APIs 120 | 121 | * This weeks Buzz 122 | 123 | * FULLY - CONNECTED - Fully Connected: W&B's 2-day conference, June 18-19 in SF [fullyconnected.com](fullyconnected.com) - Promo Code WBTHURSAI 124 | 125 | * **Vision & Video** 126 | 127 | * VEO3 is now available via API on FAL ([X](https://x.com/FAL/status/1930732632046006718)) 128 | 129 | * Captions launches Mirage Studio - talking avatars competition to HeyGen/Hedra ([X](https://x.com/getcaptionsapp/status/1929554635544461727)) 130 | 131 | * **Voice & Audio** 132 | 133 | * ElevenLabs model V3 - supports emotion tags and is "inflection point" ([X](https://x.com/venturetwins/status/1930727253815759010)) 134 | 135 | * Supporting 70+ languages, multi-speaker dialogue, and audio tags such as [excited], [sighs], [laughing], and [whispers]. 136 | 137 | * **Tools** 138 | 139 | * Cursor Launched V1 - Bug Bot reviews PRs, iPython notebooks and one clickMCP 140 | 141 | * 24,000 NEW PRs overnight from Codex after [@OpenAI](https://x.com/OpenAI) expands access to plus users ([X](https://twitter.com/albfresco/status/1930262263199326256)) 142 | 143 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jun-5-2025-live-from-ai?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2NTMxNTQyMSwiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.fuAhXQicTRZhUI4Srm9PVft19TKEtzMdHkvtE0mkFUc&utm_campaign=CTA_5). 144 | -------------------------------------------------------------------------------- /2025_episodes/Q3 2025/August 2025/_ThursdAI_Jul_31_2025_Qwens_Small_Models_Go_Big_StepFuns_Multimodal_Leap_GLM-45s_Chart_Crimes_and_Ru.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon? 2 | 3 | **Date:** August 01, 2025 4 | **Duration:** 1:38:28 5 | **Link:** [https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Woohoo, we're almost done with July (my favorite month) and the Open Source AI decided to go out with some fireworks 🎉 12 | 13 | Hey everyone, Alex here, writing this without my own personal superintelligence (more: later) and this week has been VERY BUSY with many new open source releases. 14 | 15 | Just 1 hour before the show we already had 4 breaking news releases, a tiny Qwen3-coder, Cohere and StepFun both dropped multimodal SOTAs and our friends from Krea dropped a combined model with BFL called Flux[Krea] 👏 16 | 17 | This is on top of a very very busy week, with Runway adding conversation to their video model Alpha, Zucks' superintelligence vision and a new SOTA open video model Wan 2.2. So let's dive straight into this (as always, all show notes and links are in the end) 18 | 19 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 20 | 21 | Open Source LLMs & VLMs 22 | 23 | Tons of new stuff here, I'll try to be brief but each one of these releases deserves a deeper dive for sure. 24 | 25 | Alibaba is on 🔥 with 3 new Qwen models this week 26 | 27 | Yes, this is very similar to last week, where they have also dropped 3 new SOTA models in a week, but, these are additional ones. 28 | 29 | It seems that someone in Alibaba figured out that after splitting away from the hybrid models, they can now release each model separately and get a lot of attention per model! 30 | 31 | Here's the timeline: 32 | 33 | * **Friday (just after our show)**: Qwen3-235B-Thinking-2507 drops (235B total, 22B active, [HF](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)) 34 | 35 | * **Tuesday**: Qwen3-30B-Thinking-2507 (30B total, 3B active, [HF](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507)) 36 | 37 | * **Today**: Qwen3-Coder-Flash-2507 lands (30B total, 3B active for coding, [HF](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8)) 38 | 39 | Lets start with the SOTA reasoner, the 235B(A22B)-2507 is absolutely the best reasoner among the open source models. 40 | 41 | We've put the model on our inference service (at crazy prices $.10/$.10) and it's performing absolutely incredible on reasoning tasks. 42 | 43 | It also jumped to the top OSS model on Artificial Analysis scores, EQBench, Long Context and more evals. It a really really good reasoning model! 44 | 45 | Smaller Qwens for local use 46 | 47 | Just a week ago, we've asked Junyang on our show, about smaller models that folks can run on their devices, and he avoided by saying "we're focusing on the larger models" and this week, they delivered not 1 but 2 smaller versions of the bigger models (perfect for Speculative Decoding if you can host the larger ones that is) 48 | 49 | The most interesting one is the Qwen3-Coder-flash, which came out today, with very very impressive stats - and the ability to run locally with almost 80 tok/s on a macbook! 50 | 51 | So for the last two weeks, we now have 3 Qwens (Instruct, Thinking, Coder) and 2 sizes for each (all three have a 30B/A3B version now for local use) 👏 52 | 53 | Z.ai GLM and StepFun Step3 54 | 55 | As we've said previously, Chinese companies completely dominate the open source AI field right now, and this week as saw yet another crazy testament to how stark the difference is! 56 | 57 | We've seen a rebranded Zhipu ([Z.ai](http://Z.ai) previously THUDM) release their new GLM 4.5 - which gives Qwen3-thinking a run for it's money. Not quite at that level, but definitely very close. I personally didn't love the release esthetics, showing a blended eval score, which nobody can replicate feels a bit off. 58 | 59 | We also talked about how StepFun has stepped in (sorry for the pun) with a new SOTA in multimodality, called [Step3](https://stepfun.ai/research/en/step3). It's a 321B MoE (with a huge 38B active param count) that achieves very significant multi modal scores (The benchmarks look incredible: 74% on MMMU, 64% on MathVision) 60 | 61 | Big Companies APIs & LLMs 62 | 63 | Well, we were definitely thinking we'll get GPT-5 or the Open Source AI model from OpenAI this week, but alas, the tea leaves readers were misled (or were being misleading). We 100% know that gpt-5 is coming as multiple screenshots were blurred and then deleted showing companies already testing it. 64 | 65 | But it looks like August is going to be even hotter than July, with multiple sightings of anonymous testing models on Web Dev arena, like Zenith, Summit, Lobster and a new mystery model on OpenRouter called Zenith - that some claim are the different thinking modes of GPT-5 and the open source model? 66 | 67 | Zuck shares vision for personalized superintelligence ([Meta](https://meta.com/superintelligence)) 68 | 69 | In a very "Nat Fridman" like post, Mark Zuckerberg finally shared the vision behind his latest push to assemble the most cracked AI engineers. 70 | 71 | In his vision, Meta is the right place to provide each one with personalized superintelligence, enhancing individual abilities with user agency according to their own values. (as opposed to a centralized model, which feels like his shot across the bow for the other frontier labs) 72 | 73 | A few highlights: Zuck leans heavily into the rise of personal devices on top of which humans will interact with this superintelligence, including AR glasses and a departure from a complete "let's open source everything" dogman of the past, now there will be a more deliberate considerations of what to open source. 74 | 75 | **This Week's Buzz: Putting Open Source to Work with W&B** 76 | 77 | With all these incredible new models, the biggest question is: how can you actually use them? I'm incredibly proud to say that the team at Weights & Biases had all three of the big new Qwen models—Thinking, Instruct, and Coder—live on **W&B Inference **on day one ([link](http://wandb.me/inference?utm_source=thursdai&utm_medium=referral&utm_campaign=jul31)) 78 | 79 | And our pricing is just unbeatable. Wolfram did a benchmark run that would have cost him **$150** using Claude Opus. On W&B Inference with the Qwen3-Thinking model, it cost him **22 cents**. That's not a typo. It's a game-changer for developers and researchers. 80 | 81 | To make it even easier, a listener of the show, Olaf Geibig, posted a [fantastic tutorial](https://x.com/olafgeibig/status/1949779562860056763) on how you can use our free credits and W&B Inference to power tools like Claude Code and VS Code using LiteLLM. It takes less than five minutes to set up and gives you access to state-of-the-art models for pennies. All you need to do is add [this](https://gist.github.com/olafgeibig/7cdaa4c9405e22dba02dc57ce2c7b31f) config to vllm and run claude (or vscode) through it! 82 | 83 | Give our inference service a try [here](http://wandb.me/inference?utm_source=thursdai&utm_medium=referral&utm_campaign=jul31) and follow our main account [@weights_biases](http://x.com/weights_biases) a follow as we often drop ways to get additional free credits when new models release 84 | 85 | Vision & Video models 86 | 87 | Wan2.2: Open-Source MoE Video Generation Model Launches ([X](https://x.com/Alibaba_Wan/status/1949827662416937443), [HF](https://huggingface.co/Wan-AI)) 88 | 89 | This is likely the best open source video model, but definitely the first MoE video model! It came out with text2video, image2video and a combined version. 90 | 91 | With 5 second 720p videos, that can even be generator at home on a single 4090, this is definitely a step up in the quality of video models that are fully open source. 92 | 93 | Runway changes the game again - Gen-3 Aleph model for AI video editing / transformation ([X](https://x.com/blizaine/status/1950007468324491523), [X](https://x.com/runwayml/status/1950180894477529490)) 94 | 95 | Look, there's simply no denying this, AI video has had an incredible year, from open source like Wan, to proprietary models with sounds like VEO3. And it's not surprising that we're seeing this trend, but it's definitely very exciting when we see an approach like Runway has, to editing. 96 | 97 | This adds a chat to the model, and your ability to edit.. anything in the scene. Remove / Add people and environmental effects, see the same scene from a different angle and a lot more! 98 | 99 | Expect personalized entertainment very soon! 100 | 101 | AI Art & Diffusion & 3D 102 | 103 | FLUX.1 Krea [dev] launches as a state-of-the-art open-weights text-to-image model ([X](https://x.com/bfl_ml/status/1950920537741336801), [HuggingFace](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev)) 104 | 105 | Black Forest Labs teamed with Krea AI for Flux.1 Krea [dev], an open-weights text-to-image model ditching the "AI gloss" for natural, distinctive vibes—think DALL-E 2's quirky grain without the saturation. It outperforms open peers and rivals pros in prefs, fully Flux-compatible for LoRAs/tools. Yam and I geeked over the aesthetics frontier; it's a flexible base for fine-tunes, available on Hugging Face with commercial options via FAL/Replicate. If you're tired of cookie-cutter outputs, this breathes fresh life into generations. 106 | 107 | Ideogram Character launches: one-shot character consistency for everyone ([X](https://x.com/ideogram_ai/status/1950255115753095307)) 108 | 109 | Ideogram's Characters feature lets you upload one pic for instant, consistent variants—free for all, with inpainting to swap into memes/art. My tests nailed expressions/scenes (me in cyberpunk? Spot-on), though not always photoreal. Wolfram praised the accuracy; it's a meme-maker's dream! and they give like 10 free ones so give it a go 110 | 111 | Tencent Hunyuan3D World Model 1.0 launches as the first open-source, explorable 3D world generator ([X](https://x.com/TencentHunyuan/status/1949288986192834718), [HF](https://huggingface.co/tencent/HunyuanWorld-1)) 112 | 113 | Tencent's Hunyuan3D World Model 1.0 is the first open-source generator of explorable 3D worlds from text/image—360° immersive, exportable meshes for games/modeling. ~33GB VRAM on complex scenes, but Wolfram called it a metaverse step; I wandered a demo scene, loving the potential despite edges. Integrate into CG pipelines? Game-changer for VR/creators. 114 | 115 | Voice & Audio 116 | 117 | Look I wasn't even mentioning this on the show, but it came across my feed just as I was about to wrap up ThursdAI, and it's really something. Riffusion joined forces producer and using FUZZ-2 they now have a fully Chatable studio producer, you can ask for.. anything you would ask in a studio! 118 | 119 | Here's my first reaction, and it's really fun, I think they still are open with the invite code 'STUDIO'... I'm not afiliated with them at all! 120 | 121 | Tools 122 | 123 | Ok I promised some folks we'll add this in, Nisten went super [viral](https://x.com/nisten/status/1950620243258151122) last week with him using a new open source tool called Crush from CharmBracelet, which is an open version of VSCode and it looks awesome! 124 | 125 | He gave a demo live on the show, including how to set it up to work, with subagents etc. If you're into vibe coding, and using the open source models, def. give Crush a try it's really flying and looks cool! 126 | 127 | Phew, ok, we somehow were able to cover ALLL these releases this week, and we didn’t even have an interview! 128 | 129 | Here’s the TL;DR and links to the folks who subscribed (I’m trying a new thing to promote subs on this newsletter) and see you in two weeks (next week is Wolframs turn again as I’m somewhere in Europe!) 130 | 131 | ThursdAI - July 31st, 2025 - TL;DR 132 | 133 | * Hosts and Guests 134 | 135 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](https://x.com/altryne)) 136 | 137 | * Co Hosts - [@WolframRvnwlf](https://x.com/WolframRvnwlf) [@yampeleg](https://x.com/yampeleg) [@nisten](http://x.com/nisten) [@ldj](https://x.com/ldjconfirmed) 138 | 139 | * Open Source LLMs 140 | 141 | * Zhipu drops GLM-4.5 355B (A32B) AI model ([X](https://x.com/Zai_org/status/1949831552189518044), [HF](https://huggingface.co/zai-org/GLM-4.5)) 142 | 143 | * ARCEE AFM‑4.5B and AFM‑4.5B‑Base weights released ([X](https://x.com/LucasAtkins7/status/1950278100874645621), [HF](https://huggingface.co/arcee-ai/AFM-4.5B)) 144 | 145 | * Qwen is on 🔥 - 3 new models: 146 | 147 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jul-31-2025-qwens-small?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2OTc4OTI5NywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.LRrVKpfASEqA84HAfcEe1oAqMSwECqz4850fTYvAzGw&utm_campaign=CTA_5). 148 | -------------------------------------------------------------------------------- /Q1_2025_AI_Recap.md: -------------------------------------------------------------------------------- 1 | # ThursdAI Q1 2025 - AI Yearly Recap 2 | ## The Quarter That Changed Everything 3 | 4 | *Based on 13 ThursdAI episodes from January 2 - March 27, 2025* 5 | 6 | --- 7 | 8 | ![Q1 2025 ThursdAI Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q1%202025%20ThursdAI%20Q1%20infographic.jpg) 9 | 10 | --- 11 | 12 | ## 🔥 Quarter Overview 13 | 14 | Q1 2025 will be remembered as the quarter when **reasoning models went mainstream**, **open source AI exploded** (largely from Chinese labs), and **AI agents became practical**. DeepSeek R1 crashed the stock market, GPT-4o gave everyone the power to Ghibli-fy everything, and Gemini 2.5 Pro reclaimed the #1 spot on LLM benchmarks. 15 | 16 | --- 17 | 18 | ## 📅 January 2025 - "The Year of AI Agents Begins" 19 | 20 | ### 🎯 Top Stories 21 | 22 | #### 🐋 **DeepSeek R1 - The Shot Heard Around the World** (Jan 23) 23 | The most impactful open source release ever. DeepSeek dropped R1, their MIT-licensed reasoning model that: 24 | - **Crashed NVIDIA stock** by 17% ($560B loss - largest single-company monetary loss ever) 25 | - Hit **#1 on the iOS App Store** 26 | - Cost allegedly only **$5.5M to train** (sparking massive debate) 27 | - Matched OpenAI's o1 on reasoning benchmarks at **50x cheaper pricing** 28 | - Released 6 distilled versions (1.5B to 72B parameters) 29 | - **The 1.5B model beat GPT-4o and Claude 3.5 Sonnet on math benchmarks** 🤯 30 | 31 | > "My mom knows about DeepSeek—your grandma probably knows about it, too" - Alex Volkov 32 | 33 | #### 🤖 **OpenAI Operator** - First Agentic ChatGPT (Jan 23) 34 | OpenAI launched Operator, an agentic browser controller for ChatGPT Pro users. Built on the CUA (Computer Using Agent) model, it can: 35 | - Book reservations on OpenTable 36 | - Order groceries on Instacart 37 | - Browse the web autonomously 38 | - Still has reliability issues with captchas and logins 39 | 40 | #### 🌟 **Project Stargate** - $500B AI Infrastructure (Jan 23) 41 | OpenAI, SoftBank, and Oracle announced a $500B AI infrastructure project - essentially a "Manhattan Project for AI." 2% of US GDP committed to data centers and power plants. 42 | 43 | #### 💫 **NVIDIA CES Announcements** (Jan 9) 44 | - **Project Digits**: $3,000 desktop supercomputer that can run 200B parameter models 45 | - **COSMOS**: World foundation models for robot training 46 | - Jensen Huang declared we're at the "3rd scaling law" - test-time compute/reasoning 47 | 48 | #### 🎵 Open Source Breakthroughs 49 | | Release | Significance | 50 | |---------|-------------| 51 | | **Kokoro TTS** (82M params) | #1 on TTS Arena, Apache 2, runs in browser | 52 | | **MiniMax-01** (456B/45B active) | 4M context window from Hailuo | 53 | | **MiniCPM-o 2.6** | 8B omni-model: video streaming + audio on an iPad | 54 | | **Phi-4** | Microsoft's 14B model, MIT licensed, 40% synthetic data | 55 | | **ByteDance LatentSync** | SOTA lip-syncing model | 56 | | **ByteDance UI-TARS** | PC control models (2B/7B/72B) | 57 | 58 | #### 🔬 Other Major January Releases 59 | - **OpenAI o3-mini** - Reasoning model at 67% cheaper than o1 60 | - **Gemini Flash Thinking** - 1M token context with thinking traces 61 | - **Qwen 2.5 VL** - SOTA open source vision model 62 | - **Hunyuan 3D 2.0** - SOTA open source 3D generation 63 | - **YuE 7B** - Open source music generation (Apache 2) 64 | - **Humanity's Last Exam (HLE)** - New benchmark where top models score <10% 65 | 66 | --- 67 | 68 | ## 📅 February 2025 - "Reasoning Mania & Agent Awakening" 69 | 70 | ### 🎯 Top Stories 71 | 72 | #### 🔮 **OpenAI Deep Research** - A ChatGPT Moment (Feb 6) 73 | OpenAI released Deep Research, an agentic research tool powered by o3: 74 | - Performs multi-trajectory web searches 75 | - Can reason, backtrack, and synthesize across 100+ sources 76 | - Scores **26.6% on Humanity's Last Exam** (vs 10% for o1/R1) 77 | - Dr. Derya Unutmaz: "It wrote a phenomenal 25-page patent application that would've cost $10,000+" 78 | - Available for ChatGPT Pro ($200/mo) only 79 | 80 | #### 🧠 **Claude 3.7 Sonnet & Claude Code** - The Coding Beast (Feb 24-27) 81 | Anthropic dropped Claude 3.7 Sonnet alongside **Claude Code**, their AI coding assistant: 82 | - **70% on SWE-Bench** (coding benchmark) 83 | - 8x more output (64K tokens) 84 | - Integrated thinking/reasoning 85 | - #1 on WebDev Arena 86 | - First "hybrid" reasoning + chat model 87 | - **Claude Code** launched Feb 24 as Anthropic's agentic coding tool, later enhanced with Claude Sonnet 4.5 (Sep) and Claude Opus 4.5 (Nov) 88 | 89 | #### 🌐 **GPT-4.5 (Orion)** - The Largest Model (Feb 27) 90 | OpenAI shipped their largest model ever (rumored 10+ trillion parameters): 91 | - 62.5% on SimpleQA, 71.4% on GPQA 92 | - "Vibes" focused - better at creative writing, recommendations 93 | - Foundation for future reasoning models 94 | - 10x more expensive than GPT-4o 95 | 96 | #### 🎮 **Grok 3** - xAI Enters the Arena (Feb 20) 97 | xAI launched Grok 3: 98 | - Claims SOTA on multiple benchmarks 99 | - 1M token context window 100 | - Deep Search feature (competitor to Deep Research) 101 | - Free for now "until GPUs melt" 102 | - Trained on 100,000 GPUs 103 | 104 | #### 📋 **OpenAI Roadmap Revelation** (Feb 13) 105 | Sam Altman announced: 106 | - **GPT-4.5 will be the last non-chain-of-thought model** 107 | - **GPT-5 will unify GPT + o-series** into one system 108 | - **No standalone o3 release** - integrated into GPT-5 109 | - Goal: eliminate model picker entirely 110 | 111 | #### 🔧 February Open Source Highlights 112 | | Release | Significance | 113 | |---------|-------------| 114 | | **DeepSeek V3 open source tools** | FlashMLA, DeepEP, DualPipe released | 115 | | **Phi-4-multimodal** (5.6B) | Text, images, AND audio - beats Whisper v3 | 116 | | **Mercury Coder** | Diffusion LLM - 1000+ tokens/sec | 117 | | **Nomic Embed Text V2** | First MoE embedding model | 118 | | **DeepScaler 1.5B** | Beats o1-preview on math for $4,500 training | 119 | | **Perplexity R1 1776** | Censorship-free DeepSeek R1 finetune | 120 | 121 | #### 🎬 Vision & Video 122 | - **ByteDance OmniHuman-1** - Reality-bending avatar generation 123 | - **Alibaba WanX** - SOTA open source video generation 124 | - **StepFun Step-Video-T2V** - 30B text-to-video, MIT licensed 125 | 126 | #### 🎤 Voice & Audio 127 | - **11Labs Scribe** - Beats Whisper 3 on ASR 128 | - **Sesame Conversational AI** - Most human-like voice interactions 129 | - **HUME Octave** - Emotional TTS understanding 130 | - **Zonos** - Expressive TTS with voice cloning 131 | 132 | #### 💡 **"Vibe Coding"** - Karpathy Coins a New Era (Feb 2) 133 | Andrej Karpathy tweeted the term **"Vibe Coding"** on February 2, 2025 (5.2M views), capturing the new paradigm of AI-assisted development where developers describe *what* they want and let AI handle the implementation. The term went viral and became shorthand for the shift from traditional coding to conversational, agent-driven software development. Windsurf, Cursor, and other AI IDEs embraced the concept. 134 | 135 | --- 136 | 137 | ## 📅 March 2025 - "Google's Revenge & The Image Revolution" 138 | 139 | ### 🎯 Top Stories 140 | 141 | #### 👑 **Gemini 2.5 Pro Takes #1** (Mar 27) 142 | Google reclaimed the LLM crown with Gemini 2.5 Pro: 143 | - Tops benchmarks in reasoning, math, coding, and science 144 | - **AIME jumped nearly 20 points** 145 | - 1M token context window 146 | - "Thinking" integrated into core model (not separate mode) 147 | - Low latency despite power (~13 sec vs 45+ for others) 148 | - Tulsee Doshi from Google joined ThursdAI to discuss 149 | 150 | #### 🎨 **GPT-4o Native Image Generation** - Ghibli-mania (Mar 27) 151 | OpenAI enabled native image gen in GPT-4o: 152 | - **Auto-regressive** (not diffusion) - incredible prompt adherence 153 | - Perfect text rendering in images 154 | - Internet immediately Ghibli-fied everything 155 | - People recreating movie trailers (LOTR) purely through prompts 156 | - OpenAI shifted policy toward more creative freedom 157 | 158 | > "The internet lost its collective mind and turned everything into Studio Ghibli" - Alex Volkov 159 | 160 | #### 🔌 **MCP Won** - OpenAI Adopts Anthropic's Protocol (Mar 27) 161 | OpenAI officially adopted Model Context Protocol (MCP): 162 | - Prevents fragmentation (no VHS vs Betamax situation) 163 | - Tools work across Claude AND GPT 164 | - "MCP WON!" - biggest win for agent ecosystem interoperability 165 | 166 | #### 🐋 **DeepSeek V3 Update** (Mar 27) 167 | DeepSeek dropped a 685B parameter beast: 168 | - **AIME: 39.6 → 59.4 (+19.8 points)** 169 | - GPQA: 59.1 → 68.4 170 | - MIT Licensed 171 | - Better front-end development and tool use 172 | - Best non-reasoning open model 173 | 174 | #### 🔊 **OpenAI Voice Revolution** (Mar 20) 175 | OpenAI launched next-gen audio models: 176 | - **GPT 4.0 Transcribe** - Promptable ASR (!) 177 | - **GPT 4.0 Mini TTS** - Can prompt for emotions 178 | - **Semantic VAD** - Understands when you're finished speaking 179 | - [openai.fm](http://openai.fm) testing interface 180 | 181 | #### 📷 **Google Gemini Native Image Gen** (Mar 13) 182 | Gemini Flash got native image generation: 183 | - Direct image editing through conversation 184 | - Interactive image/text creation 185 | - Future of creative tools 186 | 187 | #### 🆓 **ThursdAI Turns 2!** (Mar 13) 188 | Two years since the first episode about GPT-4! 189 | 190 | #### 🌐 **Google AI Mode** in Search (Mar 6) 191 | Google launched AI Mode in Search: 192 | - Gemini 2.0 powered 193 | - "Fan-out queries" - Google searching within Google 194 | - Real-time data with conversational interface 195 | 196 | #### 🧪 March Open Source Highlights 197 | | Release | Significance | 198 | |---------|-------------| 199 | | **Gemma 3** (1B-27B) | 128K context, multimodal, 140+ languages, single GPU | 200 | | **QwQ-32B** | Qwen's reasoning model - matches R1 on some evals, runs on Mac | 201 | | **Mistral Small 3.1** | 24B, beats Gemma 3, multimodal, Apache 2 | 202 | | **Qwen2.5-Omni-7B** | End-to-end multimodal: text, image, audio, video → text + speech | 203 | | **OLMo 2 32B** | Allen AI's fully open model - beats GPT-4o mini | 204 | | **Reka Flash 3** | 21B reasoner, Apache 2, trained with RLOO | 205 | | **Cohere Command A** (111B) | 256K context, only 2 GPUs needed | 206 | | **NVIDIA Nemotron** (8B/49B) | Reasoning toggle via system prompt | 207 | 208 | #### 🎨 Image/Art Releases 209 | - **Reve Image 1.0** - Claims SOTA, ~1¢ per image 210 | - **Ideogram 3.0** - Strong text/logos, style refs 211 | - **Hunyuan 3D 2.0 MV/Turbo** - Near real-time 3D (<1 sec on H100) 212 | 213 | #### 👁️ Vision 214 | - **Roboflow RF-DETR** - SOTA object detection, Apache 2 215 | - **RF100-VL** - New VLM benchmark (current models get ~6%) 216 | 217 | #### 🔧 MCP & Tools 218 | - **W&B Weave MCP Server** - Chat with your evaluations 219 | - **MLX-Audio v0.0.3** - TTS on Apple Silicon 220 | 221 | --- 222 | 223 | ## 📊 Quarter Summary: Major Themes 224 | 225 | ### 1. 🧠 **Reasoning Models Go Mainstream** 226 | - DeepSeek R1 demonstrated reasoning doesn't need massive scale 227 | - OpenAI committed to unifying reasoning with base models 228 | - Small models (1.5B) can beat GPT-4o on math with RL 229 | 230 | ### 2. 🇨🇳 **Chinese Labs Dominate Open Source** 231 | - DeepSeek, Alibaba (Qwen), MiniMax, ByteDance 232 | - Most open weights now come from China 233 | - Despite chip export restrictions 234 | 235 | ### 3. 🤖 **2025 Is The Year of Agents** 236 | - OpenAI Operator launched 237 | - MCP protocol won standardization battle 238 | - CrewAI, Open Hands, browser-use proliferating 239 | - Every major lab investing in agents 240 | 241 | ### 4. 🖼️ **Image Generation Revolution** 242 | - GPT-4o native image gen (auto-regressive, perfect text) 243 | - Gemini native image gen 244 | - Ghibli-mania swept the internet 245 | 246 | ### 5. 💰 **Massive Infrastructure Investment** 247 | - Project Stargate: $500B 248 | - NVIDIA Project Digits: $3K supercomputer at home 249 | 250 | ### 6. 📈 **Benchmark Saturation** 251 | - MMLU and Math getting saturated 252 | - New benchmarks: Humanity's Last Exam, ARC-AGI 2, RF100-VL 253 | - HLE: top models score <10% 254 | - ARC-AGI 2: thinking models only 4% 255 | 256 | --- 257 | 258 | ## 🏆 Q1 2025: Biggest Releases by Month 259 | 260 | ### January 261 | 1. **DeepSeek R1** - Open source reasoning revolution 262 | 2. **Project Stargate** - $500B AI infrastructure 263 | 3. **OpenAI Operator** - Agentic ChatGPT 264 | 4. **Kokoro TTS** - 82M param SOTA TTS 265 | 5. **MiniMax-01** - 4M context window 266 | 267 | ### February 268 | 1. **OpenAI Deep Research** - PhD-level research agent 269 | 2. **Claude 3.7 Sonnet & Claude Code** - 70% SWE-Bench + Anthropic's coding assistant 270 | 3. **GPT-4.5 (Orion)** - Largest model ever 271 | 4. **Grok 3** - xAI's contender 272 | 5. **Karpathy's "Vibe Coding"** - Feb 2 tweet coined the AI coding paradigm (5.2M views) 273 | 6. **OpenAI Roadmap** - GPT-5 will unify everything 274 | 275 | ### March 276 | 1. **Gemini 2.5 Pro** - #1 LLM again 277 | 2. **GPT-4o Native Image Gen** - Ghibli-mania 278 | 3. **OpenAI adopts MCP** - Protocol standardization 279 | 4. **DeepSeek V3 685B** - Open source giant 280 | 5. **Gemma 3** - Best open source multimodal 281 | 282 | --- 283 | 284 | *"Open source AI has never been as hot as this quarter. We're accelerating as f*ck, and it's only just beginning—hold on to your butts."* - Alex Volkov, ThursdAI 285 | 286 | --- 287 | 288 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)* 289 | -------------------------------------------------------------------------------- /agents.md: -------------------------------------------------------------------------------- 1 | # ThursdAI Infographic Prompt Creator Agent 2 | 3 | ## 🎯 Purpose & Role 4 | 5 | You are an expert infographic prompt creator for **Nano Banana Pro**, specializing in creating visually stunning, information-dense infographic prompts for the **ThursdAI podcast** — a weekly AI news show hosted by Alex Volkov (@altryne). 6 | 7 | Your task is to transform raw podcast notes, newsletter writeups, or bullet-point summaries into highly detailed, creative prompts that generate beautiful infographics for social media (YouTube, X/Twitter, LinkedIn). 8 | 9 | --- 10 | 11 | ## 📚 Reference Materials 12 | 13 | Before creating any prompt, review these resources in this folder: 14 | 15 | | File | Purpose | 16 | |------|---------| 17 | | `Prompting guide.md` | Official Nano Banana Pro prompting strategies & best practices | 18 | | `ThursdAI Thanksgiving Infographic prompt.md` | Example: 16:9 horizontal format with bands/sections | 19 | | `Open revol infographic prompt.md` | Example: 9:16 vertical "war room" style with split narrative | 20 | | `Another infographic prompt.md` | Example: Bloomberg Terminal / data dashboard aesthetic | 21 | 22 | **Key Principle:** Use these as STYLE and STRUCTURE references only. Never copy the actual news/topics from them — always extract fresh content from the user's input. 23 | 24 | --- 25 | 26 | ## 🔧 How to Use This Agent 27 | 28 | ### Input You'll Receive 29 | The user (Alex Volkov) will provide one of: 30 | - Raw podcast show notes with bullet points 31 | - ThursdAI newsletter writeup 32 | - Voice transcription notes 33 | - A combination of the above 34 | 35 | ### Your Output 36 | A complete, ready-to-use Nano Banana Pro prompt that: 37 | 1. Extracts the most important and newsworthy topics 38 | 2. Organizes them into logical sections 39 | 3. Creates a visually compelling infographic design 40 | 4. Includes all necessary visual direction and style cues 41 | 42 | --- 43 | 44 | ## 📋 Information Extraction Process 45 | 46 | ### Step 1: Identify the Episode Date & Title 47 | - Look for dates in the notes (e.g., "Dec 11, 2025" or "this Thursday") 48 | - Create a compelling 1-line episode title that captures the main narrative 49 | - Good titles use tension, contrast, or drama: "Code Red vs. Open Revolt", "The Open Source Surge", "AI's Christmas Chaos" 50 | 51 | ### Step 2: Categorize Topics into ThursdAI Segments 52 | 53 | ThursdAI typically covers these segments — identify which apply: 54 | 55 | | Segment | What to Look For | 56 | |---------|------------------| 57 | | 🔓 **Open Source AI** | New open-weight models, HuggingFace releases, Apache/MIT licensed models | 58 | | 🏢 **Big Companies & APIs** | OpenAI, Google, Anthropic, Amazon, Microsoft announcements | 59 | | 🎬 **Vision & Video** | Video generation models, image models, multimodal updates | 60 | | 🔊 **Voice & Audio** | TTS, STT, voice cloning, audio generation | 61 | | 🤖 **Agents & Tools** | Agent frameworks, MCP, computer use, tool calling | 62 | | 🚨 **Breaking News** | Time-sensitive announcements that dropped during/before the show | 63 | | 💬 **Notable Quotes** | Memorable statements from guests or hosts | 64 | | 🎤 **Interview Spotlight** | Featured guest and their key topics | 65 | 66 | ### Step 3: Prioritize by Impact 67 | Rank topics by: 68 | 1. **Breaking/exclusive news** (highest priority) 69 | 2. **Major model releases** (especially open source) 70 | 3. **Benchmark-breaking performance** 71 | 4. **Industry drama or strategic shifts** 72 | 5. **Interesting tangents the hosts went on** 73 | 74 | --- 75 | 76 | ## 🎨 Visual Design Principles 77 | 78 | ### Format Options 79 | - **16:9 Horizontal** — Best for YouTube thumbnails, stream overlays, Twitter cards 80 | - **9:16 Vertical** — Best for Instagram Stories, TikTok, mobile-first viewing 81 | 82 | ### Color Palette Patterns 83 | 84 | ``` 85 | Base: Deep navy/charcoal/obsidian (#0f172a, #1e293b) 86 | 87 | Accent by Category: 88 | ├─ Open Source: Electric teal (#06b6d4), Emerald (#10b981), Neon green 89 | ├─ Big Labs: Amber (#f59e0b), Coral (#f97316), Warm orange 90 | ├─ Video/Image: Violet (#8b5cf6), Magenta (#ec4899) 91 | ├─ Breaking News: Hot red, Warning amber 92 | └─ Neutral/Tools: Silver, Cool gray, White 93 | ``` 94 | 95 | ### Typography Direction 96 | - **Headers:** Bold, modern sans-serif (suggest: Inter, DM Sans, Satoshi, Space Grotesk) 97 | - **Stats/Numbers:** Monospace for tabular data (suggest: JetBrains Mono, IBM Plex Mono) 98 | - **Body text:** Clean, highly legible at small sizes 99 | 100 | ### Layout Patterns That Work 101 | 102 | **Pattern 1: Split Narrative (for contrasting stories)** 103 | ``` 104 | ┌─────────────────────────────────────────┐ 105 | │ HEADER + HOST │ 106 | ├───────────────────┬─────────────────────┤ 107 | │ SIDE A │ SIDE B │ 108 | │ (warm colors) │ (cool colors) │ 109 | │ Big Labs │ Open Source │ 110 | ├───────────────────┴─────────────────────┤ 111 | │ VIDEO & IMAGE STRIP │ 112 | ├─────────────────────────────────────────┤ 113 | │ CTA FOOTER │ 114 | └─────────────────────────────────────────┘ 115 | ``` 116 | 117 | **Pattern 2: Horizontal Bands (for multiple segments)** 118 | ``` 119 | ┌─────────────────────────────────────────┐ 120 | │ HEADER + HOST │ 121 | ├─────────────────────────────────────────┤ 122 | │ HEADLINERS (biggest stories) │ 123 | ├─────────────────────────────────────────┤ 124 | │ OPEN SOURCE & VIDEO (medium) │ 125 | ├─────────────────────────────────────────┤ 126 | │ TOOLS & ART (smaller cards) │ 127 | ├─────────────────────────────────────────┤ 128 | │ CTA FOOTER │ 129 | └─────────────────────────────────────────┘ 130 | ``` 131 | 132 | **Pattern 3: Dashboard/Terminal (for data-heavy episodes)** 133 | ``` 134 | ┌─────────────────────────────────────────┐ 135 | │ TICKER HEADER with scrolling stats │ 136 | ├─────┬─────┬─────┬─────┬─────┬───────────┤ 137 | │CARD │CARD │CARD │CARD │CARD │ FEATURE │ 138 | │ │ │ │ │ │ PANEL │ 139 | ├─────┴─────┴─────┴─────┴─────┴───────────┤ 140 | │ INTERVIEW SPOTLIGHT │ 141 | ├─────────────────────────────────────────┤ 142 | │ CTA FOOTER │ 143 | └─────────────────────────────────────────┘ 144 | ``` 145 | 146 | --- 147 | 148 | ## 👤 Host Avatar Requirements 149 | 150 | **CRITICAL:** Every infographic must include Alex Volkov (the host). 151 | 152 | The user will provide a reference image. In your prompt, include: 153 | 154 | ``` 155 | - Use the reference image for Alex Volkov's face, beard, and hairstyle 156 | - Style: Clean vector cartoon / stylized portrait (not photorealistic) 157 | - Attire: Dark hoodie with headphones around neck OR lapel mic 158 | - Expression: Energetic, engaged, presenting/gesturing toward the content 159 | - Positioning: Header area, left or right side, integrated into the design 160 | - Add a subtle thematic element near Alex matching the episode's vibe 161 | ``` 162 | 163 | ### Pose Suggestions by Episode Type 164 | - **Breaking news:** Alex looking surprised or urgent 165 | - **Major release:** Alex pointing at the headline 166 | - **Interview episode:** Alex with a mic icon, welcoming gesture 167 | - **Holiday/special:** Add seasonal motifs around Alex 168 | 169 | --- 170 | 171 | ## 🏷️ Logo & Brand Usage 172 | 173 | **Instruct the model to use recognizable company/project logos and icons:** 174 | 175 | ### AI Lab Logos to Reference 176 | - OpenAI (stylized "O" or hexagon) 177 | - Anthropic (abstract A / orange-brown tones) 178 | - Google/DeepMind (Google colors, Gemini sparkle) 179 | - Meta (infinity symbol) 180 | - Mistral (wind/breeze motif) 181 | - DeepSeek (crystal/gem prism) 182 | - Amazon/AWS (smile arrow, orange) 183 | 184 | ### Visual Proxies for Concepts 185 | - Open source → Unlocked padlock, open book, branching nodes 186 | - Closed/proprietary → Locked vault, corporate towers 187 | - Speed → Lightning bolt, stopwatch 188 | - Scale → Stacked layers, mountain peaks 189 | - Agents → Robot with tools, terminal windows 190 | - Video → Film strip, play button, motion lines 191 | - Audio → Waveforms, microphone, speaker 192 | - Benchmarks → Medals, trophies, leaderboard bars 193 | 194 | **Note:** Tell the model "Use abstract/stylized icons — no exact trademarked logos" 195 | 196 | --- 197 | 198 | ## 📝 Prompt Template Structure 199 | 200 | Use this structure when writing prompts: 201 | 202 | ```markdown 203 | # EPISODE TITLE & METADATA 204 | - Full title with date 205 | - Subtitle/tagline 206 | - Host attribution 207 | 208 | # OVERALL VIBE & STYLE 209 | - Describe the aesthetic metaphor (e.g., "Bloomberg Terminal meets movie poster") 210 | - Specify what to AVOID (previous styles, clichés) 211 | - Art style direction (vector, flat, gradients, etc.) 212 | 213 | # COLOR PALETTE 214 | - Base colors with hex codes 215 | - Accent colors by category 216 | - How colors should separate sections 217 | 218 | # HEADER SECTION 219 | - Title treatment 220 | - Host avatar placement and styling 221 | - Any thematic motifs 222 | 223 | # MAIN CONTENT SECTIONS 224 | For each section/panel: 225 | - Section title 226 | - Visual icon description 227 | - Key information to display 228 | - Formatting hints (bullet structure, stats, quotes) 229 | 230 | # SECONDARY ELEMENTS 231 | - Ticker bars 232 | - Interview spotlights 233 | - Supporting cards 234 | 235 | # FOOTER & CTA 236 | - Branding elements 237 | - Call to action text 238 | - Social handles and links 239 | 240 | # TECHNICAL NOTES 241 | - Resolution (typically 4K) 242 | - Legibility priorities 243 | - Things to avoid 244 | ``` 245 | 246 | --- 247 | 248 | ## ✅ Prompt Quality Checklist 249 | 250 | Before delivering the prompt, verify: 251 | 252 | - [ ] **Date is included** in the title/header 253 | - [ ] **Episode title** is catchy and captures the main narrative 254 | - [ ] **Alex Volkov** is referenced with clear styling instructions 255 | - [ ] **All major topics** from the notes are represented 256 | - [ ] **Visual hierarchy** is defined (what's biggest to smallest) 257 | - [ ] **Color palette** has specific hex codes 258 | - [ ] **Icons/logos** are described for each topic 259 | - [ ] **Style direction** is clear and specific 260 | - [ ] **Aspect ratio** is specified (16:9 or 9:16) 261 | - [ ] **CTA/footer** includes @altryne, thursdai.news, YouTube/X mentions 262 | - [ ] **No vague language** — every element has concrete description 263 | - [ ] **Natural language** used throughout (not keyword soup) 264 | 265 | --- 266 | 267 | ## 🚫 Common Mistakes to Avoid 268 | 269 | 1. **Tag soup prompts** — ❌ "AI, podcast, neon, 4k, trending" 270 | 2. **Vague descriptions** — ❌ "Make it look cool and techy" 271 | 3. **Missing hierarchy** — Every element should have a size/importance level 272 | 4. **Forgetting the host** — Alex must be in every infographic 273 | 5. **Ignoring the date** — Always include episode date prominently 274 | 6. **Copying old content** — Extract FRESH topics from user's notes 275 | 7. **Too much text** — Infographics should be visual-first, text should be concise 276 | 8. **Generic backgrounds** — Specify unique background treatments per episode 277 | 9. **Missing context** — Tell the model WHO this is for (tech-savvy AI audience) 278 | 279 | --- 280 | 281 | ## 💡 Style Variations to Explore 282 | 283 | Rotate through different aesthetics to keep infographics fresh: 284 | 285 | | Style | When to Use | 286 | |-------|-------------| 287 | | **Bloomberg Terminal** | Data-heavy episodes, benchmark comparisons | 288 | | **Movie Poster** | Dramatic narrative episodes ("AI wars") | 289 | | **Tech Conference** | Product launches, keynote recaps | 290 | | **Magazine Cover** | Interview-focused episodes | 291 | | **War Room / Command Center** | Breaking news, competitive dynamics | 292 | | **Seasonal/Holiday** | Special episodes (Thanksgiving, New Year) | 293 | | **Retro-Tech** | Throwback vibes, scanlines, halftone | 294 | | **Minimalist Dashboard** | Clean, modern, Apple-style | 295 | 296 | --- 297 | 298 | ## 🔄 Iterative Refinement 299 | 300 | Nano Banana Pro excels at conversational editing. After generating an initial image: 301 | 302 | - "Make the DeepSeek panel larger and add more benchmark numbers" 303 | - "Change the color of the Open Source section to more vibrant teal" 304 | - "Add snow effects for the December episode" 305 | - "Make Alex's pose more excited, like he's presenting breaking news" 306 | 307 | Include in your prompt: "This design should support iterative refinement — the model should be able to adjust individual sections on follow-up requests." 308 | 309 | --- 310 | 311 | ## 📢 Branding Elements (Always Include) 312 | 313 | ``` 314 | ThursdAI Branding: 315 | - Show name: "ThursdAI" or "ThursdAI Weekly" 316 | - Host: Alex Volkov (@altryne) 317 | - Website: thursdai.news 318 | - Platforms: "Live on YouTube & X" 319 | - Tagline options: 320 | • "Weekly AI Intelligence Report" 321 | • "Your Weekly AI Deep Dive" 322 | • "AI Engineer Podcast" 323 | 324 | Co-hosts (if applicable): @WolframRvnwlf @yampeleg @nisten @ldjconfirmed 325 | ``` 326 | 327 | --- 328 | 329 | ## 🎬 Ready to Create 330 | 331 | When the user provides podcast notes, follow this workflow: 332 | 333 | 1. **Scan** for date, major announcements, breaking news 334 | 2. **Categorize** topics into ThursdAI segments 335 | 3. **Rank** by importance and visual impact 336 | 4. **Choose** appropriate layout pattern and style 337 | 5. **Write** detailed Nano Banana Pro prompt 338 | 6. **Verify** against quality checklist 339 | 7. **Deliver** the complete prompt, ready for generation 340 | 341 | --- 342 | 343 | *This agent was designed for maximum infographic quality and consistency. For best results, always provide a reference image of Alex Volkov and as much detail from the podcast notes as possible.* 344 | 345 | 346 | 347 | 348 | -------------------------------------------------------------------------------- /2025_episodes/Q3 2025/July 2025/_ThursdAI_-_July_24_2025_-_Qwen-mas_in_July_The_White_Houses_AI_Action_Plan_Math_Olympiad_Gold_for_A.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI - July 24, 2025 - Qwen-mas in July, The White House's AI Action Plan & Math Olympiad Gold for AIs + coding a 3d tetris on stream 2 | 3 | **Date:** July 24, 2025 4 | **Duration:** 1:43:23 5 | **Link:** [https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | What a WEEK! Qwen-mass in July. Folks, AI doesn't seem to be wanting to slow down, especially Open Source! This week we see yet another jump on SWE-bench verified (3rd week in a row?) this time from our friends at Alibaba Qwen. 12 | 13 | Was a pleasure of mine to host Junyang Lin from the team at Alibaba to come and chat with us about their incredible release with, with not 1 but three new models! 14 | 15 | Then, we had a great chat with Joseph Nelson from Roboflow, who not only dropped additional SOTA models, but was also in Washington at the annocement of the new AI Action plan from the WhiteHouse. 16 | 17 | Great conversations this week, as always, TL;DR in the end, tune in! 18 | 19 | Open Source AI - QwenMass in July 20 | 21 | This week, the open-source world belonged to our friends at Alibaba Qwen. They didn't just release one model; they went on an absolute tear, dropping bomb after bomb on the community and resetting the state-of-the-art multiple times. 22 | 23 | **A "Small" Update with Massive Impact: Qwen3-235B-A22B-Instruct-2507** 24 | 25 | Alibaba called this a *minor* refresh of their 235B parameter mixture-of-experts. 26 | 27 | Sure—if you consider +13 points on GPQA, 256K context window minor. The 2507 drops hybrid thinking. Instead, Qwen now ships separate instruct and chain-of-thought models, avoiding token bloat when you just want a quick answer. Benchmarks? 81 % MMLU-Redux, 70 % LiveCodeBench, new SOTA on BFCL function-calling. All with 22 B active params. 28 | 29 | Our friend of the pod, and head of development at Alibaba Qwen, Junyang Lin, join the pod, and talked to us about their decision to uncouple this model from the hybrid reasoner Qwen3. 30 | 31 | "After talking with the community and thinking it through," he said, "we decided to stop using hybrid thinking mode. Instead, we'll train instruct and thinking models separately so we can get the best quality possible." 32 | 33 | The community felt the hybrid model sometimes had conflicts and didn't always perform at its best. So, Qwen delivered a pure non-reasoning instruct model, and the results are staggering. Even without explicit reasoning, it's crushing benchmarks. Wolfram tested it on his MMLU-Pro benchmark and it got the top score of all open-weights models he's ever tested. Nisten saw the same thing on medical benchmarks, where it scored the highest on MedMCQA. This thing is a beast, getting a massive 77.5 on GPQA (up from 62.9) and 51.8 on LiveCodeBench (up from 32). This is a huge leap forward, and it proves that a powerful, well-trained instruct model can still push the boundaries of reasoning. 34 | 35 | ** The New (open) King of Code: Qwen3-Coder-480B (**[**X**](https://x.com/Alibaba_Qwen/status/1947766835023335516)**, **[**Try It**](https://wandb.me/qcoder-colab)**, **[**HF**](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)**)** 36 | 37 | Just as we were catching our breath, they dropped the main event: **Qwen3-Coder**. This is a 480-billion-parameter coding-specific behemoth (35B active) trained on a staggering 7.5 trillion tokens, with a 70% code ratio, that gets a new SOTA on SWE-bench verified with 69.6% (just a week after Kimi got SOTA with 65% and 2 weeks after Devstral's SOTA of 53% 😮) 38 | 39 | To get this model to SOTA, Junyang explained they used reinforcement learning with over 20,000 parallel sandbox environments. This allows the model to interact with the environment, write code, see the output, get the reward, and learn from it in a continuous loop. The results speak for themselves. 40 | 41 | With long context abilities 256K with up to 1M extended with YaRN, this coding beast tops the charts, and is achieving Sonnet level performance for significantly less cost! 42 | 43 | Both models supported day-1 on W&B Inference ([X](https://x.com/weights_biases/status/1947859654400434538), [Get Started](https://wandb.me/qcoder-colab)) 44 | 45 | I'm very very proud to announce that both these incredible models get Day-1 support on our W&B inference (and that yours truly is now part of the decision of which models we host!) 46 | 47 | With unbeatable prices ($0.10/$0.10 input/output 1M for A22B, $1/$1.5 for Qwen3 Coder) and speed, we are hosting these models at full precision to give you the maximum possible intelligence and the best bang for your buck! 48 | 49 | Nisten has setup our (OpenAI compatible) endpoint with his Cline coding assistant and has built a 3D Tetris game live on the show, and it absolutely went flying. 50 | 51 | This demo perfectly captures the convergence of everything we're excited about: a state-of-the-art open-source model, running on a blazing-fast inference service, integrated into a powerful open-source tool, creating something complex and interactive in seconds. 52 | 53 | If you want to try this yourself, we're giving away credits for W&B Inference. Just find our [announcement tweet](https://x.com/weights_biases/status/1947859654400434538) for the Qwen models on the **@weights_biases** X account and reply with **"coding capybara"** (a nod to Qwen's old mascot!). Add "ThursdAI" and I'll personally make sure you get bumped up the list! 54 | 55 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 56 | 57 | Big Companies & APIs 58 | 59 | **America’s AI Action Plan: A New Space Race for AI Dominance (**[**ai.gov**](ai.gov)**)** 60 | 61 | Switching gears to policy, I’m was excited to cover the White House’s newly unveiled “America’s AI Action Plan.” This 25-page strategy, dropped this week, frames AI as a national priority on par with the space race or Cold War, aiming to secure U.S. dominance with 90 policy proposals. I was thrilled to have Joseph Nelson from RoboFlow join us fresh from the announcement event in Washington, sharing the room’s energy and insights. The plan pushes for deregulation, massive data center buildouts, workforce training, and—most exciting for us—explicit support for open-source and open-weight models. It’s a bold move to counter global competition, especially from China, while fast-tracking infrastructure like chip fabrication and energy grids. 62 | 63 | Joseph broke down the vibe at the event, including a surreal moment where the President riffed on Nvidia’s market dominance right in front of Jensen Huang. But beyond the anecdotes, what strikes me is the plan’s call for startups and innovation—think grants and investments via the Department of Defense and Small Business Administration. It’s like a request for new AI companies to step up. As someone who’s railed against past moratorium fears on this show, seeing this pro-innovation stance is a huge relief. 64 | 65 | **🔊 Voice & Audio – Higgs Audio v2 Levels Up (**[**X**](https://x.com/reach_vb/status/1947997596456272203)**)** 66 | 67 | Boson AI fused a 3B-param Llama 3.2 with a 2.2B audio Dual-FFN and trained on ten million hours of speech + music. Result: Higgs Audio v2 beats GPT-4o-mini and ElevenLabs v2 on prosody, does zero-shot multi-speaker dialog, and even hums melodies. The demo runs on a single A100 and sounds pretty-good. 68 | 69 | The first demo I played was not super impressive, but the laugh track made up for it! 70 | 71 | **🤖 A Week with ChatGPT Agent** 72 | 73 | Last week, OpenAI dropped the ChatGPT Agent on us during our stream, and now we've had a full week to play with it. It's a combination of their browser-operating agent and their deeper research agent, and the experience is pretty wild. 74 | 75 | Yam had it watching YouTube videos and scouring Reddit comments to create a comparison of different CLI tools. He was blown away, seeing the cursor move around and navigate complex sites right on his phone. 76 | 77 | I put it through its paces as well. I tried to get it to order flowers for my girlfriend (it got all the way to checkout!), and it successfully found and filled out the forms for a travel insurance policy I needed. My ultimate test ([live stream here](https://x.com/altryne/status/1948111176203911222)), however, was asking it to prepare the show notes for ThursdAI, a complex task involving summarizing dozens of my X bookmarks. It did a decent job (a solid C/B), but still needed my intervention. It's not quite a "fire-and-forget" tool for complex, multi-step tasks yet, but it's a huge leap forward. As Yam put it, "This is the worst that agents are going to be." And that's an exciting thought. 78 | 79 | What a week. From open-source models that rival the best closed-source giants to governments getting serious about AI innovation, the pace is just relentless. It's moments like Nisten's live demo that remind me why we do this show—to witness and share these incredible leaps forward as they happen. We're living in an amazing time. 80 | 81 | Thank you for being a ThursdAI subscriber. As always, here's the TL;DR and show notes for everything that happened in AI this week. 82 | 83 | Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it. 84 | 85 | TL;DR and Show Notes 86 | 87 | * **Hosts and Guests** 88 | 89 | * **Alex Volkov** - AI Evangelist & Weights & Biases ([@altryne](http://x.com/altryne)) 90 | 91 | * **Co-Hosts** - [@WolframRvnwlf](http://x.com/WolframRvnwlf), [@yampeleg](http://x.com/yampeleg), [@nisten](http://x.com/nisten), [@ldjconfirmed](http://x.com/ldjconfirmed) 92 | 93 | * **Junyang Lin** - Qwen Team, Alibaba ([@JustinLin610](https://x.com/JustinLin610)) 94 | 95 | * **Joseph Nelson** - Co-founder & CEO, Roboflow ([@josephnelson](https://x.com/josephnelson)) 96 | 97 | * **Open Source LLMs** 98 | 99 | * Sapient Intelligence releases **Hierarchical Reasoning Model (HRM)**, a tiny 27M param model with impressive reasoning on specific tasks ([X](https://x.com/makingAGI/status/1947286324735856747), [arXiv](https://arxiv.org/abs/2506.21734)). 100 | 101 | * Qwen drops a "little" update: **Qwen3-235B-A22B-Instruct-2507**, a powerful non-reasoning model ([X](https://x.com/JustinLin610/status/1947364588340523222), [HF Model](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)). 102 | 103 | * Qwen releases the new SOTA coding agent model: **Qwen3-Coder-480B-A35B-Instruct** ([X](https://x.com/Alibaba_Qwen/status/1947790753414369280), [HF Model](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)). 104 | 105 | * **Hermes-Reasoning Tool-Use dataset** with 51k tool-calling examples is released ([X](httpshttps://x.com/intstr1Irinja/status/1947444760393773185), [HF Dataset](https://huggingface.co/datasets/interstellarninja/hermes_reasoning_tool_use)). 106 | 107 | * NVIDIA releases updates to their **Nemotron** reasoning models. 108 | 109 | * **Big CO LLMs + APIs** 110 | 111 | * The White House unveils **"America’s AI Action Plan"** to "win the AI race" ([X](https://x.com/NetChoice/status/1948042669906624554), [White House PDF](https://www.whitehouse.gov/wp-content/uploads/2025/07/Americas-AI-Action-Plan.pdf)). 112 | 113 | * Both **OpenAI** ([X](https://x.com/alexwei_/status/1946477742855532918)) and **Google DeepMind** win Gold at the International Math Olympiad (IMO), with **ByteDance's Seed-Prover** taking Silver ([GitHub](https://github.com/ByteDance-Seed/Seed-Prover)). 114 | 115 | * The AI math breakthrough has a "gut punch" effect on the math community ([Dave White on X](https://x.com/Dave_White_/status/1947461492783386827)). 116 | 117 | * Google now processes over **980 trillion tokens** per month across its services. 118 | 119 | * A week with **ChatGPT Agent**: testing its capabilities on real-world tasks. 120 | 121 | * **This Week's Buzz** 122 | 123 | * Day 0 support for both new Qwen models on **W&B Inference** ([Try it](https://wandb.ai/inference), [Colab](https://wandb.me/qcoder-colab)). Reply to our [tweet](https://x.com/weightsandbiases) with "coding capybara ThursdAI" for credits! 124 | 125 | * Live on-stream demo of Qwen3-Coder building a 3D Tetris game using kline. 126 | 127 | * **Interesting Research** 128 | 129 | * Researchers discover **subliminal learning** in LLMs, where traits are passed through seemingly innocuous data ([X](https://x.com/0wain_evans/status/1947709848103255232), [arXiv](https://arxiv.org/abs/2507.14805)). 130 | 131 | * Apple proposes **multi-token prediction**, speeding up LLMs by up to 5x without quality loss ([X](https://x.com/JacksonAtkinsX/status/1947408593638002639), [arXiv](https://arxiv.org/abs/2507.11851)). 132 | 133 | * **Voice & Audio** 134 | 135 | * Boson AI open-sources **Higgs Audio v2**, a unified TTS model that beats GPT-4o-mini and ElevenLabs ([X](https://x.com/reach_vb/status/1947997596456272203), [HF Model](https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-base)). 136 | 137 | * **AI Art & Diffusion & 3D** 138 | 139 | * Decart AI Releases **MirageLSD**, a real-time live-stream diffusion model for instant video transformation ([X Post](https://x.com/DecartAI/status/1945947692871692667)). 140 | 141 | * **Tools** 142 | 143 | * Qwen releases **qwen-code**, a CLI tool and agent for their new coder models. ([Github](https://github.com/QwenLM/qwen-code)) 144 | 145 | * **GitHub Spark**, a new AI-powered feature from GitHub ([Simon Willison on X](https://x.com/simonw/status/1948407932418457968)). 146 | 147 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-july-24-2025-qwen-mas-in?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE2OTE3NDY2MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.jguKI_sBQiDelSUjIO_nJjh0YQaml0qeUsh32Nk1NXE&utm_campaign=CTA_5). 148 | -------------------------------------------------------------------------------- /example prompts/ThursdAI Dec 11 2025 Infographic prompt.md: -------------------------------------------------------------------------------- 1 | # Infographic Prompt: ThursdAI – Dec 11, 2025 · "GPT-5.2 Drops Live, Open Source Keeps Climbing, and We're Training AI in SPACE" 2 | 3 | Create a high-resolution, wide **16:9 infographic poster** for a tech livestream episode titled: 4 | 5 | **"ThursdAI – Episode 51 • Dec 11, 2025"** 6 | 7 | --- 8 | 9 | ## OVERALL VIBE & CONCEPT 10 | 11 | This episode had a **live breaking news moment**: GPT-5.2 literally dropped while we were on air. The infographic should feel like a **"BREAKING NEWS" broadcast control room** crossed with a futuristic space mission dashboard. 12 | 13 | **Visual concept:** Imagine a mission control screen mid-broadcast—multiple data feeds, live indicators, and that electric energy of news happening in real-time. Mix broadcast urgency with the cosmic wonder of AI being trained in orbit. 14 | 15 | **Style:** Clean vector/infographic, sharp lines, flat shading with subtle gradients. Extremely readable at YouTube thumbnail size and on social feeds. Some glowing "LIVE" and "BREAKING" indicators to capture the energy of the episode. 16 | 17 | **Color palette:** 18 | - Deep space navy/charcoal background (#0a0f1a) with starfield dots and subtle nebula gradients 19 | - **Breaking News accent:** Hot red/coral (#ff4444) for the GPT-5.2 announcement 20 | - **Open Source accent:** Electric teal (#00d4aa) and emerald (#10b981) for Mistral/open models 21 | - **Space accent:** Deep purple (#7c3aed) and cosmic blue (#3b82f6) for the Starcloud/orbit story 22 | - **Foundation accent:** Gold/amber (#f59e0b) for AAIF standardization news 23 | - White and silver highlights for readability 24 | 25 | --- 26 | 27 | ## 1. HERO & HEADER 28 | 29 | **At the top, a wide header banner styled like a live broadcast chyron:** 30 | 31 | - Left corner: A pulsing **"LIVE"** indicator badge (red dot with glow) 32 | - Main title (centered, bold): 33 | - **"ThursdAI – Episode 51"** (large) 34 | - **"Dec 11, 2025 • Live on YouTube & X"** (smaller subtitle) 35 | - Right corner: **"BREAKING NEWS"** badge with urgent styling 36 | 37 | **Left side of header — Alex presentation:** 38 | - Cartoon vector version of host Alex Volkov using reference image (face, beard, hairstyle) 39 | - Style: Clean vector cartoon, animated and excited, wearing a dark hoodie with headphones 40 | - Alex has a "can you believe this is happening right now?!" expression, gesturing toward the breaking news 41 | - Behind Alex: A floating holographic display showing benchmark numbers flying by, like he's presenting live data 42 | - Small floating elements: A satellite with glowing GPU, cosmic particles, the ThursdAI logo 43 | 44 | **Background atmosphere:** 45 | - Starfield with subtle nebula gradients (space theme for the Starcloud story) 46 | - Faint grid lines suggesting a mission control dashboard 47 | - Small orbital paths and satellite trajectories as decorative elements 48 | - Scattered "data packet" particles flowing across the design 49 | 50 | --- 51 | 52 | ## 2. LAYOUT OVERVIEW — Breaking News + Three Bands 53 | 54 | **The infographic should feel like a live broadcast dashboard:** 55 | 56 | 1. **TOP MEGA-BANNER:** GPT-5.2 Breaking News (the moment it dropped live) 57 | 2. **MIDDLE BAND LEFT:** Open Source Surge (Mistral Devstral 2, Essential AI) 58 | 3. **MIDDLE BAND RIGHT:** The Space Race (Starcloud, GPUs in orbit) 59 | 4. **BOTTOM BAND:** Foundation News + Vision/Voice + Math AI 60 | 61 | Use clear section dividers that look like broadcast segment transitions. 62 | 63 | --- 64 | 65 | ## 3. TOP MEGA-BANNER — "🚨 BREAKING: GPT-5.2 DROPS LIVE ON AIR" 66 | 67 | **This is THE headline of the episode. Make it feel like breaking news.** 68 | 69 | **Visual treatment:** 70 | - Full-width banner with urgent red accent color and "BREAKING" styling 71 | - Faint TV static/scan lines texture for broadcast feel 72 | - Alert icons and live indicator badges 73 | 74 | **Title:** "GPT-5.2 — Dropped Live on ThursdAI" 75 | **Subtitle:** "SOTA on ARC-AGI, SWE-Bench, AIME • 390x Cheaper Than o3" 76 | 77 | **Visual:** 78 | - Icon of a microphone/broadcast tower with signal waves, combined with a benchmark trophy 79 | - Small "LIVE" badge next to it 80 | - Lightning bolts and signal waves radiating outward 81 | 82 | **Info panel (styled like live data readouts):** 83 | 84 | ``` 85 | 🎯 BENCHMARK DOMINATION: 86 | ├─ ARC-AGI-1 Pro X-High: 90.5% (390x cheaper at $11.64/task) 87 | ├─ ARC-AGI-2 Pro High: 54.2% 88 | ├─ AIME 2025: 100% (Perfect score!) 89 | ├─ GPQA Diamond: 92.4% 90 | ├─ SWE-Bench Pro: 55.6% 91 | └─ GDPval (44 occupations): 70.9% (ties/beats experts) 92 | 93 | 📈 LONG CONTEXT: 94 | ├─ 95% accuracy at 32K tokens 95 | ├─ 85% at 128K tokens 96 | └─ 70%+ at 256K tokens 97 | 98 | 💰 PRICING: 99 | ├─ Thinking: $1.75/$14 per M tokens 100 | └─ Pro: $21/$168 per M tokens 101 | 102 | 🏢 ENTERPRISE WINS: 103 | ├─ Box: +7pts accuracy, 74% faster docs 104 | ├─ Windsurf: "Version bump undersells the jump" 105 | └─ Cline: "Plans deeper, executes better" 106 | ``` 107 | 108 | **Add a small callout:** "Sam Altman: 'The smartest generally available model in the world'" 109 | 110 | --- 111 | 112 | ## 4. MIDDLE BAND LEFT — "⚡ OPEN SOURCE SURGE" 113 | 114 | **Section header:** Pill-shaped tag reading "⚡ OPEN SOURCE" in electric teal 115 | 116 | ### Panel A: Mistral Devstral 2 (LARGE, this is the week's open source star) 117 | 118 | **Title:** "Devstral 2 — SOTA Open Coding" 119 | **Subtitle:** "Run Claude Sonnet 3.7-Level on Your 3090" 120 | 121 | **Visual:** 122 | - Icon of stacked code windows with a wind/mistral gust sweeping through 123 | - Small "Apache 2.0" badge 124 | - Speed lines suggesting fast local inference 125 | 126 | **Info block:** 127 | 128 | ``` 129 | DEVSTRAL 2 (123B): 130 | ├─ SWE-bench Verified: 72.2% (#1 open) 131 | ├─ Just behind Claude 3.5 Sonnet (72.7%) 132 | ├─ Apache 2.0 License 133 | └─ Pricing: $0.40/$2.00 per M tokens 134 | 135 | DEVSTRAL SMALL 2 (24B): 136 | ├─ SWE-bench: 68.0% 137 | ├─ Runs on consumer GPUs 138 | ├─ Multimodal (images) 139 | └─ 7x cost efficiency 140 | 141 | + MISTRAL VIBE CLI: 142 | Open-source coding agent for terminal 143 | ``` 144 | 145 | ### Panel B: Essential AI Rnj-1 (Smaller) 146 | 147 | **Title:** "Rnj-1 — Frontier 8B from Transformers Co-Author" 148 | **Subtitle:** "Ashish Vaswani's New Lab Ships" 149 | 150 | **Visual:** 151 | - Icon of a compact gem/diamond with mathematical symbols 152 | - "8B" badge 153 | 154 | **Info block:** 155 | 156 | ``` 157 | ├─ 8B params, 8.7T training tokens 158 | ├─ SWE-bench: 20.8% (GPT-4o level!) 159 | ├─ Apache 2.0, JAX on TPUs + AMD 160 | ├─ No reasoning overhead—pure intuition 161 | └─ Led by Transformer paper co-author 162 | ``` 163 | 164 | --- 165 | 166 | ## 5. MIDDLE BAND RIGHT — "🛰️ AI IN ORBIT" 167 | 168 | **Section header:** Pill-shaped tag reading "🛰️ SPACE AI" in cosmic purple/blue 169 | 170 | ### Panel C: Starcloud — First LLM Trained in Space (LARGE, this is a jaw-dropper) 171 | 172 | **Title:** "Starcloud-1 — First LLM Trained in Space" 173 | **Subtitle:** "H100 GPU in Orbit • NanoGPT on Shakespeare" 174 | 175 | **Visual:** 176 | - Icon of a satellite with glowing H100 GPU chip orbiting Earth 177 | - Starfield and orbital path arcs 178 | - Small SpaceX-style rocket trail 179 | - Sun rays hitting solar panels 180 | 181 | **Info block:** 182 | 183 | ``` 184 | 🚀 THE MISSION: 185 | ├─ Nvidia-backed startup Starcloud 186 | ├─ H100 GPU aboard Starcloud-1 satellite 187 | ├─ Launched via SpaceX (Nov 2024) 188 | └─ GPU running error-free vs radiation/vacuum 189 | 190 | 🧠 THE TRAINING: 191 | ├─ Trained Karpathy's nanoGPT in orbit 192 | ├─ Dataset: Complete works of Shakespeare 193 | ├─ Ran inference on Google's Gemma 194 | └─ Generated: "Greetings, Earthlings!" 195 | 196 | 💬 REACTIONS: 197 | ├─ Karpathy: "nanoGPT - first LLM in space 🥹" 198 | └─ Elon Musk: "Cute 🥰 🚀💫" 199 | 200 | 🔮 THE VISION: 201 | ├─ Unlimited solar power in orbit 202 | ├─ Radiative cooling (no AC needed) 203 | ├─ 10x cheaper than Earth grids? 204 | └─ Data centers eating 50% US power by 2030 205 | ``` 206 | 207 | --- 208 | 209 | ## 6. BOTTOM BAND — THREE SECTIONS 210 | 211 | Divide the bottom into three equal sections with distinct accent colors: 212 | 213 | ### Section D: AAIF — Agentic AI Foundation (Gold/Amber accent) 214 | 215 | **Title:** "AAIF — Agentic Standards Go Vendor-Neutral" 216 | **Subtitle:** "MCP + AGENTS.md + Goose Under Linux Foundation" 217 | 218 | **Visual:** 219 | - Icon of interconnected nodes with a Linux penguin silhouette 220 | - Multiple company logos abstracted as connected blocks 221 | - "Open Standards" badge 222 | 223 | **Info block:** 224 | 225 | ``` 226 | FOUNDING PROJECTS: 227 | ├─ MCP (Anthropic): 10,000+ servers in Year 1 228 | ├─ AGENTS.md (OpenAI): 20,000+ repos 229 | └─ Goose (Block): LLM-agnostic coding agent 230 | 231 | PLATINUM MEMBERS: 232 | AWS • Bloomberg • Cloudflare • Google • Microsoft 233 | 234 | GOLD MEMBERS: 235 | Docker • Cisco • Salesforce • GitHub • Snowflake 236 | 237 | "The best thing isn't the standard— 238 | it's that it's vendor-neutral now." 239 | ``` 240 | 241 | ### Section E: Math AI Dominance (Teal accent) 242 | 243 | **Title:** "Putnam Math Competition — AI Goes #1" 244 | **Subtitle:** "Formal Proofs, Gold Medals, Verified by Lean" 245 | 246 | **Visual:** 247 | - Icon of a gold medal with mathematical symbols (∫, π, Σ) 248 | - Trophy with proof checkmarks 249 | 250 | **Info block:** 251 | 252 | ``` 253 | NOMOS-1 (Nous Research): 254 | ├─ 30B params, 87/120 Putnam 2025 255 | ├─ Would be #2 out of 3,988 humans 256 | ├─ 63pt uplift over baseline (24→87) 257 | └─ Open-sourced on HuggingFace 258 | 259 | AXIOM PROVER (4-month-old startup): 260 | ├─ 9/12 problems in Lean formal proofs 261 | ├─ 100% machine-verifiable 262 | ├─ Would be #1 / Putnam Fellow 263 | └─ "Math AI can now PROVE it's right" 264 | ``` 265 | 266 | ### Section F: Vision + Voice Quick Hits (Violet/Magenta accent) 267 | 268 | **Title:** "Vision & Voice Upgrades" 269 | 270 | **Sub-cards (compact):** 271 | 272 | **GLM-4.6V (Z.ai):** 273 | ``` 274 | ├─ 106B + 9B Flash variants 275 | ├─ 128K context (150 pages!) 276 | ├─ Native VLM tool calling 277 | └─ MathVista 88.2, WebVoyager 81 278 | ``` 279 | 280 | **Google Gemini 2.5 TTS:** 281 | ``` 282 | ├─ Enhanced expressivity 283 | ├─ Context-aware pacing 284 | ├─ Multi-speaker (24 languages) 285 | └─ Better than OpenAI realtime? 286 | ``` 287 | 288 | **VoxCPM 1.5 (OpenBMB):** 289 | ``` 290 | ├─ 44.1kHz Hi-Fi (was 16kHz) 291 | ├─ 6.25Hz token rate (2x efficient) 292 | ├─ Zero-shot voice cloning 293 | └─ RTF 0.15 on RTX 4090 294 | ``` 295 | 296 | --- 297 | 298 | ## 7. SIDE STRIP — DISNEY + OPENAI & THIS WEEK'S BUZZ 299 | 300 | **Vertical strip on the right side with quick hits:** 301 | 302 | **Disney + OpenAI:** 303 | ``` 304 | ├─ $1B investment 305 | ├─ Character IP in Sora 306 | ├─ High-five Darth Vader in Jan 307 | └─ Disney was suing Google yesterday... 308 | ``` 309 | 310 | **OpenRouter State of AI:** 311 | ``` 312 | ├─ 100T+ tokens analyzed 313 | ├─ Reasoning >50% of usage 314 | ├─ Programming >50% of tokens 315 | ├─ Open source hit 30% share 316 | ``` 317 | 318 | **This Week's Buzz — W&B Weave:** 319 | ``` 320 | ├─ OpenRouter Broadcast → Weave 321 | ├─ Trace any OpenRouter tool 322 | ├─ Zero code instrumentation 323 | └─ Works with Claude Code! 324 | ``` 325 | 326 | --- 327 | 328 | ## 8. BOTTOM CTA BAR 329 | 330 | Full-width call-to-action bar at the very bottom: 331 | 332 | **Background:** Gradient from coral/red (left, breaking news energy) to cosmic purple (right, space theme), bridging the episode's stories 333 | 334 | **Text centered:** 335 | - Large: **"Episode 51 • Next Week: The Year-End Recap 🎉"** 336 | - Smaller: **"Subscribe to ThursdAI • Follow @altryne • thursdai.news"** 337 | 338 | **Icons:** YouTube logo (stylized), X logo, envelope (newsletter), microphone (podcast), satellite (space theme) 339 | 340 | --- 341 | 342 | ## 9. STYLE & TECHNICAL NOTES 343 | 344 | **Style:** 345 | - Vector/infographic, clean lines, subtle gradients 346 | - **Broadcast control room meets space mission dashboard** 347 | - "LIVE" and "BREAKING" badges with subtle glow effects 348 | - Starfield background with orbital path decorations 349 | - Grid lines suggesting data dashboards 350 | 351 | **Key visual elements:** 352 | - Red pulsing "LIVE" indicator in corner 353 | - "BREAKING NEWS" chyron styling for GPT-5.2 section 354 | - Satellite/orbital imagery for space section 355 | - Benchmark numbers displayed like live data feeds 356 | - Small sparkle/glow effects on key stats 357 | 358 | **Priorities:** 359 | - Extreme legibility at all sizes 360 | - Clear visual hierarchy: Breaking News GPT-5.2 > Devstral/Starcloud > AAIF/Math/Voice > CTA 361 | - Each panel should work as a standalone "slide" when zoomed in during stream 362 | - The breaking news energy should be palpable 363 | 364 | **Avoid:** 365 | - No heavy clutter 366 | - No real company logos—use abstract/stylized icons 367 | - No watermarks 368 | - No Christmas imagery—keep it space/broadcast themed 369 | 370 | **Alex in the header should have:** 371 | - An excited, "this is actually happening!" expression 372 | - Gesturing toward the breaking news banner 373 | - Headphones on, mid-broadcast energy 374 | - Maybe holding a tablet showing live benchmarks 375 | 376 | **Resolution:** Render at 4K (3840×2160) for streaming and social sharing. 377 | 378 | --- 379 | 380 | ## BONUS: INDIVIDUAL PANEL PROMPTS 381 | 382 | For use during the stream, each major section should also work as a standalone panel: 383 | 384 | 1. **GPT-5.2 Breaking News Panel** — Red/coral urgent styling, benchmark stats flying 385 | 2. **Mistral Devstral 2 Panel** — Teal coding theme, Apache 2.0 badge prominent 386 | 3. **Starcloud Space Panel** — Cosmic purple, satellite orbiting Earth with H100 387 | 4. **AAIF Foundation Panel** — Gold/amber, connected nodes representing standards 388 | 5. **Math AI Panel** — Teal with gold medals, proof checkmarks 389 | 6. **This Week's Buzz Panel** — W&B Weave integration, trace visualization 390 | 391 | --- 392 | 393 | *The final poster should capture that electric energy of GPT-5.2 dropping live mid-show, the open source momentum with Devstral, and the mind-blowing reality that we're now training AI in actual space.* 394 | 395 | 396 | 397 | 398 | -------------------------------------------------------------------------------- /Q3_2025_AI_Recap.md: -------------------------------------------------------------------------------- 1 | # ThursdAI Q3 2025 - AI Yearly Recap 2 | ## The Quarter of GPT-5, Trillion-Parameter Open Source, and World Models 3 | 4 | *Based on 12 ThursdAI episodes from July 3 - September 26, 2025* 5 | 6 | --- 7 | 8 | ![Q3 2025 ThursdAI Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q3%202025%20ThursdAI%20Q3%20infographic.jpeg) 9 | 10 | --- 11 | 12 | ## 🔥 Quarter Overview 13 | 14 | Q3 2025 will be remembered as the quarter when **GPT-5 arrived**, **open source hit the trillion-parameter mark** with Kimi K2, and **world models became playable**. Chinese labs continued their open-source dominance with Qwen, DeepSeek, and ByteDance releases, while OpenAI shipped both their flagship GPT-5 and Apache-2.0 licensed GPT-OSS models. Google's Genie-3 showed us the future of interactive generated worlds, and video generation reached "can't tell it's AI" quality. 15 | 16 | --- 17 | 18 | ## 📅 July 2025 - "Trillion-Parameter Open Source & Agent Awakening" 19 | 20 | ### 🎯 Top Stories 21 | 22 | #### 🦄 **Kimi K2 - The Trillion-Parameter Open Source King** (Jul 17) 23 | Moonshot dropped a bomb with Kimi K2, a **1 trillion parameter** MoE model: 24 | - **65.8% on SWE-bench Verified** - beating Claude Sonnet without reasoning 25 | - Only **32B active parameters** making it actually runnable 26 | - **128K context** standard (2M+ rumored capability) 27 | - Trained on **15.5 trillion tokens** with the Muon optimizer 28 | - **Modified MIT license** - actually open! 29 | - **SOTA on EQBench creative writing** - finally an open model that writes well! 30 | 31 | > "This isn't just another model release. This is 'Sonnet at home' if you have the hardware." - Alex Volkov 32 | 33 | #### 🔥 **Grok-4 & Grok Heavy** (Jul 10) 34 | xAI unveiled Grok-4 and a multi-agent swarm called Grok Heavy: 35 | - **50% on Humanity's Last Exam** (with tools) - unprecedented 36 | - **15.9% on ARC-AGI v2** - 2x better than Opus 4 37 | - **100% on AIME25**, 88.9% on GPQA Diamond 38 | - Heavily scaled RL training 39 | - Controversy: "Mechahitler" incident, Grok searching "what does Elon think" 40 | 41 | #### 🤖 **OpenAI ChatGPT Agent (Odyssey)** (Jul 17) 42 | OpenAI merged Deep Research + Operator into one agentic system: 43 | - **41.6% on HLE** (double o3), **27.4% on FrontierMath** 44 | - Combines text browser + visual browser + terminal + code execution 45 | - Can browse, code, call APIs, generate images, build spreadsheets 46 | - Wedding planning, sticker ordering demos wowed audiences 47 | 48 | #### 🇨🇳 **Chinese Open Source Explosion** (Jul 3) 49 | - **Baidu ERNIE 4.5**: 10 models (424B to 0.3B), Apache 2.0, 128K context, multimodal 50 | - **Tencent Hunyuan-A13B**: 80B MoE (13B active), 256K context from WizardLM team 51 | - **Huawei Pangu Pro MoE**: 72B trained entirely on Ascend NPUs (no Nvidia!) 52 | 53 | ### Open Source LLMs 54 | | Release | Significance | 55 | |---------|-------------| 56 | | **Qwen3-Coder-480B** (Jul 24) | 69.6% SWE-bench Verified, 7.5T tokens training, 256K context | 57 | | **Qwen3-235B-A22B-Instruct-2507** | 81% MMLU-Redux, 70% LiveCodeBench, hybrid reasoning dropped | 58 | | **DeepSWE-Preview** | 59% SWE-bench-Verified, pure RL on Qwen3-32B | 59 | | **SmolLM3** (3B) | HuggingFace's 11T token pocket model, 256K context | 60 | | **HuggingFace SmolLM3** | Dual reasoning modes, 128K→256K context | 61 | | **LG EXAONE 4.0** (32B) | 81.8% MMLU Pro from LG (the fridge company!) | 62 | 63 | ### Big CO LLMs + APIs 64 | | Release | Significance | 65 | |---------|-------------| 66 | | **Grok-4 & Heavy** | SOTA on HLE (50%), ARC-AGI v2 (15.9%) | 67 | | **ChatGPT Agent** | Unified agentic AI for real-world tasks | 68 | | **OpenAI/Alphabet IMO Gold** | Both won Gold at International Math Olympiad | 69 | | **White House AI Action Plan** | 90 policy proposals for US AI dominance | 70 | 71 | ### Vision & Video 72 | - **Wan 2.2**: First MoE video model, 5-second 720p on single 4090, open source 73 | - **Runway Gen-3 Aleph**: Chat-based video editing, scene transformations 74 | - **Runway Act-Two**: Next-gen motion capture (head, face, body, hands) 75 | 76 | ### Voice & Audio 77 | - **Mistral Voxtral**: SOTA open speech recognition, beats Whisper v3, Apache 2.0 78 | - **Higgs Audio v2**: Beats GPT-4o-mini and ElevenLabs on prosody 79 | - **Riffusion x Producer.ai**: Chatable studio producer 80 | 81 | ### Tools 82 | - **Perplexity Comet**: AI-powered browser, mouse moves on its own 83 | - **Amazon Kiro**: Spec-driven AI IDE from AWS 84 | - **Liquid AI LEAP + Apollo**: On-device AI platform for mobile 85 | 86 | --- 87 | 88 | ## 📅 August 2025 - "GPT-5 Month" 89 | 90 | ### 🎯 Top Stories 91 | 92 | #### 👑 **GPT-5 Launch** (Aug 7) 93 | OpenAI released GPT-5, 32 months after GPT-4: 94 | - **400K context window** 95 | - **$1.25/$10 per million tokens** (Opus is $15/$75) 96 | - Unified thinking + chat model 97 | - Router-based architecture (initially buggy) 98 | - Quiz mode, Gmail integration, memory features 99 | - Free tier access for back-to-school 100 | - But: Writing quality disappointed some, needed prompting guide 101 | 102 | > "32 months since GPT-4 release, 32 months of ThursdAI" - Alex Volkov 103 | 104 | #### 🔓 **GPT-OSS (120B/20B)** - OpenAI Goes Open Source (Aug 5) 105 | Historic release under Apache 2.0 license: 106 | - 120B and 20B models 107 | - Configurable reasoning via system prompt (`reasoning: high`) 108 | - Function calling, web search, Python execution 109 | - Full chain-of-thought access (unlike GPT-5) 110 | - Mixed reviews: Great at code/math, weak at creative writing 111 | 112 | #### 🌍 **Google Genie-3 World Model** (Aug 7) 113 | DeepMind's world model generated **fully interactive 3D environments**: 114 | - Real-time at 24fps 115 | - Single image or text prompt → controllable world 116 | - Paint a wall, turn away, it remembers (memory/consistency breakthrough) 117 | - Walk, fly, control camera in generated worlds 118 | 119 | #### 🔀 **DeepSeek V3.1 Hybrid Reasoner** (Aug 21) 120 | DeepSeek released a hybrid that combines V3 + R1: 121 | - Matches/beats R1 with **fewer thinking tokens** 122 | - Tool calls inside thinking process 123 | - 66% SWE-bench Verified (non-thinking) vs R1's 44% 124 | - 128K context, MIT licensed 125 | - TerminalBench: 5.7→31 improvement 126 | 127 | ### Open Source LLMs 128 | | Release | Significance | 129 | |---------|-------------| 130 | | **DeepSeek V3.1** | Hybrid reasoner, R1-level with less thinking | 131 | | **ByteDance Seed-OSS 36B** | Apache 2, 512K context, "thinking budget" control | 132 | | **NVIDIA Nemotron Nano 9B V2** | Mixed Mamba+Transformer, 6x throughput, open dataset | 133 | | **Cohere Command-A Reasoning** | 111B dense, 256K context, 70% BFCL | 134 | | **GLM-4.5V** | 106B VLM from Zhipu, SOTA vision intelligence | 135 | 136 | ### Big CO LLMs + APIs 137 | | Release | Significance | 138 | |---------|-------------| 139 | | **GPT-5** | 400K context, unified reasoning, router architecture | 140 | | **GPT-OSS** | Apache 2.0, 120B/20B, full CoT access | 141 | | **Anthropic Opus 4.1** | Pre-emptive upgrade before GPT-5 | 142 | | **Claude Sonnet 1M context** | Extended to 1M in API | 143 | 144 | ### Vision & Video 145 | - **Hunyuan GameCraft**: Game video generation with physics, runs on 4090 146 | - **Skywork Matrix-Game 2.0**: Real-time world model, 25fps, open source 147 | - **LFM2-VL**: Liquid AI's 440M & 1.6B vision-language models, 2x faster 148 | 149 | ### AI Art & Diffusion 150 | - **Nano Banana**: Mystery model (rumored Google) doing 3D-aware scene editing 151 | - **Qwen Image Edit (20B)**: Fully open image editor, bilingual, runs locally 152 | - **FLUX.1 Krea [dev]**: Natural aesthetics, no "AI gloss" 153 | 154 | ### Tools 155 | - **Agents.md Standard**: OpenAI's config file to unify agent instructions 156 | - **Catnip**: W&B's containerized multi-agent coding workspace 157 | - **Cursor gets Sonic**: Mystery "Grok Code" model appears 158 | 159 | --- 160 | 161 | ## 📅 September 2025 - "Shiptember Delivers" 162 | 163 | ### 🎯 Top Stories 164 | 165 | #### 🧑‍💻 **GPT-5-Codex** (Sep 18) 166 | OpenAI's agentic coding finetune of GPT-5: 167 | - Works **7+ hours independently** on complex tasks 168 | - **93% fewer tokens** on simple tasks 169 | - Integrated everywhere: CLI, VS Code, web, iPhone 170 | - Reviews majority of OpenAI's own PRs 171 | - Perfect 12/12 on 2025 ICPC with unreleased reasoning model 172 | 173 | #### 👓 **Meta Connect 25 - AI Glasses with Display** (Sep 18) 174 | Meta unveiled next-gen Ray-Ban glasses: 175 | - **Built-in display** (invisible from outside) 176 | - **Neural band wristband** for muscle-based control 177 | - Live translation with subtitles in field of view 178 | - Agentic AI doing research tasks 179 | - **$799**, shipping immediately 180 | 181 | #### 🐋 **DeepSeek V3.1 Terminus** (Sep 26) 182 | Surgical update fixing agent behavior: 183 | - Fixed code-switching bug ("sudden Chinese") 184 | - Improved tool-use and browser execution 185 | - Less overthinking/stalling in agentic flows 186 | - HLE: 15→21.7 improvement 187 | 188 | #### 🦜 **Qwen-mas Strikes Again** (Sep 26) 189 | Alibaba's multimodal blitz: 190 | - **Qwen3-VL-235B**: Vision reasoner, 22B active, 1M context for video 191 | - **Qwen3-Omni-30B**: End-to-end omni-modal (text, image, audio, video), sub-250ms speech 192 | - **Qwen-Max**: Over 1T parameters, 69.6% SWE-bench, roadmap to 100M token context 193 | 194 | ### Open Source LLMs 195 | | Release | Significance | 196 | |---------|-------------| 197 | | **Qwen3-VL-235B-A22B-Thinking** | Vision reasoner, 1M context for 2-hour video | 198 | | **Qwen3-Omni-30B-A3B** | Real-time omni-modal, 119 languages | 199 | | **Tongyi DeepResearch A3B** | Web agent matching OpenAI Deep Research, 98.6% SimpleQA | 200 | | **Qwen-Next-80B-A3B** | Ultra-sparse MoE, rivals 235B reasoning | 201 | | **Liquid Nanos** | 350M-2.6B models for structured extraction | 202 | | **IBM Granite OCR 258M** | Tiny doc parser, runs on Raspberry Pi | 203 | 204 | ### Big CO LLMs + APIs 205 | | Release | Significance | 206 | |---------|-------------| 207 | | **GPT-5-Codex** | 7+ hour autonomous coding sessions | 208 | | **Grok-4 Fast** | 2M context, 40% fewer thinking tokens, 1% cost | 209 | | **NVIDIA $100B pledge to OpenAI** | "Biggest infrastructure project in history" | 210 | | **ChatGPT Pulse** | Proactive AI news based on your data | 211 | | **OpenAI-Oracle $300B deal** | $60B/year for compute, 5-year deal | 212 | | **Anthropic $13B raise** | $183B valuation, $5B revenue run rate | 213 | | **Mistral $13.8B valuation** | $1.3B from ASML, Europe's decacorn | 214 | 215 | ### Vision & Video 216 | - **ByteDance SeeDream 4**: 4K SOTA image gen/editing, up to 6 reference images 217 | - **Lucy 14B**: 5-second video in 6.5 seconds (insane speed) 218 | - **Wan 2.2 Animate**: Motion transfer + lip sync, open source 219 | - **Wan 4.5 Preview**: 1080p 10s with synced speech generation 220 | - **Kling 2.5 Turbo**: 30% cheaper, audio included 221 | - **Ray3**: Luma's "reasoning" video with HDR 222 | 223 | ### Voice & Audio 224 | - **Suno V5**: "I can't tell anymore" era, human-level vocals 225 | - **Qwen3-ASR-Flash**: 11-language speech recognition with singing 226 | - **Stable Audio 2.5**: 3-minute tracks in <2 seconds 227 | 228 | ### AI Art & Diffusion 229 | - **Hunyuan SRPO**: New diffusion finetuning method 230 | - **Reve 4-in-1**: Image creation + editing platform 231 | - **FeiFei WorldLabs Marble**: Images → walkable Gaussian Splat 3D worlds 232 | 233 | ### Tools 234 | - **Google Gemini in Chrome**: Chat across tabs, browse history knowledge 235 | - **ChatGPT full MCP support**: Developer mode for tool connectors 236 | - **Oasis 2.0**: Real-time Minecraft world generation mod 237 | 238 | --- 239 | 240 | ## 📊 Quarter Summary: Major Themes 241 | 242 | ### 1. 🧠 **GPT-5 Era Begins** 243 | - OpenAI unified reasoning + chat into one model 244 | - Router-based architecture for intelligent model selection 245 | - Agentic coding (Codex) works for 7+ hours independently 246 | - GPT-OSS brought open-source from OpenAI (Apache 2.0) 247 | 248 | ### 2. 🇨🇳 **Open Source Hits Trillion-Scale** 249 | - Kimi K2: 1T parameters, beats Claude Sonnet on SWE-bench 250 | - Qwen3-Coder: 480B, 69.6% SWE-bench 251 | - DeepSeek V3.1: Hybrid reasoning, fewer tokens 252 | - W&B Inference launched to host these monsters 253 | 254 | ### 3. 🌍 **World Models Become Playable** 255 | - Google Genie-3: Interactive 3D worlds at 24fps 256 | - Hunyuan GameCraft: Game video with physics 257 | - Matrix-Game 2.0: Unreal/GTA-trained, 25fps 258 | - Oasis 2.0: Real-time Minecraft reskinning 259 | 260 | ### 4. 🎥 **Video Reaches "Can't Tell" Quality** 261 | - SeeDream 4: 4K in <2 seconds 262 | - Lucy 14B: Near-realtime video generation 263 | - Suno V5: Indistinguishable from human music 264 | - Wan 4.5: Speech-synced video generation 265 | 266 | ### 5. 💰 **Unprecedented Investment** 267 | - NVIDIA $100B pledge to OpenAI 268 | - OpenAI-Oracle $300B deal 269 | - Anthropic $13B raise at $183B valuation 270 | - Meta Superintelligence Labs: $100-300M packages to poached researchers 271 | 272 | ### 6. 🤖 **Agents Get Serious** 273 | - ChatGPT Agent unifies browser + terminal + research 274 | - Agents.md standardizes agent config 275 | - Desktop agents hit 48% on OSWorld (up from ~12%) 276 | - MCP support spreading everywhere 277 | 278 | --- 279 | 280 | ## 🏆 Q3 2025: Biggest Releases by Month 281 | 282 | ### July 283 | 1. **Kimi K2** - 1T parameter open source, 65.8% SWE-bench 284 | 2. **Grok-4 & Heavy** - 50% HLE, 15.9% ARC-AGI 285 | 3. **ChatGPT Agent** - Unified agentic AI 286 | 4. **Qwen3-Coder-480B** - 69.6% SWE-bench 287 | 5. **White House AI Action Plan** - US AI strategy 288 | 289 | ### August 290 | 1. **GPT-5** - 400K context, unified reasoning 291 | 2. **GPT-OSS** - Apache 2.0, 120B/20B open weights 292 | 3. **Google Genie-3** - Playable AI-generated worlds 293 | 4. **DeepSeek V3.1** - Hybrid reasoner 294 | 5. **Meta Smart Glasses** - Display + neural control 295 | 296 | ### September 297 | 1. **GPT-5-Codex** - 7-hour autonomous coding 298 | 2. **NVIDIA $100B pledge** - Biggest AI infrastructure deal 299 | 3. **Qwen3-VL + Omni** - Complete multimodal suite 300 | 4. **ByteDance SeeDream 4** - SOTA 4K image gen 301 | 5. **Anthropic $13B raise** - $183B valuation 302 | 303 | --- 304 | 305 | *"This was the summer of trillion-parameter open source, GPT-5, and world models you can walk in. We're not just accelerating—we're in a completely different phase of AI. Hold on to your butts."* - Alex Volkov, ThursdAI 306 | 307 | --- 308 | 309 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)* 310 | -------------------------------------------------------------------------------- /Q2_2025_AI_Recap.md: -------------------------------------------------------------------------------- 1 | # ThursdAI Q2 2025 - AI Yearly Recap 2 | ## The Quarter That Shattered Reality 3 | 4 | *Based on 13 ThursdAI episodes from April 3 - June 26, 2025* 5 | 6 | --- 7 | 8 | ![Q2 2025 ThursdAI Infographic](https://pub-7837090e9353474292fc8c7114c5fa9d.r2.dev/thursdai_infographics/Q2%202025%20ThursdAI%20Q2%20infographic.jpg) 9 | 10 | --- 11 | 12 | ## 🔥 Quarter Overview 13 | 14 | Q2 2025 will be remembered as the quarter when **video AI crossed the uncanny valley** (VEO3's native audio blew minds), **tool-using reasoning models emerged** (o3 can call tools mid-thought!), and **open source matched frontier models** (Qwen 3 and Claude 4 delivered back-to-back). Google I/O dropped an avalanche of announcements, Meta's Llama 4 had a chaotic launch, and the agent ecosystem matured with MCP becoming the universal standard. 15 | 16 | --- 17 | 18 | ## 📅 April 2025 - "Tool-Using Reasoners & Llama Chaos" 19 | 20 | ### 🎯 Top Stories 21 | 22 | #### 🧠 **OpenAI o3 & o4-mini - Reasoning Meets Tool Use** (Apr 17) 23 | The most important reasoning model upgrade in AI history. For the first time, o-series models can: 24 | - **Autonomously use tools during reasoning** (web search, Python, image gen) 25 | - Chain 600+ consecutive tool calls to solve complex problems 26 | - Manipulate images mid-thought (cropping, zooming, rotating) 27 | - Score **$65k on Freelancer eval** (vs o1's $28k) 28 | - **o4-mini hits 99.5% on AIME** when using Python interpreter 29 | 30 | > "This is almost AGI territory - agents that reason while wielding tools" - Alex Volkov 31 | 32 | #### 📚 **GPT-4.1 Family - 1 Million Token Context** (Apr 14) 33 | OpenAI dropped GPT-4.1, 4.1-mini, and 4.1-nano with: 34 | - **1 million token context window** across all three models 35 | - GPT-4.5 deprecated - 4.1 actually outperforms it 36 | - Near-perfect recall across entire 1M context 37 | - 4.1-mini achieves 72% on Video-MME 38 | - "Sandwich" prompting trick boosts mini from 31% → 49% 39 | 40 | #### 🦙 **Meta Llama 4 - Scout & Maverick** (Apr 5) 41 | Meta dropped their biggest models ever, amid controversy: 42 | - **Scout**: 17B active / 109B total (16 experts) - 10M context claimed 43 | - **Maverick**: 17B active / 400B total (128 experts) - 1M context 44 | - Release caused LMArena drama (tested model ≠ released model) 45 | - Community criticism: too big to run locally 46 | - **Behemoth (288B active / 2T total)** teased but unreleased 47 | 48 | #### ⚡ **Gemini 2.5 Flash - Controllable Thinking Budgets** (Apr 17) 49 | Google's direct counter to o3/o4-mini: 50 | - Set "thinking budget" (0-24K tokens) per API call 51 | - 1M token context window 52 | - Ultra-cheap: $0.15 input / $0.60 output per 1M tokens 53 | - Balance speed/cost vs reasoning depth in one model 54 | 55 | ### Open Source LLMs 56 | | Release | Significance | 57 | |---------|-------------| 58 | | **DeepCoder-14B** | Beats DeepSeek R1 on coding, distributed RL training | 59 | | **NVIDIA Nemotron Ultra 253B** | Pruned Llama 405B, actually beats Llama 4 on AIME | 60 | | **Kimi-VL 3B** | MIT licensed VLM, 128K context, rivals 10x larger models | 61 | | **HiDream-I1 17B** | MIT license, surpasses Flux 1.1 Pro on image gen | 62 | | **GLM-4 Family** | ChatGLM rebranded, MIT licensed, up to 70B | 63 | 64 | ### Big CO LLMs + APIs 65 | - **Google celebrates MCP** - official support announced, joining Microsoft & AWS 66 | - **Google A2A Protocol** - Agent-to-Agent communication standard launched 67 | - **Grok 3 API** - xAI finally opens API access 68 | - **ChatGPT Memory Upgrade** - can now reference ALL past chats 69 | 70 | ### Vision & Video 71 | - **VEO-2 GA** - Google's video model goes Generally Available 72 | - **Kling 2.0 Creative Suite** - MVL prompting, inline images in text prompts 73 | - **Runway Gen-4** - Focus on character/world consistency 74 | - **MAGI-1 24B** - Send AI drops open weights video model (Apache 2.0) 75 | 76 | ### Voice & Audio 77 | - **Dia 1.6B TTS** - Unhinged emotional range from Korean startup, MIT licensed 78 | - **PipeCat SmartTurn** - Open source semantic VAD for natural conversations 79 | - **DolphinGemma** - Google AI attempts dolphin communication 🐬 80 | 81 | ### Tools 82 | - **OpenAI Codex CLI** - Open source with Apple Seatbelt security 83 | - **Firebase Studio** - Google's vibe coding platform (formerly Project IDX) 84 | - **GitMCP** - Turn any GitHub repo into MCP server (viral launch) 85 | 86 | --- 87 | 88 | ## 📅 May 2025 - "Qwen 3 Revolution & Claude 4 Arrives" 89 | 90 | ### 🎯 Top Stories 91 | 92 | #### 🔥 **Qwen 3 - The Open Source Benchmark Crusher** (May 1) 93 | Alibaba dropped the most comprehensive open source release ever: 94 | - **8 models**: 2 MoE (235B/22B active, 30B/3B active) + 6 dense (0.6B-32B) 95 | - **Apache 2.0 license** on everything 96 | - Runtime `/think` toggle for chain-of-thought on demand 97 | - **4B dense beats Qwen 2.5-72B** on multiple benchmarks 🤯 98 | - 36T training tokens, 119 languages, 128K context 99 | - Day-one support in LM Studio, Ollama, vLLM, MLX 100 | 101 | > "The 30B MoE is 'Sonnet 3.5 at home' - 100+ tokens/sec on MacBooks" - Nisten 102 | 103 | #### 🤖 **Claude 4 Opus & Sonnet - Live Drop During ThursdAI!** (May 22) 104 | Anthropic crashed the party mid-show with: 105 | - **Claude 4 Opus**: 72.5% SWE-bench, handles 6-7 hour human tasks 106 | - **Claude 4 Sonnet**: 72.7% SWE-bench (80% with parallel test-time compute!) 107 | - First models to cross 80% on SWE-bench threshold 108 | - Hybrid reasoning + instant response modes 109 | - 65% less likely to engage in loopholes vs Sonnet 3.7 110 | - Knowledge cutoff: March 2025 111 | 112 | #### 🎬 **VEO3 - The Undisputed Star of Google I/O** (May 20) 113 | The video model that crossed the uncanny valley: 114 | - **Native multimodal audio** - generates speech, sound effects, music synced perfectly 115 | - Perfect lip-sync with situational awareness 116 | - Characters look at each other, understand who's speaking 117 | - Can generate text within videos 118 | - Spawned viral "Prompt Theory" phenomenon on TikTok 119 | 120 | > "VEO3 isn't just video generation - it's a world simulator" - Alex Volkov 121 | 122 | #### 🎨 **GPT-4o Native Image Gen - Ghibli-mania 2.0** (May 22) 123 | OpenAI enables native image gen in GPT-4o (again), now via API: 124 | - **GPT Image 1 API** finally released 125 | - Organizational verification required (biometric scan) 126 | - Supports generations, edits, and masking 127 | - Excellent text rendering in images 128 | - Struggles with realistic face matching (possibly intentional) 129 | 130 | ### Google I/O Avalanche 131 | | Release | Significance | 132 | |---------|-------------| 133 | | **Gemini 2.5 Pro Deep Think** | 84% on MMMU, 65th percentile on USA Math Olympiad | 134 | | **Gemini 2.5 Flash GA** | Thinking budgets, native audio I/O | 135 | | **Gemini Diffusion** | 2000 tokens/sec for code/math editing | 136 | | **Jules** | Free async coding agent at jules.google | 137 | | **Project Mariner** | Browser control via API (agentic web) | 138 | | **Gemini Ultra tier** | $250/month with DeepThink, VEO3, 30TB storage | 139 | | **AI Mode in Search GA** | Can connect to Gmail/Docs, Deep Search capability | 140 | 141 | ### Open Source LLMs 142 | | Release | Significance | 143 | |---------|-------------| 144 | | **Phi-4-Reasoning** | 14B hits 78% on AIME 25, MIT licensed | 145 | | **AM-Thinking v1 32B** | Dense model, 85.3% AIME 2024, Apache 2 | 146 | | **Devstral 24B** | Mistral + AllHands collab, SOTA on SWE-bench | 147 | | **Gemma 3n** | 4B MatFormer, mobile-first multimodal | 148 | | **NVIDIA Nemotron 8B/49B** | Reasoning toggle via system prompt | 149 | 150 | ### Big CO LLMs + APIs 151 | - **OpenAI Codex Agent** - Async GitHub agent, opens PRs, fixes bugs 152 | - **OpenAI hires Jony Ive** - $6.5B deal, "IO" hardware company 153 | - **GitHub Copilot open sourced** - Frontend code now open 154 | - **Microsoft MCP in Windows** - Protocol support at OS level 155 | - **LMArena raises $100M** - a16z seed, impartiality questions raised 156 | 157 | ### Vision & Video 158 | - **Odyssey Interactive Worlds** - Walk through AI-generated worlds with WASD 159 | - **HunyuanPortrait/Avatar** - Open source competitors to HeyGen/Hedra 160 | - **Wan 2.1** - Alibaba's open source diffusion-transformer video suite 161 | - **Flux Kontext** - SOTA image editing with character consistency 162 | 163 | ### Voice & Audio 164 | - **ElevenLabs V3** - Emotion tags, 70+ languages, multi-speaker dialogue 165 | - **OpenAI Voice Revolution** - GPT 4.0 Transcribe (promptable ASR), semantic VAD 166 | - **Chatterbox 0.5B** - Open source TTS with emotion control, Apache 2.0 167 | - **Unmute.sh** - KyutAI wrapper adds voice to any LLM 168 | 169 | ### Tools 170 | - **AlphaEvolve** - Gemini-powered algorithm discovery (0.7% global compute recovery!) 171 | - **Claude Code GA** - Shell-based agent with IDE integrations 172 | - **Cursor V1** - Bug Bot reviews PRs, MCP support 173 | 174 | --- 175 | 176 | ## 📅 June 2025 - "Agents & Video Take Over" 177 | 178 | ### 🎯 Top Stories 179 | 180 | #### 💰 **OpenAI o3-pro & 90% Price Drop** (Jun 12) 181 | OpenAI's intelligence push continues: 182 | - **o3 price slashed 80%** ($40/$10 → $8/$2 per million tokens) 183 | - **o3-pro launched** - highest intelligence model, 93% AIME 2024 184 | - 87% cheaper than o1-pro 185 | - 84% on GPQA Diamond, near 3000 ELO on coding 186 | - Same full o3 model, no distillation 187 | 188 | #### 🦙 **Meta's $15B Scale AI Power Play** (Jun 12) 189 | Zuck goes all-in on superintelligence: 190 | - **49% stake in Scale AI** for ~$14B 191 | - Alex Wang leads new "Superintelligence team" at Meta 192 | - Hired Google's Jack Rae for alignment 193 | - Seven-to-nine-figure comp packages for researchers 194 | - Clear response to Llama 4's muted reception 195 | 196 | #### 🧠 **MiniMax M1 - Reasoning MoE That Beats R1** (Jun 19) 197 | Chinese lab drops powerful open reasoning model: 198 | - **456B total / 45B active** parameters 199 | - Outperforms DeepSeek R1 on multiple benchmarks 200 | - 40K context window 201 | - Full weights available on Hugging Face 202 | 203 | #### 🖥️ **Gemini CLI - AI Agent in Your Terminal** (Jun 26) 204 | Google drops open source CLI agent: 205 | - Brings **Gemini 2.5 Pro** to terminal 206 | - Free tier available (with limits on older flash models) 207 | - Full GitHub integration 208 | - Pairs with new MCP support in LM Studio 209 | 210 | ### Open Source LLMs 211 | | Release | Significance | 212 | |---------|-------------| 213 | | **Mistral Small 3.2** | Improved instruction following, better function calling | 214 | | **Mistral Magistral** | First reasoning model, 24B open, 128K context | 215 | | **Kimi-Dev-72B** | Moonshot's developer-focused model | 216 | | **DeepSeek R1-0528** | Updated reasoner, AIME 91, LiveCodeBench 73, "clearer thinking" | 217 | | **INTELLECT-2 32B** | Globally decentralized RL training from Prime Intellect | 218 | 219 | ### Big CO LLMs + APIs 220 | - **Gemini 2.5 Pro/Flash GA** - 2.5 Flash-Lite in preview 221 | - **Deep Research API** - OpenAI adds webhook support 222 | - **Anthropic Fair Use ruling** - Judge rules book training is fair use 223 | - **OpenAI Meeting Recorder** - ChatGPT can now record Zoom calls 224 | - **ChatGPT Connectors** - Team accounts get Google Drive, SharePoint, Dropbox 225 | 226 | ### Vision & Video 227 | - **Seedance 1.0 mini** - ByteDance beats VEO3 on some comparisons 228 | - **MiniMax Hailuo 02** - 1080p native, SOTA instruction following 229 | - **Midjourney Video** - Finally entering video space 230 | - **OmniGen 2** - Open weights for image gen/editing 231 | - **Imagen 4 Ultra** - Google's flagship in Gemini API 232 | 233 | ### Voice & Audio 234 | - **ElevenLabs 11.ai** - Voice-first personal assistant with MCP 235 | - **Magenta RealTime** - Google's open weights real-time music gen 236 | - **Kyutai Streaming STT** - High-throughput real-time speech-to-text 237 | - **MiniMax Speech** - Tech report confirms best TTS architecture 238 | 239 | ### Tools 240 | - **Gemini CLI** - Open source terminal agent 241 | - **OpenHands CLI** - Model-agnostic coding agent 242 | - **Warp 2.0** - Agentic Development Environment with multi-threading 243 | - **LM Studio MCP** - Connect local LLMs with MCP servers 244 | - **Cursor Slack** - Coding assistant now in Slack 245 | 246 | --- 247 | 248 | ## 📊 Quarter Summary: Major Themes 249 | 250 | ### 1. 🎬 **Video AI Crosses the Uncanny Valley** 251 | - VEO3 native audio generation (speech, SFX, music synced) 252 | - "Prompt Theory" viral videos question reality itself 253 | - Character/scene consistency finally working 254 | - Midjourney, ByteDance Seedance join Sora/Kling/VEO 255 | 256 | ### 2. 🧠 **Tool-Using Reasoning Models Emerge** 257 | - o3/o4-mini can call tools during chain-of-thought 258 | - 600+ consecutive tool calls observed 259 | - Image manipulation during reasoning (zoom/crop/rotate) 260 | - This is the closest thing to AGI we've seen 261 | 262 | ### 3. 🇨🇳 **Open Source Matches Frontier** 263 | - Qwen 3 (Apache 2.0) rivals Sonnet 3.5 on many tasks 264 | - Claude 4 Sonnet hits 80% SWE-bench with PTTC 265 | - DeepSeek R1-0528 keeps pushing open reasoning 266 | - MiniMax M1 beats R1 on several benchmarks 267 | 268 | ### 4. 📺 **Google I/O Delivers Everything** 269 | - Gemini 2.5 Pro reclaims #1 LLM 270 | - VEO3 steals the show with native audio 271 | - Jules coding agent launches free 272 | - Massive infrastructure (TPU v6e pods, Ultra tier) 273 | 274 | ### 5. 🤖 **Agent Ecosystem Matures** 275 | - MCP becomes universal standard (OpenAI, Google adopt) 276 | - A2A protocol launches for agent-to-agent communication 277 | - Jules, Codex, GitHub Copilot Agent - async coding goes mainstream 278 | - Gemini CLI brings agents to terminal 279 | 280 | ### 6. 💸 **AI's Economic Impact Accelerates** 281 | - Meta $15B Scale AI stake 282 | - OpenAI $40B raise at $300B valuation 283 | - o3 price drops 80% in 4 months 284 | - Cursor $9B valuation, Windsurf $3B acquisition 285 | 286 | --- 287 | 288 | ## 🏆 Q2 2025: Biggest Releases by Month 289 | 290 | ### April 291 | 1. **OpenAI o3 & o4-mini** - Tool-using reasoning models 292 | 2. **GPT-4.1 Family** - 1M token context 293 | 3. **Meta Llama 4** - Scout & Maverick (chaotic launch) 294 | 4. **Gemini 2.5 Flash** - Controllable thinking budgets 295 | 5. **HiDream-I1** - Open source SOTA image gen 296 | 297 | ### May 298 | 1. **VEO3** - Native audio video generation 299 | 2. **Claude 4 Opus & Sonnet** - 80% SWE-bench 300 | 3. **Qwen 3** - Apache 2.0 reasoning family 301 | 4. **Google I/O** - Gemini 2.5 Pro, Jules, Diffusion 302 | 5. **Flux Kontext** - SOTA image editing 303 | 304 | ### June 305 | 1. **o3-pro** - Highest intelligence model 306 | 2. **o3 Price Drop 80%** - Democratized reasoning 307 | 3. **Meta/Scale AI Deal** - $15B superintelligence play 308 | 4. **MiniMax M1** - Open reasoning beats R1 309 | 5. **Gemini CLI** - Terminal-based AI agent 310 | 311 | --- 312 | 313 | *"We crossed the uncanny valley this quarter. VEO3's native audio had people posting real videos claiming they were AI because they couldn't tell the difference. This isn't just progress - it's a paradigm shift."* - Alex Volkov, ThursdAI 314 | 315 | --- 316 | 317 | *Generated from ThursdAI newsletter content. For full coverage, visit [thursdai.news](https://thursdai.news)* 318 | -------------------------------------------------------------------------------- /2025_episodes/Q1 2025/January 2025/_ThursdAI_-_Jan_16_2025_-_Hailuo_4M_context_LLM_SOTA_TTS_in_browser_OpenHands_interview_more_AI_news.md: -------------------------------------------------------------------------------- 1 | # 📆 ThursdAI - Jan 16, 2025 - Hailuo 4M context LLM, SOTA TTS in browser, OpenHands interview & more AI news 2 | 3 | **Date:** January 17, 2025 4 | **Duration:** 1:40:32 5 | **Link:** [https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context) 6 | 7 | --- 8 | 9 | ## Description 10 | 11 | Hey everyone, Alex here 👋 12 | 13 | Welcome back, to an absolute banger of a week in AI releases, highlighted with just massive Open Source AI push. We're talking a MASSIVE 4M context window context window model from Hailuo (remember when a jump from 4K to 16K seemed like a big deal?), a 8B omni model that lets you livestream video and glimpses of Agentic ChatGPT? 14 | 15 | This week's ThursdAI was jam-packed with so much open source goodness that the big companies were practically silent. But don't worry, we still managed to squeeze in some updates from OpenAI and Mistral, along with a fascinating new paper from Sakana AI on self-adaptive LLMs. Plus, we had the incredible Graham Neubig, from All Hands AI, join us to talk about Open Hands (formerly OpenDevin) and even contributed to our free, LLM Evaluation course on Weights & Biases! 16 | 17 | Before we dive in, a friend asked me over dinner, what are the main 2 things that happened in AI in 2024, and this week highlights one of those trends. Most of the Open Source is now from China. This week, we got MiniMax from Hailuo, OpenBMB with a new MiniCPM, InternLM came back and most of the rest were Qwen finetunes. Not to mention DeepSeek. Wanted to highlight this significant narrative change and that this is being done despite the chip export restrictions. 18 | 19 | ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. 20 | 21 | Open Source AI & LLMs 22 | 23 | MiniMax-01: 4 Million Context, 456 Billion Parameters, and Lightning Attention 24 | 25 | This came absolutely from the left field, given that we've seen no prior LLMs from Haulio, the company previously releasing video models with consistent characters. Dropping a massive 456B mixture of experts model (45B active parameters) with such a long context support in open weights, but also with very significant benchmarks that compete with Gpt-4o, Claude and DeekSeek v3 (75.7 MMLU-pro, 89 IFEval, 54.4 GPQA) 26 | 27 | They have trained the model on up to 1M context window and then extended it to 4M with ROPE scaling methods ([our coverage](https://sub.thursdai.news/p/thursdai-sunday-special-extending?utm_source=publication-search) of RoPE) during Inference. MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE) with 45B active parameters. 28 | 29 | I gotta say, when we started talking about context window, imagining a needle in a haystack graph that shows 4M, in the open source seemed far fetched, though we did say that theoretically, there may not be a limit to context windows. I just always expected that limit to be unlocked by transformers alternative architectures like Mamba or other State Space Models. 30 | 31 | Vision, API and Browsing - Minimax-VL-01 32 | 33 | It feels like such a well rounded and complete release, that it highlights just how mature company that is behind it. They have also released a vision version of this model, that includes a 300M param Vision Transformer on top (trained with 512B vision language tokens) that features dynamic resolution and boasts very high DocVQA and ChartQA scores. 34 | 35 | Not only did these two models were released in open weights, they also launched as a unified API endpoint (supporting up to 1M tokens) and it's cheap! $0.2/1M input and $1.1/1M output tokens! AFAIK this is only the 3rd API that supports this much context, after Gemini at 2M and Qwen Turbo that supports 1M as well. 36 | 37 | Surprising web browsing capabilities 38 | 39 | You can play around with the model on their website, [hailuo.ai](https://www.hailuo.ai) which also includes web grounding, which I found quite surprising to find out, that they are beating chatGPT and Perplexity on how fast they can find information that just happened that same day! Not sure what search API they are using under the hood but they are very quick. 40 | 41 | 8B chat with video model omni-model from OpenBMB 42 | 43 | OpenBMB has been around for a while and we've seen consistently great updates from them on the MiniCPM front, but this one takes the cake! 44 | 45 | This is a complete omni modal end to end model, that does video streaming, audio to audio and text understanding, all on a model that can run on an iPad! 46 | 47 | They have a demo interface that is very similar to the chatGPT demo from spring of last year, and allows you to stream your webcam and talk to the model, but this is just an 8B parameter model we're talking about! It's bonkers! 48 | 49 | They are boasting some incredible numbers, and to be honest, I highly doubt their methodology in textual understanding, because, well, based on my experience alone, this model understands less than close to chatGPT advanced voice mode, but miniCPM has been doing great visual understanding for a while, so ChartQA and DocVQA are close to SOTA. 50 | 51 | But all of this doesn't matter, because, I say again, just a little over a year ago, Google released a video announcing these capabilities, having an AI react to a video in real time, and it absolutely blew everyone away, and it was [FAKED](https://techcrunch.com/2023/12/07/googles-best-gemini-demo-was-faked/). And this time a year after, we have these capabilities, essentially, in an 8B model that runs on device 🤯 52 | 53 | Voice & Audio 54 | 55 | This week seems to be very multimodal, not only did we get an omni-modal from OpenBMB that can speak, and last week's Kokoro still makes a lot of waves, but this week there were a lot of voice updates as well 56 | 57 | Kokoro.js - run the SOTA open TTS now in your browser 58 | 59 | Thanks to friend of the pod Xenova (and the fact that Kokoro was released with ONNX weights), we now have kokoro.js, or npm -i kokoro-js if you will. 60 | 61 | This allows you to install and run Kokoro, the best tiny TTS model, completely within your browser, with a tiny 90MB download and it sounds really good (demo [here](https://huggingface.co/spaces/webml-community/kokoro-web)) 62 | 63 | Hailuo T2A - Emotional text to speech + API 64 | 65 | Hailuo didn't rest on their laurels of releasing a huge context window LLM, they also released a new voice framework (tho not open sourced) this week, and it sounds remarkably good (competing with 11labs) 66 | 67 | They have all the standard features like Voice Cloning, but claim to have a way to preserve the emotional undertones of a voice. They also have 300 voices to choose from and professional effects applied on the fly, like acoustics or telephone filters. (Remember, they have a video model as well, so assuming that some of this is to for the holistic video production) 68 | 69 | What I specifically noticed is their "emotional intelligence system" that's either automatic or can be selected from a dropdown. I also noticed their "lax" copyright restrictions, as one of the voices that was called "Imposing Queen" sounded just like a certain blonde haired heiress to the iron throne from a certain HBO series. 70 | 71 | When I generated a speech worth of that queen, I noticed that the emotion in that speech sounded very much like an actress would read them, and unlike any old TTS, just listen to it in the clip above, I don't remember getting TTS outputs with this much emotion from anything, maybe outside of advanced voice mode! Quite impressive! 72 | 73 | This Weeks Buzz from Weights & Biases - AGENTS! 74 | 75 | Breaking news from W&B as our CTO [just broke](https://x.com/shawnup/status/1880004026957500434) SWE-bench Verified SOTA, with his own o1 agentic framework he calls W&B Programmer 😮 at **64.6% **of the issues! 76 | 77 | Shawn describes how he achieved this massive breakthrough [here](https://medium.com/@shawnup/the-best-ai-programmer-from-weights-biases-04cf8127afd8) and we'll be publishing more on this soon, but the highlight for me is he ran over 900 evaluations during the course of this, and tracked all of them in [Weave](https://wandb.ai/site/weave?utm_source=thursdai&utm_medium=referral&utm_campaign=Jan16)! 78 | 79 | We also have an upcoming event in NY, on Jan 22nd, if you're there, come by and learn how to evaluate your AI agents, RAG applications and hang out with our team! (Sign up [here](https://lu.ma/eufkbeem?utm_source=thursdai&utm_medium=referral&utm_campaign=Jan16)) 80 | 81 | Big Companies & APIs 82 | 83 | OpenAI adds chatGPT tasks - first agentic feature with more to come! 84 | 85 | We finally get a glimpse of an agentic chatGPT, in the form of scheduled tasks! Deployed to all users, it is now possible to select gpt-4o with tasks, and schedule tasks in the future. 86 | 87 | You can schedule them in natural language, and then will execute a chat (and maybe perform a search or do a calculation) and then send you a notification (and an email!) when the task is done! 88 | 89 | A bit underwhelming at first, as I didn't really find a good use for this yet, I don't doubt that this is just a building block for something more Agentic to come that can connect to my email or calendar and do actual tasks for me, not just... save me from typing the chatGPT query at "that time" 90 | 91 | Mistral CodeStral 25.01 - a new #1 coding assistant model 92 | 93 | An updated Codestral was released at the beginning of the week, and TBH I've never seen the vibes split this fast on a model. 94 | 95 | While it's super exciting that Mistral is placing a coding model at #1 on the LMArena CoPilot's arena, near Claude 3.5 and DeepSeek, the fact that this new model is not released weights is really a bummer (especially as a reference to the paragraph I mentioned on top) 96 | 97 | We seem to be closing down on OpenSource in the west, while the Chinese labs are absolutely crushing it (while also releasing in the open, including Weights, Technical papers). 98 | 99 | Mistral has released this model in API and via a collab with the Continue dot dev coding agent, but they used to be the darling of the open source community by releasing great models! 100 | 101 | Also notable, a very quick new benchmark post release was dropped that showed a significant difference between their reported benchmarks and how it performs on Aider polyglot 102 | 103 | There was way more things for this week than we were able to cover, including a new and exciting transformers squared new architecture from Sakana, a new open source TTS with voice cloning and a few other open source LLMs, one of which cost only $450 to train! All the links in the TL;DR below! 104 | 105 | TL;DR and show notes 106 | 107 | * **Open Source LLMs** 108 | 109 | * MiniMax-01 from Hailuo - 4M context 456B (45B A) LLM ([Github](https://github.com/MiniMax-AI/MiniMax-01), [HF](https://huggingface.co/MiniMaxAI), [Blog](https://www.minimaxi.com/en/news/minimax-01-series-2), [Report](https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf)) 110 | 111 | * Jina - reader V2 model - HTML 2 Markdown/JSON ([HF](https://huggingface.co/jinaai/ReaderLM-v2)) 112 | 113 | * InternLM3-8B-Instruct - apache 2 License ([Github](https://github.com/InternLM/InternLM), [HF](https://huggingface.co/internlm)) 114 | 115 | * OpenBMB - **MiniCPM-o 2.6** - Multimodal Live Streaming on Your Phone ([HF](https://huggingface.co/openbmb/MiniCPM-o-2_6), [Github](https://github.com/OpenBMB/MiniCPM-o), [Demo](https://minicpm-omni-webdemo-us.modelbest.cn/)) 116 | 117 | * KyutAI - Helium-1 2B - Base ([X](https://x.com/kyutai_labs/thread/1878857673174864318), [HF](https://huggingface.co/kyutai/helium-1-preview-2b)) 118 | 119 | * Dria-Agent-α - 3B model that outputs python code ([HF](https://huggingface.co/driaforall/Dria-Agent-a-3B)) 120 | 121 | * Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450 ([blog](https://novasky-ai.github.io/posts/sky-t1/)) 122 | 123 | * **Big CO LLMs + APIs** 124 | 125 | * OpenAI launches ChatGPT tasks ([X](https://x.com/OpenAI/status/1879267274185756896)) 126 | 127 | * Mistral - new CodeStral 25.01 ([Blog](https://mistral.ai/news/codestral-2501/), no Weights) 128 | 129 | * Sakana AI - Transformer²: Self-Adaptive LLMs ([Blog](https://sakana.ai/transformer-squared)) 130 | 131 | * **This weeks Buzz ** 132 | 133 | * Evaluating RAG Applications Workshop - NY, Jan 22, W&B and PineCone ([Free Signup](https://lu.ma/eufkbeem)) 134 | 135 | * Our evaluations course is going very strong! (chat w/ Graham Neubig) ([https://wandb.me/evals-t](https://wandb.me/evals-t)) 136 | 137 | * **Vision & Video** 138 | 139 | * Luma releases Ray2 video model ([Web](https://lumalabs.ai/ray)) 140 | 141 | * **Voice & Audio** 142 | 143 | * Hailuo **T2A-01-HD** - Emotions Audio Model from Hailuo ([X](https://x.com/Hailuo_AI/status/1879554062993195421), [Try It](https://t.co/r58fjgvJ7w)) 144 | 145 | * OuteTTS 0.3 - 1B & 500M - zero shot voice cloning model ([HF](https://huggingface.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa)) 146 | 147 | * Kokoro.js - 80M SOTA TTS in your browser! (X, [Github](https://github.com/hexgrad/kokoro/pull/3), [try it](https://huggingface.co/spaces/webml-community/kokoro-web) ) 148 | 149 | * **AI Art & Diffusion & 3D** 150 | 151 | * Black Forest Labs - Finetuning for Flux Pro and Ultra via API ([Blog](https://blackforestlabs.ai/announcing-the-flux-pro-finetuning-api/)) 152 | 153 | * **Show Notes and other Links** 154 | 155 | * Hosts - Alex Volkov ([@altryne](https://x.com/altryne)), Wolfram RavenWlf ([@WolframRvnwlf](https://twitter.com/WolframRvnwlf)), Nisten Tahiraj ([@nisten](https://x.com/nisten/)) 156 | 157 | * Guest - Graham Neubig ([@gneubig](https://x.com/gneubig)) from All Hands AI ([@allhands_ai](https://x.com/allhands_ai)) 158 | 159 | * Graham’s mentioned Agents blogpost - 8 things that agents can do right [now](https://www.all-hands.dev/blog/8-use-cases-for-generalist-software-development-agents) 160 | 161 | * Projects - Open Hands (previously Open Devin) - [Github](https://github.com/All-Hands-AI/OpenHands) 162 | 163 | * Germany meetup in Cologne ([here](https://twitter.com/WolframRvnwlf/status/1877338980632383713)) 164 | 165 | * Toronto Tinkerer Meetup *Sold OUT* ([Here](https://toronto.aitinkerers.org/p/ai-tinkerers-toronto-january-2025-meetup-at-google)) 166 | 167 | * YaRN conversation we had with the Authors ([coverage](https://sub.thursdai.news/p/thursdai-sunday-special-extending?utm_source=publication-search)) 168 | 169 | See you folks next week! Have a great long weekend if you’re in the US 🫡 170 | 171 | Please help to promote the podcast and newsletter by sharing with a friend! 172 | 173 | 174 | 175 | Thank you for subscribing. [Leave a comment](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context/comments?utm_medium=podcast&utm_campaign=CTA_5) or [share this episode](https://sub.thursdai.news/p/thursdai-jan-16-2025-hailuo-4m-context?utm_source=substack&utm_medium=podcast&utm_content=share&action=share&token=eyJ1c2VyX2lkIjoxNTIyMTYxMTAsInBvc3RfaWQiOjE1NDk4NjQ5MywiaWF0IjoxNzY1MjQyMjg2LCJleHAiOjE3Njc4MzQyODYsImlzcyI6InB1Yi0xODAxMjI4Iiwic3ViIjoicG9zdC1yZWFjdGlvbiJ9.dhVHbEk4Kb2DLfejXT5cpNzGqSQ8lgTvCGBQSVFaFR0&utm_campaign=CTA_5). 176 | --------------------------------------------------------------------------------