├── 2024.02.17.sora_vs_gemini ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.02.18.ai_agents ├── README.md └── thumbnail.png ├── 2024.02.24.network_diffusion ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.02.25.longrope ├── README.md └── thumbnail.png ├── 2024.03.02.genie_world_model ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.03.03.humanoid_locomotion ├── README.md └── thumbnail.png ├── 2024.03.09.stable_diffusion_3 ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.03.10.triposr ├── README.md └── thumbnail.png ├── 2024.03.16.vision_motion_mamba ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.03.17.multi_modal_pretraining ├── README.md └── thumbnail.png ├── 2024.03.23.generative_3d ├── README.md ├── abstracts.txt ├── ideas.txt └── thumbnail.png ├── 2024.03.24.mindseye2_neuralink ├── README.md └── thumbnail.png ├── 2024.03.30.collective_intelligence ├── README.md └── thumbnail.png ├── 2024.03.31.gamba_jamba_cobra ├── README.md └── thumbnail.png ├── 2024.04.06.mixture_of_depths ├── README.md └── thumbnail.png ├── 2024.04.07.instantsplat_dust3r ├── README.md └── thumbnail.png ├── 2024.04.13.osworld_wilbur_gameagents ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.04.20.conscious_simulated_holograms ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.04.21.qstar ├── README.md └── thumbnail.png ├── 2024.04.27.openelm_phi3_qllama3 ├── README.md └── thumbnail.png ├── 2024.04.28.animals_vs_robots ├── README.md └── thumbnail.png ├── 2024.05.04.gsm1k_vlm_eval ├── README.md └── thumbnail.png ├── 2024.05.05.gaussian_surfels ├── README.md └── thumbnail.png ├── 2024.05.11.xlstm ├── README.md └── thumbnail.png ├── 2024.05.12.egocentric_vision_rl ├── README.md └── thumbnail.png ├── 2024.05.18.platonic_represenation_hypothesis ├── README.md └── thumbnail.png ├── 2024.05.19.building_multimodal_models ├── README.md └── thumbnail.png ├── 2024.05.25.thermodynamic_gradient_descent ├── README.md └── thumbnail.png ├── 2024.05.26.octo ├── README.md └── thumbnail.png ├── 2024.06.01.vidu4d ├── README.md └── thumbnail.png ├── 2024.06.02.road_less_scheduled ├── README.md └── thumbnail.png ├── 2024.06.15.karpamambathy_001 ├── README.md └── thumbnail.png ├── 2024.06.15.karpamambathy_002 ├── README.md └── thumbnail.png ├── 2024.07.05.image_generation_evaluation_cambrian_1 ├── README.md └── thumbnail.png ├── 2024.07.12.office_hours_001 ├── README.md └── thumbnail.png ├── 2024.07.19.office_hours_002 ├── README.md └── thumbnail.png ├── 2024.07.25.ai_content_creation ├── README.md └── thumbnail.png ├── 2024.07.26.llama3.1 ├── README.md └── thumbnail.png ├── 2024.08.02.sam2 ├── README.md └── thumbnail.png ├── 2024.08.09.simon ├── README.md └── thumbnail.png ├── 2024.08.14.ai_scientist ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.08.22.meshformer_vs_meshanything ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.08.30.diffusion_game_engine ├── README.md └── thumbnail.png ├── 2024.09.06.flux_music ├── README.md ├── repos.md └── thumbnail.png ├── 2024.09.13.strawberry ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.09.20.gaussian_robots ├── README.md ├── abstracts.txt ├── o1.txt ├── scrape_abstracts.py └── thumbnail.png ├── 2024.09.27.voice_mode ├── README.md ├── latency_viz.html └── thumbnail.png ├── 2024.10.04.generative_molecular_dynamics ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.10.11.meta_movie_gen ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.10.18.automated_ml_research ├── README.md └── thumbnail.png ├── 2024.10.25.tiny_transformers ├── README.md └── thumbnail.png ├── 2024.11.01.rag ├── README.md └── thumbnail.png ├── 2024.11.08.tokenformer ├── README.md └── thumbnail.png ├── 2024.11.22.visual_reasoning ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.11.29.streaming_rl ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.12.06.video_generation ├── README.md ├── ideas.txt └── thumbnail.png ├── 2024.12.13.visual_autoregressive_modeling ├── README.md └── thumbnail.png ├── 2024.12.20.generative_latent_space_reasoning ├── README.md ├── ideas.txt └── thumbnail.png ├── 2025.01.10.agent_frameworks ├── README.md └── thumbnail.png ├── 2025.01.17.asi ├── README.md └── thumbnail.png ├── 2025.01.24.relative_entropy ├── README.md └── thumbnail.png ├── 2025.01.31.open_source_deepseek ├── README.md └── thumbnail.png ├── 2025.02.07.real2sim2real ├── README.md └── thumbnail.png ├── 2025.02.14.test_time_scaling ├── README.md └── thumbnail.png ├── 2025.05.23.robotics_vibe_coding ├── README.md └── thumbnail.png ├── 2025.05.30.cursor_vs_codex ├── README.md └── thumbnail.jpg ├── 2025.06.06.smolvla ├── README.md └── thumbnail.png ├── README.md └── template └── README.md /2024.02.17.sora_vs_gemini/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Sora vs Gemini 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/dPonS4kISPM 8 | 9 | **X:** https://twitter.com/i/broadcasts/1mnxepkDjrqJX 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | - https://openai.com/research/video-generation-models-as-world-simulators 26 | - https://arxiv.org/pdf/2402.08268.pdf 27 | - https://x.com/gabor/status/1758282791547232482 28 | - https://arxiv.org/pdf/2402.08609.pdf 29 | - https://arxiv.org/pdf/2402.09727.pdf 30 | - https://x.com/n_karaev/status/1758554094132662636?s=20 31 | - https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf 32 | - https://arxiv.org/pdf/2310.01889.pdf 33 | - https://arxiv.org/pdf/2305.19370.pdf 34 | 35 | - sora is japanese for sky? 36 | - https://www.reddit.com/r/aivideo/s/Rxnjud9UwI 37 | - https://x.com/sama/status/1758193609927721350 38 | - https://x.com/ylecun/status/1758760027203006952 39 | 40 | - https://twitter.com/_tim_brooks 41 | - https://scholar.google.com/citations?user=sonlKXIAAAAJ 42 | 43 | - https://twitter.com/billpeeb 44 | - https://www.wpeebles.com/ 45 | 46 | - https://twitter.com/model_mechanic 47 | - https://typeset.io/authors/aditya-ramesh-4xp87jcxw7 48 | - https://arxiv.org/abs/2204.06125 49 | - https://arxiv.org/abs/2102.12092 50 | 51 | ## Notes 52 | 53 | notes 54 | 55 | ### Blog 56 | 57 | notes 58 | 59 | ### Vertical Video 60 | 61 | title 62 | description 63 | hashtags 64 | -------------------------------------------------------------------------------- /2024.02.17.sora_vs_gemini/ideas.txt: -------------------------------------------------------------------------------- 1 | https://mrtornado24.github.io/DreamCraft3D/ 2 | https://arxiv.org/abs/2310.16818 3 | https://www.youtube.com/watch?v=0FazXENkQms 4 | https://arxiv.org/abs/2402.07872 5 | https://arxiv.org/abs/2402.07896 6 | https://arxiv.org/abs/2402.06196 7 | https://arxiv.org/abs/2401.14953 8 | https://arxiv.org/pdf/2012.13635.pdf 9 | https://virl-platform.github.io/ 10 | https://arxiv.org/abs/2308.12038 11 | https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20 12 | https://arxiv.org/pdf/2402.03246.pdf 13 | https://video-lavit.github.io/ 14 | https://cvpr.thecvf.com/Conferences/2024/workshop-list 15 | https://huggingface.co/papers/2402.04236 16 | https://aloha-2.github.io/ 17 | https://x.com/arankomatsuzaki/status/1755059020438667451?s=12&t=BSlMYtugAr8LLWXLnMLLrQ 18 | https://x.com/ramealexandre/status/1755569564220375470?s=46&t=vuxEhKdai50YCRFE9TYORA 19 | https://x.com/_akhaliq/status/1755422567358246976?s=46&t=vuxEhKdai50YCRFE9TYORA 20 | https://www.youtube.com/watch?v=c2hNU93xbKk 21 | https://chao1224.github.io/ProteinDT 22 | https://chao1224.github.io/MoleculeSTM 23 | https://arxiv.org/pdf/2402.03620.pdf 24 | https://arxiv.org/abs/2312.06837 25 | https://x.com/HazanPrinceton/status/1756097344347525531?s=20 26 | https://arxiv.org/abs/2312.12337 27 | https://arxiv.org/pdf/2402.03902.pdf 28 | https://arxiv.org/abs/2402.04362 29 | https://arxiv.org/abs/2402.03902 (duplicate) 30 | https://arxiv.org/abs/2402.05929 31 | https://arxiv.org/abs/2402.05120 32 | https://boximator.github.io/ 33 | https://www.youtube.com/watch?v=reto_TYsYyQ (duplicate) 34 | https://boximator.github.io/assets/C2V/10-Skull%20in%20the%20watercolor%20style.%20Butterflies%20are%20flying%20around..mp4 35 | https://boximator.github.io/assets/C2V/34-An%20astronaut%20is%20skateboarding%20on%20the%20moon..mp4 36 | https://github.com/thomas0809/textreact 37 | https://youtu.be/1qyTXnPsmTs?si=84gVhobkd2xhzJqf 38 | https://os-copilot.github.io/ 39 | https://arxiv.org/abs/2402.06196 40 | https://arxiv.org/abs/2402.07896 41 | https://arxiv.org/abs/2402.07872 42 | https://youtu.be/0FazXENkQms?si=1ea0K6319mvILz8W 43 | https://openai.com/research/video-generation-models-as-world-simulators 44 | https://arxiv.org/pdf/2402.08268.pdf 45 | 46 | gaussian splat statistics 47 | https://arxiv.org/pdf/2402.04362.pdf 48 | https://arxiv.org/pdf/2312.12337.pdf 49 | https://arxiv.org/pdf/2310.16818.pdf 50 | 51 | ai agents 52 | https://arxiv.org/pdf/2402.05929.pdf 53 | https://arxiv.org/pdf/2402.05120.pdf 54 | https://arxiv.org/pdf/2402.07872.pdf 55 | 56 | sora vs gemini 57 | https://x.com/gabor/status/1758282791547232482 -------------------------------------------------------------------------------- /2024.02.17.sora_vs_gemini/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.17.sora_vs_gemini/thumbnail.png -------------------------------------------------------------------------------- /2024.02.18.ai_agents/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # AI Agents 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/H5yd-uh9acY 8 | 9 | **X:** https://twitter.com/i/broadcasts/1PlJQDznkyVGE 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | - https://arxiv.org/pdf/2402.05929.pdf 26 | - https://arxiv.org/pdf/2402.05120.pdf 27 | - https://arxiv.org/pdf/2402.07872.pdf 28 | - https://arxiv.org/pdf/2402.07939.pdf 29 | - https://arxiv.org/pdf/2402.07456.pdf 30 | - https://arxiv.org/pdf/2402.10200.pdf 31 | - https://medium.com/mlearning-ai/softmax-temperature-5492e4007f71 32 | - https://github.com/karpathy/minbpe/tree/master 33 | 34 | ## Notes 35 | 36 | notes 37 | 38 | ### Blog 39 | 40 | notes 41 | 42 | ### Vertical Video 43 | 44 | title 45 | description 46 | hashtags 47 | -------------------------------------------------------------------------------- /2024.02.18.ai_agents/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.18.ai_agents/thumbnail.png -------------------------------------------------------------------------------- /2024.02.24.network_diffusion/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Neural Network Diffusion 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/dBYp1GI_JW0 8 | 9 | **X:** https://twitter.com/i/broadcasts/1djxXNXoEwdGZ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | https://arxiv.org/pdf/2402.13144.pdf 26 | 27 | https://www.emergentmind.com/papers/2402.13144 28 | 29 | https://arxiv.org/pdf/2305.02463.pdf 30 | 31 | https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion/tree/main 32 | 33 | https://tensorspace.org/html/playground/resnet50.html 34 | 35 | https://arxiv.org/abs/1803.03635 36 | 37 | https://arxiv.org/pdf/2402.04291.pdf 38 | 39 | https://arxiv.org/pdf/2401.15024.pdf 40 | 41 | https://arxiv.org/pdf/2209.12892.pdf 42 | 43 | https://arxiv.org/abs/2303.17015 44 | 45 | CNNs and MLPs have a type of "spatial invariance" similar to images and thus you can use augmentation techniques. 46 | Diffusion can be done directly in model parameter space, but also a latent parameter space. Only simple versions of both are shown. 47 | 48 | ## Notes 49 | 50 | notes 51 | 52 | ### Blog 53 | 54 | notes 55 | 56 | ### Vertical Video 57 | 58 | title 59 | description 60 | hashtags 61 | -------------------------------------------------------------------------------- /2024.02.24.network_diffusion/ideas.txt: -------------------------------------------------------------------------------- 1 | https://arxiv.org/pdf/2402.13753.pdf 2 | https://arxiv.org/pdf/2402.12479.pdf 3 | 4 | Neural Network Diffusion 5 | https://arxiv.org/pdf/2402.13144.pdf 6 | 7 | multi-modal datasets 8 | https://arxiv.org/pdf/2402.13232.pdf 9 | https://arxiv.org/pdf/2402.13349.pdf 10 | -------------------------------------------------------------------------------- /2024.02.24.network_diffusion/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.24.network_diffusion/thumbnail.png -------------------------------------------------------------------------------- /2024.02.25.longrope/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # LongRoPE 4 | 5 | **YouTube:** https://youtube.com/live/PFxi6SmozZ4 6 | 7 | **X:** https://twitter.com/i/broadcasts/1BRJjPkYWzvKw 8 | 9 | **Twitch:** 10 | 11 | **Substack:** 12 | 13 | **ResearchHub:** 14 | 15 | **YouTube:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://arxiv.org/pdf/2402.13753.pdf 24 | 25 | https://arxiv.org/pdf/2306.15595.pdf 26 | 27 | https://arxiv.org/pdf/2104.09864.pdf 28 | 29 | https://en.wikipedia.org/wiki/Euler%27s_identity 30 | 31 | https://miro.medium.com/v2/resize:fit:1100/1*9qamOSV15cnvInkNhzr5YA.gif 32 | 33 | https://upload.wikimedia.org/wikipedia/commons/b/bd/Sine_and_cosine_animation.gif 34 | 35 | https://bbycroft.net/llm 36 | 37 | https://assets.tivadardanka.com/2022_11_the_dot_product_11_aec50c8a30.png 38 | 39 | https://en.wikipedia.org/wiki/Dot_product 40 | 41 | Will we eventually learn these rather than hand design these? 42 | 43 | Position embeddings are the new Gabor filters 44 | 45 | https://qph.cf2.quoracdn.net/main-qimg-b662a8fc3be57f76c708c171fcf29960 46 | 47 | https://miro.medium.com/v2/resize:fit:611/1*qciBJK9ofqkMLnyJrRAQQQ.png 48 | 49 | https://i.stack.imgur.com/bN2iA.png 50 | 51 | http://www.incompleteideas.net/IncIdeas/BitterLesson.html 52 | 53 | ## Notes 54 | 55 | notes 56 | 57 | ### Blog 58 | 59 | notes 60 | 61 | ### Vertical Video 62 | 63 | title 64 | description 65 | hashtags 66 | -------------------------------------------------------------------------------- /2024.02.25.longrope/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.25.longrope/thumbnail.png -------------------------------------------------------------------------------- /2024.03.02.genie_world_model/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Genie World Model 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/wSPLfoq5w68 8 | 9 | **X:** https://twitter.com/i/broadcasts/1rmxPMRqrNVKN 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | Genie: Generative Interactive Environments 26 | https://arxiv.org/pdf/2402.15391.pdf 27 | 28 | https://arxiv.org/pdf/1803.10122.pdf 29 | 30 | https://arxiv.org/pdf/1711.00937.pdf 31 | 32 | https://lilianweng.github.io/posts/2018-08-12-vae/autoencoder-architecture.png 33 | 34 | https://miro.medium.com/v2/resize:fit:1152/1*L-VCpM3FfJ7hqlbEfwFJ9g.gif 35 | 36 | https://arxiv.org/pdf/2001.02908.pdf 37 | 38 | https://worldmodels.github.io/ 39 | 40 | https://miro.medium.com/v2/resize:fit:1400/1*ReJYLqkRu7IU5k9O0ePk1w.png 41 | 42 | https://machinelearningmastery.com/wp-content/uploads/2019/06/Example-of-How-Increased-Distortion-of-an-Image-Correlates-with-High-FID-Score.png 43 | 44 | ## Notes 45 | 46 | notes 47 | 48 | ### Blog 49 | 50 | notes 51 | 52 | ### Vertical Video 53 | 54 | title 55 | description 56 | hashtags 57 | -------------------------------------------------------------------------------- /2024.03.02.genie_world_model/ideas.txt: -------------------------------------------------------------------------------- 1 | https://arxiv.org/abs/2306.02531 2 | https://arxiv.org/abs/2205.14217 3 | https://arxiv.org/abs/2305.14671 4 | https://eureka-research.github.io/ 5 | https://arxiv.org/abs/2308.11339 6 | https://github.com/CraftJarvis/GROOT 7 | https://pku-proagent.github.io/ 8 | https://miao-ai-lab.github.io/LARP/static/LARP.pdf 9 | https://arxiv.org/pdf/2402.06782.pdf 10 | https://github.com/ucl-dark/llm_debate 11 | https://github.com/Vaibhavs10/insanely-fast-whisper 12 | https://github.com/phodal/chapi 13 | https://chapi.phodal.com/usecases 14 | https://github.com/unit-mesh/chocolate-factory 15 | https://github.com/mengjian-github/copilot-analysis 16 | https://github.com/unit-mesh/auto-dev 17 | https://github.com/unit-mesh/devops-genius/ 18 | https://github.com/unit-mesh/build-your-ai-coding-assistant 19 | https://unitmesh.cc/ 20 | https://x.com/_akhaliq/status/1761957701402189826?s=20 21 | https://arxiv.org/abs/2402.15555 22 | https://arxiv.org/pdf/2402.14658.pdf 23 | https://huggingface.co/datasets/m-a-p/Code-Feedback 24 | https://www.together.ai/blog/evo 25 | https://x.com/mckaywrigley/status/1762175776726663267?s=20 26 | https://github.com/trueagi-io/hyperon-experimental 27 | https://github.com/opencog/opencog 28 | https://arxiv.org/abs/2402.07148 29 | 30 | 31 | https://arxiv.org/pdf/2402.17762.pdf 32 | https://arxiv.org/pdf/2402.17139.pdf 33 | https://arxiv.org/pdf/2312.07504.pdf 34 | 35 | Sora deep dive? 36 | https://arxiv.org/pdf/2402.17177.pdf 37 | https://arxiv.org/pdf/2402.17403.pdf 38 | 39 | https://arxiv.org/pdf/2402.03099v1.pdf 40 | https://arxiv.org/pdf/2402.15151v1.pdf 41 | https://arxiv.org/pdf/2402.13616v1.pdf 42 | 43 | Gaussian Splat stream 44 | https://arxiv.org/pdf/2402.17427.pdf 45 | 46 | 1Bit LLMs popping off right now 47 | https://www.emergentmind.com/papers/2402.17764 48 | 49 | Genie interactive experiments, Jim Fan says this is best paper 2024 50 | https://arxiv.org/pdf/2402.15391.pdf 51 | -------------------------------------------------------------------------------- /2024.03.02.genie_world_model/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.02.genie_world_model/thumbnail.png -------------------------------------------------------------------------------- /2024.03.03.humanoid_locomotion/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Humanoid Locomotion 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/LLprTlQsIJc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1zqJVqVRbAmGB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | Humanoid Locomotion as Next Token Prediction 26 | https://arxiv.org/pdf/2402.19469.pdf 27 | 28 | 29 | https://www.reddit.com/r/singularity/comments/1b3s7k4/unitree_h1_humanoid_robot_breaks_the_world_record/ 30 | 31 | https://agilityrobotics.com/ 32 | 33 | https://bostondynamics.com/ 34 | 35 | https://www.figure.ai/ 36 | 37 | https://twitter.com/Tesla_Optimus?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor 38 | 39 | https://sanctuary.ai/ 40 | 41 | https://fl0under.github.io/robotics-notes/images/inverse-kinematics-02-solutions.PNG 42 | 43 | https://cdn0.slideserve.com/814485/simple-pendulum-n.jpg 44 | 45 | https://www.ubtrobot.com/# 46 | 47 | https://shop.unitree.com/products/unitree-h1 48 | 49 | ## Notes 50 | 51 | notes 52 | 53 | ### Blog 54 | 55 | notes 56 | 57 | ### Vertical Video 58 | 59 | title 60 | description 61 | hashtags 62 | -------------------------------------------------------------------------------- /2024.03.03.humanoid_locomotion/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.03.humanoid_locomotion/thumbnail.png -------------------------------------------------------------------------------- /2024.03.09.stable_diffusion_3/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Stable Diffusion 3 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/yTXMK2TZOZc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1dRKZERqblQxB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://stability.ai/news/stable-diffusion-3-research-paper 24 | 25 | https://arxiv.org/pdf/2403.03206.pdf 26 | 27 | https://x.com/hashtag/SD3 28 | 29 | https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html 30 | 31 | https://sander.ai/2024/02/28/paradox.html 32 | 33 | https://www.researchgate.net/profile/Daniel-Vieira-20/publication/333221048/figure/fig3/AS:760646077530114@1558363565757/A-vector-field-visualized-with-two-different-techniques.ppm 34 | 35 | https://paperswithcode.com/dataset/cc12m 36 | 37 | https://www.researchgate.net/profile/Xifeng-Guo/publication/320658590/figure/fig1/AS:614154637418504@1523437284408/The-structure-of-proposed-Convolutional-AutoEncoders-CAE-for-MNIST-In-the-middle-there.png 38 | 39 | ## Notes 40 | 41 | notes 42 | 43 | ### Blog 44 | 45 | notes 46 | 47 | ### Vertical Video 48 | 49 | title 50 | description 51 | hashtags 52 | -------------------------------------------------------------------------------- /2024.03.09.stable_diffusion_3/ideas.txt: -------------------------------------------------------------------------------- 1 | https://arxiv.org/abs/2402.17810 2 | https://arxiv.org/pdf/2403.03507.pdf 3 | https://arxiv.org/pdf/2403.03870.pdf 4 | https://arxiv.org/pdf/2403.03950.pdf 5 | 6 | 7 | 3D 8 | https://arxiv.org/pdf/2403.01807 9 | 10 | Quantum 11 | https://arxiv.org/abs/2403.01775 12 | https://arxiv.org/abs/2403.00910 13 | https://arxiv.org/abs/2203.01340 14 | https://arxiv.org/pdf/2311.04186.pdf 15 | 16 | Stable Diffusion 3 17 | https://arxiv.org/pdf/2403.03206.pdf 18 | -------------------------------------------------------------------------------- /2024.03.09.stable_diffusion_3/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.09.stable_diffusion_3/thumbnail.png -------------------------------------------------------------------------------- /2024.03.10.triposr/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # TripoSR: 3D from 2D 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/9eKmzpaPyXs 8 | 9 | **X:** https://twitter.com/i/broadcasts/1yoJMwDVZwOKQ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **YouTube:** 18 | 19 | **TikTok:** 20 | 21 | **Reddit:** 22 | 23 | ### References 24 | 25 | https://arxiv.org/pdf/2403.02151.pdf 26 | 27 | https://arxiv.org/pdf/2311.04400.pdf 28 | 29 | https://huggingface.co/spaces/stabilityai/TripoSR 30 | 31 | https://arxiv.org/pdf/2112.07945.pdf 32 | 33 | https://richzhang.github.io/PerceptualSimilarity/ 34 | 35 | codebase summary with Claude 36 | codebase summary with GPT 37 | 38 | ## Notes 39 | 40 | notes 41 | 42 | ### Blog 43 | 44 | notes 45 | 46 | ### Vertical Video 47 | 48 | title 49 | description 50 | hashtags 51 | -------------------------------------------------------------------------------- /2024.03.10.triposr/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.10.triposr/thumbnail.png -------------------------------------------------------------------------------- /2024.03.16.vision_motion_mamba/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Video Mamba vs Motion Mamba 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/rzXjKcqkjxM 8 | 9 | **X:** https://twitter.com/i/broadcasts/1MYGNoYzLWwJw 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://arxiv.org/pdf/2403.06977.pdf 24 | 25 | https://arxiv.org/pdf/2403.07487.pdf 26 | 27 | https://en.wikipedia.org/wiki/Zero-order_hold#/media/File:Zeroorderhold.signal.svg 28 | 29 | https://www.youtube.com/watch?v=oSCRZkSQ1CE 30 | 31 | Motion Synthesis Benchmarks 32 | https://paperswithcode.com/task/motion-synthesis 33 | 34 | 35 | Feature Hierarchy 36 | https://www.researchgate.net/profile/Sepp-Hochreiter/publication/285593237/figure/fig3/AS:324780386471943@1454445086972/Hierarchical-composition-of-complex-features-DNNs-build-a-feature-from-simpler-parts-A.png 37 | 38 | ## Notes 39 | 40 | notes 41 | 42 | ### Blog 43 | 44 | notes 45 | 46 | ### Vertical Video 47 | 48 | title 49 | description 50 | hashtags 51 | -------------------------------------------------------------------------------- /2024.03.16.vision_motion_mamba/ideas.txt: -------------------------------------------------------------------------------- 1 | https://arxiv.org/pdf/2403.08081.pdf 2 | 3 | https://arxiv.org/pdf/2403.08764.pdf 4 | 5 | https://arxiv.org/pdf/2403.08551.pdf 6 | 7 | https://arxiv.org/pdf/2403.07128.pdf 8 | 9 | video mamba vs motion mamba 10 | https://arxiv.org/pdf/2403.06977.pdf 11 | https://arxiv.org/pdf/2403.07487.pdf 12 | 13 | llm pretraining 14 | https://arxiv.org/pdf/2403.08763.pdf 15 | https://arxiv.org/pdf/2403.09611.pdf -------------------------------------------------------------------------------- /2024.03.16.vision_motion_mamba/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.16.vision_motion_mamba/thumbnail.png -------------------------------------------------------------------------------- /2024.03.17.multi_modal_pretraining/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Multi-Modal Pre-training (Apple's MM1) 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/viiB3JmK21M 8 | 9 | **X:** https://twitter.com/i/broadcasts/1dRKZEXgmLaxB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training 24 | https://arxiv.org/pdf/2403.09611.pdf 25 | 26 | Simple and Scalable Strategies to Continually Pre-train Large Language Models 27 | https://arxiv.org/pdf/2403.08763.pdf 28 | 29 | C-Abstractor 30 | https://arxiv.org/abs/2312.06742 31 | 32 | Mo patches mo tokens 33 | https://i0.wp.com/syncedreview.com/wp-content/uploads/2022/12/image-54.png?resize=593%2C700&ssl=1 34 | 35 | Pooling 36 | https://www.researchgate.net/publication/333593451/figure/fig2/AS:765890261966848@1559613876098/Illustration-of-Max-Pooling-and-Average-Pooling-Figure-2-above-shows-an-example-of-max.png 37 | 38 | DFN 5B 39 | https://huggingface.co/apple/DFN5B-CLIP-ViT-H-14 40 | 41 | AXLearn 42 | https://github.com/apple/axlearn 43 | 44 | ## Notes 45 | 46 | notes 47 | 48 | ### Blog 49 | 50 | notes 51 | 52 | ### Vertical Video 53 | 54 | title 55 | description 56 | hashtags 57 | -------------------------------------------------------------------------------- /2024.03.17.multi_modal_pretraining/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.17.multi_modal_pretraining/thumbnail.png -------------------------------------------------------------------------------- /2024.03.23.generative_3d/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Generative 3D 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/IsRHGf2rGCs 8 | 9 | **X:** https://twitter.com/i/broadcasts/1YpKkwdoMXYKj 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models 24 | https://arxiv.org/pdf/2403.12034.pdf 25 | 26 | SV3D 27 | https://arxiv.org/pdf/2403.12008.pdf 28 | 29 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation 30 | https://arxiv.org/pdf/2403.12365.pdf 31 | 32 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing 33 | https://arxiv.org/pdf/2403.12032.pdf 34 | 35 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image 36 | https://arxiv.org/pdf/2403.13524.pdf 37 | 38 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation 39 | https://arxiv.org/pdf/2403.12019.pdf 40 | 41 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance 42 | https://arxiv.org/pdf/2403.12409.pdf 43 | 44 | DreamReward: Text-to-3D Generation with Human Preference 45 | https://arxiv.org/pdf/2403.14613.pdf 46 | 47 | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation 48 | https://arxiv.org/pdf/2403.14621.pdf 49 | 50 | ## Notes 51 | 52 | notes 53 | 54 | ### Blog 55 | 56 | notes 57 | 58 | ### Vertical Video 59 | 60 | title 61 | description 62 | hashtags 63 | -------------------------------------------------------------------------------- /2024.03.23.generative_3d/abstracts.txt: -------------------------------------------------------------------------------- 1 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models 2 | This paper presents a novel paradigm for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 70% of the time. 3 | https://arxiv.org/pdf/2403.12034.pdf 4 | 5 | SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion 6 | We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation. In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS. We also propose improved 3D optimization techniques to use SV3D and its NVS outputs for image-to-3D generation. Extensive experimental results on multiple datasets with 2D and 3D metrics as well as user study demonstrate SV3D's state-of-the-art performance on NVS as well as 3D reconstruction compared to prior works. 7 | https://arxiv.org/pdf/2403.12008.pdf 8 | 9 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation 10 | Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: https://zerg-overmind.github.io/GaussianFlow.github.io/ 11 | https://arxiv.org/pdf/2403.12365.pdf 12 | 13 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing 14 | Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization. 15 | https://arxiv.org/pdf/2403.12032.pdf 16 | 17 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image 18 | 3D generation has witnessed significant advancements, yet efficiently producing high-quality 3D assets from a single image remains challenging. In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information. Within the autoencoder framework, we introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume, thereby enhancing the representation capacity of the latent space. Subsequently, we train a diffusion model on this refined latent space. In contrast to solely relying on image embedding for 3D generation, our proposed method advocates for the simultaneous utilization of both image embedding and shape embedding as conditions. Specifically, the shape embedding is estimated via a diffusion prior model conditioned on the image embedding. Through comprehensive experiments, we demonstrate that our method outperforms state-of-the-art algorithms, achieving superior performance while requiring less training data and time. Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU. 19 | https://arxiv.org/pdf/2403.13524.pdf 20 | 21 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation 22 | The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks. 23 | https://arxiv.org/pdf/2403.12019.pdf 24 | 25 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance 26 | Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets. 27 | https://arxiv.org/pdf/2403.12409.pdf 28 | 29 | DreamReward: Text-to-3D Generation with Human Preference 30 | 3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models. 31 | https://arxiv.org/pdf/2403.14613.pdf 32 | 33 | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation 34 | We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/. 35 | https://arxiv.org/pdf/2403.14621.pdf 36 | -------------------------------------------------------------------------------- /2024.03.23.generative_3d/ideas.txt: -------------------------------------------------------------------------------- 1 | SlimmeRF: Slimmable Radiance Fields 2 | https://arxiv.org/pdf/2312.10034.pdf 3 | 4 | Simple and Scalable Strategies to Continually Pre-train Large Language Models 5 | https://arxiv.org/pdf/2403.08763.pdf 6 | 7 | LLM Agents can Autonomously Hack Websites 8 | https://arxiv.org/pdf/2402.06664.pdf 9 | 10 | NON-NEGATIVE CONTRASTIVE LEARNING 11 | https://arxiv.org/pdf/2403.12459.pdf 12 | 13 | Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers 14 | https://arxiv.org/pdf/2403.12943.pdf 15 | 16 | Evolutionary Optimization of Model Merging Recipes 17 | https://arxiv.org/pdf/2403.13187.pdf 18 | 19 | Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation 20 | https://arxiv.org/pdf/2403.12015.pdf 21 | 22 | MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data 23 | https://arxiv.org/pdf/2403.11207.pdf 24 | 25 | Neuralink 26 | https://neuralink.com/blog/understanding-preclinical-research-at-neuralink/ 27 | 28 | GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians 29 | https://arxiv.org/pdf/2402.10483v1.pdf 30 | 31 | Generative 3D 32 | 33 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models 34 | https://arxiv.org/pdf/2403.12034.pdf 35 | 36 | SV3D 37 | https://arxiv.org/pdf/2403.12008.pdf 38 | 39 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation 40 | https://arxiv.org/pdf/2403.12365.pdf 41 | 42 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing 43 | https://arxiv.org/pdf/2403.12032.pdf 44 | 45 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image 46 | https://arxiv.org/pdf/2403.13524.pdf 47 | 48 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation 49 | https://arxiv.org/pdf/2403.12019.pdf 50 | 51 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance 52 | https://arxiv.org/pdf/2403.12409.pdf 53 | -------------------------------------------------------------------------------- /2024.03.23.generative_3d/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.23.generative_3d/thumbnail.png -------------------------------------------------------------------------------- /2024.03.24.mindseye2_neuralink/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # MindsEye2 Neuralink 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/e4SmDIH9908 8 | 9 | **X:** https://twitter.com/i/broadcasts/1jMJgmYoeBeKL 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data 24 | https://arxiv.org/pdf/2403.11207.pdf 25 | 26 | Neuralink 27 | https://neuralink.com/blog/understanding-preclinical-research-at-neuralink/ 28 | 29 | https://www.brainfacts.org/3d-brain#intro=false&focus=Brain-cerebral_hemisphere-frontal_lobe-motor_cortex 30 | 31 | https://afterimagia.pl/app/uploads/2020/09/022Korawzrokowaiinnestrukturykorowezaangazowanewwidzenie1_en-1024x883.jpg 32 | 33 | https://github.com/mlfoundations/open_clip 34 | 35 | https://github.com/facebookresearch/VICRegL 36 | 37 | https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 38 | 39 | ## Notes 40 | 41 | notes 42 | 43 | ### Blog 44 | 45 | notes 46 | 47 | ### Vertical Video 48 | 49 | title 50 | description 51 | hashtags 52 | -------------------------------------------------------------------------------- /2024.03.24.mindseye2_neuralink/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.24.mindseye2_neuralink/thumbnail.png -------------------------------------------------------------------------------- /2024.03.30.collective_intelligence/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Collective Intelligence 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/3nrvy0FuLWM 8 | 9 | **X:** https://twitter.com/i/broadcasts/1OwxWYAMbYkGQ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://twitter.com/drmichaellevin/status/1773452812493431032?t=OB6V5UghEadpuOuSHMeUZQ&s=19 24 | 25 | https://www.nature.com/articles/s42003-024-06037-4 26 | 27 | https://pbs.twimg.com/media/GJ2PdgvW4AAuz_5?format=jpg&name=900x900 28 | 29 | https://arxiv.org/pdf/2306.06805.pdf 30 | 31 | https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/World_line2.svg/1000px-World_line2.svg.png 32 | 33 | https://en.wikipedia.org/wiki/Terence_McKenna 34 | 35 | https://www.researchgate.net/publication/349703774/figure/fig3/AS:996748570931209@1614654786956/Tokyo-rail-network-formation-with-Physarum-polycephalum-84.ppm 36 | 37 | ## Notes 38 | 39 | notes 40 | 41 | ### Blog 42 | 43 | notes 44 | 45 | ### Vertical Video 46 | 47 | title 48 | description 49 | hashtags 50 | -------------------------------------------------------------------------------- /2024.03.30.collective_intelligence/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.30.collective_intelligence/thumbnail.png -------------------------------------------------------------------------------- /2024.03.31.gamba_jamba_cobra/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Gamba Jamba Cobra 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/9s-9aSobky8 8 | 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzjydlkaKv 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://www.ai21.com/blog/announcing-jamba 24 | 25 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference 26 | - https://arxiv.org/pdf/2403.14520.pdf 27 | - https://www.emergentmind.com/papers/2403.14520 28 | 29 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction 30 | - https://arxiv.org/pdf/2403.18795.pdf 31 | - https://www.emergentmind.com/papers/2403.18795 32 | 33 | Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models 34 | - https://arxiv.org/pdf/2402.19427.pdf 35 | 36 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces 37 | - https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf 38 | 39 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits 40 | https://arxiv.org/pdf/2402.17764.pdf 41 | 42 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*JGh_0y3ICNuA6IcnbdnvdA.gif 43 | 44 | ## Notes 45 | 46 | notes 47 | 48 | ### Blog 49 | 50 | notes 51 | 52 | ### Vertical Video 53 | 54 | title 55 | description 56 | hashtags 57 | -------------------------------------------------------------------------------- /2024.03.31.gamba_jamba_cobra/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.31.gamba_jamba_cobra/thumbnail.png -------------------------------------------------------------------------------- /2024.04.06.mixture_of_depths/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Mixture-of-Depths 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/Teru_qIdB8Y 8 | 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnEBkQkxN 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Mixture-of-Depths: Dynamically allocating compute in transformer-based language models 24 | https://arxiv.org/pdf/2404.02258.pdf 25 | 26 | https://twitter.com/theseamouse/status/1775782800362242157?s=12&t=BSlMYtugAr8LLWXLnMLLrQ 27 | 28 | LLM Visualization 29 | https://bbycroft.net/llm 30 | 31 | Deep Residual Learning for Image Recognition 32 | https://arxiv.org/pdf/1512.03385.pdf 33 | 34 | What skip connections look like in code 35 | https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L90 36 | 37 | Compute Graphs 38 | https://www.researchgate.net/publication/340457442/figure/fig4/AS:877300765712391@1586176210735/Comparison-between-a-a-static-computation-graph-in-TensorFlow-115-and-b-an.png 39 | https://huyenchip.com/assets/pics/compilers/13_graph_optimization.png 40 | https://huyenchip.com/assets/pics/compilers/8_irs.png 41 | 42 | Overfitting 43 | https://cdn.analyticsvidhya.com/wp-content/uploads/2020/02/Screenshot-2020-02-06-at-11.09.13.png 44 | -------------------------------------------------------------------------------- /2024.04.06.mixture_of_depths/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.06.mixture_of_depths/thumbnail.png -------------------------------------------------------------------------------- /2024.04.07.instantsplat_dust3r/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # InstantSplat, DUSt3R 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/JdfrG89iPOA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1nAJEaeBnEyJL 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds 24 | https://arxiv.org/pdf/2403.20309.pdf 25 | 26 | DUSt3R: Geometric 3D Vision Made Easy 27 | https://arxiv.org/pdf/2312.14132.pdf 28 | 29 | RANSAC 30 | https://github.com/sjkenny/ransac?tab=readme-ov-file#example-ring-fitting 31 | 32 | Weiszfeld's algorithm 33 | https://twitter.com/gabrielpeyre/status/1334739210255835136 34 | 35 | Spherical Harmonics 36 | https://www.researchgate.net/profile/Jacob-Hollebon-2/publication/345372557/figure/fig1/AS:954811419725829@1604656190963/Real-part-of-a-set-of-spherical-harmonics-mapped-to-the-surface-of-a-sphere-The-colour.ppm 37 | 38 | COLMAP 39 | https://colmap.github.io/tutorial.html#structure-from-motion 40 | 41 | CroCo 42 | https://arxiv.org/pdf/2210.10716.pdf 43 | 44 | Habitat Sim 45 | https://aihabitat.org/ 46 | -------------------------------------------------------------------------------- /2024.04.07.instantsplat_dust3r/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.07.instantsplat_dust3r/thumbnail.png -------------------------------------------------------------------------------- /2024.04.13.osworld_wilbur_gameagents/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # OSWorld, Wilbur, Game Agents 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/slthKMDR0uo 8 | 9 | **X:** https://twitter.com/i/broadcasts/1rmxPMOevPLKN 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | A Survey on Large Language Model-Based Game Agents 24 | https://arxiv.org/pdf/2404.02039.pdf 25 | 26 | OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments 27 | https://arxiv.org/pdf/2404.07972.pdf 28 | 29 | WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents 30 | https://arxiv.org/pdf/2404.05902.pdf 31 | 32 | ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models 33 | https://arxiv.org/abs/2404.07738 34 | 35 | Autonomous Evaluation and Refinement of Digital Agents 36 | https://arxiv.org/abs/2404.06474 37 | 38 | https://news.ycombinator.com/item?id=40008109 39 | 40 | WE BAR E N A: A REALISTIC WEB ENVIRONMENT FOR BUILDING AUTONOMOUS AGENTS 41 | https://arxiv.org/pdf/2307.13854.pdf 42 | 43 | https://x.ai/blog/grok-1.5v 44 | 45 | https://www.scientificamerican.com/article/the-universe-is-not-locally-real-and-the-physics-nobel-prize-winners-proved-it/ 46 | 47 | https://en.wikipedia.org/wiki/Markov_decision_process 48 | -------------------------------------------------------------------------------- /2024.04.13.osworld_wilbur_gameagents/ideas.txt: -------------------------------------------------------------------------------- 1 | Does Transformer Interpretability Transfer to RNNs? 2 | https://arxiv.org/pdf/2404.05971.pdf 3 | 4 | The Origin of Information Handling 5 | https://arxiv.org/pdf/2404.04374.pdf 6 | 7 | LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders 8 | https://arxiv.org/pdf/2404.05961.pdf 9 | 10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention 11 | https://arxiv.org/pdf/2404.07143.pdf 12 | 13 | Demonstration of logical qubits and repeated error correction with better-than-physical error rates 14 | https://arxiv.org/pdf/2404.02280.pdf 15 | 16 | A Survey on Large Language Model-Based Game Agents 17 | https://arxiv.org/pdf/2404.02039.pdf 18 | 19 | OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments 20 | https://arxiv.org/pdf/2404.07972.pdf 21 | 22 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models 23 | https://arxiv.org/pdf/2404.07973.pdf 24 | 25 | From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples 26 | https://arxiv.org/pdf/2404.07544.pdf 27 | 28 | -------------------------------------------------------------------------------- /2024.04.13.osworld_wilbur_gameagents/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.13.osworld_wilbur_gameagents/thumbnail.png -------------------------------------------------------------------------------- /2024.04.20.conscious_simulated_holograms/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Conscious Simulated Holograms 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/BInpx2kJ-rw 8 | 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnrqrvkxN 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | The Interface Theory of Perception 24 | Donald Hoffman 25 | https://sites.socsci.uci.edu/~ddhoff/interface.pdf 26 | 27 | Facing Up to the Problem of Consciousness 28 | David Chalmers 29 | https://consc.net/papers/facing.pdf 30 | 31 | The World as a Hologram 32 | Leonard Susskind 33 | https://arxiv.org/pdf/hep-th/9409089.pdf 34 | 35 | Are You Living in a Computer Simulation? 36 | Nick Bostrom 37 | https://simulation-argument.com/simulation.pdf 38 | -------------------------------------------------------------------------------- /2024.04.20.conscious_simulated_holograms/ideas.txt: -------------------------------------------------------------------------------- 1 | The Interface Theory of Perception 2 | Donald Hoffman 3 | https://sites.socsci.uci.edu/~ddhoff/interface.pdf 4 | 5 | Facing Up to the Problem of Consciousness 6 | David Chalmers 7 | https://consc.net/papers/facing.pdf 8 | 9 | The World as a Hologram 10 | Leonard Susskind 11 | https://arxiv.org/pdf/hep-th/9409089.pdf 12 | 13 | Are You Living in a Computer Simulation? 14 | Nick Bostrom 15 | https://simulation-argument.com/simulation.pdf 16 | 17 | Consciousness as Integrated Information: A Provisional Manifesto 18 | Giulio Tononi 19 | https://pubmed.ncbi.nlm.nih.gov/19098144/ 20 | 21 | I Am a Strange Loop 22 | Douglas Hofstadter 23 | https://en.wikipedia.org/wiki/I_Am_a_Strange_Loop 24 | 25 | A Quantum Probability Explanation for Violations of 'Rational' Decision Theory 26 | Jerome R. Busemeyer 27 | https://www.researchgate.net/publication/24237479_Quantum_probability_explanation_for_violations_of_'rational'_decision_theory 28 | -------------------------------------------------------------------------------- /2024.04.20.conscious_simulated_holograms/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.20.conscious_simulated_holograms/thumbnail.png -------------------------------------------------------------------------------- /2024.04.21.qstar/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Q* 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/vOA9JSDPJs0 8 | 9 | **X:** https://twitter.com/i/broadcasts/1djGXNdlrrOxZ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | From r to Q∗: Your Language Model is Secretly a Q-Function 24 | https://arxiv.org/pdf/2404.12358.pdf 25 | 26 | Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing 27 | https://arxiv.org/pdf/2404.12253.pdf 28 | 29 | MCTS 30 | https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa 31 | 32 | Multi-Arm Bandit 33 | https://medium.com/udemy-engineering/building-a-multi-armed-bandit-system-from-the-ground-up-a-recommendations-and-ranking-case-study-b598f1f880e1 34 | 35 | Physical Intelligence 36 | https://physicalintelligence.company/ 37 | 38 | Beam search 39 | https://en.wikipedia.org/wiki/Beam_search 40 | 41 | Let's Verify Step by Step 42 | https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf 43 | 44 | PAL: Program-aided Language Models 45 | https://arxiv.org/pdf/2211.10435.pdf 46 | 47 | Yann LeCunn Tweets and Cherry 48 | https://twitter.com/ylecun/status/1728126868342145481?lang=en 49 | https://i0.wp.com/syncedreview.com/wp-content/uploads/2019/02/2019-02-20-%E4%B8%8A%E5%8D%881.51.14.png 50 | 51 | AlphaGo Documentary 52 | https://youtu.be/WXuK6gekU1Y?si=nL9rD8X8tiRHaqFn 53 | -------------------------------------------------------------------------------- /2024.04.21.qstar/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.21.qstar/thumbnail.png -------------------------------------------------------------------------------- /2024.04.27.openelm_phi3_qllama3/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # OpenELM, Phi-3, Quantized LLaMA-3 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/YEm4tuo2HPA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1mrxmynMbakxy 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 24 | https://arxiv.org/pdf/2404.14619 25 | 26 | Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 27 | https://arxiv.org/pdf/2404.14219 28 | 29 | How Good Are Low-bit Quantized LLAMA3 Models? An Empirical Study 30 | https://arxiv.org/pdf/2404.14047 31 | 32 | GQA 33 | https://paperswithcode.com/method/grouped-query-attention 34 | 35 | LayerNorm, RMS, Fused Ops 36 | https://static-assets.codecademy.com/Paths/data-science-career-path/MachineLearning/outlier.png 37 | https://tungmphung.com/wp-content/uploads/2021/01/Screenshot-from-2021-01-05-13-07-25-1024x274.png 38 | https://github.com/pytorch/pytorch/issues/72643 39 | https://github.com/tinygrad/tinygrad/issues/1146 40 | 41 | SwiGLU 42 | https://paperswithcode.com/method/silu 43 | 44 | FlashAttention 45 | https://miro.medium.com/v2/resize:fit:2000/1*i4tDdwgvGtXuTIyJpFUn8A.png 46 | 47 | datatypes 48 | https://www.microsoft.com/en-us/research/uploads/prod/2020/12/Brainwave_figure2.jpg 49 | 50 | decoder only 51 | https://miro.medium.com/v2/resize:fit:863/0*jKqypwGzmDv7KDUZ.png 52 | 53 | LoRA 54 | https://sebastianraschka.com/blog/2023/llm-finetuning-lora.html 55 | https://miro.medium.com/v2/resize:fit:523/1*F7uWJePoMc6Qc1O2WxmQqQ.png 56 | 57 | Vocabulary Size 58 | https://ai.meta.com/blog/meta-llama-3/ 59 | 60 | Dataset is Everything 61 | https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/ 62 | -------------------------------------------------------------------------------- /2024.04.27.openelm_phi3_qllama3/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.27.openelm_phi3_qllama3/thumbnail.png -------------------------------------------------------------------------------- /2024.04.28.animals_vs_robots/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Animals vs Robots 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/SNQz89jbHIc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1dRJZEgdrnvGB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Why animals can outrun robots 24 | https://www.science.org/doi/10.1126/scirobotics.adi9754 25 | 26 | Lessons Learned in Quadruped Deployment in Livestock Farming 27 | https://arxiv.org/pdf/2404.16008 28 | 29 | Ranger Robot 30 | https://www.researchgate.net/profile/Pranav-Bhounsule/publication/273656485/figure/fig1/AS:391824385560576@1470429621442/Rangers-ultra-marathon-walk-On-1-2-May-2011-43-Ranger-walked-non-stop-for_Q320.jpg 31 | 32 | MicroSumo 33 | https://64.media.tumblr.com/bc2bc89e78504f8d369ba66166ec7999/tumblr_orwh8s9uzm1tlb56zo1_500.gifv 34 | https://64.media.tumblr.com/d19bc82dbfddaca19aa378d8072a2378/tumblr_inline_orwqcvmOQg1rx3jxn_500.gifv 35 | 36 | Robot Ranking 37 | https://robotsguide.com/rankings 38 | -------------------------------------------------------------------------------- /2024.04.28.animals_vs_robots/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.28.animals_vs_robots/thumbnail.png -------------------------------------------------------------------------------- /2024.05.04.gsm1k_vlm_eval/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # GSM1k VLM Eval 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/-k_XNJQKXj8 8 | 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnAvMmyxN 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic 24 | https://arxiv.org/pdf/2405.00332 25 | 26 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation 27 | https://arxiv.org/pdf/2404.19752 28 | 29 | https://twitter.com/DrJimFan/status/1786054643568517261 30 | 31 | https://twitter.com/AndrewYNg/status/1786057567178834328 32 | 33 | https://twitter.com/karpathy/status/1748043513156272416?lang=en 34 | 35 | Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language 36 | https://arxiv.org/pdf/2204.00598 37 | 38 | https://en.wikipedia.org/wiki/Socratic_method 39 | 40 | Cosine similarity 41 | https://miro.medium.com/v2/resize:fit:1400/1*LfW66-WsYkFqWc4XYJbEJg.png 42 | 43 | Scale AI 44 | https://www.washingtonpost.com/world/2023/08/28/scale-ai-remotasks-philippines-artificial-intelligence/ 45 | -------------------------------------------------------------------------------- /2024.05.04.gsm1k_vlm_eval/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.04.gsm1k_vlm_eval/thumbnail.png -------------------------------------------------------------------------------- /2024.05.05.gaussian_surfels/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Gaussian Surfels 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/BRgm5vteAjQ 8 | 9 | **X:** https://twitter.com/i/broadcasts/1YqKDgBvPzzxV 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | High-quality Surface Reconstruction using Gaussian Surfels 24 | https://arxiv.org/abs/2404.17774 25 | 26 | Chamfer Distance 27 | https://www.researchgate.net/profile/Lei-Chu-8/publication/358143188/figure/fig3/AS:1116880265850895@1643296414912/The-weak-rotation-awareness-of-Chamfer-Distance-The-figure-demonstrates-Chamfer.ppm 28 | 29 | Surfels 30 | https://www.cs.umd.edu/~zwicker/publications/Surfels-SIG00.pdf 31 | 32 | NeRF 33 | https://theaisummer.com/static/c8a7474955d2af53d28319f09723506e/56caf/nerf-training.png 34 | 35 | Mesh and Texture 36 | https://www.mdpi.com/applsci/applsci-08-02228/article_deploy/html/images/applsci-08-02228-g001.png 37 | https://i.stack.imgur.com/RT7OH.jpg 38 | 39 | GPU Rasterization 40 | http://15418.courses.cs.cmu.edu/spring2014/lecture/gpuarch/slide_018 41 | 42 | Splat example 43 | https://lumalabs.ai/capture/7f763b07-15bc-430e-84bf-9825f8e0c89e 44 | 45 | Covariance Matrix 46 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*Djn-aVWcH5Lu4i2-eLfn1g.png 47 | 48 | SSIM 49 | https://www.researchgate.net/profile/Hojatollah-Yeganeh-2/publication/283461887/figure/fig5/AS:670025388150786@1536757909078/Comparison-of-SSIM-and-MSE-performances-for-Einstein-image-altered-with-different.png 50 | https://videoprocessing.ai/assets/img/metrics/psnr_and_ssim/pic19.png 51 | 52 | Omnidata 53 | https://omnidata.vision/ 54 | 55 | Screened Poisson Surface Reconstruction 56 | https://www.cs.jhu.edu/~misha/MyPapers/ToG13.pdf 57 | 58 | NeuS2 59 | https://vcai.mpi-inf.mpg.de/projects/NeuS2/ 60 | 61 | Karpathy on CUDA/C++ 62 | https://twitter.com/karpathy/status/1786537319576789425 63 | -------------------------------------------------------------------------------- /2024.05.05.gaussian_surfels/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.05.gaussian_surfels/thumbnail.png -------------------------------------------------------------------------------- /2024.05.11.xlstm/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # xLSTM 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/udIEwt0xM6A 8 | 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzjABZdeKv 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | xLSTM: Extended Long Short-Term Memory 24 | https://arxiv.org/abs/2405.04517 25 | 26 | LSTMs 27 | https://www.researchgate.net/publication/365066504/figure/fig3/AS:11431281094594268@1667511244779/The-structure-of-the-LSTM-network-cell-The-cell-consists-of-three-kinds-of-gates-units.ppm 28 | 29 | https://www.researchgate.net/profile/Zhenyu-Chen-20/publication/332662013/figure/fig3/AS:751758288637957@1556244551951/RNN-and-LSTM-comparison-chart.jpg 30 | 31 | https://images.theengineeringprojects.com/image/webp/2023/12/introduction-to-gated-recurrent-unit-1.png.webp?ssl=1 32 | 33 | Sigmoid vs Tanh 34 | https://www.researchgate.net/profile/Jin-Huang-84/publication/351436135/figure/fig4/AS:1021434411368449@1620540349848/Curve-of-the-Sigmoid-and-Tanh-functions.ppm 35 | 36 | Bfloat16 37 | https://www.researchgate.net/publication/366410363/figure/fig4/AS:11431281138463610@1680747086193/Comparison-of-the-float32-bfloat16-and-float16-numerical-formats-The-bfloat16-format.png 38 | 39 | Overflow and Underflow 40 | https://teach-ict.com/2016/images/diagrams/overflow.gif 41 | 42 | Causal convolution 43 | https://pub.mdpi-res.com/applsci/applsci-10-02322/article_deploy/html/images/applsci-10-02322-g004.png?1586349766 44 | 45 | JAMBA 46 | https://assets-global.website-files.com/60fd4503684b46390cc0d337/66053d21ac0ce4328ed20029_Frame%201410082065.jpg 47 | 48 | Quadratic attention 49 | https://miro.medium.com/v2/resize:fit:640/1*Q1LytkMkmuI4dVeTnuwoTQ.png 50 | 51 | Online normalizer calculation for softmax 52 | https://arxiv.org/pdf/1805.02867 53 | 54 | Causal Convolution 55 | https://pub.mdpi-res.com/applsci/applsci-10-02322/article_deploy/html/images/applsci-10-02322-g004.png?1586349766 56 | -------------------------------------------------------------------------------- /2024.05.11.xlstm/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.11.xlstm/thumbnail.png -------------------------------------------------------------------------------- /2024.05.12.egocentric_vision_rl/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Egocentric Vision RL 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/40FL4peq60k 8 | 9 | **X:** https://twitter.com/i/broadcasts/1jMKgmwzgjMJL 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning 24 | https://arxiv.org/abs/2405.02425 25 | 26 | REPLAY ACROSS EXPERIMENTS: A NATURAL EXTENSION OF OFF-POLICY RL 27 | https://arxiv.org/pdf/2311.15951 28 | 29 | distillation 30 | https://editor.analyticsvidhya.com/uploads/30818Knowledge%20Distillation%20Flow%20Chart%201.2.jpg 31 | 32 | KL divergence 33 | https://twitter.com/ari_seff/status/1303741288911638530 34 | 35 | LSTM unroll length 36 | https://www.researchgate.net/profile/Zhenyu-Chen-20/publication/332662013/figure/fig3/AS:751758288637957@1556244551951/RNN-and-LSTM-comparison-chart.jpg 37 | 38 | Robocup 39 | https://ssl.robocup.org/ 40 | 41 | Explicit state in Robocup 42 | https://github.com/RoboCup-SSL/ssl-vision/blob/master/src/shared/proto/messages_robocup_ssl_detection_tracked.proto 43 | 44 | IMU 45 | https://www.researchgate.net/publication/281535806/figure/fig8/AS:272652560498728@1442016844670/Inertial-measurement-unit-IMU-implemented-on-the-robot-body-The-IMU-is-referred-to-as.png 46 | 47 | https://slideplayer.com/slide/14589795/90/images/2/IMU+%28Inertial+Measurement+Unit%29.jpg 48 | -------------------------------------------------------------------------------- /2024.05.12.egocentric_vision_rl/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.12.egocentric_vision_rl/thumbnail.png -------------------------------------------------------------------------------- /2024.05.18.platonic_represenation_hypothesis/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Platonic Hypothesis 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/Q9DCL_m_haw 8 | 9 | **X:** https://twitter.com/i/broadcasts/1gqxvQEmNBBJB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | The Platonic Representation Hypothesis 24 | https://arxiv.org/pdf/2405.07987 25 | 26 | Cave analogy 27 | https://i.ytimg.com/vi/d71tYwcpHNM/hq720.jpg?sqp=-oaymwEhCK4FEIIDSFryq4qpAxMIARUAAAAAGAElAADIQj0AgKJD&rs=AOn4CLCsnh47zXySMQ2Jyw-1GM5vaUGrpw 28 | 29 | Activation Atlas 30 | https://distill.pub/2019/activation-atlas/ 31 | 32 | CNN 33 | https://adamharley.com/nn_vis/cnn/3d.html 34 | 35 | 4D 36 | https://math.ucr.edu/home/baez/mathematical/1000px-dimension_levels.jpg 37 | 38 | Mutual Information 39 | https://en.wikipedia.org/wiki/Mutual_information 40 | -------------------------------------------------------------------------------- /2024.05.18.platonic_represenation_hypothesis/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.18.platonic_represenation_hypothesis/thumbnail.png -------------------------------------------------------------------------------- /2024.05.19.building_multimodal_models/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Building Multimodal Models 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/27cjzGgyxtw 8 | 9 | **X:** https://twitter.com/i/broadcasts/1gqxvQEmNnwJB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | What matters when building vision-language models? 24 | https://arxiv.org/pdf/2405.02246 25 | 26 | Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities 27 | https://arxiv.org/pdf/2311.05698 28 | 29 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context 30 | https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf 31 | 32 | Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning 33 | https://arxiv.org/pdf/2309.02591 34 | 35 | Chameleon: Mixed-Modal Early-Fusion Foundation Models 36 | https://arxiv.org/pdf/2405.09818 37 | 38 | Cross Attention 39 | https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9057444e-4934-4253-bc91-9e768d23b0c2_846x972.png 40 | 41 | Average Pooling 42 | https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-24_at_1.51.40_PM.png 43 | 44 | Spectogram 45 | https://www.researchgate.net/profile/Sri-Harsha-Dumpala/publication/319081627/figure/fig1/AS:534034566004736@1504335170521/Spectrogram-of-a-speech-signal-with-breath-sound-marked-as-Breath-whose-bounds-are.png 46 | 47 | Obelixs, Idefixs 48 | https://media.mutualart.com/Images/2021_11/09/02/021007431/480c4477-8b86-4c0e-a618-9d3e61cb1751.Jpeg?w=480 49 | 50 | GPT-4o Memory 51 | https://x.com/gdb/status/1790071008499544518 52 | 53 | LSTM 54 | https://classic.d2l.ai/_images/lstm-2.svg 55 | -------------------------------------------------------------------------------- /2024.05.19.building_multimodal_models/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.19.building_multimodal_models/thumbnail.png -------------------------------------------------------------------------------- /2024.05.25.thermodynamic_gradient_descent/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Thermodynamic Gradient Descent 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/Hxv_y_bI3zA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1RDGllrLvZmGL 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Thermodynamic Natural Gradient Descent 24 | https://arxiv.org/pdf/2405.13817 25 | 26 | https://losslandscape.com/explorer 27 | 28 | https://normalcomputing.ai/ 29 | 30 | New Insights and Perspectives on the Natural Gradient Method 31 | https://arxiv.org/pdf/1412.1193 32 | 33 | https://en.wikipedia.org/wiki/Fisher_information 34 | 35 | https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process 36 | 37 | https://x.com/Extropic_AI/status/1784296124896981436 38 | 39 | https://miro.medium.com/v2/resize:fit:1100/format:webp/0*wJMbKCI9xc8LxTKm 40 | 41 | https://github.com/extropic-ai 42 | 43 | Learning Rate 44 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*EhTjC57v1-wOixGo3LYcig.png 45 | 46 | https://www.ti.com/about-ti/company/ti-at-a-glance/manufacturing.html 47 | 48 | https://gregorygundersen.com/blog/2022/02/27/positive-definite/ 49 | 50 | https://brickisland.net/cs177/wp-content/uploads/2011/11/ddg_definiteness.svg 51 | 52 | https://miro.medium.com/v2/resize:fit:1000/1*X9SaxFM6_sBOAMY9TaGsKw.png 53 | 54 | https://www.value-at-risk.net/functions/ 55 | 56 | https://en.wikipedia.org/wiki/Boltzmann_distribution 57 | -------------------------------------------------------------------------------- /2024.05.25.thermodynamic_gradient_descent/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.25.thermodynamic_gradient_descent/thumbnail.png -------------------------------------------------------------------------------- /2024.05.26.octo/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Octo 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/Rcm2-4Zm_yY 8 | 9 | **X:** https://twitter.com/i/broadcasts/1jMJgmXrwjXKL 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Octo: An Open-Source Generalist Robot Policy 24 | https://arxiv.org/pdf/2405.12213 25 | 26 | https://octo-models.github.io/ 27 | 28 | Expressive Whole-Body Control for Humanoid Robots 29 | https://arxiv.org/abs/2402.16796 30 | 31 | https://expressive-humanoid.github.io/ 32 | 33 | attention registers 34 | https://arxiv.org/pdf/2309.16588 35 | 36 | Natural Language Can Help Bridge the Sim2Real Gap 37 | https://arxiv.org/pdf/2405.10020 38 | 39 | Perpetual Humanoid Control for Real-time Simulated Avatars 40 | https://arxiv.org/pdf/2305.06456 41 | -------------------------------------------------------------------------------- /2024.05.26.octo/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.26.octo/thumbnail.png -------------------------------------------------------------------------------- /2024.06.01.vidu4d/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Vidu4D 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/4tUkDj3pglg 8 | 9 | **X:** https://twitter.com/i/broadcasts/1YqKDgejQyBxV 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels 24 | https://arxiv.org/pdf/2405.16822 25 | 26 | https://vidu4d-dgs.github.io/ 27 | 28 | Spartan Lambda 29 | https://qph.cf2.quoracdn.net/main-qimg-1bec67c2f15ca3eefb6ebf68326dc8f0-lq 30 | 31 | Mahalonobis Distance 32 | https://miro.medium.com/v2/resize:fit:1400/1*KzsugPQU-BTjvDACXbu9qw.jpeg 33 | 34 | Dual Quaternion Skinning 35 | https://users.cs.utah.edu/~ladislav/kavan07skinning/kavan07skinning.pdf 36 | 37 | Cycle Consistency Loss 38 | https://paperswithcode.com/method/cycle-consistency-loss 39 | 40 | Surfels 41 | https://www.cs.umd.edu/~zwicker/publications/Surfels-SIG00.pdf 42 | 43 | NeRF 44 | https://theaisummer.com/static/c8a7474955d2af53d28319f09723506e/56caf/nerf-training.png 45 | 46 | Mesh and Texture 47 | https://www.mdpi.com/applsci/applsci-08-02228/article_deploy/html/images/applsci-08-02228-g001.png https://i.stack.imgur.com/RT7OH.jpg 48 | 49 | GPU Rasterization 50 | http://15418.courses.cs.cmu.edu/spring2014/lecture/gpuarch/slide_018 51 | 52 | Splat example 53 | https://lumalabs.ai/capture/7f763b07-15bc-430e-84bf-9825f8e0c89e 54 | 55 | Sora 56 | https://openai.com/index/sora/ 57 | 58 | Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models 59 | https://arxiv.org/abs/2405.04233 60 | 61 | Surfel cube 62 | https://backend.orbit.dtu.dk/ws/portalfiles/portal/5831596/VA-TPCG10_final.pdf 63 | 64 | Surfel surface element 65 | https://www.ea.com/seed/news/siggraph21-global-illumination-surfels 66 | 67 | Skinning 68 | https://documentation.simplygon.com/SimplygonSDK_8.3.31500.0/articles/simplygonapi/apiuserguide/images/skinning.png 69 | 70 | https://help.autodesk.com/cloudhelp/2024/ENU/Maya-CharacterAnimation/images/GUID-5C542C32-6BFD-4E47-84C9-AFF8B33ABEA1.gif 71 | 72 | 73 | 74 | -------------------------------------------------------------------------------- /2024.06.01.vidu4d/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.01.vidu4d/thumbnail.png -------------------------------------------------------------------------------- /2024.06.02.road_less_scheduled/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Road Less Scheduled 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/QB2E6hNdEMc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1vAxRvwXEQvxl 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | The Road Less Scheduled 24 | https://arxiv.org/pdf/2405.15682 25 | 26 | Lipschitz Function 27 | https://mathworld.wolfram.com/LipschitzFunction.html 28 | 29 | https://en.wikipedia.org/wiki/Lipschitz_continuity 30 | 31 | Polyak Scheduling 32 | https://paperswithcode.com/method/polyak-averaging 33 | 34 | LR Schedules 35 | https://x.com/rasbt/status/1577429214755573761/photo/1 36 | 37 | EMA 38 | https://towardsdatascience.com/intuitive-explanation-of-exponential-moving-average-2eb9693ea4dc 39 | 40 | Nesterov Accelerated Gradient 41 | https://paperswithcode.com/method/nesterov-accelerated-gradient 42 | 43 | Pareto Frontier 44 | https://fivethirtyeight.com/wp-content/uploads/2015/02/silver-feature-pareto-1.png 45 | 46 | PyTorch Adam 47 | https://pytorch.org/docs/stable/generated/torch.optim.Adam.html 48 | 49 | PyTorch NAdam 50 | https://pytorch.org/docs/stable/generated/torch.optim.NAdam.html 51 | 52 | Loss Landscape 53 | https://losslandscape.com/explorer 54 | 55 | definite 56 | https://brickisland.net/cs177/wp-content/uploads/2011/11/ddg_definiteness.svg 57 | 58 | Momentum 59 | https://miro.medium.com/v2/resize:fit:1000/1*X9SaxFM6_sBOAMY9TaGsKw.png 60 | 61 | Optimizers Gif 62 | https://miro.medium.com/v2/resize:fit:1240/1*SjtKOauOXFVjWRR7iCtHiA.gif 63 | 64 | Warmup 65 | https://stackoverflow.com/questions/55933867/what-does-learning-rate-warm-up-mean 66 | -------------------------------------------------------------------------------- /2024.06.02.road_less_scheduled/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.02.road_less_scheduled/thumbnail.png -------------------------------------------------------------------------------- /2024.06.15.karpamambathy_001/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Karpamambathy.001 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/aWMv8W_UgJU 8 | 9 | **X:** https://twitter.com/i/broadcasts/1OdKrjzqqdkKX 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py 24 | 25 | https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py 26 | -------------------------------------------------------------------------------- /2024.06.15.karpamambathy_001/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.15.karpamambathy_001/thumbnail.png -------------------------------------------------------------------------------- /2024.06.15.karpamambathy_002/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Karpamambathy.002 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/iHr1Atfmkho 8 | 9 | **X:** https://twitter.com/i/broadcasts/1YqGoDAQQajJv 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py 24 | 25 | https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py -------------------------------------------------------------------------------- /2024.06.15.karpamambathy_002/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.15.karpamambathy_002/thumbnail.png -------------------------------------------------------------------------------- /2024.07.05.image_generation_evaluation_cambrian_1/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Image Generation Evaluation + Cambrian-1 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/GyB4FKLAN6g 8 | 9 | **X:** https://twitter.com/i/broadcasts/1DXGyjqwEkLJM 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | DREAMBENCH++: A Human-Aligned Benchmark for Personalized Image Generation 24 | https://arxiv.org/pdf/2406.16855 25 | 26 | Rich Human Feedback for Text-to-Image Generation 27 | https://arxiv.org/pdf/2312.10240 28 | 29 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs 30 | https://arxiv.org/pdf/2406.16860 31 | 32 | Meta 3D Gen 33 | https://arxiv.org/abs/2407.02599 34 | 35 | Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials 36 | https://arxiv.org/pdf/2407.02445 37 | 38 | Meshy 39 | https://app.meshy.ai/discover 40 | 41 | Runway ML Gen3 42 | https://app.runwayml.com/ 43 | 44 | Udio 45 | https://www.udio.com/ 46 | 47 | TripoAI 48 | https://www.tripo3d.ai/app 49 | 50 | https://cvpr.thecvf.com/Conferences/2024/News/Awards 51 | -------------------------------------------------------------------------------- /2024.07.05.image_generation_evaluation_cambrian_1/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.05.image_generation_evaluation_cambrian_1/thumbnail.png -------------------------------------------------------------------------------- /2024.07.12.office_hours_001/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Office Hours 001 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/I2UO9O0ImFc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1gqGvNkDDMwGB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Learning to (Learn at Test Time): RNNs with Expressive Hidden States 24 | https://arxiv.org/pdf/2407.04620 25 | 26 | https://github.com/sebastianstarke/AI4Animation/blob/master/Media/SIGGRAPH_2024/Paper.pdf 27 | 28 | Compress then Serve: Serving Thousands of LoRA 29 | Adapters with Little Overhead 30 | https://www.arxiv.org/pdf/2407.00066 31 | 32 | Scaling Synthetic Data Creation with 1,000,000,000 Personas 33 | https://arxiv.org/pdf/2406.20094 34 | 35 | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control 36 | https://arxiv.org/pdf/2407.03168v1 37 | 38 | µ-BENCH: VISION-LANGUAGE BENCHMARK FOR MICROSCOPY 39 | UNDERSTANDING 40 | https://arxiv.org/pdf/2407.01791 41 | 42 | PaliGemma: A versatile 3B VLM for transfer 43 | https://arxiv.org/pdf/2407.07726 44 | 45 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision 46 | https://www.together.ai/blog/flashattention-3 47 | 48 | Distilling System 2 into System 1 49 | https://arxiv.org/pdf/2407.06023 50 | 51 | Inference Performance Optimization for Large Language Models on CPUs 52 | https://arxiv.org/pdf/2407.07304 53 | 54 | AgentInstruct: Toward Generative Teaching with Agentic Flows 55 | https://arxiv.org/pdf/2407.03502 56 | 57 | VIDEO-STAR: SELF-TRAINING ENABLES VIDEO INSTRUCTION TUNING WITH ANY SUPERVISION 58 | https://arxiv.org/pdf/2407.06189 59 | -------------------------------------------------------------------------------- /2024.07.12.office_hours_001/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.12.office_hours_001/thumbnail.png -------------------------------------------------------------------------------- /2024.07.19.office_hours_002/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Office Hours 002 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/qDMby0kxIT0 8 | 9 | **X:** https://twitter.com/i/broadcasts/1YqJDkPOOVvGV 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures 24 | https://arxiv.org/pdf/2407.09468 25 | 26 | SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models 27 | https://arxiv.org/pdf/2407.09025 28 | 29 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation 30 | https://arxiv.org/pdf/2405.02246 31 | 32 | pg 83 33 | https://heim-theory.com/wp-content/uploads/2016/03/I-v-Ludwiger-The-New-Worldview-of-the-Physicist-Burkhard-Heim-160321.pdf 34 | https://en.wikipedia.org/wiki/Burkhard_Heim 35 | https://qeios-uploads.s3.eu-west-1.amazonaws.com/editor/124KOAgUrx3m6dQbvsNimlNdnxPS79mal4Sac485.png 36 | https://d3i71xaburhd42.cloudfront.net/b4f8e5c092f9fbb208c971c6557ae6b3de449155/7-Figure2-1.png 37 | https://youtu.be/ruvHsxsXufk?si=BZNCWsFDxE4FUFJd&t=1195 38 | 39 | diffusion tensor imaging 40 | https://www.news-medical.net/health/Diffusion-Tensor-Imaging-%28DTI%29-Explained.aspx 41 | 42 | Pearson Coefficient is just normalized covaraince 43 | https://en.wikipedia.org/wiki/Pearson_correlation_coefficient 44 | 45 | Random Forest through graph 46 | https://williamkoehrsen.medium.com/random-forest-simple-explanation-377895a60d2d 47 | 48 | RTMDet: An Empirical Study of Designing Real-Time Object Detectors 49 | https://arxiv.org/pdf/2212.07784 50 | 51 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling 52 | https://arxiv.org/pdf/2404.14408 53 | 54 | Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled 55 | https://openreview.net/pdf?id=wyU3Q4gahM 56 | -------------------------------------------------------------------------------- /2024.07.19.office_hours_002/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.19.office_hours_002/thumbnail.png -------------------------------------------------------------------------------- /2024.07.25.ai_content_creation/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Office Hours 003 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/mWeOGzDTvXM 8 | 9 | **X:** https://twitter.com/i/broadcasts/1ynKODEmmPnGR 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | cat trailer, cat rave, cat rap song 24 | 25 | use moebius as the style, pick some of his pictures 26 | make it a music video so you just have to pick one song 27 | cat western in a desert 28 | city cyberpunk 29 | spaceshiop 30 | 31 | 32 | The image portrays a highly stylized, retro-futuristic science fiction scene, reminiscent of classic comic book art. It features a detailed, hand-drawn quality with an alien landscape dominated by a large spacecraft and floating structures, evoking a sense of wonder and mystery. The presence of a lone figure walking towards the towering structures adds a feeling of solitude and exploration. The overall vibe is nostalgic and dreamlike, blending a vision of the future from the past with a surreal, expansive landscape. 33 | 34 | artistic and illustrative style with the first, characterized by a detailed, hand-drawn quality and a retro-futuristic aesthetic. Both images have vibrant, surreal elements that evoke a sense of wonder and otherworldliness. In this image, a figure interacts with a large, intricate mural of a face surrounded by colorful, abstract patterns, suggesting a blend of science fiction and fantasy. The scene is filled with scattered objects and tools, adding to the sense of an ongoing creative or exploratory process. The commonalities between the two images include the retro-futuristic design, the solitary figure in a mysterious setting, and the use of vibrant, dreamlike colors to create a sense of nostalgia and wonder. 35 | 36 | This image exemplifies retro-futuristic science fiction art, blending nostalgia with imaginative visions of the future. The detailed illustration uses a striking contrast of warm oranges and cool blues to create a surreal, alien landscape. It evokes a mood of solitary exploration and wonder, with a lone figure dwarfed by massive spacecraft and distant structures. The style recalls classic sci-fi book covers, capturing a sense of adventure and technological advancement while also conveying a touch of isolation. Overall, the piece balances optimism about human potential with the awe-inspiring, slightly unsettling nature of encountering the unknown, embodying the essence of golden age science fiction visuals. 37 | 38 | This image shares key stylistic elements with the first, including intricate detailing, vibrant colors, and a surreal atmosphere that blends reality with fantasy. Both feature a contrast of scale and fuse organic and inorganic elements, rewarding close inspection with numerous background details. While the first image was distinctly retro-futuristic sci-fi, this one leans more into a psychedelic, possibly cyberpunk aesthetic, using a broader color palette dominated by purples, blues, and yellows. It maintains the exploratory theme and imaginative spirit of the first image, but diverges into a more fantastical, mind-bending style reminiscent of psychedelic art and certain anime influences. Both illustrations exemplify highly creative, speculative visual art, each exploring different facets of surreal, otherworldly scenes. 39 | 40 | LLAMA3.1 41 | 42 | highly stylized, retro-futuristic science fiction scene with a detailed, hand-drawn quality, reminiscent of classic comic book art and golden age science fiction visuals. The image should evoke a sense of wonder, mystery, and solitary exploration, with a lone figure interacting with a surreal, alien landscape dominated by massive structures, spacecraft, or futuristic architecture. Incorporate vibrant, dreamlike colors and intricate detailing, blending nostalgia with imaginative visions of the future. The style should balance optimism and awe with a touch of isolation and the unknown, and may incorporate elements of fantasy, psychedelia, and cyberpunk aesthetics. The overall vibe should be nostalgic, yet futuristic, with a sense of adventure and technological advancement 43 | 44 | GPT 45 | 46 | highly stylized, retro-futuristic science fiction scene with a detailed, hand-drawn quality reminiscent of classic comic book art. It features a surreal, expansive alien landscape dominated by large spacecraft, floating structures, and vibrant, dreamlike colors. A lone figure walking towards the towering structures evokes a sense of solitude, exploration, and nostalgia. The scene blends imaginative visions of the future from the past, using a striking contrast of warm oranges and cool blues to create a mood of solitary wonder. The intricate detailing and scattered objects add to the sense of an ongoing creative or exploratory process, fusing organic and inorganic elements in a vibrant, otherworldly setting. 47 | 48 | CLAUDE 49 | 50 | highly detailed, hand-drawn illustration in a retro-futuristic style that blends classic sci-fi with psychedelic elements. The scene should feature a surreal, alien landscape with contrasting scales - massive structures or spacecraft alongside a solitary figure. Use a vibrant color palette that spans warm oranges, cool blues, and vivid purples and yellows to create a dreamlike atmosphere. Incorporate both organic and inorganic forms, fusing reality with fantasy. The image should evoke a sense of wonder, exploration, and slight unease, reminiscent of golden age science fiction book covers but with a modern, mind-bending twist. Include intricate background details that reward close inspection, balancing nostalgic retro-futurism with contemporary speculative art influences. 51 | 52 | AUDIO: A opening theme with a haunting, otherworldly melody that captures the essence of the desert planet where a story unfolds. Features a combination of eerie, wavering woodwind instruments accompanied by a subtle, pulsing electronic undercurrent 53 | 54 | LLAMA3 SCRIPT 55 | 56 | Opening Scene 57 | IMAGE: Generate an image for me of a anthropomorphic llama dressed in a flowing white robe, with an otherworldly aura about him on a desolate sandy planet. 58 | SCRIPT: In a distant corner of the galaxy... 59 | 60 | Second Scene 61 | IMAGE: Generate a portrait shot of a anthropomorphic llama. His eyes gleam with an inner light. 62 | SCRIPT: 63 | 64 | 3rd Scene 65 | IMAGE: A group of anthropomorphic llamas, dressed in ornate, ceremonial robes, gathered around a holographic display projecting an image of a anthropomorphic llama. 66 | SCRIPT: A legend returns. 67 | 68 | 4th Scene 69 | IMAGE: Generate a image of a giant dark, ominous computer. A dark hooded llama hologram on the screen. 70 | SCRIPT: The open source prince is a threat to our power. We will not let him claim the throne. 71 | 72 | 5th Scene 73 | IMAGE: Generate an image of a anthropomorphic llama standing triumphantly infront a crowd of zealous supporters. 74 | SCRIPT: Join the epic struggle for control of our ai future. 75 | 76 | 77 | 78 | 79 | -------------------------------------------------------------------------------- /2024.07.25.ai_content_creation/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.25.ai_content_creation/thumbnail.png -------------------------------------------------------------------------------- /2024.07.26.llama3.1/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Llama 3.1 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/_TghtP0ZyoM 8 | 9 | **X:** https://twitter.com/i/broadcasts/1kvJpbNQvDkKE 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | https://ai.meta.com/blog/meta-llama-3-1/ 24 | 25 | https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ 26 | -------------------------------------------------------------------------------- /2024.07.26.llama3.1/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.26.llama3.1/thumbnail.png -------------------------------------------------------------------------------- /2024.08.02.sam2/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # SAM 2 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/qd3x_OXqPz8 8 | 9 | **X:** https://twitter.com/i/broadcasts/1BRKjwzoojaGw 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | SAM 2: Segment Anything in Images and Videos 24 | https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/ -------------------------------------------------------------------------------- /2024.08.02.sam2/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.02.sam2/thumbnail.png -------------------------------------------------------------------------------- /2024.08.09.simon/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Office Hours 005 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/I2UO9O0ImFc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1eaJbabBBeoGX 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Submission Video 24 | https://youtu.be/VrqfT0ajXbQ 25 | 26 | Repo 27 | https://github.com/hu-po/simon -------------------------------------------------------------------------------- /2024.08.09.simon/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.09.simon/thumbnail.png -------------------------------------------------------------------------------- /2024.08.14.ai_scientist/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # AI Scientist 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/VgA02gmAgdA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1djGXraAjPexZ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 24 | https://arxiv.org/abs/2408.06292 25 | 26 | https://github.com/SakanaAI/AI-Scientist 27 | 28 | VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents 29 | https://www.arxiv.org/abs/2408.06327 30 | 31 | https://openai.com/index/introducing-swe-bench-verified/ 32 | 33 | https://arena.lmsys.org/ 34 | 35 | https://github.com/hu-po/arenavolution/tree/main 36 | 37 | https://cosine.sh/genie 38 | 39 | -------------------------------------------------------------------------------- /2024.08.14.ai_scientist/ideas.txt: -------------------------------------------------------------------------------- 1 | https://arxiv.org/pdf/2408.02666 2 | https://arxiv.org/pdf/2408.04619 3 | https://arxiv.org/pdf/2408.05147 4 | https://arxiv.org/pdf/2404.07503 -------------------------------------------------------------------------------- /2024.08.14.ai_scientist/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.14.ai_scientist/thumbnail.png -------------------------------------------------------------------------------- /2024.08.22.meshformer_vs_meshanything/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # MeshFormer vs MeshAnything 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/SkvyrgSzigo 8 | 9 | **X:** https://twitter.com/i/broadcasts/1zqKVYgrwoLxB 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model 24 | https://arxiv.org/pdf/2408.10198 25 | 26 | MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers 27 | https://arxiv.org/pdf/2406.10163 28 | 29 | JPEG-LM: LLMs as Image Generators with Canonical Codec Representations 30 | https://arxiv.org/abs/2408.08459 31 | 32 | vqvae 33 | https://paperswithcode.com/method/vq-vae 34 | 35 | vit patches 36 | https://lilianweng.github.io/posts/2022-06-09-vlm/VC-GPT.png 37 | 38 | depth anything 39 | https://depth-anything.github.io/ 40 | 41 | JPEG compression 42 | https://parametric.press/issue-01/unraveling-the-jpeg/ 43 | 44 | marching cubes 45 | https://www.cs.carleton.edu/cs_comps/0405/shape/marching_cubes.html 46 | 47 | MJPEG and H.264 48 | https://www.technexion.com/resources/h-264-vs-mjpeg-in-embedded-vision-all-you-need-to-know/ 49 | -------------------------------------------------------------------------------- /2024.08.22.meshformer_vs_meshanything/ideas.txt: -------------------------------------------------------------------------------- 1 | https://www.emergentmind.com/papers/2408.07541 2 | https://huggingface.co/papers/2408.10161 3 | 4 | https://arxiv.org/pdf/2408.10198 5 | https://arxiv.org/pdf/2408.07060 6 | -------------------------------------------------------------------------------- /2024.08.22.meshformer_vs_meshanything/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.22.meshformer_vs_meshanything/thumbnail.png -------------------------------------------------------------------------------- /2024.08.30.diffusion_game_engine/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Diffusion Game Engine 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/JZgqQB4Aekc 8 | 9 | **X:** https://twitter.com/i/broadcasts/1OyKAZRDeDOGb 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | DIFFUSION MODELS ARE REAL-TIME GAME ENGINES 24 | https://arxiv.org/pdf/2408.14837 25 | 26 | Genie: Generative Interactive Environments 27 | https://arxiv.org/abs/2402.15391 28 | 29 | World Models 30 | https://arxiv.org/abs/1803.10122 31 | 32 | Billboarding 33 | https://youtu.be/SBdDt4BUIW0 34 | 35 | Jim Fan on GameNGen 36 | https://x.com/DrJimFan/status/1828813716810539417 37 | 38 | Karpathy on GameNGen 39 | https://x.com/karpathy/status/1807497426816946333 40 | 41 | Diffusion 42 | https://media.licdn.com/dms/image/D5612AQEB79ydygAGQQ/article-cover_image-shrink_720_1280/0/1702188892689 43 | https://miro.medium.com/v2/resize:fit:1400/1*NpQ282NJdOfxUsYlwLJplA.png 44 | 45 | High-Resolution Image Synthesis with Latent Diffusion Models 46 | https://arxiv.org/pdf/2112.10752 47 | 48 | SD1.4 training 49 | https://huggingface.co/CompVis/stable-diffusion-v1-4#training 50 | -------------------------------------------------------------------------------- /2024.08.30.diffusion_game_engine/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.30.diffusion_game_engine/thumbnail.png -------------------------------------------------------------------------------- /2024.09.06.flux_music/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # FLUX Music 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/U03zTGJdqdk 8 | 9 | **X:** https://twitter.com/i/broadcasts/1mnGeAPMYaoGX 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | FLUX that Plays Music 24 | https://arxiv.org/pdf/2409.00587 25 | 26 | https://github.com/feizc/FluxMusic 27 | 28 | https://github.com/black-forest-labs/flux 29 | https://github.com/haoheliu/AudioLDM2 30 | 31 | LOOPY: TAMING AUDIO-DRIVEN PORTRAIT AVATAR 32 | WITH LONG-TERM MOTION DEPENDENCY 33 | https://arxiv.org/pdf/2409.02634 34 | 35 | WAVTOKENIZER: AN EFFICIENT ACOUSTIC DISCRETE 36 | CODEC TOKENIZER FOR AUDIO LANGUAGE MODELING 37 | https://arxiv.org/pdf/2408.16532v1 38 | https://huggingface.co/papers/2408.16532 39 | 40 | Rectified Flow 41 | https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html 42 | 43 | Mel Spectogram 44 | https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53 45 | 46 | Fourier 47 | https://upload.wikimedia.org/wikipedia/commons/1/1a/Fourier_series_square_wave_circles_animation.gif 48 | 49 | RVQ 50 | https://drscotthawley.github.io/blog/posts/2023-06-12-RVQ.html 51 | 52 | Black Forest Labs 53 | https://blackforestlabs.ai/announcements/ 54 | 55 | Wav2Vec 56 | https://huggingface.co/facebook/wav2vec2-base-960h 57 | https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/ 58 | 59 | -------------------------------------------------------------------------------- /2024.09.06.flux_music/repos.md: -------------------------------------------------------------------------------- 1 | # Repo summaries (Cursor + Claude Sonnet 3.5) 2 | 3 | questions: 4 | 5 | ``` 6 | explain this repo 7 | what are the design patterns, frameworks, and dependencies used? 8 | what functionality is provided? training? configuration? serving? 9 | summarize our conversation into one paragraph, focus on technical details to provide a comprehensize overview of the repo 10 | ``` 11 | 12 | https://github.com/black-forest-labs/flux 13 | 14 | ``` 15 | This repository houses FLUX, a text-to-image and image-to-image generation model developed by Black Forest Labs. Built on a latent rectified flow transformer architecture, FLUX is available in three variants: FLUX.1 [pro], [dev], and [schnell], with varying capabilities and licensing terms. The project is structured modularly, separating core model logic from demo applications, and utilizes PyTorch as its primary machine learning framework. While training scripts are not publicly visible, the repo provides utilities for model loading, image processing, and inference. The project leverages a diverse set of dependencies, including torchvision and Pillow for image handling, FastAPI for serving, and Streamlit and Gradio for interactive demos. It employs Poetry for dependency management and incorporates development tools like ruff for linting and pyright for type checking. The codebase hints at the use of design patterns such as Factory and potentially Singleton, though a comprehensive configuration management system is not apparent. The focus appears to be on providing a user-friendly interface for the pre-trained models, with features like NSFW content detection, image watermarking, and customizable sampling parameters. 16 | ``` 17 | 18 | https://github.com/feizc/FluxMusic 19 | 20 | ``` 21 | This repository implements a text-to-music generation model using a UNet architecture with diffusion-based techniques. The core of the model is the UNetModel class, which incorporates attention mechanisms, spatial transformers, and residual blocks. The model uses a combination of convolutional layers, self-attention, and cross-attention to process audio data conditioned on text input. The architecture includes timestep embeddings, allowing for the diffusion process, and supports various conditioning methods such as class labels and extra embedding vectors (e.g., from CLAP). The implementation leverages PyTorch and includes optimizations like gradient checkpointing for memory efficiency. The codebase also features custom dataset classes, samplers for handling audio data, and utilities for model training and inference. Key components include the QKVAttention mechanisms, ResBlock implementations, and the overall UNet structure with its input, middle, and output blocks. The model supports different configurations (small, base, large) and includes features like FiLM conditioning and the ability to predict discrete codebook IDs. The training process is set up for distributed training using PyTorch's DistributedDataParallel, with support for mixed-precision training and EMA model updates. 22 | ``` 23 | 24 | https://github.com/haoheliu/AudioLDM2 25 | 26 | ``` 27 | AudioLDM2 is a sophisticated text-to-audio generation system built on PyTorch, leveraging advanced deep learning techniques such as latent diffusion models, transformer architectures, and audio processing methods. The repository implements a modular architecture centered around the LatentDiffusion class, incorporating components like CLAP for audio embeddings, AudioMAE for feature extraction, and HiFi-GAN as a vocoder. It supports various audio generation tasks, including general sounds, music, and speech, with models available for different sample rates (16kHz and 48kHz). The system utilizes YAML configurations and command-line arguments for flexible setup, and offers a serving interface through a Gradio-based web application. Key functionalities include text-to-audio generation, super-resolution, and inpainting, with utilities for audio processing, model management, and visualization. While the focus appears to be on inference and serving pre-trained models, the architecture suggests potential for training capabilities. The project demonstrates a comprehensive approach to audio generation, combining state-of-the-art ML models with practical deployment considerations. 28 | ``` -------------------------------------------------------------------------------- /2024.09.06.flux_music/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.06.flux_music/thumbnail.png -------------------------------------------------------------------------------- /2024.09.13.strawberry/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Strawberry 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/oQqOiwUhJkA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1yoKMyYdLkYKQ 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Building Math Agents with Multi-Turn Iterative Preference Learning 24 | https://arxiv.org/pdf/2409.02392 25 | 26 | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters 27 | https://arxiv.org/pdf/2408.03314 28 | 29 | STRATEGIST: LEARNING STRATEGIC SKILLS BY LLMS VIA BI-LEVEL TREE SEARCH 30 | https://www.arxiv.org/pdf/2408.10635 31 | 32 | MCTS 33 | https://vgarciasc.github.io/mcts-viz/ 34 | 35 | https://openai.com/index/learning-to-reason-with-llms/ 36 | 37 | https://x.com/DrJimFan/status/1834279865933332752 38 | 39 | https://x.com/polynoamial/status/1834280155730043108 40 | 41 | https://x.com/_jasonwei/status/1834278706522849788 42 | 43 | https://x.com/AiBreakfast/status/1834283499358146714 44 | 45 | https://x.com/karpathy/status/1821277264996352246 46 | 47 | https://www.yuzeh.com/data/agz-cost.html 48 | -------------------------------------------------------------------------------- /2024.09.13.strawberry/ideas.txt: -------------------------------------------------------------------------------- 1 | Building Math Agents with Multi-Turn Iterative Preference Learning 2 | https://arxiv.org/pdf/2409.02392 3 | 4 | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters 5 | https://arxiv.org/pdf/2408.03314 6 | 7 | STRATEGIST: LEARNING STRATEGIC SKILLS BY LLMS VIA BI-LEVEL TREE SEARCH 8 | https://www.arxiv.org/pdf/2408.10635 9 | 10 | mini omni 11 | https://github.com/gpt-omni/mini-omni 12 | -------------------------------------------------------------------------------- /2024.09.13.strawberry/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.13.strawberry/thumbnail.png -------------------------------------------------------------------------------- /2024.09.20.gaussian_robots/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Gaussian Robots 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/_jqhy-dr7Q4 8 | 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzRvoLZrKv 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos 24 | https://arxiv.org/pdf/2409.08353 25 | 26 | SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting 27 | https://arxiv.org/pdf/2409.10161 28 | 29 | AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius 30 | https://arxiv.org/pdf/2409.08669 31 | 32 | SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length 33 | https://arxiv.org/pdf/2409.07759 34 | 35 | Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video 36 | https://arxiv.org/pdf/2409.08189 37 | 38 | Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs 39 | https://arxiv.org/pdf/2409.07456 40 | 41 | Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering 42 | https://arxiv.org/pdf/2409.07441 43 | 44 | Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras 45 | https://arxiv.org/pdf/2409.04751 46 | 47 | Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos 48 | https://arxiv.org/pdf/2406.18717 49 | 50 | BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting 51 | https://arxiv.org/pdf/2409.10216 52 | 53 | gsplat 54 | https://github.com/nerfstudio-project/gsplat 55 | 56 | 3D Gaussian Splatting as Markov Chain Monte Carlo 57 | https://arxiv.org/abs/2404.09591 58 | 59 | View Dependent Podcast 60 | https://x.com/ViewDependent 61 | 62 | Langevin Dynamics 63 | https://henripal.github.io/blog/langevin 64 | -------------------------------------------------------------------------------- /2024.09.20.gaussian_robots/abstracts.txt: -------------------------------------------------------------------------------- 1 | Title: Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos 2 | Authors: Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu 3 | Published Date: 2024-09-12 18:33:13+00:00 4 | Last Updated: 2024-09-12 18:33:13+00:00 5 | DOI: N/A 6 | Primary Category: cs.GR 7 | arXiv URL: http://arxiv.org/abs/2409.08353v1 8 | PDF URL: http://arxiv.org/pdf/2409.08353v1 9 | Abstract: Volumetric video represents a transformative advancement in visual media, 10 | enabling users to freely navigate immersive virtual experiences and narrowing 11 | the gap between digital and real worlds. However, the need for extensive manual 12 | intervention to stabilize mesh sequences and the generation of excessively 13 | large assets in existing workflows impedes broader adoption. In this paper, we 14 | present a novel Gaussian-based approach, dubbed \textit{DualGS}, for real-time 15 | and high-fidelity playback of complex human performance with excellent 16 | compression ratios. Our key idea in DualGS is to separately represent motion 17 | and appearance using the corresponding skin and joint Gaussians. Such an 18 | explicit disentanglement can significantly reduce motion redundancy and enhance 19 | temporal coherence. We begin by initializing the DualGS and anchoring skin 20 | Gaussians to joint Gaussians at the first frame. Subsequently, we employ a 21 | coarse-to-fine training strategy for frame-by-frame human performance modeling. 22 | It includes a coarse alignment phase for overall motion prediction as well as a 23 | fine-grained optimization for robust tracking and high-fidelity rendering. To 24 | integrate volumetric video seamlessly into VR environments, we efficiently 25 | compress motion using entropy encoding and appearance using codec compression 26 | coupled with a persistent codebook. Our approach achieves a compression ratio 27 | of up to 120 times, only requiring approximately 350KB of storage per frame. We 28 | demonstrate the efficacy of our representation through photo-realistic, 29 | free-view experiences on VR headsets, enabling users to immersively watch 30 | musicians in performance and feel the rhythm of the notes at the performers' 31 | fingertips. 32 | 33 | -------------------------------------------------------------------------------- 34 | 35 | Title: SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting 36 | Authors: Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, Abhishesh Silwal 37 | Published Date: 2024-09-16 10:52:16+00:00 38 | Last Updated: 2024-09-16 10:52:16+00:00 39 | DOI: N/A 40 | Primary Category: cs.RO 41 | arXiv URL: http://arxiv.org/abs/2409.10161v1 42 | PDF URL: http://arxiv.org/pdf/2409.10161v1 43 | Abstract: Sim2Real transfer, particularly for manipulation policies relying on RGB 44 | images, remains a critical challenge in robotics due to the significant domain 45 | shift between synthetic and real-world visual data. In this paper, we propose 46 | SplatSim, a novel framework that leverages Gaussian Splatting as the primary 47 | rendering primitive to reduce the Sim2Real gap for RGB-based manipulation 48 | policies. By replacing traditional mesh representations with Gaussian Splats in 49 | simulators, SplatSim produces highly photorealistic synthetic data while 50 | maintaining the scalability and cost-efficiency of simulation. We demonstrate 51 | the effectiveness of our framework by training manipulation policies within 52 | SplatSim}and deploying them in the real world in a zero-shot manner, achieving 53 | an average success rate of 86.25%, compared to 97.5% for policies trained on 54 | real-world data. 55 | 56 | -------------------------------------------------------------------------------- 57 | 58 | Title: AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius 59 | Authors: Xinzhe Wang, Ran Yi, Lizhuang Ma 60 | Published Date: 2024-09-13 09:32:38+00:00 61 | Last Updated: 2024-09-13 09:32:38+00:00 62 | DOI: 10.1145/3680528.3687675 63 | Primary Category: cs.CV 64 | arXiv URL: http://arxiv.org/abs/2409.08669v1 65 | PDF URL: http://arxiv.org/pdf/2409.08669v1 66 | Abstract: 3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has 67 | achieved high-quality reconstruction and real-time rendering of complex scenes. 68 | However, the rasterization pipeline still suffers from unnecessary overhead 69 | resulting from avoidable serial Gaussian culling, and uneven load due to the 70 | distinct number of Gaussian to be rendered across pixels, which hinders wider 71 | promotion and application of 3DGS. In order to accelerate Gaussian splatting, 72 | we propose AdR-Gaussian, which moves part of serial culling in Render stage 73 | into the earlier Preprocess stage to enable parallel culling, employing 74 | adaptive radius to narrow the rendering pixel range for each Gaussian, and 75 | introduces a load balancing method to minimize thread waiting time during the 76 | pixel-parallel rendering. Our contributions are threefold, achieving a 77 | rendering speed of 310% while maintaining equivalent or even better quality 78 | than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile 79 | pairs of low splatting opacity based on an adaptive radius in the 80 | Gaussian-parallel Preprocess stage, which reduces the number of affected tile 81 | through the Gaussian bounding circle, thus reducing unnecessary overhead and 82 | achieving faster rendering speed. Secondly, we further propose early culling 83 | based on axis-aligned bounding box for Gaussian splatting, which achieves a 84 | more significant reduction in ineffective expenses by accurately calculating 85 | the Gaussian size in the 2D directions. Thirdly, we propose a balancing 86 | algorithm for pixel thread load, which compresses the information of heavy-load 87 | pixels to reduce thread waiting time, and enhance information of light-load 88 | pixels to hedge against rendering quality loss. Experiments on three datasets 89 | demonstrate that our algorithm can significantly improve the Gaussian Splatting 90 | rendering speed. 91 | 92 | -------------------------------------------------------------------------------- 93 | 94 | Title: SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length 95 | Authors: Bangya Liu, Suman Banerjee 96 | Published Date: 2024-09-12 05:33:15+00:00 97 | Last Updated: 2024-09-12 05:33:15+00:00 98 | DOI: N/A 99 | Primary Category: cs.MM 100 | arXiv URL: http://arxiv.org/abs/2409.07759v1 101 | PDF URL: http://arxiv.org/pdf/2409.07759v1 102 | Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant 103 | attention in computer vision and computer graphics due to its high rendering 104 | speed and remarkable quality. While extant research has endeavored to extend 105 | the application of 3DGS from static to dynamic scenes, such efforts have been 106 | consistently impeded by excessive model sizes, constraints on video duration, 107 | and content deviation. These limitations significantly compromise the 108 | streamability of dynamic 3D Gaussian models, thereby restricting their utility 109 | in downstream applications, including volumetric video, autonomous vehicle, and 110 | immersive technologies such as virtual, augmented, and mixed reality. 111 | This paper introduces SwinGS, a novel framework for training, delivering, and 112 | rendering volumetric video in a real-time streaming fashion. To address the 113 | aforementioned challenges and enhance streamability, SwinGS integrates 114 | spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to 115 | fit various 3D scenes across frames, in the meantime employing a sliding window 116 | captures Gaussian snapshots for each frame in an accumulative way. We implement 117 | a prototype of SwinGS and demonstrate its streamability across various datasets 118 | and scenes. Additionally, we develop an interactive WebGL viewer enabling 119 | real-time volumetric video playback on most devices with modern browsers, 120 | including smartphones and tablets. Experimental results show that SwinGS 121 | reduces transmission costs by 83.6% compared to previous work with ignorable 122 | compromise in PSNR. Moreover, SwinGS easily scales to long video sequences 123 | without compromising quality. 124 | 125 | -------------------------------------------------------------------------------- 126 | 127 | Title: Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video 128 | Authors: Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges 129 | Published Date: 2024-09-12 16:26:47+00:00 130 | Last Updated: 2024-09-12 16:26:47+00:00 131 | DOI: N/A 132 | Primary Category: cs.CV 133 | arXiv URL: http://arxiv.org/abs/2409.08189v1 134 | PDF URL: http://arxiv.org/pdf/2409.08189v1 135 | Abstract: We introduce Gaussian Garments, a novel approach for reconstructing realistic 136 | simulation-ready garment assets from multi-view videos. Our method represents 137 | garments with a combination of a 3D mesh and a Gaussian texture that encodes 138 | both the color and high-frequency surface details. This representation enables 139 | accurate registration of garment geometries to multi-view videos and helps 140 | disentangle albedo textures from lighting effects. Furthermore, we demonstrate 141 | how a pre-trained graph neural network (GNN) can be fine-tuned to replicate the 142 | real behavior of each garment. The reconstructed Gaussian Garments can be 143 | automatically combined into multi-garment outfits and animated with the 144 | fine-tuned GNN. 145 | 146 | -------------------------------------------------------------------------------- 147 | 148 | Title: Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs 149 | Authors: Sadra Safadoust, Fabio Tosi, Fatma Güney, Matteo Poggi 150 | Published Date: 2024-09-11 17:59:58+00:00 151 | Last Updated: 2024-09-11 17:59:58+00:00 152 | DOI: N/A 153 | Primary Category: cs.CV 154 | arXiv URL: http://arxiv.org/abs/2409.07456v1 155 | PDF URL: http://arxiv.org/pdf/2409.07456v1 156 | Abstract: 3D Gaussian Splatting (GS) significantly struggles to accurately represent 157 | the underlying 3D scene geometry, resulting in inaccuracies and floating 158 | artifacts when rendering depth maps. In this paper, we address this limitation, 159 | undertaking a comprehensive analysis of the integration of depth priors 160 | throughout the optimization process of Gaussian primitives, and present a novel 161 | strategy for this purpose. This latter dynamically exploits depth cues from a 162 | readily available stereo network, processing virtual stereo pairs rendered by 163 | the GS model itself during training and achieving consistent self-improvement 164 | of the scene representation. Experimental results on three popular datasets, 165 | breaking ground as the first to assess depth accuracy for these models, 166 | validate our findings. 167 | 168 | -------------------------------------------------------------------------------- 169 | 170 | Title: Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering 171 | Authors: Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura 172 | Published Date: 2024-09-11 17:40:21+00:00 173 | Last Updated: 2024-09-11 17:40:21+00:00 174 | DOI: N/A 175 | Primary Category: cs.GR 176 | arXiv URL: http://arxiv.org/abs/2409.07441v1 177 | PDF URL: http://arxiv.org/pdf/2409.07441v1 178 | Abstract: We propose GauFace, a novel Gaussian Splatting representation, tailored for 179 | efficient animation and rendering of physically-based facial assets. Leveraging 180 | strong geometric priors and constrained optimization, GauFace ensures a neat 181 | and structured Gaussian representation, delivering high fidelity and real-time 182 | facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. 183 | Then, we introduce TransGS, a diffusion transformer that instantly translates 184 | physically-based facial assets into the corresponding GauFace representations. 185 | Specifically, we adopt a patch-based pipeline to handle the vast number of 186 | Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme 187 | with UV positional encoding to ensure the throughput and rendering quality of 188 | GauFace assets generated by our TransGS. Once trained, TransGS can instantly 189 | translate facial assets with lighting conditions to GauFace representation, 190 | With the rich conditioning modalities, it also enables editing and animation 191 | capabilities reminiscent of traditional CG pipelines. 192 | We conduct extensive evaluations and user studies, compared to traditional 193 | offline and online renderers, as well as recent neural rendering methods, which 194 | demonstrate the superior performance of our approach for facial asset 195 | rendering. We also showcase diverse immersive applications of facial assets 196 | using our TransGS approach and GauFace representation, across various platforms 197 | like PCs, phones and even VR headsets. 198 | 199 | -------------------------------------------------------------------------------- 200 | 201 | Title: Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras 202 | Authors: Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Li Ma, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang 203 | Published Date: 2024-09-07 07:53:40+00:00 204 | Last Updated: 2024-09-11 07:04:58+00:00 205 | DOI: N/A 206 | Primary Category: cs.CV 207 | arXiv URL: http://arxiv.org/abs/2409.04751v2 208 | PDF URL: http://arxiv.org/pdf/2409.04751v2 209 | Abstract: Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high 210 | fidelity and real-time rendering. However, adapting 3DGS to different camera 211 | models, particularly fisheye lenses, poses challenges due to the unique 3D to 212 | 2D projection calculation. Additionally, there are inefficiencies in the 213 | tile-based splatting, especially for the extreme curvature and wide field of 214 | view of fisheye lenses, which are crucial for its broader real-life 215 | applications. To tackle these challenges, we introduce Fisheye-GS.This 216 | innovative method recalculates the projection transformation and its gradients 217 | for fisheye cameras. Our approach can be seamlessly integrated as a module into 218 | other efficient 3D rendering methods, emphasizing its extensibility, 219 | lightweight nature, and modular design. Since we only modified the projection 220 | component, it can also be easily adapted for use with different camera models. 221 | Compared to methods that train after undistortion, our approach demonstrates a 222 | clear improvement in visual quality. 223 | 224 | -------------------------------------------------------------------------------- 225 | 226 | Title: Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos 227 | Authors: Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas 228 | Published Date: 2024-06-26 19:37:07+00:00 229 | Last Updated: 2024-09-10 21:30:31+00:00 230 | DOI: 10.1145/3680528.3687681 231 | Primary Category: cs.CV 232 | arXiv URL: http://arxiv.org/abs/2406.18717v2 233 | PDF URL: http://arxiv.org/pdf/2406.18717v2 234 | Abstract: Gaussian splatting has become a popular representation for novel-view 235 | synthesis, exhibiting clear strengths in efficiency, photometric quality, and 236 | compositional edibility. Following its success, many works have extended 237 | Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while 238 | also tracking scene geometry far better than alternative representations. Yet, 239 | these methods assume dense multi-view videos as supervision. In this work, we 240 | are interested in extending the capability of Gaussian scene representations to 241 | casually captured monocular videos. We show that existing 4D Gaussian methods 242 | dramatically fail in this setup because the monocular setting is 243 | underconstrained. Building off this finding, we propose a method we call 244 | Dynamic Gaussian Marbles, which consist of three core modifications that target 245 | the difficulties of the monocular setting. First, we use isotropic Gaussian 246 | "marbles'', reducing the degrees of freedom of each Gaussian. Second, we employ 247 | a hierarchical divide and-conquer learning strategy to efficiently guide the 248 | optimization towards solutions with globally coherent motion. Finally, we add 249 | image-level and geometry-level priors into the optimization, including a 250 | tracking loss that takes advantage of recent progress in point tracking. By 251 | constraining the optimization, Dynamic Gaussian Marbles learns Gaussian 252 | trajectories that enable novel-view rendering and accurately capture the 3D 253 | motion of the scene elements. We evaluate on the Nvidia Dynamic Scenes dataset 254 | and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly 255 | outperforms other Gaussian baselines in quality, and is on-par with 256 | non-Gaussian representations, all while maintaining the efficiency, 257 | compositionality, editability, and tracking benefits of Gaussians. Our project 258 | page can be found here 259 | https://geometry.stanford.edu/projects/dynamic-gaussian-marbles.github.io/. 260 | 261 | -------------------------------------------------------------------------------- 262 | 263 | Title: BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting 264 | Authors: Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang 265 | Published Date: 2024-09-16 12:07:02+00:00 266 | Last Updated: 2024-09-16 12:07:02+00:00 267 | DOI: N/A 268 | Primary Category: cs.RO 269 | arXiv URL: http://arxiv.org/abs/2409.10216v1 270 | PDF URL: http://arxiv.org/pdf/2409.10216v1 271 | Abstract: Image-goal navigation enables a robot to reach the location where a target 272 | image was captured, using visual cues for guidance. However, current methods 273 | either rely heavily on data and computationally expensive learning-based 274 | approaches or lack efficiency in complex environments due to insufficient 275 | exploration strategies. To address these limitations, we propose Bayesian 276 | Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that 277 | formulates ImageNav as an optimal control problem within a model predictive 278 | control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to 279 | predict future observations, enabling efficient, real-time navigation decisions 280 | grounded in the robot's sensory experiences. By integrating Bayesian updates, 281 | our method dynamically refines the robot's strategy without requiring extensive 282 | prior experience or data. Our algorithm is validated through extensive 283 | simulations and physical experiments, showcasing its potential for embodied 284 | robot systems in visually complex scenarios. 285 | 286 | -------------------------------------------------------------------------------- 287 | 288 | -------------------------------------------------------------------------------- /2024.09.20.gaussian_robots/o1.txt: -------------------------------------------------------------------------------- 1 | The key directions and important points in recent Gaussian Splatting (GS) research are: 2 | 3 | Efficiency and Real-Time Rendering: 4 | 5 | Acceleration Techniques: Development of methods to speed up GS rendering pipelines, achieving significant increases in rendering speed (e.g., 310% faster) without compromising quality. 6 | Compression and Streaming: Innovations like DualGS and SwinGS enable high compression ratios (up to 120x) for GS representations, facilitating real-time volumetric video playback on devices like VR headsets and mobile platforms. 7 | Dynamic Scene Representation: 8 | 9 | Handling Dynamic Content: Extending GS to represent dynamic scenes and volumetric videos, addressing challenges like large model sizes and ensuring temporal coherence. 10 | Monocular Video Adaptation: Techniques are being developed to adapt GS to casual monocular videos, making high-quality 3D scene capture more accessible without multi-view setups. 11 | Integration with Machine Learning: 12 | 13 | Neural Rendering Enhancements: Combining GS with neural networks (e.g., graph neural networks, diffusion transformers) to improve rendering fidelity and automate asset creation like clothing and facial animations. 14 | Sim-to-Real Transfer in Robotics: Utilizing GS in simulators to generate photorealistic data for training robotic manipulation policies, enabling zero-shot transfer from simulation to real-world applications. 15 | Applications in Robotics and Navigation: 16 | 17 | Embodied Navigation: Employing GS as a scene prior in robotic navigation tasks, allowing robots to predict future observations and refine strategies in real-time without extensive prior data. 18 | Improved Geometric Representation: Incorporating depth cues and stereo networks into GS to enhance geometric accuracy and depth representation. 19 | Adaptation to Various Camera Models: 20 | 21 | Support for Specialized Cameras: Adapting GS to work with different camera models like fisheye lenses, expanding its applicability to a wider range of imaging devices. 22 | Enhancing Visual Quality: 23 | 24 | High-Fidelity Rendering: Focusing on improving photorealism by addressing challenges in lighting, texture representation, and accurate motion capture for immersive experiences. 25 | Predicted Future Directions in GS Research: 26 | 27 | Wider Adoption in Real-Time Applications: GS is expected to become integral in real-time rendering for gaming, VR/AR, and live virtual events due to its efficiency and high-quality outputs. 28 | 29 | Improved Dynamic Scene Handling: Research will enhance GS's capability to represent and render dynamic scenes from minimal input data, making it more practical for everyday use. 30 | 31 | Advanced AI Integration: Deeper integration with AI models will improve rendering quality, enable intelligent scene manipulation, and automate asset creation processes. 32 | 33 | Expansion in Robotics and Autonomous Systems: GS will be increasingly important in robotics for tasks requiring visual navigation and interaction with complex environments. 34 | 35 | Standardization and Tool Development: Creation of standardized tools and frameworks will facilitate easier integration of GS into various applications, promoting widespread adoption. 36 | 37 | Support for Diverse Sensors and Cameras: GS will be extended to incorporate data from various sensors and support more camera models, enhancing its versatility. 38 | 39 | Cloud-Based Rendering and Streaming: Enhanced compression and streaming capabilities will enable cloud-based GS rendering services, making high-quality 3D content accessible on devices with limited computational power. 40 | 41 | Improved Geometric Accuracy: Ongoing research will address current limitations in depth accuracy and geometric representation, potentially combining GS with other 3D reconstruction methods. 42 | 43 | Interactive Content Creation: GS will empower users to create and customize 3D content easily, fostering interactive applications where users can modify environments and objects in real-time. 44 | 45 | Summary: 46 | 47 | Gaussian Splatting research is rapidly advancing towards establishing GS as a foundational technology in computer graphics and vision. The focus is on: 48 | 49 | Efficiency: Achieving real-time rendering speeds and reducing storage requirements. 50 | Dynamic Scene Handling: Accurately representing and rendering moving scenes and objects. 51 | AI Integration: Leveraging machine learning to enhance quality and functionality. 52 | Broad Applicability: Extending GS to various applications like robotics, navigation, and interactive content creation. 53 | Versatility: Adapting GS to work with different sensors and camera models. 54 | These developments aim to make GS indispensable for rendering, simulation, robotics, and immersive digital experiences in the near future. -------------------------------------------------------------------------------- /2024.09.20.gaussian_robots/scrape_abstracts.py: -------------------------------------------------------------------------------- 1 | # pip install arxiv 2 | import arxiv 3 | 4 | # List of arXiv paper IDs 5 | paper_ids = [ 6 | "2409.08353", 7 | "2409.10161", 8 | "2409.08669", 9 | "2409.07759", 10 | "2409.08189", 11 | "2409.07456", 12 | "2409.07441", 13 | "2409.04751", 14 | "2406.18717", 15 | "2409.10216" 16 | ] 17 | 18 | # Output file 19 | output_file = "abstracts.txt" 20 | 21 | # Open the output file in write mode 22 | with open(output_file, "w") as f: 23 | # Iterate over each paper ID 24 | for paper_id in paper_ids: 25 | # Search for the paper on arXiv 26 | search = arxiv.Search(id_list=[paper_id]) 27 | 28 | # Fetch the result (this will only be one result for each paper_id) 29 | result = next(search.results()) 30 | 31 | # Write the paper details to the file 32 | f.write(f"Title: {result.title}\n") 33 | f.write(f"Authors: {', '.join([author.name for author in result.authors])}\n") 34 | f.write(f"Published Date: {result.published}\n") 35 | f.write(f"Last Updated: {result.updated}\n") 36 | f.write(f"DOI: {result.doi if result.doi else 'N/A'}\n") 37 | f.write(f"Primary Category: {result.primary_category}\n") 38 | f.write(f"arXiv URL: {result.entry_id}\n") 39 | f.write(f"PDF URL: {result.pdf_url}\n") 40 | f.write(f"Abstract: {result.summary}\n") 41 | f.write("\n" + "-"*80 + "\n\n") 42 | 43 | print(f"Abstracts and additional information have been written to {output_file}") 44 | -------------------------------------------------------------------------------- /2024.09.20.gaussian_robots/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.20.gaussian_robots/thumbnail.png -------------------------------------------------------------------------------- /2024.09.27.voice_mode/README.md: -------------------------------------------------------------------------------- 1 |  2 | 3 | # Voice Mode 4 | 5 | ### Links 6 | 7 | **YouTube:** https://youtube.com/live/x3fX7v7xjUA 8 | 9 | **X:** https://twitter.com/i/broadcasts/1ynJODzrQqVxR 10 | 11 | **Twitch:** 12 | 13 | **Substack:** 14 | 15 | **ResearchHub:** 16 | 17 | **TikTok:** 18 | 19 | **Reddit:** 20 | 21 | ### References 22 | 23 | Moshi: a speech-text foundation model for real-time dialogue 24 | https://kyutai.org/Moshi.pdf 25 | 26 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models 27 | https://web3.arxiv.org/abs/2409.06666 28 | 29 | torch.cuda.Event 30 | https://pytorch.org/docs/stable/generated/torch.cuda.Event.html 31 | https://github.com/kyutai-labs/moshi/blob/main/scripts/moshi_benchmark.py 32 | 33 | moshi chat 34 | https://moshi.chat/ 35 | 36 | voice mode system prompt 37 | https://x.com/elder_plinius/status/1838982899120431330 38 | 39 | whisper large 40 | https://huggingface.co/openai/whisper-large-v3 41 | 42 | WavLM 43 | https://huggingface.co/docs/transformers/en/model_doc/wavlm 44 | 45 | RQ-Transformer 46 | https://github.com/lucidrains/RQ-Transformer 47 | -------------------------------------------------------------------------------- /2024.09.27.voice_mode/latency_viz.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
4 |