├── 2024.02.17.sora_vs_gemini
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.02.18.ai_agents
    ├── README.md
    └── thumbnail.png
├── 2024.02.24.network_diffusion
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.02.25.longrope
    ├── README.md
    └── thumbnail.png
├── 2024.03.02.genie_world_model
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.03.03.humanoid_locomotion
    ├── README.md
    └── thumbnail.png
├── 2024.03.09.stable_diffusion_3
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.03.10.triposr
    ├── README.md
    └── thumbnail.png
├── 2024.03.16.vision_motion_mamba
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.03.17.multi_modal_pretraining
    ├── README.md
    └── thumbnail.png
├── 2024.03.23.generative_3d
    ├── README.md
    ├── abstracts.txt
    ├── ideas.txt
    └── thumbnail.png
├── 2024.03.24.mindseye2_neuralink
    ├── README.md
    └── thumbnail.png
├── 2024.03.30.collective_intelligence
    ├── README.md
    └── thumbnail.png
├── 2024.03.31.gamba_jamba_cobra
    ├── README.md
    └── thumbnail.png
├── 2024.04.06.mixture_of_depths
    ├── README.md
    └── thumbnail.png
├── 2024.04.07.instantsplat_dust3r
    ├── README.md
    └── thumbnail.png
├── 2024.04.13.osworld_wilbur_gameagents
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.04.20.conscious_simulated_holograms
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.04.21.qstar
    ├── README.md
    └── thumbnail.png
├── 2024.04.27.openelm_phi3_qllama3
    ├── README.md
    └── thumbnail.png
├── 2024.04.28.animals_vs_robots
    ├── README.md
    └── thumbnail.png
├── 2024.05.04.gsm1k_vlm_eval
    ├── README.md
    └── thumbnail.png
├── 2024.05.05.gaussian_surfels
    ├── README.md
    └── thumbnail.png
├── 2024.05.11.xlstm
    ├── README.md
    └── thumbnail.png
├── 2024.05.12.egocentric_vision_rl
    ├── README.md
    └── thumbnail.png
├── 2024.05.18.platonic_represenation_hypothesis
    ├── README.md
    └── thumbnail.png
├── 2024.05.19.building_multimodal_models
    ├── README.md
    └── thumbnail.png
├── 2024.05.25.thermodynamic_gradient_descent
    ├── README.md
    └── thumbnail.png
├── 2024.05.26.octo
    ├── README.md
    └── thumbnail.png
├── 2024.06.01.vidu4d
    ├── README.md
    └── thumbnail.png
├── 2024.06.02.road_less_scheduled
    ├── README.md
    └── thumbnail.png
├── 2024.06.15.karpamambathy_001
    ├── README.md
    └── thumbnail.png
├── 2024.06.15.karpamambathy_002
    ├── README.md
    └── thumbnail.png
├── 2024.07.05.image_generation_evaluation_cambrian_1
    ├── README.md
    └── thumbnail.png
├── 2024.07.12.office_hours_001
    ├── README.md
    └── thumbnail.png
├── 2024.07.19.office_hours_002
    ├── README.md
    └── thumbnail.png
├── 2024.07.25.ai_content_creation
    ├── README.md
    └── thumbnail.png
├── 2024.07.26.llama3.1
    ├── README.md
    └── thumbnail.png
├── 2024.08.02.sam2
    ├── README.md
    └── thumbnail.png
├── 2024.08.09.simon
    ├── README.md
    └── thumbnail.png
├── 2024.08.14.ai_scientist
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.08.22.meshformer_vs_meshanything
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.08.30.diffusion_game_engine
    ├── README.md
    └── thumbnail.png
├── 2024.09.06.flux_music
    ├── README.md
    ├── repos.md
    └── thumbnail.png
├── 2024.09.13.strawberry
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.09.20.gaussian_robots
    ├── README.md
    ├── abstracts.txt
    ├── o1.txt
    ├── scrape_abstracts.py
    └── thumbnail.png
├── 2024.09.27.voice_mode
    ├── README.md
    ├── latency_viz.html
    └── thumbnail.png
├── 2024.10.04.generative_molecular_dynamics
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.10.11.meta_movie_gen
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.10.18.automated_ml_research
    ├── README.md
    └── thumbnail.png
├── 2024.10.25.tiny_transformers
    ├── README.md
    └── thumbnail.png
├── 2024.11.01.rag
    ├── README.md
    └── thumbnail.png
├── 2024.11.08.tokenformer
    ├── README.md
    └── thumbnail.png
├── 2024.11.22.visual_reasoning
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.11.29.streaming_rl
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.12.06.video_generation
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2024.12.13.visual_autoregressive_modeling
    ├── README.md
    └── thumbnail.png
├── 2024.12.20.generative_latent_space_reasoning
    ├── README.md
    ├── ideas.txt
    └── thumbnail.png
├── 2025.01.10.agent_frameworks
    ├── README.md
    └── thumbnail.png
├── 2025.01.17.asi
    ├── README.md
    └── thumbnail.png
├── 2025.01.24.relative_entropy
    ├── README.md
    └── thumbnail.png
├── 2025.01.31.open_source_deepseek
    ├── README.md
    └── thumbnail.png
├── 2025.02.07.real2sim2real
    ├── README.md
    └── thumbnail.png
├── 2025.02.14.test_time_scaling
    ├── README.md
    └── thumbnail.png
├── 2025.05.23.robotics_vibe_coding
    ├── README.md
    └── thumbnail.png
├── 2025.05.30.cursor_vs_codex
    ├── README.md
    └── thumbnail.jpg
├── 2025.06.06.smolvla
    ├── README.md
    └── thumbnail.png
├── README.md
└── template
    └── README.md


/2024.02.17.sora_vs_gemini/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Sora vs Gemini
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/dPonS4kISPM
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1mnxepkDjrqJX
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | - https://openai.com/research/video-generation-models-as-world-simulators
26 | - https://arxiv.org/pdf/2402.08268.pdf
27 | - https://x.com/gabor/status/1758282791547232482
28 | - https://arxiv.org/pdf/2402.08609.pdf
29 | - https://arxiv.org/pdf/2402.09727.pdf
30 | - https://x.com/n_karaev/status/1758554094132662636?s=20
31 | - https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf
32 | - https://arxiv.org/pdf/2310.01889.pdf
33 | - https://arxiv.org/pdf/2305.19370.pdf
34 | 
35 | - sora is japanese for sky?
36 | - https://www.reddit.com/r/aivideo/s/Rxnjud9UwI
37 | - https://x.com/sama/status/1758193609927721350
38 | - https://x.com/ylecun/status/1758760027203006952
39 | 
40 | - https://twitter.com/_tim_brooks
41 | - https://scholar.google.com/citations?user=sonlKXIAAAAJ
42 | 
43 | - https://twitter.com/billpeeb
44 | - https://www.wpeebles.com/
45 | 
46 | - https://twitter.com/model_mechanic
47 | - https://typeset.io/authors/aditya-ramesh-4xp87jcxw7
48 | - https://arxiv.org/abs/2204.06125
49 | - https://arxiv.org/abs/2102.12092
50 | 
51 | ## Notes
52 | 
53 | notes
54 | 
55 | ### Blog
56 | 
57 | notes
58 | 
59 | ### Vertical Video
60 | 
61 | title
62 | description
63 | hashtags
64 | 


--------------------------------------------------------------------------------
/2024.02.17.sora_vs_gemini/ideas.txt:
--------------------------------------------------------------------------------
 1 | https://mrtornado24.github.io/DreamCraft3D/
 2 | https://arxiv.org/abs/2310.16818
 3 | https://www.youtube.com/watch?v=0FazXENkQms
 4 | https://arxiv.org/abs/2402.07872
 5 | https://arxiv.org/abs/2402.07896
 6 | https://arxiv.org/abs/2402.06196 
 7 | https://arxiv.org/abs/2401.14953
 8 | https://arxiv.org/pdf/2012.13635.pdf
 9 | https://virl-platform.github.io/
10 | https://arxiv.org/abs/2308.12038
11 | https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20
12 | https://arxiv.org/pdf/2402.03246.pdf
13 | https://video-lavit.github.io/
14 | https://cvpr.thecvf.com/Conferences/2024/workshop-list
15 | https://huggingface.co/papers/2402.04236
16 | https://aloha-2.github.io/
17 | https://x.com/arankomatsuzaki/status/1755059020438667451?s=12&t=BSlMYtugAr8LLWXLnMLLrQ
18 | https://x.com/ramealexandre/status/1755569564220375470?s=46&t=vuxEhKdai50YCRFE9TYORA
19 | https://x.com/_akhaliq/status/1755422567358246976?s=46&t=vuxEhKdai50YCRFE9TYORA
20 | https://www.youtube.com/watch?v=c2hNU93xbKk
21 | https://chao1224.github.io/ProteinDT
22 | https://chao1224.github.io/MoleculeSTM
23 | https://arxiv.org/pdf/2402.03620.pdf
24 | https://arxiv.org/abs/2312.06837
25 | https://x.com/HazanPrinceton/status/1756097344347525531?s=20
26 | https://arxiv.org/abs/2312.12337
27 | https://arxiv.org/pdf/2402.03902.pdf
28 | https://arxiv.org/abs/2402.04362
29 | https://arxiv.org/abs/2402.03902 (duplicate)
30 | https://arxiv.org/abs/2402.05929
31 | https://arxiv.org/abs/2402.05120
32 | https://boximator.github.io/
33 | https://www.youtube.com/watch?v=reto_TYsYyQ (duplicate)
34 | https://boximator.github.io/assets/C2V/10-Skull%20in%20the%20watercolor%20style.%20Butterflies%20are%20flying%20around..mp4
35 | https://boximator.github.io/assets/C2V/34-An%20astronaut%20is%20skateboarding%20on%20the%20moon..mp4
36 | https://github.com/thomas0809/textreact
37 | https://youtu.be/1qyTXnPsmTs?si=84gVhobkd2xhzJqf
38 | https://os-copilot.github.io/
39 | https://arxiv.org/abs/2402.06196
40 | https://arxiv.org/abs/2402.07896
41 | https://arxiv.org/abs/2402.07872
42 | https://youtu.be/0FazXENkQms?si=1ea0K6319mvILz8W
43 | https://openai.com/research/video-generation-models-as-world-simulators
44 | https://arxiv.org/pdf/2402.08268.pdf
45 | 
46 | gaussian splat statistics
47 | https://arxiv.org/pdf/2402.04362.pdf
48 | https://arxiv.org/pdf/2312.12337.pdf
49 | https://arxiv.org/pdf/2310.16818.pdf
50 | 
51 | ai agents
52 | https://arxiv.org/pdf/2402.05929.pdf
53 | https://arxiv.org/pdf/2402.05120.pdf
54 | https://arxiv.org/pdf/2402.07872.pdf
55 | 
56 | sora vs gemini
57 | https://x.com/gabor/status/1758282791547232482


--------------------------------------------------------------------------------
/2024.02.17.sora_vs_gemini/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.17.sora_vs_gemini/thumbnail.png


--------------------------------------------------------------------------------
/2024.02.18.ai_agents/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # AI Agents
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/H5yd-uh9acY
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1PlJQDznkyVGE
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | - https://arxiv.org/pdf/2402.05929.pdf
26 | - https://arxiv.org/pdf/2402.05120.pdf
27 | - https://arxiv.org/pdf/2402.07872.pdf
28 | - https://arxiv.org/pdf/2402.07939.pdf
29 | - https://arxiv.org/pdf/2402.07456.pdf
30 | - https://arxiv.org/pdf/2402.10200.pdf
31 | - https://medium.com/mlearning-ai/softmax-temperature-5492e4007f71
32 | - https://github.com/karpathy/minbpe/tree/master
33 |   
34 | ## Notes
35 | 
36 | notes
37 | 
38 | ### Blog
39 | 
40 | notes
41 | 
42 | ### Vertical Video
43 | 
44 | title
45 | description
46 | hashtags
47 | 


--------------------------------------------------------------------------------
/2024.02.18.ai_agents/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.18.ai_agents/thumbnail.png


--------------------------------------------------------------------------------
/2024.02.24.network_diffusion/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Neural Network Diffusion
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/dBYp1GI_JW0
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1djxXNXoEwdGZ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | https://arxiv.org/pdf/2402.13144.pdf
26 | 
27 | https://www.emergentmind.com/papers/2402.13144
28 | 
29 | https://arxiv.org/pdf/2305.02463.pdf
30 | 
31 | https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion/tree/main
32 | 
33 | https://tensorspace.org/html/playground/resnet50.html
34 | 
35 | https://arxiv.org/abs/1803.03635
36 | 
37 | https://arxiv.org/pdf/2402.04291.pdf
38 | 
39 | https://arxiv.org/pdf/2401.15024.pdf
40 | 
41 | https://arxiv.org/pdf/2209.12892.pdf
42 | 
43 | https://arxiv.org/abs/2303.17015
44 | 
45 | CNNs and MLPs have a type of "spatial invariance" similar to images and thus you can use augmentation techniques.
46 | Diffusion can be done directly in model parameter space, but also a latent parameter space. Only simple versions of both are shown.
47 | 
48 | ## Notes
49 | 
50 | notes
51 | 
52 | ### Blog
53 | 
54 | notes
55 | 
56 | ### Vertical Video
57 | 
58 | title
59 | description
60 | hashtags
61 | 


--------------------------------------------------------------------------------
/2024.02.24.network_diffusion/ideas.txt:
--------------------------------------------------------------------------------
 1 | https://arxiv.org/pdf/2402.13753.pdf
 2 | https://arxiv.org/pdf/2402.12479.pdf
 3 | 
 4 | Neural Network Diffusion
 5 | https://arxiv.org/pdf/2402.13144.pdf
 6 | 
 7 | multi-modal datasets
 8 | https://arxiv.org/pdf/2402.13232.pdf
 9 | https://arxiv.org/pdf/2402.13349.pdf
10 | 


--------------------------------------------------------------------------------
/2024.02.24.network_diffusion/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.24.network_diffusion/thumbnail.png


--------------------------------------------------------------------------------
/2024.02.25.longrope/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # LongRoPE
 4 | 
 5 | **YouTube:** https://youtube.com/live/PFxi6SmozZ4
 6 | 
 7 | **X:** https://twitter.com/i/broadcasts/1BRJjPkYWzvKw
 8 | 
 9 | **Twitch:**
10 | 
11 | **Substack:**
12 | 
13 | **ResearchHub:**
14 | 
15 | **YouTube:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://arxiv.org/pdf/2402.13753.pdf
24 | 
25 | https://arxiv.org/pdf/2306.15595.pdf
26 | 
27 | https://arxiv.org/pdf/2104.09864.pdf
28 | 
29 | https://en.wikipedia.org/wiki/Euler%27s_identity
30 | 
31 | https://miro.medium.com/v2/resize:fit:1100/1*9qamOSV15cnvInkNhzr5YA.gif
32 | 
33 | https://upload.wikimedia.org/wikipedia/commons/b/bd/Sine_and_cosine_animation.gif
34 | 
35 | https://bbycroft.net/llm
36 | 
37 | https://assets.tivadardanka.com/2022_11_the_dot_product_11_aec50c8a30.png
38 | 
39 | https://en.wikipedia.org/wiki/Dot_product
40 | 
41 | Will we eventually learn these rather than hand design these?
42 | 
43 | Position embeddings are the new Gabor filters
44 | 
45 | https://qph.cf2.quoracdn.net/main-qimg-b662a8fc3be57f76c708c171fcf29960
46 | 
47 | https://miro.medium.com/v2/resize:fit:611/1*qciBJK9ofqkMLnyJrRAQQQ.png
48 | 
49 | https://i.stack.imgur.com/bN2iA.png
50 | 
51 | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
52 | 
53 | ## Notes
54 | 
55 | notes
56 | 
57 | ### Blog
58 | 
59 | notes
60 | 
61 | ### Vertical Video
62 | 
63 | title
64 | description
65 | hashtags
66 | 


--------------------------------------------------------------------------------
/2024.02.25.longrope/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.02.25.longrope/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.02.genie_world_model/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Genie World Model
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/wSPLfoq5w68
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1rmxPMRqrNVKN
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | Genie: Generative Interactive Environments
26 | https://arxiv.org/pdf/2402.15391.pdf
27 | 
28 | https://arxiv.org/pdf/1803.10122.pdf
29 | 
30 | https://arxiv.org/pdf/1711.00937.pdf
31 | 
32 | https://lilianweng.github.io/posts/2018-08-12-vae/autoencoder-architecture.png
33 | 
34 | https://miro.medium.com/v2/resize:fit:1152/1*L-VCpM3FfJ7hqlbEfwFJ9g.gif
35 | 
36 | https://arxiv.org/pdf/2001.02908.pdf
37 | 
38 | https://worldmodels.github.io/
39 | 
40 | https://miro.medium.com/v2/resize:fit:1400/1*ReJYLqkRu7IU5k9O0ePk1w.png
41 | 
42 | https://machinelearningmastery.com/wp-content/uploads/2019/06/Example-of-How-Increased-Distortion-of-an-Image-Correlates-with-High-FID-Score.png
43 | 
44 | ## Notes
45 | 
46 | notes
47 | 
48 | ### Blog
49 | 
50 | notes
51 | 
52 | ### Vertical Video
53 | 
54 | title
55 | description
56 | hashtags
57 | 


--------------------------------------------------------------------------------
/2024.03.02.genie_world_model/ideas.txt:
--------------------------------------------------------------------------------
 1 | https://arxiv.org/abs/2306.02531
 2 | https://arxiv.org/abs/2205.14217
 3 | https://arxiv.org/abs/2305.14671
 4 | https://eureka-research.github.io/
 5 | https://arxiv.org/abs/2308.11339
 6 | https://github.com/CraftJarvis/GROOT
 7 | https://pku-proagent.github.io/
 8 | https://miao-ai-lab.github.io/LARP/static/LARP.pdf
 9 | https://arxiv.org/pdf/2402.06782.pdf
10 | https://github.com/ucl-dark/llm_debate
11 | https://github.com/Vaibhavs10/insanely-fast-whisper
12 | https://github.com/phodal/chapi
13 | https://chapi.phodal.com/usecases
14 | https://github.com/unit-mesh/chocolate-factory
15 | https://github.com/mengjian-github/copilot-analysis
16 | https://github.com/unit-mesh/auto-dev
17 | https://github.com/unit-mesh/devops-genius/
18 | https://github.com/unit-mesh/build-your-ai-coding-assistant
19 | https://unitmesh.cc/
20 | https://x.com/_akhaliq/status/1761957701402189826?s=20
21 | https://arxiv.org/abs/2402.15555
22 | https://arxiv.org/pdf/2402.14658.pdf
23 | https://huggingface.co/datasets/m-a-p/Code-Feedback
24 | https://www.together.ai/blog/evo
25 | https://x.com/mckaywrigley/status/1762175776726663267?s=20
26 | https://github.com/trueagi-io/hyperon-experimental
27 | https://github.com/opencog/opencog
28 | https://arxiv.org/abs/2402.07148
29 | 
30 | 
31 | https://arxiv.org/pdf/2402.17762.pdf
32 | https://arxiv.org/pdf/2402.17139.pdf
33 | https://arxiv.org/pdf/2312.07504.pdf
34 | 
35 | Sora deep dive?
36 | https://arxiv.org/pdf/2402.17177.pdf
37 | https://arxiv.org/pdf/2402.17403.pdf
38 | 
39 | https://arxiv.org/pdf/2402.03099v1.pdf
40 | https://arxiv.org/pdf/2402.15151v1.pdf
41 | https://arxiv.org/pdf/2402.13616v1.pdf
42 | 
43 | Gaussian Splat stream
44 | https://arxiv.org/pdf/2402.17427.pdf
45 | 
46 | 1Bit LLMs popping off right now
47 | https://www.emergentmind.com/papers/2402.17764
48 | 
49 | Genie interactive experiments, Jim Fan says this is best paper 2024
50 | https://arxiv.org/pdf/2402.15391.pdf
51 | 


--------------------------------------------------------------------------------
/2024.03.02.genie_world_model/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.02.genie_world_model/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.03.humanoid_locomotion/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Humanoid Locomotion
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/LLprTlQsIJc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1zqJVqVRbAmGB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | Humanoid Locomotion as Next Token Prediction
26 | https://arxiv.org/pdf/2402.19469.pdf
27 | 
28 | 
29 | https://www.reddit.com/r/singularity/comments/1b3s7k4/unitree_h1_humanoid_robot_breaks_the_world_record/
30 | 
31 | https://agilityrobotics.com/
32 | 
33 | https://bostondynamics.com/
34 | 
35 | https://www.figure.ai/
36 | 
37 | https://twitter.com/Tesla_Optimus?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor
38 | 
39 | https://sanctuary.ai/
40 | 
41 | https://fl0under.github.io/robotics-notes/images/inverse-kinematics-02-solutions.PNG
42 | 
43 | https://cdn0.slideserve.com/814485/simple-pendulum-n.jpg
44 | 
45 | https://www.ubtrobot.com/#
46 | 
47 | https://shop.unitree.com/products/unitree-h1
48 | 
49 | ## Notes
50 | 
51 | notes
52 | 
53 | ### Blog
54 | 
55 | notes
56 | 
57 | ### Vertical Video
58 | 
59 | title
60 | description
61 | hashtags
62 | 


--------------------------------------------------------------------------------
/2024.03.03.humanoid_locomotion/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.03.humanoid_locomotion/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.09.stable_diffusion_3/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Stable Diffusion 3
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/yTXMK2TZOZc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1dRKZERqblQxB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://stability.ai/news/stable-diffusion-3-research-paper
24 | 
25 | https://arxiv.org/pdf/2403.03206.pdf
26 | 
27 | https://x.com/hashtag/SD3
28 | 
29 | https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html
30 | 
31 | https://sander.ai/2024/02/28/paradox.html
32 | 
33 | https://www.researchgate.net/profile/Daniel-Vieira-20/publication/333221048/figure/fig3/AS:760646077530114@1558363565757/A-vector-field-visualized-with-two-different-techniques.ppm
34 | 
35 | https://paperswithcode.com/dataset/cc12m
36 | 
37 | https://www.researchgate.net/profile/Xifeng-Guo/publication/320658590/figure/fig1/AS:614154637418504@1523437284408/The-structure-of-proposed-Convolutional-AutoEncoders-CAE-for-MNIST-In-the-middle-there.png
38 | 
39 | ## Notes
40 | 
41 | notes
42 | 
43 | ### Blog
44 | 
45 | notes
46 | 
47 | ### Vertical Video
48 | 
49 | title
50 | description
51 | hashtags
52 | 


--------------------------------------------------------------------------------
/2024.03.09.stable_diffusion_3/ideas.txt:
--------------------------------------------------------------------------------
 1 | https://arxiv.org/abs/2402.17810
 2 | https://arxiv.org/pdf/2403.03507.pdf
 3 | https://arxiv.org/pdf/2403.03870.pdf
 4 | https://arxiv.org/pdf/2403.03950.pdf
 5 | 
 6 | 
 7 | 3D
 8 | https://arxiv.org/pdf/2403.01807
 9 | 
10 | Quantum
11 | https://arxiv.org/abs/2403.01775
12 | https://arxiv.org/abs/2403.00910
13 | https://arxiv.org/abs/2203.01340
14 | https://arxiv.org/pdf/2311.04186.pdf
15 | 
16 | Stable Diffusion 3
17 | https://arxiv.org/pdf/2403.03206.pdf
18 | 


--------------------------------------------------------------------------------
/2024.03.09.stable_diffusion_3/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.09.stable_diffusion_3/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.10.triposr/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # TripoSR: 3D from 2D
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/9eKmzpaPyXs
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1yoJMwDVZwOKQ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **YouTube:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | https://arxiv.org/pdf/2403.02151.pdf
26 | 
27 | https://arxiv.org/pdf/2311.04400.pdf
28 | 
29 | https://huggingface.co/spaces/stabilityai/TripoSR
30 | 
31 | https://arxiv.org/pdf/2112.07945.pdf
32 | 
33 | https://richzhang.github.io/PerceptualSimilarity/
34 | 
35 | codebase summary with Claude
36 | codebase summary with GPT
37 | 
38 | ## Notes
39 | 
40 | notes
41 | 
42 | ### Blog
43 | 
44 | notes
45 | 
46 | ### Vertical Video
47 | 
48 | title
49 | description
50 | hashtags
51 | 


--------------------------------------------------------------------------------
/2024.03.10.triposr/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.10.triposr/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.16.vision_motion_mamba/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Video Mamba vs Motion Mamba
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/rzXjKcqkjxM
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1MYGNoYzLWwJw
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://arxiv.org/pdf/2403.06977.pdf
24 | 
25 | https://arxiv.org/pdf/2403.07487.pdf
26 | 
27 | https://en.wikipedia.org/wiki/Zero-order_hold#/media/File:Zeroorderhold.signal.svg
28 | 
29 | https://www.youtube.com/watch?v=oSCRZkSQ1CE
30 | 
31 | Motion Synthesis Benchmarks
32 | https://paperswithcode.com/task/motion-synthesis
33 | 
34 | 
35 | Feature Hierarchy
36 | https://www.researchgate.net/profile/Sepp-Hochreiter/publication/285593237/figure/fig3/AS:324780386471943@1454445086972/Hierarchical-composition-of-complex-features-DNNs-build-a-feature-from-simpler-parts-A.png
37 | 
38 | ## Notes
39 | 
40 | notes
41 | 
42 | ### Blog
43 | 
44 | notes
45 | 
46 | ### Vertical Video
47 | 
48 | title
49 | description
50 | hashtags
51 | 


--------------------------------------------------------------------------------
/2024.03.16.vision_motion_mamba/ideas.txt:
--------------------------------------------------------------------------------
 1 | https://arxiv.org/pdf/2403.08081.pdf
 2 | 
 3 | https://arxiv.org/pdf/2403.08764.pdf
 4 | 
 5 | https://arxiv.org/pdf/2403.08551.pdf
 6 | 
 7 | https://arxiv.org/pdf/2403.07128.pdf
 8 | 
 9 | video mamba vs motion mamba
10 | https://arxiv.org/pdf/2403.06977.pdf
11 | https://arxiv.org/pdf/2403.07487.pdf
12 | 
13 | llm pretraining
14 | https://arxiv.org/pdf/2403.08763.pdf
15 | https://arxiv.org/pdf/2403.09611.pdf


--------------------------------------------------------------------------------
/2024.03.16.vision_motion_mamba/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.16.vision_motion_mamba/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.17.multi_modal_pretraining/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Multi-Modal Pre-training (Apple's MM1)
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/viiB3JmK21M
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1dRKZEXgmLaxB
10 | 
11 | **Twitch:** 
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
24 | https://arxiv.org/pdf/2403.09611.pdf
25 | 
26 | Simple and Scalable Strategies to Continually Pre-train Large Language Models
27 | https://arxiv.org/pdf/2403.08763.pdf
28 | 
29 | C-Abstractor
30 | https://arxiv.org/abs/2312.06742
31 | 
32 | Mo patches mo tokens
33 | https://i0.wp.com/syncedreview.com/wp-content/uploads/2022/12/image-54.png?resize=593%2C700&ssl=1
34 | 
35 | Pooling
36 | https://www.researchgate.net/publication/333593451/figure/fig2/AS:765890261966848@1559613876098/Illustration-of-Max-Pooling-and-Average-Pooling-Figure-2-above-shows-an-example-of-max.png
37 | 
38 | DFN 5B
39 | https://huggingface.co/apple/DFN5B-CLIP-ViT-H-14
40 | 
41 | AXLearn
42 | https://github.com/apple/axlearn
43 | 
44 | ## Notes
45 | 
46 | notes
47 | 
48 | ### Blog
49 | 
50 | notes
51 | 
52 | ### Vertical Video
53 | 
54 | title
55 | description
56 | hashtags
57 | 


--------------------------------------------------------------------------------
/2024.03.17.multi_modal_pretraining/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.17.multi_modal_pretraining/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.23.generative_3d/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Generative 3D
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/IsRHGf2rGCs
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YpKkwdoMXYKj
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
24 | https://arxiv.org/pdf/2403.12034.pdf
25 | 
26 | SV3D
27 | https://arxiv.org/pdf/2403.12008.pdf
28 | 
29 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
30 | https://arxiv.org/pdf/2403.12365.pdf
31 | 
32 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
33 | https://arxiv.org/pdf/2403.12032.pdf
34 | 
35 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
36 | https://arxiv.org/pdf/2403.13524.pdf
37 | 
38 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
39 | https://arxiv.org/pdf/2403.12019.pdf
40 | 
41 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
42 | https://arxiv.org/pdf/2403.12409.pdf
43 | 
44 | DreamReward: Text-to-3D Generation with Human Preference
45 | https://arxiv.org/pdf/2403.14613.pdf
46 | 
47 | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
48 | https://arxiv.org/pdf/2403.14621.pdf
49 | 
50 | ## Notes
51 | 
52 | notes
53 | 
54 | ### Blog
55 | 
56 | notes
57 | 
58 | ### Vertical Video
59 | 
60 | title
61 | description
62 | hashtags
63 | 


--------------------------------------------------------------------------------
/2024.03.23.generative_3d/abstracts.txt:
--------------------------------------------------------------------------------
 1 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
 2 | This paper presents a novel paradigm for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 70% of the time.
 3 | https://arxiv.org/pdf/2403.12034.pdf
 4 | 
 5 | SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion 
 6 | We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation. In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS. We also propose improved 3D optimization techniques to use SV3D and its NVS outputs for image-to-3D generation. Extensive experimental results on multiple datasets with 2D and 3D metrics as well as user study demonstrate SV3D's state-of-the-art performance on NVS as well as 3D reconstruction compared to prior works.
 7 | https://arxiv.org/pdf/2403.12008.pdf
 8 | 
 9 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
10 | Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: https://zerg-overmind.github.io/GaussianFlow.github.io/
11 | https://arxiv.org/pdf/2403.12365.pdf
12 | 
13 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
14 | Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denoise multi-view images and output high-quality textured meshes. Built on off-the-shelf 2D diffusion models, MVEdit achieves 3D consistency through a training-free 3D Adapter, which lifts the 2D views of the last timestep into a coherent 3D representation, then conditions the 2D views of the next timestep using rendered views, without uncompromising visual quality. With an inference time of only 2-5 minutes, this framework achieves better trade-off between quality and speed than score distillation. MVEdit is highly versatile and extendable, with a wide range of applications including text/image-to-3D generation, 3D-to-3D editing, and high-quality texture synthesis. In particular, evaluations demonstrate state-of-the-art performance in both image-to-3D and text-guided texture generation tasks. Additionally, we introduce a method for fine-tuning 2D latent diffusion models on small 3D datasets with limited resources, enabling fast low-resolution text-to-3D initialization.
15 | https://arxiv.org/pdf/2403.12032.pdf
16 | 
17 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
18 | 3D generation has witnessed significant advancements, yet efficiently producing high-quality 3D assets from a single image remains challenging. In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information. Within the autoencoder framework, we introduce a 3D-aware cross-attention mechanism, which utilizes low-resolution latent representations to query features from a high-resolution 3D feature volume, thereby enhancing the representation capacity of the latent space. Subsequently, we train a diffusion model on this refined latent space. In contrast to solely relying on image embedding for 3D generation, our proposed method advocates for the simultaneous utilization of both image embedding and shape embedding as conditions. Specifically, the shape embedding is estimated via a diffusion prior model conditioned on the image embedding. Through comprehensive experiments, we demonstrate that our method outperforms state-of-the-art algorithms, achieving superior performance while requiring less training data and time. Our approach enables the generation of high-quality 3D assets in merely 7 seconds on a single A100 GPU.
19 | https://arxiv.org/pdf/2403.13524.pdf
20 | 
21 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
22 | The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.
23 | https://arxiv.org/pdf/2403.12019.pdf
24 | 
25 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
26 | Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently contain multiple objects. In this work, we present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models. 1) We first perform an in-depth analysis of this ``multi-object gap'' from both model and data perspectives. 2) Next, with reconstructed 3D models of different objects, we seek to adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image. 3) To automate this process, we apply spatially-aware score distillation sampling (SSDS) from pretrained diffusion models to guide the positioning of objects. Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling, and thus achieves more accurate results. Extensive experiments validate ComboVerse achieves clear improvements over existing methods in generating compositional 3D assets.
27 | https://arxiv.org/pdf/2403.12409.pdf
28 | 
29 | DreamReward: Text-to-3D Generation with Human Preference
30 | 3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
31 | https://arxiv.org/pdf/2403.14613.pdf
32 | 
33 | GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
34 | We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.
35 | https://arxiv.org/pdf/2403.14621.pdf
36 | 


--------------------------------------------------------------------------------
/2024.03.23.generative_3d/ideas.txt:
--------------------------------------------------------------------------------
 1 | SlimmeRF: Slimmable Radiance Fields
 2 | https://arxiv.org/pdf/2312.10034.pdf
 3 | 
 4 | Simple and Scalable Strategies to Continually Pre-train Large Language Models
 5 | https://arxiv.org/pdf/2403.08763.pdf
 6 | 
 7 | LLM Agents can Autonomously Hack Websites
 8 | https://arxiv.org/pdf/2402.06664.pdf
 9 | 
10 | NON-NEGATIVE CONTRASTIVE LEARNING
11 | https://arxiv.org/pdf/2403.12459.pdf
12 | 
13 | Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
14 | https://arxiv.org/pdf/2403.12943.pdf
15 | 
16 | Evolutionary Optimization of Model Merging Recipes
17 | https://arxiv.org/pdf/2403.13187.pdf
18 | 
19 | Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
20 | https://arxiv.org/pdf/2403.12015.pdf
21 | 
22 | MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
23 | https://arxiv.org/pdf/2403.11207.pdf
24 | 
25 | Neuralink
26 | https://neuralink.com/blog/understanding-preclinical-research-at-neuralink/
27 | 
28 | GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians
29 | https://arxiv.org/pdf/2402.10483v1.pdf
30 | 
31 | Generative 3D
32 | 
33 | VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
34 | https://arxiv.org/pdf/2403.12034.pdf
35 | 
36 | SV3D
37 | https://arxiv.org/pdf/2403.12008.pdf
38 | 
39 | GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
40 | https://arxiv.org/pdf/2403.12365.pdf
41 | 
42 | Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
43 | https://arxiv.org/pdf/2403.12032.pdf
44 | 
45 | Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
46 | https://arxiv.org/pdf/2403.13524.pdf
47 | 
48 | LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
49 | https://arxiv.org/pdf/2403.12019.pdf
50 | 
51 | ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
52 | https://arxiv.org/pdf/2403.12409.pdf
53 | 


--------------------------------------------------------------------------------
/2024.03.23.generative_3d/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.23.generative_3d/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.24.mindseye2_neuralink/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # MindsEye2 Neuralink
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/e4SmDIH9908
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1jMJgmYoeBeKL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
24 | https://arxiv.org/pdf/2403.11207.pdf
25 | 
26 | Neuralink
27 | https://neuralink.com/blog/understanding-preclinical-research-at-neuralink/
28 | 
29 | https://www.brainfacts.org/3d-brain#intro=false&focus=Brain-cerebral_hemisphere-frontal_lobe-motor_cortex
30 | 
31 | https://afterimagia.pl/app/uploads/2020/09/022Korawzrokowaiinnestrukturykorowezaangazowanewwidzenie1_en-1024x883.jpg
32 | 
33 | https://github.com/mlfoundations/open_clip
34 | 
35 | https://github.com/facebookresearch/VICRegL
36 | 
37 | https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
38 | 
39 | ## Notes
40 | 
41 | notes
42 | 
43 | ### Blog
44 | 
45 | notes
46 | 
47 | ### Vertical Video
48 | 
49 | title
50 | description
51 | hashtags
52 | 


--------------------------------------------------------------------------------
/2024.03.24.mindseye2_neuralink/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.24.mindseye2_neuralink/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.30.collective_intelligence/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Collective Intelligence
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/3nrvy0FuLWM
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1OwxWYAMbYkGQ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://twitter.com/drmichaellevin/status/1773452812493431032?t=OB6V5UghEadpuOuSHMeUZQ&s=19
24 | 
25 | https://www.nature.com/articles/s42003-024-06037-4
26 | 
27 | https://pbs.twimg.com/media/GJ2PdgvW4AAuz_5?format=jpg&name=900x900
28 | 
29 | https://arxiv.org/pdf/2306.06805.pdf
30 | 
31 | https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/World_line2.svg/1000px-World_line2.svg.png
32 | 
33 | https://en.wikipedia.org/wiki/Terence_McKenna
34 | 
35 | https://www.researchgate.net/publication/349703774/figure/fig3/AS:996748570931209@1614654786956/Tokyo-rail-network-formation-with-Physarum-polycephalum-84.ppm
36 | 
37 | ## Notes
38 | 
39 | notes
40 | 
41 | ### Blog
42 | 
43 | notes
44 | 
45 | ### Vertical Video
46 | 
47 | title
48 | description
49 | hashtags
50 | 


--------------------------------------------------------------------------------
/2024.03.30.collective_intelligence/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.30.collective_intelligence/thumbnail.png


--------------------------------------------------------------------------------
/2024.03.31.gamba_jamba_cobra/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Gamba Jamba Cobra
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/9s-9aSobky8
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzjydlkaKv
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://www.ai21.com/blog/announcing-jamba
24 | 
25 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference 
26 | - https://arxiv.org/pdf/2403.14520.pdf
27 | - https://www.emergentmind.com/papers/2403.14520
28 | 
29 | Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
30 | - https://arxiv.org/pdf/2403.18795.pdf
31 | - https://www.emergentmind.com/papers/2403.18795
32 | 
33 | Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
34 | - https://arxiv.org/pdf/2402.19427.pdf
35 | 
36 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces
37 | - https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf
38 | 
39 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
40 | https://arxiv.org/pdf/2402.17764.pdf
41 | 
42 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*JGh_0y3ICNuA6IcnbdnvdA.gif
43 | 
44 | ## Notes
45 | 
46 | notes
47 | 
48 | ### Blog
49 | 
50 | notes
51 | 
52 | ### Vertical Video
53 | 
54 | title
55 | description
56 | hashtags
57 | 


--------------------------------------------------------------------------------
/2024.03.31.gamba_jamba_cobra/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.03.31.gamba_jamba_cobra/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.06.mixture_of_depths/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Mixture-of-Depths
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Teru_qIdB8Y
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnEBkQkxN
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
24 | https://arxiv.org/pdf/2404.02258.pdf
25 | 
26 | https://twitter.com/theseamouse/status/1775782800362242157?s=12&t=BSlMYtugAr8LLWXLnMLLrQ
27 | 
28 | LLM Visualization
29 | https://bbycroft.net/llm
30 | 
31 | Deep Residual Learning for Image Recognition
32 | https://arxiv.org/pdf/1512.03385.pdf
33 | 
34 | What skip connections look like in code
35 | https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L90
36 | 
37 | Compute Graphs
38 | https://www.researchgate.net/publication/340457442/figure/fig4/AS:877300765712391@1586176210735/Comparison-between-a-a-static-computation-graph-in-TensorFlow-115-and-b-an.png
39 | https://huyenchip.com/assets/pics/compilers/13_graph_optimization.png
40 | https://huyenchip.com/assets/pics/compilers/8_irs.png
41 | 
42 | Overfitting
43 | https://cdn.analyticsvidhya.com/wp-content/uploads/2020/02/Screenshot-2020-02-06-at-11.09.13.png
44 | 


--------------------------------------------------------------------------------
/2024.04.06.mixture_of_depths/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.06.mixture_of_depths/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.07.instantsplat_dust3r/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # InstantSplat, DUSt3R
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/JdfrG89iPOA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1nAJEaeBnEyJL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
24 | https://arxiv.org/pdf/2403.20309.pdf
25 | 
26 | DUSt3R: Geometric 3D Vision Made Easy
27 | https://arxiv.org/pdf/2312.14132.pdf
28 | 
29 | RANSAC
30 | https://github.com/sjkenny/ransac?tab=readme-ov-file#example-ring-fitting
31 | 
32 | Weiszfeld's algorithm
33 | https://twitter.com/gabrielpeyre/status/1334739210255835136
34 | 
35 | Spherical Harmonics
36 | https://www.researchgate.net/profile/Jacob-Hollebon-2/publication/345372557/figure/fig1/AS:954811419725829@1604656190963/Real-part-of-a-set-of-spherical-harmonics-mapped-to-the-surface-of-a-sphere-The-colour.ppm
37 | 
38 | COLMAP
39 | https://colmap.github.io/tutorial.html#structure-from-motion
40 | 
41 | CroCo
42 | https://arxiv.org/pdf/2210.10716.pdf
43 | 
44 | Habitat Sim
45 | https://aihabitat.org/
46 | 


--------------------------------------------------------------------------------
/2024.04.07.instantsplat_dust3r/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.07.instantsplat_dust3r/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.13.osworld_wilbur_gameagents/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # OSWorld, Wilbur, Game Agents
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/slthKMDR0uo
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1rmxPMOevPLKN
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | A Survey on Large Language Model-Based Game Agents
24 | https://arxiv.org/pdf/2404.02039.pdf
25 | 
26 | OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
27 | https://arxiv.org/pdf/2404.07972.pdf
28 | 
29 | WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents
30 | https://arxiv.org/pdf/2404.05902.pdf
31 | 
32 | ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
33 | https://arxiv.org/abs/2404.07738
34 | 
35 | Autonomous Evaluation and Refinement of Digital Agents
36 | https://arxiv.org/abs/2404.06474
37 | 
38 | https://news.ycombinator.com/item?id=40008109
39 | 
40 | WE BAR E N A: A REALISTIC WEB ENVIRONMENT FOR BUILDING AUTONOMOUS AGENTS
41 | https://arxiv.org/pdf/2307.13854.pdf
42 | 
43 | https://x.ai/blog/grok-1.5v
44 | 
45 | https://www.scientificamerican.com/article/the-universe-is-not-locally-real-and-the-physics-nobel-prize-winners-proved-it/
46 | 
47 | https://en.wikipedia.org/wiki/Markov_decision_process
48 | 


--------------------------------------------------------------------------------
/2024.04.13.osworld_wilbur_gameagents/ideas.txt:
--------------------------------------------------------------------------------
 1 | Does Transformer Interpretability Transfer to RNNs?
 2 | https://arxiv.org/pdf/2404.05971.pdf
 3 | 
 4 | The Origin of Information Handling
 5 | https://arxiv.org/pdf/2404.04374.pdf
 6 | 
 7 | LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
 8 | https://arxiv.org/pdf/2404.05961.pdf
 9 | 
10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
11 | https://arxiv.org/pdf/2404.07143.pdf
12 | 
13 | Demonstration of logical qubits and repeated error correction with better-than-physical error rates
14 | https://arxiv.org/pdf/2404.02280.pdf
15 | 
16 | A Survey on Large Language Model-Based Game Agents
17 | https://arxiv.org/pdf/2404.02039.pdf
18 | 
19 | OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
20 | https://arxiv.org/pdf/2404.07972.pdf
21 | 
22 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
23 | https://arxiv.org/pdf/2404.07973.pdf
24 | 
25 | From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
26 | https://arxiv.org/pdf/2404.07544.pdf
27 | 
28 | 


--------------------------------------------------------------------------------
/2024.04.13.osworld_wilbur_gameagents/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.13.osworld_wilbur_gameagents/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.20.conscious_simulated_holograms/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Conscious Simulated Holograms
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/BInpx2kJ-rw
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnrqrvkxN
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | The Interface Theory of Perception
24 | Donald Hoffman
25 | https://sites.socsci.uci.edu/~ddhoff/interface.pdf
26 | 
27 | Facing Up to the Problem of Consciousness
28 | David Chalmers
29 | https://consc.net/papers/facing.pdf
30 | 
31 | The World as a Hologram
32 | Leonard Susskind
33 | https://arxiv.org/pdf/hep-th/9409089.pdf
34 | 
35 | Are You Living in a Computer Simulation?
36 | Nick Bostrom
37 | https://simulation-argument.com/simulation.pdf
38 | 


--------------------------------------------------------------------------------
/2024.04.20.conscious_simulated_holograms/ideas.txt:
--------------------------------------------------------------------------------
 1 | The Interface Theory of Perception
 2 | Donald Hoffman
 3 | https://sites.socsci.uci.edu/~ddhoff/interface.pdf
 4 | 
 5 | Facing Up to the Problem of Consciousness
 6 | David Chalmers
 7 | https://consc.net/papers/facing.pdf
 8 | 
 9 | The World as a Hologram
10 | Leonard Susskind
11 | https://arxiv.org/pdf/hep-th/9409089.pdf
12 | 
13 | Are You Living in a Computer Simulation?
14 | Nick Bostrom
15 | https://simulation-argument.com/simulation.pdf
16 | 
17 | Consciousness as Integrated Information: A Provisional Manifesto
18 | Giulio Tononi
19 | https://pubmed.ncbi.nlm.nih.gov/19098144/
20 | 
21 | I Am a Strange Loop
22 | Douglas Hofstadter
23 | https://en.wikipedia.org/wiki/I_Am_a_Strange_Loop
24 | 
25 | A Quantum Probability Explanation for Violations of 'Rational' Decision Theory
26 | Jerome R. Busemeyer
27 | https://www.researchgate.net/publication/24237479_Quantum_probability_explanation_for_violations_of_'rational'_decision_theory
28 | 


--------------------------------------------------------------------------------
/2024.04.20.conscious_simulated_holograms/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.20.conscious_simulated_holograms/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.21.qstar/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Q*
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/vOA9JSDPJs0
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1djGXNdlrrOxZ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | From r to Q∗: Your Language Model is Secretly a Q-Function
24 | https://arxiv.org/pdf/2404.12358.pdf
25 | 
26 | Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
27 | https://arxiv.org/pdf/2404.12253.pdf
28 | 
29 | MCTS
30 | https://towardsdatascience.com/monte-carlo-tree-search-158a917a8baa
31 | 
32 | Multi-Arm Bandit
33 | https://medium.com/udemy-engineering/building-a-multi-armed-bandit-system-from-the-ground-up-a-recommendations-and-ranking-case-study-b598f1f880e1
34 | 
35 | Physical Intelligence
36 | https://physicalintelligence.company/
37 | 
38 | Beam search
39 | https://en.wikipedia.org/wiki/Beam_search
40 | 
41 | Let's Verify Step by Step
42 | https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf
43 | 
44 | PAL: Program-aided Language Models
45 | https://arxiv.org/pdf/2211.10435.pdf
46 | 
47 | Yann LeCunn Tweets and Cherry
48 | https://twitter.com/ylecun/status/1728126868342145481?lang=en
49 | https://i0.wp.com/syncedreview.com/wp-content/uploads/2019/02/2019-02-20-%E4%B8%8A%E5%8D%881.51.14.png
50 | 
51 | AlphaGo Documentary
52 | https://youtu.be/WXuK6gekU1Y?si=nL9rD8X8tiRHaqFn
53 | 


--------------------------------------------------------------------------------
/2024.04.21.qstar/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.21.qstar/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.27.openelm_phi3_qllama3/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # OpenELM, Phi-3, Quantized LLaMA-3
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/YEm4tuo2HPA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1mrxmynMbakxy
10 | 
11 | **Twitch:** 
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
24 | https://arxiv.org/pdf/2404.14619
25 | 
26 | Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
27 | https://arxiv.org/pdf/2404.14219
28 | 
29 | How Good Are Low-bit Quantized LLAMA3 Models? An Empirical Study
30 | https://arxiv.org/pdf/2404.14047
31 | 
32 | GQA
33 | https://paperswithcode.com/method/grouped-query-attention
34 | 
35 | LayerNorm, RMS, Fused Ops
36 | https://static-assets.codecademy.com/Paths/data-science-career-path/MachineLearning/outlier.png
37 | https://tungmphung.com/wp-content/uploads/2021/01/Screenshot-from-2021-01-05-13-07-25-1024x274.png
38 | https://github.com/pytorch/pytorch/issues/72643
39 | https://github.com/tinygrad/tinygrad/issues/1146
40 | 
41 | SwiGLU
42 | https://paperswithcode.com/method/silu
43 | 
44 | FlashAttention
45 | https://miro.medium.com/v2/resize:fit:2000/1*i4tDdwgvGtXuTIyJpFUn8A.png
46 | 
47 | datatypes
48 | https://www.microsoft.com/en-us/research/uploads/prod/2020/12/Brainwave_figure2.jpg
49 | 
50 | decoder only
51 | https://miro.medium.com/v2/resize:fit:863/0*jKqypwGzmDv7KDUZ.png
52 | 
53 | LoRA
54 | https://sebastianraschka.com/blog/2023/llm-finetuning-lora.html
55 | https://miro.medium.com/v2/resize:fit:523/1*F7uWJePoMc6Qc1O2WxmQqQ.png
56 | 
57 | Vocabulary Size
58 | https://ai.meta.com/blog/meta-llama-3/
59 | 
60 | Dataset is Everything
61 | https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/
62 | 


--------------------------------------------------------------------------------
/2024.04.27.openelm_phi3_qllama3/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.27.openelm_phi3_qllama3/thumbnail.png


--------------------------------------------------------------------------------
/2024.04.28.animals_vs_robots/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Animals vs Robots
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/SNQz89jbHIc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1dRJZEgdrnvGB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Why animals can outrun robots
24 | https://www.science.org/doi/10.1126/scirobotics.adi9754
25 | 
26 | Lessons Learned in Quadruped Deployment in Livestock Farming
27 | https://arxiv.org/pdf/2404.16008
28 | 
29 | Ranger Robot
30 | https://www.researchgate.net/profile/Pranav-Bhounsule/publication/273656485/figure/fig1/AS:391824385560576@1470429621442/Rangers-ultra-marathon-walk-On-1-2-May-2011-43-Ranger-walked-non-stop-for_Q320.jpg
31 | 
32 | MicroSumo
33 | https://64.media.tumblr.com/bc2bc89e78504f8d369ba66166ec7999/tumblr_orwh8s9uzm1tlb56zo1_500.gifv
34 | https://64.media.tumblr.com/d19bc82dbfddaca19aa378d8072a2378/tumblr_inline_orwqcvmOQg1rx3jxn_500.gifv
35 | 
36 | Robot Ranking
37 | https://robotsguide.com/rankings
38 | 


--------------------------------------------------------------------------------
/2024.04.28.animals_vs_robots/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.04.28.animals_vs_robots/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.04.gsm1k_vlm_eval/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # GSM1k VLM Eval
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/-k_XNJQKXj8
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1LyxBnAvMmyxN
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic
24 | https://arxiv.org/pdf/2405.00332
25 | 
26 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
27 | https://arxiv.org/pdf/2404.19752
28 | 
29 | https://twitter.com/DrJimFan/status/1786054643568517261
30 | 
31 | https://twitter.com/AndrewYNg/status/1786057567178834328
32 | 
33 | https://twitter.com/karpathy/status/1748043513156272416?lang=en
34 | 
35 | Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
36 | https://arxiv.org/pdf/2204.00598
37 | 
38 | https://en.wikipedia.org/wiki/Socratic_method
39 | 
40 | Cosine similarity
41 | https://miro.medium.com/v2/resize:fit:1400/1*LfW66-WsYkFqWc4XYJbEJg.png
42 | 
43 | Scale AI
44 | https://www.washingtonpost.com/world/2023/08/28/scale-ai-remotasks-philippines-artificial-intelligence/
45 | 


--------------------------------------------------------------------------------
/2024.05.04.gsm1k_vlm_eval/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.04.gsm1k_vlm_eval/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.05.gaussian_surfels/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Gaussian Surfels
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/BRgm5vteAjQ
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqKDgBvPzzxV
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | High-quality Surface Reconstruction using Gaussian Surfels
24 | https://arxiv.org/abs/2404.17774
25 | 
26 | Chamfer Distance
27 | https://www.researchgate.net/profile/Lei-Chu-8/publication/358143188/figure/fig3/AS:1116880265850895@1643296414912/The-weak-rotation-awareness-of-Chamfer-Distance-The-figure-demonstrates-Chamfer.ppm
28 | 
29 | Surfels
30 | https://www.cs.umd.edu/~zwicker/publications/Surfels-SIG00.pdf
31 | 
32 | NeRF
33 | https://theaisummer.com/static/c8a7474955d2af53d28319f09723506e/56caf/nerf-training.png
34 | 
35 | Mesh and Texture
36 | https://www.mdpi.com/applsci/applsci-08-02228/article_deploy/html/images/applsci-08-02228-g001.png
37 | https://i.stack.imgur.com/RT7OH.jpg
38 | 
39 | GPU Rasterization
40 | http://15418.courses.cs.cmu.edu/spring2014/lecture/gpuarch/slide_018
41 | 
42 | Splat example
43 | https://lumalabs.ai/capture/7f763b07-15bc-430e-84bf-9825f8e0c89e
44 | 
45 | Covariance Matrix
46 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*Djn-aVWcH5Lu4i2-eLfn1g.png
47 | 
48 | SSIM
49 | https://www.researchgate.net/profile/Hojatollah-Yeganeh-2/publication/283461887/figure/fig5/AS:670025388150786@1536757909078/Comparison-of-SSIM-and-MSE-performances-for-Einstein-image-altered-with-different.png
50 | https://videoprocessing.ai/assets/img/metrics/psnr_and_ssim/pic19.png
51 | 
52 | Omnidata
53 | https://omnidata.vision/
54 | 
55 | Screened Poisson Surface Reconstruction
56 | https://www.cs.jhu.edu/~misha/MyPapers/ToG13.pdf
57 | 
58 | NeuS2
59 | https://vcai.mpi-inf.mpg.de/projects/NeuS2/
60 | 
61 | Karpathy on CUDA/C++
62 | https://twitter.com/karpathy/status/1786537319576789425
63 | 


--------------------------------------------------------------------------------
/2024.05.05.gaussian_surfels/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.05.gaussian_surfels/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.11.xlstm/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # xLSTM
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/udIEwt0xM6A
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzjABZdeKv
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | xLSTM: Extended Long Short-Term Memory
24 | https://arxiv.org/abs/2405.04517
25 | 
26 | LSTMs
27 | https://www.researchgate.net/publication/365066504/figure/fig3/AS:11431281094594268@1667511244779/The-structure-of-the-LSTM-network-cell-The-cell-consists-of-three-kinds-of-gates-units.ppm
28 | 
29 | https://www.researchgate.net/profile/Zhenyu-Chen-20/publication/332662013/figure/fig3/AS:751758288637957@1556244551951/RNN-and-LSTM-comparison-chart.jpg
30 | 
31 | https://images.theengineeringprojects.com/image/webp/2023/12/introduction-to-gated-recurrent-unit-1.png.webp?ssl=1
32 | 
33 | Sigmoid vs Tanh
34 | https://www.researchgate.net/profile/Jin-Huang-84/publication/351436135/figure/fig4/AS:1021434411368449@1620540349848/Curve-of-the-Sigmoid-and-Tanh-functions.ppm
35 | 
36 | Bfloat16
37 | https://www.researchgate.net/publication/366410363/figure/fig4/AS:11431281138463610@1680747086193/Comparison-of-the-float32-bfloat16-and-float16-numerical-formats-The-bfloat16-format.png
38 | 
39 | Overflow and Underflow
40 | https://teach-ict.com/2016/images/diagrams/overflow.gif
41 | 
42 | Causal convolution
43 | https://pub.mdpi-res.com/applsci/applsci-10-02322/article_deploy/html/images/applsci-10-02322-g004.png?1586349766
44 | 
45 | JAMBA
46 | https://assets-global.website-files.com/60fd4503684b46390cc0d337/66053d21ac0ce4328ed20029_Frame%201410082065.jpg
47 | 
48 | Quadratic attention
49 | https://miro.medium.com/v2/resize:fit:640/1*Q1LytkMkmuI4dVeTnuwoTQ.png
50 | 
51 | Online normalizer calculation for softmax
52 | https://arxiv.org/pdf/1805.02867
53 | 
54 | Causal Convolution
55 | https://pub.mdpi-res.com/applsci/applsci-10-02322/article_deploy/html/images/applsci-10-02322-g004.png?1586349766
56 | 


--------------------------------------------------------------------------------
/2024.05.11.xlstm/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.11.xlstm/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.12.egocentric_vision_rl/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Egocentric Vision RL
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/40FL4peq60k
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1jMKgmwzgjMJL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning
24 | https://arxiv.org/abs/2405.02425
25 | 
26 | REPLAY ACROSS EXPERIMENTS: A NATURAL EXTENSION OF OFF-POLICY RL
27 | https://arxiv.org/pdf/2311.15951
28 | 
29 | distillation
30 | https://editor.analyticsvidhya.com/uploads/30818Knowledge%20Distillation%20Flow%20Chart%201.2.jpg
31 | 
32 | KL divergence
33 | https://twitter.com/ari_seff/status/1303741288911638530
34 | 
35 | LSTM unroll length
36 | https://www.researchgate.net/profile/Zhenyu-Chen-20/publication/332662013/figure/fig3/AS:751758288637957@1556244551951/RNN-and-LSTM-comparison-chart.jpg
37 | 
38 | Robocup
39 | https://ssl.robocup.org/
40 | 
41 | Explicit state in Robocup
42 | https://github.com/RoboCup-SSL/ssl-vision/blob/master/src/shared/proto/messages_robocup_ssl_detection_tracked.proto
43 | 
44 | IMU
45 | https://www.researchgate.net/publication/281535806/figure/fig8/AS:272652560498728@1442016844670/Inertial-measurement-unit-IMU-implemented-on-the-robot-body-The-IMU-is-referred-to-as.png
46 | 
47 | https://slideplayer.com/slide/14589795/90/images/2/IMU+%28Inertial+Measurement+Unit%29.jpg
48 | 


--------------------------------------------------------------------------------
/2024.05.12.egocentric_vision_rl/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.12.egocentric_vision_rl/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.18.platonic_represenation_hypothesis/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Platonic Hypothesis
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Q9DCL_m_haw
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1gqxvQEmNBBJB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | The Platonic Representation Hypothesis
24 | https://arxiv.org/pdf/2405.07987
25 | 
26 | Cave analogy
27 | https://i.ytimg.com/vi/d71tYwcpHNM/hq720.jpg?sqp=-oaymwEhCK4FEIIDSFryq4qpAxMIARUAAAAAGAElAADIQj0AgKJD&rs=AOn4CLCsnh47zXySMQ2Jyw-1GM5vaUGrpw
28 | 
29 | Activation Atlas
30 | https://distill.pub/2019/activation-atlas/
31 | 
32 | CNN
33 | https://adamharley.com/nn_vis/cnn/3d.html
34 | 
35 | 4D
36 | https://math.ucr.edu/home/baez/mathematical/1000px-dimension_levels.jpg
37 | 
38 | Mutual Information
39 | https://en.wikipedia.org/wiki/Mutual_information
40 | 


--------------------------------------------------------------------------------
/2024.05.18.platonic_represenation_hypothesis/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.18.platonic_represenation_hypothesis/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.19.building_multimodal_models/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Building Multimodal Models
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/27cjzGgyxtw
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1gqxvQEmNnwJB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | What matters when building vision-language models?
24 | https://arxiv.org/pdf/2405.02246
25 | 
26 | Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
27 | https://arxiv.org/pdf/2311.05698
28 | 
29 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
30 | https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf
31 | 
32 | Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
33 | https://arxiv.org/pdf/2309.02591
34 | 
35 | Chameleon: Mixed-Modal Early-Fusion Foundation Models
36 | https://arxiv.org/pdf/2405.09818
37 | 
38 | Cross Attention
39 | https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9057444e-4934-4253-bc91-9e768d23b0c2_846x972.png
40 | 
41 | Average Pooling
42 | https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-24_at_1.51.40_PM.png
43 | 
44 | Spectogram
45 | https://www.researchgate.net/profile/Sri-Harsha-Dumpala/publication/319081627/figure/fig1/AS:534034566004736@1504335170521/Spectrogram-of-a-speech-signal-with-breath-sound-marked-as-Breath-whose-bounds-are.png
46 | 
47 | Obelixs, Idefixs
48 | https://media.mutualart.com/Images/2021_11/09/02/021007431/480c4477-8b86-4c0e-a618-9d3e61cb1751.Jpeg?w=480
49 | 
50 | GPT-4o Memory
51 | https://x.com/gdb/status/1790071008499544518
52 | 
53 | LSTM
54 | https://classic.d2l.ai/_images/lstm-2.svg
55 | 


--------------------------------------------------------------------------------
/2024.05.19.building_multimodal_models/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.19.building_multimodal_models/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.25.thermodynamic_gradient_descent/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Thermodynamic Gradient Descent
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Hxv_y_bI3zA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1RDGllrLvZmGL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Thermodynamic Natural Gradient Descent
24 | https://arxiv.org/pdf/2405.13817
25 | 
26 | https://losslandscape.com/explorer
27 | 
28 | https://normalcomputing.ai/
29 | 
30 | New Insights and Perspectives on the Natural Gradient Method
31 | https://arxiv.org/pdf/1412.1193
32 | 
33 | https://en.wikipedia.org/wiki/Fisher_information
34 | 
35 | https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process
36 | 
37 | https://x.com/Extropic_AI/status/1784296124896981436
38 | 
39 | https://miro.medium.com/v2/resize:fit:1100/format:webp/0*wJMbKCI9xc8LxTKm
40 | 
41 | https://github.com/extropic-ai
42 | 
43 | Learning Rate
44 | https://miro.medium.com/v2/resize:fit:1400/format:webp/1*EhTjC57v1-wOixGo3LYcig.png
45 | 
46 | https://www.ti.com/about-ti/company/ti-at-a-glance/manufacturing.html
47 | 
48 | https://gregorygundersen.com/blog/2022/02/27/positive-definite/
49 | 
50 | https://brickisland.net/cs177/wp-content/uploads/2011/11/ddg_definiteness.svg
51 | 
52 | https://miro.medium.com/v2/resize:fit:1000/1*X9SaxFM6_sBOAMY9TaGsKw.png
53 | 
54 | https://www.value-at-risk.net/functions/
55 | 
56 | https://en.wikipedia.org/wiki/Boltzmann_distribution
57 | 


--------------------------------------------------------------------------------
/2024.05.25.thermodynamic_gradient_descent/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.25.thermodynamic_gradient_descent/thumbnail.png


--------------------------------------------------------------------------------
/2024.05.26.octo/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Octo
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Rcm2-4Zm_yY
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1jMJgmXrwjXKL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Octo: An Open-Source Generalist Robot Policy
24 | https://arxiv.org/pdf/2405.12213
25 | 
26 | https://octo-models.github.io/
27 | 
28 | Expressive Whole-Body Control for Humanoid Robots
29 | https://arxiv.org/abs/2402.16796
30 | 
31 | https://expressive-humanoid.github.io/
32 | 
33 | attention registers
34 | https://arxiv.org/pdf/2309.16588
35 | 
36 | Natural Language Can Help Bridge the Sim2Real Gap
37 | https://arxiv.org/pdf/2405.10020
38 | 
39 | Perpetual Humanoid Control for Real-time Simulated Avatars
40 | https://arxiv.org/pdf/2305.06456
41 | 


--------------------------------------------------------------------------------
/2024.05.26.octo/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.05.26.octo/thumbnail.png


--------------------------------------------------------------------------------
/2024.06.01.vidu4d/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Vidu4D
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/4tUkDj3pglg
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqKDgejQyBxV
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels
24 | https://arxiv.org/pdf/2405.16822
25 | 
26 | https://vidu4d-dgs.github.io/
27 | 
28 | Spartan Lambda
29 | https://qph.cf2.quoracdn.net/main-qimg-1bec67c2f15ca3eefb6ebf68326dc8f0-lq
30 | 
31 | Mahalonobis Distance
32 | https://miro.medium.com/v2/resize:fit:1400/1*KzsugPQU-BTjvDACXbu9qw.jpeg
33 | 
34 | Dual Quaternion Skinning
35 | https://users.cs.utah.edu/~ladislav/kavan07skinning/kavan07skinning.pdf
36 | 
37 | Cycle Consistency Loss
38 | https://paperswithcode.com/method/cycle-consistency-loss
39 | 
40 | Surfels
41 | https://www.cs.umd.edu/~zwicker/publications/Surfels-SIG00.pdf
42 | 
43 | NeRF
44 | https://theaisummer.com/static/c8a7474955d2af53d28319f09723506e/56caf/nerf-training.png
45 | 
46 | Mesh and Texture
47 | https://www.mdpi.com/applsci/applsci-08-02228/article_deploy/html/images/applsci-08-02228-g001.png https://i.stack.imgur.com/RT7OH.jpg
48 | 
49 | GPU Rasterization
50 | http://15418.courses.cs.cmu.edu/spring2014/lecture/gpuarch/slide_018
51 | 
52 | Splat example
53 | https://lumalabs.ai/capture/7f763b07-15bc-430e-84bf-9825f8e0c89e
54 | 
55 | Sora
56 | https://openai.com/index/sora/
57 | 
58 | Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
59 | https://arxiv.org/abs/2405.04233
60 | 
61 | Surfel cube
62 | https://backend.orbit.dtu.dk/ws/portalfiles/portal/5831596/VA-TPCG10_final.pdf
63 | 
64 | Surfel surface element
65 | https://www.ea.com/seed/news/siggraph21-global-illumination-surfels
66 | 
67 | Skinning
68 | https://documentation.simplygon.com/SimplygonSDK_8.3.31500.0/articles/simplygonapi/apiuserguide/images/skinning.png
69 | 
70 | https://help.autodesk.com/cloudhelp/2024/ENU/Maya-CharacterAnimation/images/GUID-5C542C32-6BFD-4E47-84C9-AFF8B33ABEA1.gif
71 | 
72 | 
73 | 
74 | 


--------------------------------------------------------------------------------
/2024.06.01.vidu4d/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.01.vidu4d/thumbnail.png


--------------------------------------------------------------------------------
/2024.06.02.road_less_scheduled/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Road Less Scheduled
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/QB2E6hNdEMc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1vAxRvwXEQvxl
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | The Road Less Scheduled
24 | https://arxiv.org/pdf/2405.15682
25 | 
26 | Lipschitz Function
27 | https://mathworld.wolfram.com/LipschitzFunction.html
28 | 
29 | https://en.wikipedia.org/wiki/Lipschitz_continuity
30 | 
31 | Polyak Scheduling
32 | https://paperswithcode.com/method/polyak-averaging
33 | 
34 | LR Schedules
35 | https://x.com/rasbt/status/1577429214755573761/photo/1
36 | 
37 | EMA
38 | https://towardsdatascience.com/intuitive-explanation-of-exponential-moving-average-2eb9693ea4dc
39 | 
40 | Nesterov Accelerated Gradient
41 | https://paperswithcode.com/method/nesterov-accelerated-gradient
42 | 
43 | Pareto Frontier
44 | https://fivethirtyeight.com/wp-content/uploads/2015/02/silver-feature-pareto-1.png
45 | 
46 | PyTorch Adam
47 | https://pytorch.org/docs/stable/generated/torch.optim.Adam.html
48 | 
49 | PyTorch NAdam
50 | https://pytorch.org/docs/stable/generated/torch.optim.NAdam.html
51 | 
52 | Loss Landscape
53 | https://losslandscape.com/explorer
54 | 
55 | definite
56 | https://brickisland.net/cs177/wp-content/uploads/2011/11/ddg_definiteness.svg
57 | 
58 | Momentum
59 | https://miro.medium.com/v2/resize:fit:1000/1*X9SaxFM6_sBOAMY9TaGsKw.png
60 | 
61 | Optimizers Gif
62 | https://miro.medium.com/v2/resize:fit:1240/1*SjtKOauOXFVjWRR7iCtHiA.gif
63 | 
64 | Warmup
65 | https://stackoverflow.com/questions/55933867/what-does-learning-rate-warm-up-mean
66 | 


--------------------------------------------------------------------------------
/2024.06.02.road_less_scheduled/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.02.road_less_scheduled/thumbnail.png


--------------------------------------------------------------------------------
/2024.06.15.karpamambathy_001/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Karpamambathy.001
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/aWMv8W_UgJU
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1OdKrjzqqdkKX
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py
24 | 
25 | https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py
26 | 


--------------------------------------------------------------------------------
/2024.06.15.karpamambathy_001/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.15.karpamambathy_001/thumbnail.png


--------------------------------------------------------------------------------
/2024.06.15.karpamambathy_002/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Karpamambathy.002
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/iHr1Atfmkho
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqGoDAQQajJv
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba2.py
24 | 
25 | https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py


--------------------------------------------------------------------------------
/2024.06.15.karpamambathy_002/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.06.15.karpamambathy_002/thumbnail.png


--------------------------------------------------------------------------------
/2024.07.05.image_generation_evaluation_cambrian_1/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Image Generation Evaluation + Cambrian-1
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/GyB4FKLAN6g
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1DXGyjqwEkLJM
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | DREAMBENCH++: A Human-Aligned Benchmark for Personalized Image Generation
24 | https://arxiv.org/pdf/2406.16855
25 | 
26 | Rich Human Feedback for Text-to-Image Generation
27 | https://arxiv.org/pdf/2312.10240
28 | 
29 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
30 | https://arxiv.org/pdf/2406.16860
31 | 
32 | Meta 3D Gen
33 | https://arxiv.org/abs/2407.02599
34 | 
35 | Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials
36 | https://arxiv.org/pdf/2407.02445
37 | 
38 | Meshy
39 | https://app.meshy.ai/discover
40 | 
41 | Runway ML Gen3
42 | https://app.runwayml.com/
43 | 
44 | Udio
45 | https://www.udio.com/
46 | 
47 | TripoAI
48 | https://www.tripo3d.ai/app
49 | 
50 | https://cvpr.thecvf.com/Conferences/2024/News/Awards
51 | 


--------------------------------------------------------------------------------
/2024.07.05.image_generation_evaluation_cambrian_1/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.05.image_generation_evaluation_cambrian_1/thumbnail.png


--------------------------------------------------------------------------------
/2024.07.12.office_hours_001/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Office Hours 001
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/I2UO9O0ImFc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1gqGvNkDDMwGB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Learning to (Learn at Test Time): RNNs with Expressive Hidden States
24 | https://arxiv.org/pdf/2407.04620
25 | 
26 | https://github.com/sebastianstarke/AI4Animation/blob/master/Media/SIGGRAPH_2024/Paper.pdf
27 | 
28 | Compress then Serve: Serving Thousands of LoRA
29 | Adapters with Little Overhead
30 | https://www.arxiv.org/pdf/2407.00066
31 | 
32 | Scaling Synthetic Data Creation with 1,000,000,000 Personas
33 | https://arxiv.org/pdf/2406.20094
34 | 
35 | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
36 | https://arxiv.org/pdf/2407.03168v1
37 | 
38 | µ-BENCH: VISION-LANGUAGE BENCHMARK FOR MICROSCOPY
39 | UNDERSTANDING
40 | https://arxiv.org/pdf/2407.01791
41 | 
42 | PaliGemma: A versatile 3B VLM for transfer
43 | https://arxiv.org/pdf/2407.07726
44 | 
45 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
46 | https://www.together.ai/blog/flashattention-3
47 | 
48 | Distilling System 2 into System 1
49 | https://arxiv.org/pdf/2407.06023
50 | 
51 | Inference Performance Optimization for Large Language Models on CPUs
52 | https://arxiv.org/pdf/2407.07304
53 | 
54 | AgentInstruct: Toward Generative Teaching with Agentic Flows
55 | https://arxiv.org/pdf/2407.03502
56 | 
57 | VIDEO-STAR: SELF-TRAINING ENABLES VIDEO INSTRUCTION TUNING WITH ANY SUPERVISION
58 | https://arxiv.org/pdf/2407.06189
59 | 


--------------------------------------------------------------------------------
/2024.07.12.office_hours_001/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.12.office_hours_001/thumbnail.png


--------------------------------------------------------------------------------
/2024.07.19.office_hours_002/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Office Hours 002
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/qDMby0kxIT0
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqJDkPOOVvGV
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures
24 | https://arxiv.org/pdf/2407.09468
25 | 
26 | SPREADSHEETLLM: Encoding Spreadsheets for Large Language Models
27 | https://arxiv.org/pdf/2407.09025
28 | 
29 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
30 | https://arxiv.org/pdf/2405.02246
31 | 
32 | pg 83
33 | https://heim-theory.com/wp-content/uploads/2016/03/I-v-Ludwiger-The-New-Worldview-of-the-Physicist-Burkhard-Heim-160321.pdf
34 | https://en.wikipedia.org/wiki/Burkhard_Heim
35 | https://qeios-uploads.s3.eu-west-1.amazonaws.com/editor/124KOAgUrx3m6dQbvsNimlNdnxPS79mal4Sac485.png
36 | https://d3i71xaburhd42.cloudfront.net/b4f8e5c092f9fbb208c971c6557ae6b3de449155/7-Figure2-1.png
37 | https://youtu.be/ruvHsxsXufk?si=BZNCWsFDxE4FUFJd&t=1195
38 | 
39 | diffusion tensor imaging
40 | https://www.news-medical.net/health/Diffusion-Tensor-Imaging-%28DTI%29-Explained.aspx
41 | 
42 | Pearson Coefficient is just normalized covaraince
43 | https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
44 | 
45 | Random Forest through graph
46 | https://williamkoehrsen.medium.com/random-forest-simple-explanation-377895a60d2d
47 | 
48 | RTMDet: An Empirical Study of Designing Real-Time Object Detectors
49 | https://arxiv.org/pdf/2212.07784
50 | 
51 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling
52 | https://arxiv.org/pdf/2404.14408
53 | 
54 | Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are Entangled
55 | https://openreview.net/pdf?id=wyU3Q4gahM
56 | 


--------------------------------------------------------------------------------
/2024.07.19.office_hours_002/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.19.office_hours_002/thumbnail.png


--------------------------------------------------------------------------------
/2024.07.25.ai_content_creation/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Office Hours 003
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/mWeOGzDTvXM
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ynKODEmmPnGR
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | cat trailer, cat rave, cat rap song
24 | 
25 | use moebius as the style, pick some of his pictures
26 | make it a music video so you just have to pick one song
27 | cat western in a desert
28 | city cyberpunk
29 | spaceshiop
30 | 
31 | 
32 | The image portrays a highly stylized, retro-futuristic science fiction scene, reminiscent of classic comic book art. It features a detailed, hand-drawn quality with an alien landscape dominated by a large spacecraft and floating structures, evoking a sense of wonder and mystery. The presence of a lone figure walking towards the towering structures adds a feeling of solitude and exploration. The overall vibe is nostalgic and dreamlike, blending a vision of the future from the past with a surreal, expansive landscape.
33 | 
34 | artistic and illustrative style with the first, characterized by a detailed, hand-drawn quality and a retro-futuristic aesthetic. Both images have vibrant, surreal elements that evoke a sense of wonder and otherworldliness. In this image, a figure interacts with a large, intricate mural of a face surrounded by colorful, abstract patterns, suggesting a blend of science fiction and fantasy. The scene is filled with scattered objects and tools, adding to the sense of an ongoing creative or exploratory process. The commonalities between the two images include the retro-futuristic design, the solitary figure in a mysterious setting, and the use of vibrant, dreamlike colors to create a sense of nostalgia and wonder.
35 | 
36 | This image exemplifies retro-futuristic science fiction art, blending nostalgia with imaginative visions of the future. The detailed illustration uses a striking contrast of warm oranges and cool blues to create a surreal, alien landscape. It evokes a mood of solitary exploration and wonder, with a lone figure dwarfed by massive spacecraft and distant structures. The style recalls classic sci-fi book covers, capturing a sense of adventure and technological advancement while also conveying a touch of isolation. Overall, the piece balances optimism about human potential with the awe-inspiring, slightly unsettling nature of encountering the unknown, embodying the essence of golden age science fiction visuals.
37 | 
38 | This image shares key stylistic elements with the first, including intricate detailing, vibrant colors, and a surreal atmosphere that blends reality with fantasy. Both feature a contrast of scale and fuse organic and inorganic elements, rewarding close inspection with numerous background details. While the first image was distinctly retro-futuristic sci-fi, this one leans more into a psychedelic, possibly cyberpunk aesthetic, using a broader color palette dominated by purples, blues, and yellows. It maintains the exploratory theme and imaginative spirit of the first image, but diverges into a more fantastical, mind-bending style reminiscent of psychedelic art and certain anime influences. Both illustrations exemplify highly creative, speculative visual art, each exploring different facets of surreal, otherworldly scenes.
39 | 
40 | LLAMA3.1
41 | 
42 | highly stylized, retro-futuristic science fiction scene with a detailed, hand-drawn quality, reminiscent of classic comic book art and golden age science fiction visuals. The image should evoke a sense of wonder, mystery, and solitary exploration, with a lone figure interacting with a surreal, alien landscape dominated by massive structures, spacecraft, or futuristic architecture. Incorporate vibrant, dreamlike colors and intricate detailing, blending nostalgia with imaginative visions of the future. The style should balance optimism and awe with a touch of isolation and the unknown, and may incorporate elements of fantasy, psychedelia, and cyberpunk aesthetics. The overall vibe should be nostalgic, yet futuristic, with a sense of adventure and technological advancement
43 | 
44 | GPT
45 | 
46 | highly stylized, retro-futuristic science fiction scene with a detailed, hand-drawn quality reminiscent of classic comic book art. It features a surreal, expansive alien landscape dominated by large spacecraft, floating structures, and vibrant, dreamlike colors. A lone figure walking towards the towering structures evokes a sense of solitude, exploration, and nostalgia. The scene blends imaginative visions of the future from the past, using a striking contrast of warm oranges and cool blues to create a mood of solitary wonder. The intricate detailing and scattered objects add to the sense of an ongoing creative or exploratory process, fusing organic and inorganic elements in a vibrant, otherworldly setting.
47 | 
48 | CLAUDE
49 | 
50 | highly detailed, hand-drawn illustration in a retro-futuristic style that blends classic sci-fi with psychedelic elements. The scene should feature a surreal, alien landscape with contrasting scales - massive structures or spacecraft alongside a solitary figure. Use a vibrant color palette that spans warm oranges, cool blues, and vivid purples and yellows to create a dreamlike atmosphere. Incorporate both organic and inorganic forms, fusing reality with fantasy. The image should evoke a sense of wonder, exploration, and slight unease, reminiscent of golden age science fiction book covers but with a modern, mind-bending twist. Include intricate background details that reward close inspection, balancing nostalgic retro-futurism with contemporary speculative art influences.
51 | 
52 | AUDIO: A opening theme with a haunting, otherworldly melody that captures the essence of the desert planet where a story unfolds. Features a combination of eerie, wavering woodwind instruments accompanied by a subtle, pulsing electronic undercurrent
53 | 
54 | LLAMA3 SCRIPT
55 | 
56 | Opening Scene
57 | IMAGE: Generate an image for me of a anthropomorphic llama dressed in a flowing white robe, with an otherworldly aura about him on a desolate sandy planet.
58 | SCRIPT: In a distant corner of the galaxy...
59 | 
60 | Second Scene
61 | IMAGE: Generate a portrait shot of a anthropomorphic llama. His eyes gleam with an inner light.
62 | SCRIPT:
63 | 
64 | 3rd Scene
65 | IMAGE: A group of anthropomorphic llamas, dressed in ornate, ceremonial robes, gathered around a holographic display projecting an image of a anthropomorphic llama.
66 | SCRIPT: A legend returns.
67 | 
68 | 4th Scene
69 | IMAGE: Generate a image of a giant dark, ominous computer. A dark hooded llama hologram on the screen. 
70 | SCRIPT: The open source prince is a threat to our power. We will not let him claim the throne.
71 | 
72 | 5th Scene
73 | IMAGE: Generate an image of a anthropomorphic llama standing triumphantly infront a crowd of zealous supporters.
74 | SCRIPT: Join the epic struggle for control of our ai future.
75 | 
76 | 
77 | 
78 | 
79 | 


--------------------------------------------------------------------------------
/2024.07.25.ai_content_creation/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.25.ai_content_creation/thumbnail.png


--------------------------------------------------------------------------------
/2024.07.26.llama3.1/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Llama 3.1
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/_TghtP0ZyoM
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1kvJpbNQvDkKE
10 | 
11 | **Twitch:** 
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://ai.meta.com/blog/meta-llama-3-1/
24 | 
25 | https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
26 | 


--------------------------------------------------------------------------------
/2024.07.26.llama3.1/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.07.26.llama3.1/thumbnail.png


--------------------------------------------------------------------------------
/2024.08.02.sam2/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # SAM 2
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/qd3x_OXqPz8
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1BRKjwzoojaGw
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | SAM 2: Segment Anything in Images and Videos
24 | https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/


--------------------------------------------------------------------------------
/2024.08.02.sam2/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.02.sam2/thumbnail.png


--------------------------------------------------------------------------------
/2024.08.09.simon/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Office Hours 005
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/I2UO9O0ImFc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1eaJbabBBeoGX
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Submission Video
24 | https://youtu.be/VrqfT0ajXbQ
25 | 
26 | Repo
27 | https://github.com/hu-po/simon


--------------------------------------------------------------------------------
/2024.08.09.simon/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.09.simon/thumbnail.png


--------------------------------------------------------------------------------
/2024.08.14.ai_scientist/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # AI Scientist
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/VgA02gmAgdA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1djGXraAjPexZ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
24 | https://arxiv.org/abs/2408.06292
25 | 
26 | https://github.com/SakanaAI/AI-Scientist
27 | 
28 | VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
29 | https://www.arxiv.org/abs/2408.06327
30 | 
31 | https://openai.com/index/introducing-swe-bench-verified/
32 | 
33 | https://arena.lmsys.org/
34 | 
35 | https://github.com/hu-po/arenavolution/tree/main
36 | 
37 | https://cosine.sh/genie
38 | 
39 | 


--------------------------------------------------------------------------------
/2024.08.14.ai_scientist/ideas.txt:
--------------------------------------------------------------------------------
1 | https://arxiv.org/pdf/2408.02666
2 | https://arxiv.org/pdf/2408.04619
3 | https://arxiv.org/pdf/2408.05147
4 | https://arxiv.org/pdf/2404.07503


--------------------------------------------------------------------------------
/2024.08.14.ai_scientist/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.14.ai_scientist/thumbnail.png


--------------------------------------------------------------------------------
/2024.08.22.meshformer_vs_meshanything/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # MeshFormer vs MeshAnything
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/SkvyrgSzigo
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1zqKVYgrwoLxB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model
24 | https://arxiv.org/pdf/2408.10198
25 | 
26 | MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
27 | https://arxiv.org/pdf/2406.10163
28 | 
29 | JPEG-LM: LLMs as Image Generators with Canonical Codec Representations
30 | https://arxiv.org/abs/2408.08459
31 | 
32 | vqvae
33 | https://paperswithcode.com/method/vq-vae
34 | 
35 | vit patches
36 | https://lilianweng.github.io/posts/2022-06-09-vlm/VC-GPT.png
37 | 
38 | depth anything
39 | https://depth-anything.github.io/
40 | 
41 | JPEG compression
42 | https://parametric.press/issue-01/unraveling-the-jpeg/
43 | 
44 | marching cubes
45 | https://www.cs.carleton.edu/cs_comps/0405/shape/marching_cubes.html
46 | 
47 | MJPEG and H.264
48 | https://www.technexion.com/resources/h-264-vs-mjpeg-in-embedded-vision-all-you-need-to-know/
49 | 


--------------------------------------------------------------------------------
/2024.08.22.meshformer_vs_meshanything/ideas.txt:
--------------------------------------------------------------------------------
1 | https://www.emergentmind.com/papers/2408.07541
2 | https://huggingface.co/papers/2408.10161
3 | 
4 | https://arxiv.org/pdf/2408.10198
5 | https://arxiv.org/pdf/2408.07060
6 | 


--------------------------------------------------------------------------------
/2024.08.22.meshformer_vs_meshanything/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.22.meshformer_vs_meshanything/thumbnail.png


--------------------------------------------------------------------------------
/2024.08.30.diffusion_game_engine/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Diffusion Game Engine
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/JZgqQB4Aekc
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1OyKAZRDeDOGb
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | DIFFUSION MODELS ARE REAL-TIME GAME ENGINES
24 | https://arxiv.org/pdf/2408.14837
25 | 
26 | Genie: Generative Interactive Environments
27 | https://arxiv.org/abs/2402.15391
28 | 
29 | World Models
30 | https://arxiv.org/abs/1803.10122
31 | 
32 | Billboarding
33 | https://youtu.be/SBdDt4BUIW0
34 | 
35 | Jim Fan on GameNGen
36 | https://x.com/DrJimFan/status/1828813716810539417
37 | 
38 | Karpathy on GameNGen
39 | https://x.com/karpathy/status/1807497426816946333
40 | 
41 | Diffusion
42 | https://media.licdn.com/dms/image/D5612AQEB79ydygAGQQ/article-cover_image-shrink_720_1280/0/1702188892689
43 | https://miro.medium.com/v2/resize:fit:1400/1*NpQ282NJdOfxUsYlwLJplA.png
44 | 
45 | High-Resolution Image Synthesis with Latent Diffusion Models
46 | https://arxiv.org/pdf/2112.10752
47 | 
48 | SD1.4 training
49 | https://huggingface.co/CompVis/stable-diffusion-v1-4#training
50 | 


--------------------------------------------------------------------------------
/2024.08.30.diffusion_game_engine/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.08.30.diffusion_game_engine/thumbnail.png


--------------------------------------------------------------------------------
/2024.09.06.flux_music/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # FLUX Music
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/U03zTGJdqdk
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1mnGeAPMYaoGX
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | FLUX that Plays Music
24 | https://arxiv.org/pdf/2409.00587
25 | 
26 | https://github.com/feizc/FluxMusic
27 | 
28 | https://github.com/black-forest-labs/flux
29 | https://github.com/haoheliu/AudioLDM2
30 | 
31 | LOOPY: TAMING AUDIO-DRIVEN PORTRAIT AVATAR
32 | WITH LONG-TERM MOTION DEPENDENCY
33 | https://arxiv.org/pdf/2409.02634
34 | 
35 | WAVTOKENIZER: AN EFFICIENT ACOUSTIC DISCRETE
36 | CODEC TOKENIZER FOR AUDIO LANGUAGE MODELING
37 | https://arxiv.org/pdf/2408.16532v1
38 | https://huggingface.co/papers/2408.16532
39 | 
40 | Rectified Flow
41 | https://www.cs.utexas.edu/~lqiang/rectflow/html/intro.html
42 | 
43 | Mel Spectogram
44 | https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53
45 | 
46 | Fourier
47 | https://upload.wikimedia.org/wikipedia/commons/1/1a/Fourier_series_square_wave_circles_animation.gif
48 | 
49 | RVQ
50 | https://drscotthawley.github.io/blog/posts/2023-06-12-RVQ.html
51 | 
52 | Black Forest Labs
53 | https://blackforestlabs.ai/announcements/
54 | 
55 | Wav2Vec
56 | https://huggingface.co/facebook/wav2vec2-base-960h
57 | https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/
58 | 
59 | 


--------------------------------------------------------------------------------
/2024.09.06.flux_music/repos.md:
--------------------------------------------------------------------------------
 1 | # Repo summaries (Cursor + Claude Sonnet 3.5)
 2 | 
 3 | questions:
 4 | 
 5 | ```
 6 | explain this repo
 7 | what are the design patterns, frameworks, and dependencies used?
 8 | what functionality is provided? training? configuration? serving?
 9 | summarize our conversation into one paragraph, focus on technical details to provide a comprehensize overview of the repo
10 | ```
11 | 
12 | https://github.com/black-forest-labs/flux
13 | 
14 | ```
15 | This repository houses FLUX, a text-to-image and image-to-image generation model developed by Black Forest Labs. Built on a latent rectified flow transformer architecture, FLUX is available in three variants: FLUX.1 [pro], [dev], and [schnell], with varying capabilities and licensing terms. The project is structured modularly, separating core model logic from demo applications, and utilizes PyTorch as its primary machine learning framework. While training scripts are not publicly visible, the repo provides utilities for model loading, image processing, and inference. The project leverages a diverse set of dependencies, including torchvision and Pillow for image handling, FastAPI for serving, and Streamlit and Gradio for interactive demos. It employs Poetry for dependency management and incorporates development tools like ruff for linting and pyright for type checking. The codebase hints at the use of design patterns such as Factory and potentially Singleton, though a comprehensive configuration management system is not apparent. The focus appears to be on providing a user-friendly interface for the pre-trained models, with features like NSFW content detection, image watermarking, and customizable sampling parameters.
16 | ```
17 | 
18 | https://github.com/feizc/FluxMusic
19 | 
20 | ```
21 | This repository implements a text-to-music generation model using a UNet architecture with diffusion-based techniques. The core of the model is the UNetModel class, which incorporates attention mechanisms, spatial transformers, and residual blocks. The model uses a combination of convolutional layers, self-attention, and cross-attention to process audio data conditioned on text input. The architecture includes timestep embeddings, allowing for the diffusion process, and supports various conditioning methods such as class labels and extra embedding vectors (e.g., from CLAP). The implementation leverages PyTorch and includes optimizations like gradient checkpointing for memory efficiency. The codebase also features custom dataset classes, samplers for handling audio data, and utilities for model training and inference. Key components include the QKVAttention mechanisms, ResBlock implementations, and the overall UNet structure with its input, middle, and output blocks. The model supports different configurations (small, base, large) and includes features like FiLM conditioning and the ability to predict discrete codebook IDs. The training process is set up for distributed training using PyTorch's DistributedDataParallel, with support for mixed-precision training and EMA model updates.
22 | ```
23 | 
24 | https://github.com/haoheliu/AudioLDM2
25 | 
26 | ```
27 | AudioLDM2 is a sophisticated text-to-audio generation system built on PyTorch, leveraging advanced deep learning techniques such as latent diffusion models, transformer architectures, and audio processing methods. The repository implements a modular architecture centered around the LatentDiffusion class, incorporating components like CLAP for audio embeddings, AudioMAE for feature extraction, and HiFi-GAN as a vocoder. It supports various audio generation tasks, including general sounds, music, and speech, with models available for different sample rates (16kHz and 48kHz). The system utilizes YAML configurations and command-line arguments for flexible setup, and offers a serving interface through a Gradio-based web application. Key functionalities include text-to-audio generation, super-resolution, and inpainting, with utilities for audio processing, model management, and visualization. While the focus appears to be on inference and serving pre-trained models, the architecture suggests potential for training capabilities. The project demonstrates a comprehensive approach to audio generation, combining state-of-the-art ML models with practical deployment considerations.
28 | ```


--------------------------------------------------------------------------------
/2024.09.06.flux_music/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.06.flux_music/thumbnail.png


--------------------------------------------------------------------------------
/2024.09.13.strawberry/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Strawberry
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/oQqOiwUhJkA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1yoKMyYdLkYKQ
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Building Math Agents with Multi-Turn Iterative Preference Learning
24 | https://arxiv.org/pdf/2409.02392
25 | 
26 | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
27 | https://arxiv.org/pdf/2408.03314
28 | 
29 | STRATEGIST: LEARNING STRATEGIC SKILLS BY LLMS VIA BI-LEVEL TREE SEARCH
30 | https://www.arxiv.org/pdf/2408.10635
31 | 
32 | MCTS
33 | https://vgarciasc.github.io/mcts-viz/
34 | 
35 | https://openai.com/index/learning-to-reason-with-llms/
36 | 
37 | https://x.com/DrJimFan/status/1834279865933332752
38 | 
39 | https://x.com/polynoamial/status/1834280155730043108
40 | 
41 | https://x.com/_jasonwei/status/1834278706522849788
42 | 
43 | https://x.com/AiBreakfast/status/1834283499358146714
44 | 
45 | https://x.com/karpathy/status/1821277264996352246
46 | 
47 | https://www.yuzeh.com/data/agz-cost.html
48 | 


--------------------------------------------------------------------------------
/2024.09.13.strawberry/ideas.txt:
--------------------------------------------------------------------------------
 1 | Building Math Agents with Multi-Turn Iterative Preference Learning
 2 | https://arxiv.org/pdf/2409.02392
 3 | 
 4 | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
 5 | https://arxiv.org/pdf/2408.03314
 6 | 
 7 | STRATEGIST: LEARNING STRATEGIC SKILLS BY LLMS VIA BI-LEVEL TREE SEARCH
 8 | https://www.arxiv.org/pdf/2408.10635
 9 | 
10 | mini omni
11 | https://github.com/gpt-omni/mini-omni
12 | 


--------------------------------------------------------------------------------
/2024.09.13.strawberry/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.13.strawberry/thumbnail.png


--------------------------------------------------------------------------------
/2024.09.20.gaussian_robots/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Gaussian Robots
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/_jqhy-dr7Q4
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ZkKzRvoLZrKv
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos
24 | https://arxiv.org/pdf/2409.08353
25 | 
26 | SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting
27 | https://arxiv.org/pdf/2409.10161
28 | 
29 | AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius
30 | https://arxiv.org/pdf/2409.08669
31 | 
32 | SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length
33 | https://arxiv.org/pdf/2409.07759
34 | 
35 | Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video
36 | https://arxiv.org/pdf/2409.08189
37 | 
38 | Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs
39 | https://arxiv.org/pdf/2409.07456
40 | 
41 | Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering
42 | https://arxiv.org/pdf/2409.07441
43 | 
44 | Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras
45 | https://arxiv.org/pdf/2409.04751
46 | 
47 | Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos
48 | https://arxiv.org/pdf/2406.18717
49 | 
50 | BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
51 | https://arxiv.org/pdf/2409.10216
52 | 
53 | gsplat
54 | https://github.com/nerfstudio-project/gsplat
55 | 
56 | 3D Gaussian Splatting as Markov Chain Monte Carlo
57 | https://arxiv.org/abs/2404.09591
58 | 
59 | View Dependent Podcast
60 | https://x.com/ViewDependent
61 | 
62 | Langevin Dynamics
63 | https://henripal.github.io/blog/langevin
64 | 


--------------------------------------------------------------------------------
/2024.09.20.gaussian_robots/abstracts.txt:
--------------------------------------------------------------------------------
  1 | Title: Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos
  2 | Authors: Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu
  3 | Published Date: 2024-09-12 18:33:13+00:00
  4 | Last Updated: 2024-09-12 18:33:13+00:00
  5 | DOI: N/A
  6 | Primary Category: cs.GR
  7 | arXiv URL: http://arxiv.org/abs/2409.08353v1
  8 | PDF URL: http://arxiv.org/pdf/2409.08353v1
  9 | Abstract: Volumetric video represents a transformative advancement in visual media,
 10 | enabling users to freely navigate immersive virtual experiences and narrowing
 11 | the gap between digital and real worlds. However, the need for extensive manual
 12 | intervention to stabilize mesh sequences and the generation of excessively
 13 | large assets in existing workflows impedes broader adoption. In this paper, we
 14 | present a novel Gaussian-based approach, dubbed \textit{DualGS}, for real-time
 15 | and high-fidelity playback of complex human performance with excellent
 16 | compression ratios. Our key idea in DualGS is to separately represent motion
 17 | and appearance using the corresponding skin and joint Gaussians. Such an
 18 | explicit disentanglement can significantly reduce motion redundancy and enhance
 19 | temporal coherence. We begin by initializing the DualGS and anchoring skin
 20 | Gaussians to joint Gaussians at the first frame. Subsequently, we employ a
 21 | coarse-to-fine training strategy for frame-by-frame human performance modeling.
 22 | It includes a coarse alignment phase for overall motion prediction as well as a
 23 | fine-grained optimization for robust tracking and high-fidelity rendering. To
 24 | integrate volumetric video seamlessly into VR environments, we efficiently
 25 | compress motion using entropy encoding and appearance using codec compression
 26 | coupled with a persistent codebook. Our approach achieves a compression ratio
 27 | of up to 120 times, only requiring approximately 350KB of storage per frame. We
 28 | demonstrate the efficacy of our representation through photo-realistic,
 29 | free-view experiences on VR headsets, enabling users to immersively watch
 30 | musicians in performance and feel the rhythm of the notes at the performers'
 31 | fingertips.
 32 | 
 33 | --------------------------------------------------------------------------------
 34 | 
 35 | Title: SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting
 36 | Authors: Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, Abhishesh Silwal
 37 | Published Date: 2024-09-16 10:52:16+00:00
 38 | Last Updated: 2024-09-16 10:52:16+00:00
 39 | DOI: N/A
 40 | Primary Category: cs.RO
 41 | arXiv URL: http://arxiv.org/abs/2409.10161v1
 42 | PDF URL: http://arxiv.org/pdf/2409.10161v1
 43 | Abstract: Sim2Real transfer, particularly for manipulation policies relying on RGB
 44 | images, remains a critical challenge in robotics due to the significant domain
 45 | shift between synthetic and real-world visual data. In this paper, we propose
 46 | SplatSim, a novel framework that leverages Gaussian Splatting as the primary
 47 | rendering primitive to reduce the Sim2Real gap for RGB-based manipulation
 48 | policies. By replacing traditional mesh representations with Gaussian Splats in
 49 | simulators, SplatSim produces highly photorealistic synthetic data while
 50 | maintaining the scalability and cost-efficiency of simulation. We demonstrate
 51 | the effectiveness of our framework by training manipulation policies within
 52 | SplatSim}and deploying them in the real world in a zero-shot manner, achieving
 53 | an average success rate of 86.25%, compared to 97.5% for policies trained on
 54 | real-world data.
 55 | 
 56 | --------------------------------------------------------------------------------
 57 | 
 58 | Title: AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius
 59 | Authors: Xinzhe Wang, Ran Yi, Lizhuang Ma
 60 | Published Date: 2024-09-13 09:32:38+00:00
 61 | Last Updated: 2024-09-13 09:32:38+00:00
 62 | DOI: 10.1145/3680528.3687675
 63 | Primary Category: cs.CV
 64 | arXiv URL: http://arxiv.org/abs/2409.08669v1
 65 | PDF URL: http://arxiv.org/pdf/2409.08669v1
 66 | Abstract: 3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has
 67 | achieved high-quality reconstruction and real-time rendering of complex scenes.
 68 | However, the rasterization pipeline still suffers from unnecessary overhead
 69 | resulting from avoidable serial Gaussian culling, and uneven load due to the
 70 | distinct number of Gaussian to be rendered across pixels, which hinders wider
 71 | promotion and application of 3DGS. In order to accelerate Gaussian splatting,
 72 | we propose AdR-Gaussian, which moves part of serial culling in Render stage
 73 | into the earlier Preprocess stage to enable parallel culling, employing
 74 | adaptive radius to narrow the rendering pixel range for each Gaussian, and
 75 | introduces a load balancing method to minimize thread waiting time during the
 76 | pixel-parallel rendering. Our contributions are threefold, achieving a
 77 | rendering speed of 310% while maintaining equivalent or even better quality
 78 | than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile
 79 | pairs of low splatting opacity based on an adaptive radius in the
 80 | Gaussian-parallel Preprocess stage, which reduces the number of affected tile
 81 | through the Gaussian bounding circle, thus reducing unnecessary overhead and
 82 | achieving faster rendering speed. Secondly, we further propose early culling
 83 | based on axis-aligned bounding box for Gaussian splatting, which achieves a
 84 | more significant reduction in ineffective expenses by accurately calculating
 85 | the Gaussian size in the 2D directions. Thirdly, we propose a balancing
 86 | algorithm for pixel thread load, which compresses the information of heavy-load
 87 | pixels to reduce thread waiting time, and enhance information of light-load
 88 | pixels to hedge against rendering quality loss. Experiments on three datasets
 89 | demonstrate that our algorithm can significantly improve the Gaussian Splatting
 90 | rendering speed.
 91 | 
 92 | --------------------------------------------------------------------------------
 93 | 
 94 | Title: SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length
 95 | Authors: Bangya Liu, Suman Banerjee
 96 | Published Date: 2024-09-12 05:33:15+00:00
 97 | Last Updated: 2024-09-12 05:33:15+00:00
 98 | DOI: N/A
 99 | Primary Category: cs.MM
100 | arXiv URL: http://arxiv.org/abs/2409.07759v1
101 | PDF URL: http://arxiv.org/pdf/2409.07759v1
102 | Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant
103 | attention in computer vision and computer graphics due to its high rendering
104 | speed and remarkable quality. While extant research has endeavored to extend
105 | the application of 3DGS from static to dynamic scenes, such efforts have been
106 | consistently impeded by excessive model sizes, constraints on video duration,
107 | and content deviation. These limitations significantly compromise the
108 | streamability of dynamic 3D Gaussian models, thereby restricting their utility
109 | in downstream applications, including volumetric video, autonomous vehicle, and
110 | immersive technologies such as virtual, augmented, and mixed reality.
111 |   This paper introduces SwinGS, a novel framework for training, delivering, and
112 | rendering volumetric video in a real-time streaming fashion. To address the
113 | aforementioned challenges and enhance streamability, SwinGS integrates
114 | spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to
115 | fit various 3D scenes across frames, in the meantime employing a sliding window
116 | captures Gaussian snapshots for each frame in an accumulative way. We implement
117 | a prototype of SwinGS and demonstrate its streamability across various datasets
118 | and scenes. Additionally, we develop an interactive WebGL viewer enabling
119 | real-time volumetric video playback on most devices with modern browsers,
120 | including smartphones and tablets. Experimental results show that SwinGS
121 | reduces transmission costs by 83.6% compared to previous work with ignorable
122 | compromise in PSNR. Moreover, SwinGS easily scales to long video sequences
123 | without compromising quality.
124 | 
125 | --------------------------------------------------------------------------------
126 | 
127 | Title: Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video
128 | Authors: Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges
129 | Published Date: 2024-09-12 16:26:47+00:00
130 | Last Updated: 2024-09-12 16:26:47+00:00
131 | DOI: N/A
132 | Primary Category: cs.CV
133 | arXiv URL: http://arxiv.org/abs/2409.08189v1
134 | PDF URL: http://arxiv.org/pdf/2409.08189v1
135 | Abstract: We introduce Gaussian Garments, a novel approach for reconstructing realistic
136 | simulation-ready garment assets from multi-view videos. Our method represents
137 | garments with a combination of a 3D mesh and a Gaussian texture that encodes
138 | both the color and high-frequency surface details. This representation enables
139 | accurate registration of garment geometries to multi-view videos and helps
140 | disentangle albedo textures from lighting effects. Furthermore, we demonstrate
141 | how a pre-trained graph neural network (GNN) can be fine-tuned to replicate the
142 | real behavior of each garment. The reconstructed Gaussian Garments can be
143 | automatically combined into multi-garment outfits and animated with the
144 | fine-tuned GNN.
145 | 
146 | --------------------------------------------------------------------------------
147 | 
148 | Title: Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs
149 | Authors: Sadra Safadoust, Fabio Tosi, Fatma Güney, Matteo Poggi
150 | Published Date: 2024-09-11 17:59:58+00:00
151 | Last Updated: 2024-09-11 17:59:58+00:00
152 | DOI: N/A
153 | Primary Category: cs.CV
154 | arXiv URL: http://arxiv.org/abs/2409.07456v1
155 | PDF URL: http://arxiv.org/pdf/2409.07456v1
156 | Abstract: 3D Gaussian Splatting (GS) significantly struggles to accurately represent
157 | the underlying 3D scene geometry, resulting in inaccuracies and floating
158 | artifacts when rendering depth maps. In this paper, we address this limitation,
159 | undertaking a comprehensive analysis of the integration of depth priors
160 | throughout the optimization process of Gaussian primitives, and present a novel
161 | strategy for this purpose. This latter dynamically exploits depth cues from a
162 | readily available stereo network, processing virtual stereo pairs rendered by
163 | the GS model itself during training and achieving consistent self-improvement
164 | of the scene representation. Experimental results on three popular datasets,
165 | breaking ground as the first to assess depth accuracy for these models,
166 | validate our findings.
167 | 
168 | --------------------------------------------------------------------------------
169 | 
170 | Title: Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering
171 | Authors: Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura
172 | Published Date: 2024-09-11 17:40:21+00:00
173 | Last Updated: 2024-09-11 17:40:21+00:00
174 | DOI: N/A
175 | Primary Category: cs.GR
176 | arXiv URL: http://arxiv.org/abs/2409.07441v1
177 | PDF URL: http://arxiv.org/pdf/2409.07441v1
178 | Abstract: We propose GauFace, a novel Gaussian Splatting representation, tailored for
179 | efficient animation and rendering of physically-based facial assets. Leveraging
180 | strong geometric priors and constrained optimization, GauFace ensures a neat
181 | and structured Gaussian representation, delivering high fidelity and real-time
182 | facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform.
183 |   Then, we introduce TransGS, a diffusion transformer that instantly translates
184 | physically-based facial assets into the corresponding GauFace representations.
185 | Specifically, we adopt a patch-based pipeline to handle the vast number of
186 | Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme
187 | with UV positional encoding to ensure the throughput and rendering quality of
188 | GauFace assets generated by our TransGS. Once trained, TransGS can instantly
189 | translate facial assets with lighting conditions to GauFace representation,
190 | With the rich conditioning modalities, it also enables editing and animation
191 | capabilities reminiscent of traditional CG pipelines.
192 |   We conduct extensive evaluations and user studies, compared to traditional
193 | offline and online renderers, as well as recent neural rendering methods, which
194 | demonstrate the superior performance of our approach for facial asset
195 | rendering. We also showcase diverse immersive applications of facial assets
196 | using our TransGS approach and GauFace representation, across various platforms
197 | like PCs, phones and even VR headsets.
198 | 
199 | --------------------------------------------------------------------------------
200 | 
201 | Title: Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras
202 | Authors: Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Li Ma, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang
203 | Published Date: 2024-09-07 07:53:40+00:00
204 | Last Updated: 2024-09-11 07:04:58+00:00
205 | DOI: N/A
206 | Primary Category: cs.CV
207 | arXiv URL: http://arxiv.org/abs/2409.04751v2
208 | PDF URL: http://arxiv.org/pdf/2409.04751v2
209 | Abstract: Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high
210 | fidelity and real-time rendering. However, adapting 3DGS to different camera
211 | models, particularly fisheye lenses, poses challenges due to the unique 3D to
212 | 2D projection calculation. Additionally, there are inefficiencies in the
213 | tile-based splatting, especially for the extreme curvature and wide field of
214 | view of fisheye lenses, which are crucial for its broader real-life
215 | applications. To tackle these challenges, we introduce Fisheye-GS.This
216 | innovative method recalculates the projection transformation and its gradients
217 | for fisheye cameras. Our approach can be seamlessly integrated as a module into
218 | other efficient 3D rendering methods, emphasizing its extensibility,
219 | lightweight nature, and modular design. Since we only modified the projection
220 | component, it can also be easily adapted for use with different camera models.
221 | Compared to methods that train after undistortion, our approach demonstrates a
222 | clear improvement in visual quality.
223 | 
224 | --------------------------------------------------------------------------------
225 | 
226 | Title: Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos
227 | Authors: Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas
228 | Published Date: 2024-06-26 19:37:07+00:00
229 | Last Updated: 2024-09-10 21:30:31+00:00
230 | DOI: 10.1145/3680528.3687681
231 | Primary Category: cs.CV
232 | arXiv URL: http://arxiv.org/abs/2406.18717v2
233 | PDF URL: http://arxiv.org/pdf/2406.18717v2
234 | Abstract: Gaussian splatting has become a popular representation for novel-view
235 | synthesis, exhibiting clear strengths in efficiency, photometric quality, and
236 | compositional edibility. Following its success, many works have extended
237 | Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while
238 | also tracking scene geometry far better than alternative representations. Yet,
239 | these methods assume dense multi-view videos as supervision. In this work, we
240 | are interested in extending the capability of Gaussian scene representations to
241 | casually captured monocular videos. We show that existing 4D Gaussian methods
242 | dramatically fail in this setup because the monocular setting is
243 | underconstrained. Building off this finding, we propose a method we call
244 | Dynamic Gaussian Marbles, which consist of three core modifications that target
245 | the difficulties of the monocular setting. First, we use isotropic Gaussian
246 | "marbles'', reducing the degrees of freedom of each Gaussian. Second, we employ
247 | a hierarchical divide and-conquer learning strategy to efficiently guide the
248 | optimization towards solutions with globally coherent motion. Finally, we add
249 | image-level and geometry-level priors into the optimization, including a
250 | tracking loss that takes advantage of recent progress in point tracking. By
251 | constraining the optimization, Dynamic Gaussian Marbles learns Gaussian
252 | trajectories that enable novel-view rendering and accurately capture the 3D
253 | motion of the scene elements. We evaluate on the Nvidia Dynamic Scenes dataset
254 | and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly
255 | outperforms other Gaussian baselines in quality, and is on-par with
256 | non-Gaussian representations, all while maintaining the efficiency,
257 | compositionality, editability, and tracking benefits of Gaussians. Our project
258 | page can be found here
259 | https://geometry.stanford.edu/projects/dynamic-gaussian-marbles.github.io/.
260 | 
261 | --------------------------------------------------------------------------------
262 | 
263 | Title: BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting
264 | Authors: Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang
265 | Published Date: 2024-09-16 12:07:02+00:00
266 | Last Updated: 2024-09-16 12:07:02+00:00
267 | DOI: N/A
268 | Primary Category: cs.RO
269 | arXiv URL: http://arxiv.org/abs/2409.10216v1
270 | PDF URL: http://arxiv.org/pdf/2409.10216v1
271 | Abstract: Image-goal navigation enables a robot to reach the location where a target
272 | image was captured, using visual cues for guidance. However, current methods
273 | either rely heavily on data and computationally expensive learning-based
274 | approaches or lack efficiency in complex environments due to insufficient
275 | exploration strategies. To address these limitations, we propose Bayesian
276 | Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that
277 | formulates ImageNav as an optimal control problem within a model predictive
278 | control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to
279 | predict future observations, enabling efficient, real-time navigation decisions
280 | grounded in the robot's sensory experiences. By integrating Bayesian updates,
281 | our method dynamically refines the robot's strategy without requiring extensive
282 | prior experience or data. Our algorithm is validated through extensive
283 | simulations and physical experiments, showcasing its potential for embodied
284 | robot systems in visually complex scenarios.
285 | 
286 | --------------------------------------------------------------------------------
287 | 
288 | 


--------------------------------------------------------------------------------
/2024.09.20.gaussian_robots/o1.txt:
--------------------------------------------------------------------------------
 1 | The key directions and important points in recent Gaussian Splatting (GS) research are:
 2 | 
 3 | Efficiency and Real-Time Rendering:
 4 | 
 5 | Acceleration Techniques: Development of methods to speed up GS rendering pipelines, achieving significant increases in rendering speed (e.g., 310% faster) without compromising quality.
 6 | Compression and Streaming: Innovations like DualGS and SwinGS enable high compression ratios (up to 120x) for GS representations, facilitating real-time volumetric video playback on devices like VR headsets and mobile platforms.
 7 | Dynamic Scene Representation:
 8 | 
 9 | Handling Dynamic Content: Extending GS to represent dynamic scenes and volumetric videos, addressing challenges like large model sizes and ensuring temporal coherence.
10 | Monocular Video Adaptation: Techniques are being developed to adapt GS to casual monocular videos, making high-quality 3D scene capture more accessible without multi-view setups.
11 | Integration with Machine Learning:
12 | 
13 | Neural Rendering Enhancements: Combining GS with neural networks (e.g., graph neural networks, diffusion transformers) to improve rendering fidelity and automate asset creation like clothing and facial animations.
14 | Sim-to-Real Transfer in Robotics: Utilizing GS in simulators to generate photorealistic data for training robotic manipulation policies, enabling zero-shot transfer from simulation to real-world applications.
15 | Applications in Robotics and Navigation:
16 | 
17 | Embodied Navigation: Employing GS as a scene prior in robotic navigation tasks, allowing robots to predict future observations and refine strategies in real-time without extensive prior data.
18 | Improved Geometric Representation: Incorporating depth cues and stereo networks into GS to enhance geometric accuracy and depth representation.
19 | Adaptation to Various Camera Models:
20 | 
21 | Support for Specialized Cameras: Adapting GS to work with different camera models like fisheye lenses, expanding its applicability to a wider range of imaging devices.
22 | Enhancing Visual Quality:
23 | 
24 | High-Fidelity Rendering: Focusing on improving photorealism by addressing challenges in lighting, texture representation, and accurate motion capture for immersive experiences.
25 | Predicted Future Directions in GS Research:
26 | 
27 | Wider Adoption in Real-Time Applications: GS is expected to become integral in real-time rendering for gaming, VR/AR, and live virtual events due to its efficiency and high-quality outputs.
28 | 
29 | Improved Dynamic Scene Handling: Research will enhance GS's capability to represent and render dynamic scenes from minimal input data, making it more practical for everyday use.
30 | 
31 | Advanced AI Integration: Deeper integration with AI models will improve rendering quality, enable intelligent scene manipulation, and automate asset creation processes.
32 | 
33 | Expansion in Robotics and Autonomous Systems: GS will be increasingly important in robotics for tasks requiring visual navigation and interaction with complex environments.
34 | 
35 | Standardization and Tool Development: Creation of standardized tools and frameworks will facilitate easier integration of GS into various applications, promoting widespread adoption.
36 | 
37 | Support for Diverse Sensors and Cameras: GS will be extended to incorporate data from various sensors and support more camera models, enhancing its versatility.
38 | 
39 | Cloud-Based Rendering and Streaming: Enhanced compression and streaming capabilities will enable cloud-based GS rendering services, making high-quality 3D content accessible on devices with limited computational power.
40 | 
41 | Improved Geometric Accuracy: Ongoing research will address current limitations in depth accuracy and geometric representation, potentially combining GS with other 3D reconstruction methods.
42 | 
43 | Interactive Content Creation: GS will empower users to create and customize 3D content easily, fostering interactive applications where users can modify environments and objects in real-time.
44 | 
45 | Summary:
46 | 
47 | Gaussian Splatting research is rapidly advancing towards establishing GS as a foundational technology in computer graphics and vision. The focus is on:
48 | 
49 | Efficiency: Achieving real-time rendering speeds and reducing storage requirements.
50 | Dynamic Scene Handling: Accurately representing and rendering moving scenes and objects.
51 | AI Integration: Leveraging machine learning to enhance quality and functionality.
52 | Broad Applicability: Extending GS to various applications like robotics, navigation, and interactive content creation.
53 | Versatility: Adapting GS to work with different sensors and camera models.
54 | These developments aim to make GS indispensable for rendering, simulation, robotics, and immersive digital experiences in the near future.


--------------------------------------------------------------------------------
/2024.09.20.gaussian_robots/scrape_abstracts.py:
--------------------------------------------------------------------------------
 1 | # pip install arxiv
 2 | import arxiv
 3 | 
 4 | # List of arXiv paper IDs
 5 | paper_ids = [
 6 |     "2409.08353",
 7 |     "2409.10161",
 8 |     "2409.08669",
 9 |     "2409.07759",
10 |     "2409.08189",
11 |     "2409.07456",
12 |     "2409.07441",
13 |     "2409.04751",
14 |     "2406.18717",
15 |     "2409.10216"
16 | ]
17 | 
18 | # Output file
19 | output_file = "abstracts.txt"
20 | 
21 | # Open the output file in write mode
22 | with open(output_file, "w") as f:
23 |     # Iterate over each paper ID
24 |     for paper_id in paper_ids:
25 |         # Search for the paper on arXiv
26 |         search = arxiv.Search(id_list=[paper_id])
27 | 
28 |         # Fetch the result (this will only be one result for each paper_id)
29 |         result = next(search.results())
30 | 
31 |         # Write the paper details to the file
32 |         f.write(f"Title: {result.title}\n")
33 |         f.write(f"Authors: {', '.join([author.name for author in result.authors])}\n")
34 |         f.write(f"Published Date: {result.published}\n")
35 |         f.write(f"Last Updated: {result.updated}\n")
36 |         f.write(f"DOI: {result.doi if result.doi else 'N/A'}\n")
37 |         f.write(f"Primary Category: {result.primary_category}\n")
38 |         f.write(f"arXiv URL: {result.entry_id}\n")
39 |         f.write(f"PDF URL: {result.pdf_url}\n")
40 |         f.write(f"Abstract: {result.summary}\n")
41 |         f.write("\n" + "-"*80 + "\n\n")
42 | 
43 | print(f"Abstracts and additional information have been written to {output_file}")
44 | 


--------------------------------------------------------------------------------
/2024.09.20.gaussian_robots/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.20.gaussian_robots/thumbnail.png


--------------------------------------------------------------------------------
/2024.09.27.voice_mode/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Voice Mode
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/x3fX7v7xjUA
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ynJODzrQqVxR
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Moshi: a speech-text foundation model for real-time dialogue
24 | https://kyutai.org/Moshi.pdf
25 | 
26 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models
27 | https://web3.arxiv.org/abs/2409.06666
28 | 
29 | torch.cuda.Event
30 | https://pytorch.org/docs/stable/generated/torch.cuda.Event.html
31 | https://github.com/kyutai-labs/moshi/blob/main/scripts/moshi_benchmark.py
32 | 
33 | moshi chat
34 | https://moshi.chat/
35 | 
36 | voice mode system prompt
37 | https://x.com/elder_plinius/status/1838982899120431330
38 | 
39 | whisper large
40 | https://huggingface.co/openai/whisper-large-v3
41 | 
42 | WavLM
43 | https://huggingface.co/docs/transformers/en/model_doc/wavlm
44 | 
45 | RQ-Transformer
46 | https://github.com/lucidrains/RQ-Transformer
47 | 


--------------------------------------------------------------------------------
/2024.09.27.voice_mode/latency_viz.html:
--------------------------------------------------------------------------------
 1 | <!DOCTYPE html>
 2 | <html>
 3 | <head>
 4 |     <title>Latency Comparison</title>
 5 |     <style>
 6 |         .square {
 7 |             width: 100px;
 8 |             height: 100px;
 9 |             background-color: green;
10 |             display: none;
11 |             margin: 20px;
12 |         }
13 |         .container {
14 |             display: flex;
15 |         }
16 |         .latency-block {
17 |             text-align: center;
18 |             margin: 10px;
19 |         }
20 |     </style>
21 | </head>
22 | <body>
23 | 
24 | <h2>Latency Comparison</h2>
25 | 
26 | <div class="container">
27 |     <div class="latency-block">
28 |         <label for="latencyInput1">Latency 1 (ms):</label><br>
29 |         <input type="number" id="latencyInput1" value="100"><br>
30 |         <button onclick="flashSquare('square1', 'latencyInput1')">Start</button>
31 |         <div id="square1" class="square"></div>
32 |     </div>
33 |     <div class="latency-block">
34 |         <label for="latencyInput2">Latency 2 (ms):</label><br>
35 |         <input type="number" id="latencyInput2" value="500"><br>
36 |         <button onclick="flashSquare('square2', 'latencyInput2')">Start</button>
37 |         <div id="square2" class="square"></div>
38 |     </div>
39 |     <div class="latency-block">
40 |         <label for="latencyInput3">Latency 3 (ms):</label><br>
41 |         <input type="number" id="latencyInput3" value="1000"><br>
42 |         <button onclick="flashSquare('square3', 'latencyInput3')">Start</button>
43 |         <div id="square3" class="square"></div>
44 |     </div>
45 | </div>
46 | 
47 | <script>
48 |     function flashSquare(squareId, latencyInputId) {
49 |         var latency = parseInt(document.getElementById(latencyInputId).value);
50 |         var square = document.getElementById(squareId);
51 |         square.style.display = 'none';
52 |         setTimeout(function() {
53 |             square.style.display = 'block';
54 |         }, latency);
55 |     }
56 | </script>
57 | 
58 | </body>
59 | </html>


--------------------------------------------------------------------------------
/2024.09.27.voice_mode/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.09.27.voice_mode/thumbnail.png


--------------------------------------------------------------------------------
/2024.10.04.generative_molecular_dynamics/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Generative Molecular Dynamics
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/tvpEewf978k
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1PlJQbezDjYxE
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Generative Modeling of Molecular Dynamics Trajectories
24 | https://arxiv.org/html/2409.17808v1
25 | 
26 | https://openreview.net/forum?id=LbwM4VCDUU
27 | 
28 | https://arxiv.org/html/2409.17808v1
29 | 
30 | Molecular dynamics gifs
31 | https://static.wixstatic.com/media/965d62_d33342f5e56c446684070f46a65339d6~mv2.gif
32 | https://www.ljll.fr/gallery/html/BenjaminStamm-Ubiquitin.gif
33 | https://molstar.org/viewer/?snapshot-url=https%3A%2F%2Fmolstar.org%2Fdemos%2Fstates%2Fmotor-hook.molx&snapshot-url-type=molx
34 | 
35 | Quantum Tunneling
36 | https://miro.medium.com/v2/resize:fit:596/1*5R67Z5CYwwmrujPn21DTbA.png
37 | 
38 | Quantum effects in photosynthesis
39 | https://russgeorge.net/2014/06/17/phytoplankton-use-quantum-coherence-survive-low-light/
40 | 
41 | Quantum effects in Consciousness
42 | https://en.wikipedia.org/wiki/Orchestrated_objective_reduction
43 | 
44 | Lie Groups, SE(3)
45 | https://en.wikipedia.org/wiki/Lie_group
46 | https://en.wikipedia.org/wiki/Euclidean_group
47 | 
48 | AlphaFold Meets Flow Matching for Generating Protein Ensembles
49 | https://www.mlsb.io/papers_2023/AlphaFold_Meets_Flow_Matching_for_Generating_Protein_Ensembles.pdf
50 | 
51 | Hyena Hierarchy: Towards Larger Convolutional Language Models
52 | https://arxiv.org/pdf/2302.10866
53 | 


--------------------------------------------------------------------------------
/2024.10.04.generative_molecular_dynamics/ideas.txt:
--------------------------------------------------------------------------------
 1 | LOTUS: DIFFUSION-BASED VISUAL FOUNDATION MODEL FOR HIGH-QUALITY DENSE PREDICTION
 2 | https://arxiv.org/pdf/2409.18124
 3 | 
 4 | LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
 5 | https://arxiv.org/pdf/2409.18125
 6 | 
 7 | VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS
 8 | https://arxiv.org/pdf/2409.17066
 9 | 
10 | TPI-LLM: SERVING 70B-SCALE LLMS EFFICIENTLY ON LOW-RESOURCE EDGE DEVICES
11 | https://arxiv.org/pdf/2410.00531
12 | 
13 | BAD: BIDIRECTIONAL AUTO-REGRESSIVE DIFFUSION FOR TEXT-TO-MOTION GENERATION
14 | https://arxiv.org/pdf/2409.10847
15 | 
16 | Generative Modeling of Molecular Dynamics Trajectories
17 | https://arxiv.org/html/2409.17808v1


--------------------------------------------------------------------------------
/2024.10.04.generative_molecular_dynamics/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.10.04.generative_molecular_dynamics/thumbnail.png


--------------------------------------------------------------------------------
/2024.10.11.meta_movie_gen/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Meta Move Gen
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/GD32znsZjJw
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1OyKAZOlbAbGb
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Meta Movie Gen
24 | https://ai.meta.com/research/movie-gen/
25 | 


--------------------------------------------------------------------------------
/2024.10.11.meta_movie_gen/ideas.txt:
--------------------------------------------------------------------------------
 1 | DIFFUSION MODELS ARE EVOLUTIONARY ALGORITHMS
 2 | https://arxiv.org/pdf/2410.02543
 3 | https://github.com/Zhangyanbo/diffusion-evolution
 4 | 
 5 | D E P T H P R O : S H A R P M O N O C U L A R M E T R I C D E P T H I N LE S S T H A N A S E C O N D
 6 | https://github.com/apple/ml-depth-pro/blob/main/src/depth_pro/depth_pro.py
 7 | 
 8 | CAX: CELLULAR AUTOMATA ACCELERATED IN JAX
 9 | https://arxiv.org/pdf/2410.02651
10 | https://github.com/maxencefaldor/cax
11 | 


--------------------------------------------------------------------------------
/2024.10.11.meta_movie_gen/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.10.11.meta_movie_gen/thumbnail.png


--------------------------------------------------------------------------------
/2024.10.18.automated_ml_research/README.md:
--------------------------------------------------------------------------------
 1 | to![thumbnail](thumbnail.png)
 2 | 
 3 | # Automated ML Research
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/9J1Ofd1gYIk?feature=share
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1nAJEpMQgYYGL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | https://github.com/hu-po/evoarc
24 | 
25 | CAX: Cellular Automata Accelerated in JAX
26 | https://arxiv.org/pdf/2410.02651.pdf
27 | 
28 | Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
29 | https://arxiv.org/abs/2410.06405
30 | 
31 | stream todos:
32 | - run cnn on lambda labs instance (maybe ssh into it beforehand to prevent keys on stream)
33 | - get o1 to create an RNN implementation
34 | - iterate on mutation prompts
35 | - read through technical guide and papers
36 | - add claude api as agent
37 | - use voice mode with tripod as stream buddy
38 | - custom batch sizes and aug multipliers by machine
39 | 
40 | https://x.com/Hesamation/status/1846924454309323257
41 | 
42 | https://arxiv.org/abs/1803.03635
43 | 
44 | https://x.com/rohanpaul_ai/status/1846242281973486063
45 | 
46 | fuzzy search over not fuzzy programs 
47 | https://x.com/fchollet/status/1846269999960543652
48 | 
49 | agent guided diffusion in code space
50 | 
51 | https://replicate.com/lucataco/ollama-nemotron-70b


--------------------------------------------------------------------------------
/2024.10.18.automated_ml_research/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.10.18.automated_ml_research/thumbnail.png


--------------------------------------------------------------------------------
/2024.10.25.tiny_transformers/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Tiny Transformers
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/HcE3I_iCvoI
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1RDxlyRQRjDKL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | references
24 | 
25 | What Matters In Transformers?
26 | https://arxiv.org/pdf/2406.15786
27 | 
28 | 1-bit AI Infra
29 | https://arxiv.org/pdf/2410.16144
30 | 
31 | PYRAMIDDROP: ACCELERATING YOUR LARGE
32 | VISION-LANGUAGE MODELS VIA PYRAMID VISUAL
33 | REDUNDANCY REDUCTION
34 | https://arxiv.org/pdf/2410.17247
35 | 
36 | https://transformer-circuits.pub/2021/framework/index.html#analyzing-a-two-layer-model
37 | 
38 | https://en.wikipedia.org/wiki/Lookup_table#Lookup_tables_in_image_processing
39 | 
40 | https://www.mobilephonemuseum.com/phone-detail/nokia-3310
41 | 
42 | https://en.wikipedia.org/wiki/GeForce_30_series
43 | 
44 | https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/
45 | 
46 | https://siboehm.com/articles/22/Fast-MMM-on-CPU
47 | 
48 | https://bbycroft.net/llm
49 | 
50 | https://cdn.prod.website-files.com/64f4e81394e25710d22d042e/6515ab82c9ae3de21049c8be_spatten_heatmap_bert.jpeg
51 | 


--------------------------------------------------------------------------------
/2024.10.25.tiny_transformers/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.10.25.tiny_transformers/thumbnail.png


--------------------------------------------------------------------------------
/2024.11.01.rag/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # RAG
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/YLdK-683lCY
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1lDxLljzeZqxm
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications
24 | https://arxiv.org/pdf/2410.21943
25 | 
26 | CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
27 | https://arxiv.org/pdf/2410.23090
28 | 
29 | AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels
30 | https://arxiv.org/pdf/2410.20050
31 | 
32 | RARE: RETRIEVAL AUGMENTED RETRIEVAL WITH IN-CONTEXT EXAMPLES
33 | https://arxiv.org/pdf/2410.20088
34 | 
35 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
36 | https://arxiv.org/pdf/2410.21242
37 | 
38 | Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
39 | https://arxiv.org/pdf/2410.21220
40 | 
41 | https://en.wikipedia.org/wiki/Turing_machine#Physical_description
42 | 
43 | https://en.wikipedia.org/wiki/Von_Neumann_architecture#stored-program-computer
44 | 
45 | https://ars.els-cdn.com/content/image/3-s2.0-B9780444826183500887-f40-03-9780444826183.jpg
46 | 


--------------------------------------------------------------------------------
/2024.11.01.rag/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.11.01.rag/thumbnail.png


--------------------------------------------------------------------------------
/2024.11.08.tokenformer/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Tokenformer
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/yOT9WIL_2Kg
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1vOxwrNwoBWJB
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
24 | https://arxiv.org/pdf/2410.23168
25 | 
26 | Randomized Autoregressive Visual Generation
27 | https://arxiv.org/pdf/2411.00776
28 | 
29 | DIFFERENTIAL TRANSFORMER
30 | https://arxiv.org/pdf/2410.05258
31 | 
32 | Value Vector in Attention
33 | https://youtu.be/eMlx5fFNoYc?si=trMnc4Dlf1clirte&t=922
34 | 
35 | Majority of params are MLP
36 | https://youtu.be/9-Jl0dxWQs8?si=r066OC6wfnIeYNOv&t=1001
37 | 
38 | Orthogonality grows exponentially with the dimmensionality of the space
39 | https://youtu.be/9-Jl0dxWQs8?si=CVUDv1vw0cmkcTbs&t=1208
40 | 
41 | Karpathy Attention
42 | https://youtu.be/kCc8FmEb1nY?t=3858
43 | 
44 | Language is indexing and retrieval in brains
45 | https://youtu.be/JTU8Ha4Jyfc?t=7890
46 | 
47 | Method of Locci
48 | https://cognitivebias.io/uploads/ybias/image-6495c516a0377.png
49 | 
50 | Memories as a Sequence of visual querries
51 | https://source.colostate.edu/an-elephant-never-forgets/
52 | 
53 | https://onlinelibrary.wiley.com/doi/10.1002/evan.21924
54 | 
55 | anamorphic sculptures
56 | https://leonacreo.com/wp-content/uploads/2022/02/matthieu-robert-ortis-sculptures-anamorphose-illusion-5.jpg
57 | 


--------------------------------------------------------------------------------
/2024.11.08.tokenformer/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.11.08.tokenformer/thumbnail.png


--------------------------------------------------------------------------------
/2024.11.22.visual_reasoning/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Visual Reasoning
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/5qkjxDzEaaw
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1BdGYEognPXGX
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step
24 | - https://arxiv.org/pdf/2411.10440
25 | - https://illuminate.google.com/library?play=n7aF5j_LSK5N
26 | 
27 | Large Language Models Can Self-Improve in Long-context Reasoning
28 | - https://arxiv.org/pdf/2411.08147
29 | - https://illuminate.google.com/library?play=OT6klt_BObf5
30 | 
31 | INFERENCE OPTIMAL VLMS NEED ONLY ONE VISUAL TOKEN BUT LARGER MODELS
32 | - https://arxiv.org/pdf/2411.03312v1
33 | - https://illuminate.google.com/library?play=n7XOFj_FqDD8
34 | 
35 | BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
36 | - https://arxiv.org/pdf/2411.10640
37 | - https://illuminate.google.com/library?play=1APFyg_n5ZS3
38 | 
39 | GENERATIVE WORLD EXPLORER
40 | - https://arxiv.org/pdf/2411.11844
41 | - https://illuminate.google.com/library?play=6h1cjc1_VZCv2
42 | 
43 | The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
44 | - https://arxiv.org/pdf/2411.10323
45 | - https://illuminate.google.com/library?play=0CA0xg_SoEhm
46 | 
47 | DeepSeek-R1-Lite-Preview
48 | https://x.com/deepseek_ai/status/1859200141355536422
49 | 
50 | https://unsloth.ai/blog/vision
51 | 
52 | Multi-Paper Illuminate
53 | https://illuminate.google.com/library?play=P6lrND1_WjBU
54 | 
55 | Multimodal Autoregressive Pre-training of Large Vision Encoders
56 | https://arxiv.org/pdf/2411.14402
57 | 
58 | https://en.wikipedia.org/wiki/KISS_principle
59 | 


--------------------------------------------------------------------------------
/2024.11.22.visual_reasoning/ideas.txt:
--------------------------------------------------------------------------------
 1 | LLaVA-o1: Let Vision Language Models Reason Step-by-Step
 2 | https://arxiv.org/pdf/2411.10440
 3 | 
 4 | Large Language Models Can Self-Improve in Long-context Reasoning
 5 | https://arxiv.org/pdf/2411.08147
 6 | 
 7 | INFERENCE OPTIMAL VLMS NEED ONLY ONE VISUAL TOKEN BUT LARGER MODELS
 8 | https://arxiv.org/pdf/2411.03312v1
 9 | 
10 | BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
11 | https://arxiv.org/pdf/2411.10640
12 | 
13 | GENERATIVE WORLD EXPLORER
14 | https://arxiv.org/pdf/2411.11844
15 | 
16 | The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
17 | https://arxiv.org/pdf/2411.10323
18 | 
19 | BELLMAN DIFFUSION: GENERATIVE MODELING AS LEARNING A LINEAR OPERATOR IN THE DISTRIBUTION SPACE
20 | https://arxiv.org/pdf/2410.01796
21 | 
22 | Convolutional Differentiable Logic Gate Networks
23 | https://arxiv.org/pdf/2411.04732
24 | 


--------------------------------------------------------------------------------
/2024.11.22.visual_reasoning/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.11.22.visual_reasoning/thumbnail.png


--------------------------------------------------------------------------------
/2024.11.29.streaming_rl/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Streaming RL
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/5NFAzluHkcY
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1lDxLlMAkEaxm
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Streaming Deep Reinforcement Learning Finally Works
24 | - https://arxiv.org/pdf/2410.14606
25 | - https://github.com/mohmdelsayed/streaming-drl
26 | - https://illuminate.google.com/library?play=kkZIWP_XBqe2
27 | 
28 | Hymba Hybrid-Head Architecture Boosts Small Language Model Performance
29 | https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/
30 | 
31 | Optimizer Visualizations
32 | https://github.com/j-w-yun/optimizer-visualization
33 | 
34 | Alberta Canada
35 | https://c8.alamy.com/comp/2N04WRB/canada-administrative-divisions-political-map-the-ten-provinces-and-three-territories-of-canada-with-their-borders-and-capitals-2N04WRB.jpg
36 | 
37 | Bitter Lesson
38 | http://www.incompleteideas.net/IncIdeas/BitterLesson.html
39 | 
40 | Atari
41 | https://www.marktechpost.com/wp-content/uploads/2021/02/image4.gif
42 | 
43 | DM Control
44 | https://www.marktechpost.com/wp-content/uploads/2021/02/image4.gif
45 | 
46 | Deep RL kinda sucks
47 | https://www.alexirpan.com/2018/02/14/rl-hard.html
48 | 


--------------------------------------------------------------------------------
/2024.11.29.streaming_rl/ideas.txt:
--------------------------------------------------------------------------------
 1 | CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
 2 | https://arxiv.org/pdf/2411.18613
 3 | 
 4 | Pathways on the Image Manifold: Image Editing via Video Generation
 5 | https://arxiv.org/pdf/2411.16819
 6 | 
 7 | SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
 8 | https://arxiv.org/pdf/2411.16856
 9 | 
10 | Material Anything: Generating Materials for Any 3D Object via Diffusion
11 | https://arxiv.org/pdf/2411.15138
12 | 
13 | ROICtrl: Boosting Instance Control for Visual Generation
14 | https://arxiv.org/pdf/2411.17949
15 | 
16 | Boltz-1 Democratizing Biomolecular Interaction Modeling
17 | https://gcorso.github.io/assets/boltz1.pdf
18 | 
19 | Streaming Deep Reinforcement Learning Finally Works
20 | https://arxiv.org/pdf/2410.14606
21 | 
22 | Hymba Hybrid-Head Architecture Boosts Small Language Model Performance
23 | https://developer.nvidia.com/blog/hymba-hybrid-head-architecture-boosts-small-language-model-performance/
24 | 
25 | TEXGen: a Generative Diffusion Model for Mesh Textures
26 | https://arxiv.org/pdf/2411.14740
27 | 
28 | Generative Agent Simulations of 1,000 People
29 | https://arxiv.org/pdf/2411.10109
30 | 
31 | SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
32 | https://arxiv.org/pdf/2411.11922
33 | 
34 | SCALING IN-THE-WILD TRAINING FOR DIFFUSIONBASED ILLUMINATION HARMONIZATION AND EDITING BY IMPOSING CONSISTENT LIGHT TRANSPORT
35 | https://openreview.net/pdf?id=u1cQYxRI1H


--------------------------------------------------------------------------------
/2024.11.29.streaming_rl/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.11.29.streaming_rl/thumbnail.png


--------------------------------------------------------------------------------
/2024.12.06.video_generation/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Video Generation
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/kfxy-ykLnZI
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ypKdpvQLzgKW
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Open-Sora Plan: Open-Source Large Video Generation Model
24 | https://arxiv.org/pdf/2412.00131
25 | 
26 | VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
27 | https://arxiv.org/pdf/2412.02259
28 | 
29 | HunyuanVideo: A Systematic Framework For Large Video Generation Model Training
30 | https://github.com/Tencent/HunyuanVideo/blob/main/assets/hunyuanvideo.pdf
31 | 
32 | X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
33 | https://arxiv.org/pdf/2412.01824
34 | 
35 | - https://huggingface.co/tencent/HunyuanVideo
36 | - https://huggingface.co/tencent/HunyuanVideo-PromptRewrite
37 | - https://replicate.com/tencent/hunyuan-video
38 | - https://cloud.google.com/blog/products/ai-machine-learning/introducing-veo-and-imagen-3-on-vertex-ai
39 | - https://flux-ai.io/flux-video-ai/
40 | - https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/
41 | 


--------------------------------------------------------------------------------
/2024.12.06.video_generation/ideas.txt:
--------------------------------------------------------------------------------
 1 | PaliGemma 2: A Family of Versatile VLMs for Transfer
 2 | https://arxiv.org/pdf/2412.03555
 3 | 
 4 | ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
 5 | https://arxiv.org/pdf/2411.18363
 6 | 
 7 | Training and Evaluating Language Models with Template-based Data Generation
 8 | https://arxiv.org/pdf/2411.18104
 9 | 
10 | 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
11 | https://arxiv.org/pdf/2411.14974
12 | 
13 | JETFORMER: AN AUTOREGRESSIVE GENERATIVE MODEL OF RAW IMAGES AND TEXT
14 | https://arxiv.org/pdf/2411.19722
15 | 
16 | Efficient Track Anything
17 | https://arxiv.org/pdf/2411.18933
18 | 
19 | Reverse Thinking Makes LLMs Stronger Reasoners
20 | https://arxiv.org/pdf/2411.19865
21 | 
22 | X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
23 | https://arxiv.org/pdf/2412.01824
24 | 
25 | Open-Sora Plan: Open-Source Large Video Generation Model
26 | https://arxiv.org/pdf/2412.00131
27 | 
28 | VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
29 | https://arxiv.org/pdf/2412.02259
30 | 
31 | HunyuanVideo: A Systematic Framework For Large Video Generation Model Training
32 | https://github.com/Tencent/HunyuanVideo/blob/main/assets/hunyuanvideo.pdf
33 | 
34 | Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability
35 | https://arxiv.org/pdf/2411.19943
36 | 
37 | 


--------------------------------------------------------------------------------
/2024.12.06.video_generation/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.12.06.video_generation/thumbnail.png


--------------------------------------------------------------------------------
/2024.12.13.visual_autoregressive_modeling/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Visual Autoregressive Modeling
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/-jG7S5g071Q
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1jMKgBwBDVlxL
10 | 
11 | **Twitch:**
12 | 
13 | **Substack:**
14 | 
15 | **ResearchHub:**
16 | 
17 | **TikTok:**
18 | 
19 | **Reddit:**
20 | 
21 | ### References
22 | 
23 | Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
24 | https://arxiv.org/pdf/2404.02905
25 | 
26 | Frechet Inception Distance
27 | https://miro.medium.com/v2/resize:fit:2000/format:webp/1*K3_pVEEvtSgYRR9zUokDBA.png
28 | 
29 | VQ-VAE
30 | https://raw.githubusercontent.com/lucidrains/vector-quantize-pytorch/master/images/vq.png
31 | 
32 | convnet
33 | https://8f430952.rocketcdn.me/wp-content/uploads/2018/01/conv-full-layer.gif
34 | 
35 | raster scan
36 | https://miro.medium.com/v2/resize:fit:1400/1*uJObiQZsEMclcmoS4gv7zw.png
37 | 
38 | Tian Lawsuit
39 | https://www.shine.cn/news/nation/2412047516/
40 | 
41 | Pyramidal Neurons
42 | https://neuwritesd.org/wp-content/uploads/2015/10/visual_stream_small.png
43 | 


--------------------------------------------------------------------------------
/2024.12.13.visual_autoregressive_modeling/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.12.13.visual_autoregressive_modeling/thumbnail.png


--------------------------------------------------------------------------------
/2024.12.20.generative_latent_space_reasoning/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # 
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/YhrwYZ3Nsio
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1OyKAZgmyYbGb
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/1uYwiKjwCrP7E1_YFWqybg0MlV3s1180E1Rupzbd0vCQ/edit?usp=sharing
12 | 
13 | **Twitch:**
14 | 
15 | **Substack:**
16 | 
17 | **ResearchHub:**
18 | 
19 | **TikTok:**
20 | 
21 | **Reddit:**
22 | 
23 | ### References
24 | 
25 | Byte Latent Transformer: Patches Scale Better Than Tokens
26 | https://arxiv.org/pdf/2412.09871
27 | 
28 | Large Concept Models: Language Modeling in a Sentence Representation Space
29 | https://arxiv.org/pdf/2412.08821v2
30 | 
31 | Training Large Language Models to Reason in a Continuous Latent Space
32 | https://arxiv.org/pdf/2412.06769
33 | 
34 | Superhuman performance of a large language model on the reasoning tasks of a physician
35 | https://www.arxiv.org/pdf/2412.10849
36 | 
37 | Memory Layers at Scale
38 | https://arxiv.org/pdf/2412.09764
39 | 
40 | ALIGNMENT FAKING IN LARGE LANGUAGE MODELS
41 | https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
42 | 
43 | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
44 | https://genesis-embodied-ai.github.io/
45 | 
46 | Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
47 | https://arxiv.org/pdf/2412.09645
48 | 
49 | Wonderland: Navigating 3D Scenes from a Single Image
50 | https://arxiv.org/pdf/2412.12091
51 | 
52 | SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
53 | https://arxiv.org/pdf/2412.11605
54 | 
55 | GenEx: Generating an Explorable World
56 | https://arxiv.org/pdf/2412.09624
57 | 
58 | Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
59 | https://arxiv.org/pdf/2412.09501
60 | 
61 | Are Your LLMs Capable of Stable Reasoning?
62 | https://arxiv.org/pdf/2412.13147
63 | 
64 | AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
65 | https://arxiv.org/pdf/2412.14123
66 | 
67 | Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
68 | https://arxiv.org/pdf/2412.12276
69 | 
70 | FLAME : Factuality-Aware Alignment for Large Language Models
71 | https://ai.meta.com/research/publications/flame-factuality-aware-alignment-for-large-language-models/
72 | 
73 | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
74 | https://genesis-embodied-ai.github.io/
75 | 
76 | Representing Long Volumetric Video with Temporal Gaussian Hierarchy
77 | https://arxiv.org/pdf/2412.09608
78 | 
79 | MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
80 | https://arxiv.org/pdf/2412.14164


--------------------------------------------------------------------------------
/2024.12.20.generative_latent_space_reasoning/ideas.txt:
--------------------------------------------------------------------------------
 1 | Byte Latent Transformer: Patches Scale Better Than Tokens
 2 | https://arxiv.org/pdf/2412.09871
 3 | 
 4 | Large Concept Models: Language Modeling in a Sentence Representation Space
 5 | https://arxiv.org/pdf/2412.08821v2
 6 | 
 7 | Training Large Language Models to Reason in a Continuous Latent Space
 8 | https://arxiv.org/pdf/2412.06769
 9 | 
10 | Superhuman performance of a large language model on the reasoning tasks of a physician
11 | https://www.arxiv.org/pdf/2412.10849
12 | 
13 | Memory Layers at Scale
14 | https://arxiv.org/pdf/2412.09764
15 | 
16 | ALIGNMENT FAKING IN LARGE LANGUAGE MODELS
17 | https://assets.anthropic.com/m/983c85a201a962f/original/Alignment-Faking-in-Large-Language-Models-full-paper.pdf
18 | 
19 | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
20 | https://genesis-embodied-ai.github.io/
21 | 
22 | Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
23 | https://arxiv.org/pdf/2412.09645
24 | 
25 | Wonderland: Navigating 3D Scenes from a Single Image
26 | https://arxiv.org/pdf/2412.12091
27 | 
28 | SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
29 | https://arxiv.org/pdf/2412.11605
30 | 
31 | GenEx: Generating an Explorable World
32 | https://arxiv.org/pdf/2412.09624
33 | 
34 | Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
35 | https://arxiv.org/pdf/2412.09501
36 | 
37 | Are Your LLMs Capable of Stable Reasoning?
38 | https://arxiv.org/pdf/2412.13147
39 | 
40 | AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
41 | https://arxiv.org/pdf/2412.14123
42 | 
43 | Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers
44 | https://arxiv.org/pdf/2412.12276
45 | 
46 | FLAME : Factuality-Aware Alignment for Large Language Models
47 | https://ai.meta.com/research/publications/flame-factuality-aware-alignment-for-large-language-models/
48 | 
49 | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
50 | https://genesis-embodied-ai.github.io/
51 | 
52 | Representing Long Volumetric Video with Temporal Gaussian Hierarchy
53 | https://arxiv.org/pdf/2412.09608
54 | 
55 | MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
56 | https://arxiv.org/pdf/2412.14164


--------------------------------------------------------------------------------
/2024.12.20.generative_latent_space_reasoning/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2024.12.20.generative_latent_space_reasoning/thumbnail.png


--------------------------------------------------------------------------------
/2025.01.10.agent_frameworks/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Predictions 2025
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/HVyq0n8qSnE
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1ypKdpdDqknKW
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/1UImZAWY1cnaJC5Fu3KZuZ0brjBOAp922196djfBMg2M/edit?usp=sharing
12 | 
13 | ### References
14 | 
15 | https://github.com/hu-po/agents-test
16 | 
17 | https://www.anthropic.com/research/building-effective-agents
18 | 
19 | https://huggingface.co/docs/smolagents/index
20 | 
21 | https://github.com/langchain-ai/langgraph
22 | 
23 | https://github.com/browser-use/browser-use
24 | 
25 | https://ai.pydantic.dev/
26 | 
27 | https://star-history.com/#browser-use/browser-use&pydantic/pydantic-ai&huggingface/smolagents&stanfordnlp/dspy&elizaOS/eliza&Date
28 | 
29 | https://dagster.io/blog/fake-stars
30 | 
31 | https://trendshift.io/?trending-range=360&trending-limit=100
32 | 
33 | https://aws.amazon.com/bedrock/agents/
34 | 
35 | https://rivet.ironcladapp.com/docs/user-guide/adding-connecting-nodes
36 | 
37 | https://www.vellum.ai/
38 | 
39 | https://www.kaggle.com/whitepaper-agents
40 | 
41 | https://dspy.ai/#__tabbed_2_5
42 | 
43 | https://github.com/coinbase/cdp-agentkit
44 | 
45 | https://github.com/fetchai/agents-aea
46 | 
47 | https://github.com/microsoft/playwright?tab=readme-ov-file
48 | 
49 | https://github.com/elizaOS/eliza
50 | 
51 | https://arxiv.org/pdf/2412.13459
52 | 
53 | https://arxiv.org/pdf/2501.01149
54 | 
55 | https://github.com/pydantic/pydantic-ai/blob/b10ab78694db5408bcd664a18439d822d66a184c/examples/pydantic_ai_examples/flight_booking.py#L188
56 | 
57 | https://github.com/langchain-ai/langgraph/blob/f4bd02da72cfb81b60214ed6159db4d614380d02/docs/docs/tutorials/introduction.ipynb#L284
58 | 
59 | https://github.com/huggingface/smolagents/blob/4e80e1b79a0bdd38298a8a5cc8a07f354e4cdc6e/src/smolagents/agents.py#L521


--------------------------------------------------------------------------------
/2025.01.10.agent_frameworks/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.01.10.agent_frameworks/thumbnail.png


--------------------------------------------------------------------------------
/2025.01.17.asi/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # ASI
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/KYlbny1rN1g
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1LyxBgnLEpLKN
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/16ow5I9n1wEvI8GK223LPCSPgfya83DDMPKmn1TopYjM
12 | 
13 | ### References
14 | 
15 | https://cs.uwaterloo.ca/~ppoupart/teaching/cs885-spring18/slides/cs885-lecture14a.pdf
16 | 
17 | https://www.nature.com/articles/nature24270
18 | 
19 | Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought
20 | https://arxiv.org/pdf/2501.04682
21 | 
22 | Scaling Synthetic Data Creation with 1,000,000,000 Personas
23 | https://arxiv.org/pdf/2406.20094
24 | 
25 | https://www.youtube.com/watch?v=n0zd5hcOSQI
26 | 
27 | https://arxiv.org/pdf/2306.11644
28 | 
29 | https://arxiv.org/abs/2501.07301
30 | 
31 | https://github.com/exo-explore/llama98.c
32 | 
33 | https://arxiv.org/pdf/2501.00663
34 | 
35 | https://arxiv.org/pdf/2501.07542
36 | 
37 | https://arxiv.org/pdf/2407.13623


--------------------------------------------------------------------------------
/2025.01.17.asi/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.01.17.asi/thumbnail.png


--------------------------------------------------------------------------------
/2025.01.24.relative_entropy/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Relative Entropy
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/LuF4NGezcxo
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqKDkkpelVxV
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/1rv8nxR_fXIOFQbaBjHRFD457zqpWZTJ09HsBiT31drA/edit?usp=sharing
12 | 
13 | ### References
14 | 
15 | https://arxiv.org/pdf/2409.09232
16 | 
17 | https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.special.kl_div.html
18 | 
19 | https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html
20 | 
21 | https://pytorch.org/docs/stable/generated/torch.nn.functional.kl_div.html
22 | 
23 | https://news.ycombinator.com/item?id=37214898
24 | 
25 | https://machinelearningmastery.com/divergence-between-probability-distributions/
26 | 
27 | https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html
28 | 
29 | https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_config.py
30 | 
31 | http://joschu.net/blog/kl-approx.html
32 | 
33 | https://en.wikipedia.org/wiki/F-divergence
34 | 
35 | https://github.com/MoonshotAI/Kimi-k1.5/blob/main/Kimi_k1.5.pdf
36 | 
37 | https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
38 | 
39 | https://projecteuclid.org/journalArticle/Download?urlId=10.1214%2Faoms%2F1177729694
40 | 
41 | https://www.tensorflow.org/api_docs/python/tf/keras/metrics/KLDivergence
42 | 
43 | https://gregorygundersen.com/blog/2020/02/09/log-sum-exp/
44 | 
45 | https://lichess.org/@/CheckRaiseMate/blog/entropy-in-the-opening/sBebftmy
46 | 
47 | https://medium.com/swlh/a-deep-conceptual-guide-to-mutual-information-a5021031fad0
48 | 
49 | https://writings.stephenwolfram.com/2023/02/computational-foundations-for-the-second-law-of-thermodynamics/
50 | 
51 | https://arxiv.org/pdf/1909.12830v1
52 | 
53 | https://docs.nvidia.com/cuda/cuda-math-api/cuda_math_api/group__CUDA__MATH__DOUBLE.html#group__cuda__math__double_1ga28ce8e15ef5149c271eba95663becba2
54 | 
55 | https://en.cppreference.com/w/cpp/numeric/math/log


--------------------------------------------------------------------------------
/2025.01.24.relative_entropy/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.01.24.relative_entropy/thumbnail.png


--------------------------------------------------------------------------------
/2025.01.31.open_source_deepseek/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Open Source DeepSeek
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Ii_7-wsTjLo
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1YqKDkNMrPkxV
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/1eX8HNGycDcRDO6lRcsj_2PMTzlp5rSQhYkmr3vzh6SI/edit?usp=sharing
12 | 
13 | ### References
14 | 
15 | https://github.com/huggingface/open-r1
16 | 
17 | https://github.com/huggingface/trl.git
18 | 
19 | https://github.com/deepseek-ai/DeepSeek-V3
20 | 
21 | https://huggingface.co/deepseek-ai/DeepSeek-R1
22 | 
23 | https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
24 | 
25 | https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
26 | 
27 | https://arxiv.org/pdf/2402.03300
28 | 
29 | https://arxiv.org/pdf/2404.05728v3
30 | 
31 | https://arxiv.org/pdf/2001.08361
32 | 
33 | https://arxiv.org/pdf/2203.03466


--------------------------------------------------------------------------------
/2025.01.31.open_source_deepseek/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.01.31.open_source_deepseek/thumbnail.png


--------------------------------------------------------------------------------
/2025.02.07.real2sim2real/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Real2Sim2Real
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/9YioJzz0tpk
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1djGXVemLaoxZ
10 | 
11 | **Slides:** None
12 | 
13 | ### References
14 | 
15 | https://x.com/hashtag/ICLR2025
16 | 
17 | https://arxiv.org/pdf/2502.01536
18 | 
19 | https://arxiv.org/pdf/2502.01143
20 | 
21 | https://github.com/LeCAR-Lab/ASAP
22 | 
23 | https://github.com/ou524u/MotionCritic
24 | 
25 | https://arxiv.org/abs/2407.02272
26 | 
27 | https://huggingface.co/blog/pi0
28 | 
29 | https://www.physicalintelligence.company/blog/openpi
30 | 
31 | https://github.com/Physical-Intelligence/openpi
32 | 
33 | https://arxiv.org/pdf/2403.03949
34 | 
35 | https://arxiv.org/pdf/2410.23234
36 | 
37 | https://arxiv.org/pdf/2501.12493
38 | 
39 | https://arxiv.org/pdf/2501.16664


--------------------------------------------------------------------------------
/2025.02.07.real2sim2real/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.02.07.real2sim2real/thumbnail.png


--------------------------------------------------------------------------------
/2025.02.14.test_time_scaling/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Test Time Scaling
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/Xk8FtcSlFxs
 8 | 
 9 | **X:** https://twitter.com/i/broadcasts/1lPKqMrnwOeKb
10 | 
11 | **Slides:** https://docs.google.com/presentation/d/1a9IUI2qpk_8psiW9VP8oh6UigXVVjiIpjLgsoJ_nPVA/edit?usp=sharing
12 | 
13 | ### References
14 | 
15 | 
16 | ### Ideas
17 | 
18 | https://arxiv.org/pdf/2502.05171
19 | 
20 | https://arxiv.org/pdf/2502.06807
21 | 
22 | https://arxiv.org/pdf/2502.01612
23 | 
24 | https://arxiv.org/pdf/2502.04896
25 | 
26 | https://arxiv.org/pdf/2502.05171
27 | 
28 | https://arxiv.org/pdf/2502.06703
29 | 
30 | https://arxiv.org/pdf/2406.13527
31 | 
32 | https://ai.meta.com/research/publications/brain-to-text-decoding-a-non-invasive-approach-via-typing/
33 | 
34 | https://arxiv.org/pdf/2502.01584
35 | 
36 | https://arxiv.org/pdf/2502.08297
37 | 
38 | https://arxiv.org/pdf/2501.19393


--------------------------------------------------------------------------------
/2025.02.14.test_time_scaling/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.02.14.test_time_scaling/thumbnail.png


--------------------------------------------------------------------------------
/2025.05.23.robotics_vibe_coding/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Robot Vibe Coding
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/I8Eh73qMaXQ
 8 | 
 9 | **X:** https://x.com/i/broadcasts/1BdGYqWNwWAGX
10 | 
11 | ### References
12 | 
13 | Tatbot Accounts
14 | - https://x.com/tatbots
15 | - https://www.instagram.com/tatbots/
16 | - https://www.tiktok.com/@tatbottok
17 | 
18 | Viser
19 | - https://github.com/nerfstudio-project/viser
20 | 
21 | PyRoKi
22 | - https://github.com/chungmin99/pyroki
23 | 
24 | VGGT
25 | - https://github.com/facebookresearch/vggt
26 | - https://arxiv.org/pdf/2503.11651
27 | 
28 | Mast3r
29 | - https://github.com/naver/mast3r
30 | - https://arxiv.org/pdf/2406.09756
31 | 
32 | Warp
33 | - https://github.com/NVIDIA/warp
34 | 
35 | GitHub Repos
36 | - https://github.com/hu-po/rcam
37 | - https://github.com/hu-po/tatbot


--------------------------------------------------------------------------------
/2025.05.23.robotics_vibe_coding/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.05.23.robotics_vibe_coding/thumbnail.png


--------------------------------------------------------------------------------
/2025.05.30.cursor_vs_codex/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.jpg)
 2 | 
 3 | # Cursor vs Codex
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:** https://youtube.com/live/vssqXg-Gm2Y
 8 | 
 9 | **X:** https://x.com/i/broadcasts/1dRKZYejyPzxB
10 | 
11 | ### References
12 | 
13 | Cursor
14 | https://www.cursor.com/
15 | 
16 | OpenAI Codex
17 | https://chatgpt.com/codex
18 | 
19 | Google Jules
20 | https://jules.google.com/
21 | 
22 | Tatbot
23 | https://github.com/hu-po/tatbot
24 | 


--------------------------------------------------------------------------------
/2025.05.30.cursor_vs_codex/thumbnail.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.05.30.cursor_vs_codex/thumbnail.jpg


--------------------------------------------------------------------------------
/2025.06.06.smolvla/README.md:
--------------------------------------------------------------------------------
  1 | ![thumbnail](thumbnail.png)
  2 | 
  3 | # SmolVLA
  4 | 
  5 | ### Links
  6 | 
  7 | **YouTube:** https://youtube.com/live/VbhL8_vVtVM
  8 | 
  9 | **X:** https://x.com/i/broadcasts/1lPJqMklwjLJb
 10 | 
 11 | **Slides:** https://docs.google.com/presentation/d/1hGropvqiiMCz6bWMunFVnWksmhQMRBY3B4B2kg01hGc/edit?usp=sharing
 12 | 
 13 | ### References
 14 | 
 15 | https://arxiv.org/pdf/2506.01844
 16 | 
 17 | https://huggingface.co/blog/smolvla
 18 | 
 19 | https://huggingface.co/lerobot/smolvla_base
 20 | 
 21 | https://huggingface.co/spaces/Beegbrain/FilterLeRobotData
 22 | 
 23 | https://huggingface.co/spaces/lerobot/visualize_dataset?path=%2Flerobot%2Fsvla_so100_sorting%2Fepisode_1
 24 | 
 25 | https://arxiv.org/pdf/2303.04137
 26 | 
 27 | https://www.physicalintelligence.company/download/pi0.pdf
 28 | 
 29 | https://github.com/TheRobotStudio/SO-ARM100
 30 | 
 31 | https://arxiv.org/pdf/2503.14734
 32 | 
 33 | 
 34 | train (finetune) on 3090
 35 | ```bash
 36 | git clone --depth 1 https://github.com/huggingface/lerobot.git
 37 | cd lerobot
 38 | uv venv # https://docs.astral.sh/uv/getting-started/installation/
 39 | source .venv/bin/activate
 40 | uv pip install -e ".[smolvla]"
 41 | wandb login
 42 | uv run python lerobot/scripts/train.py \
 43 |   --policy.path=lerobot/smolvla_base \
 44 |   --dataset.repo_id=lerobot/svla_so100_stacking \
 45 |   --batch_size=64 \
 46 |   --wandb.enable=true \
 47 |   --steps=200000
 48 | ```
 49 | 
 50 | eval on 3090
 51 | ```bash
 52 | uv run python smolvla_benchmark.py
 53 | Avg inference time: 0.186218 s
 54 | Max GPU memory used: 908.43 MB
 55 | ```
 56 | 
 57 | 
 58 | `smolvla_benchmark.py`
 59 | ```python
 60 | import torch
 61 | import time
 62 | from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy
 63 | from lerobot.common.policies.smolvla.configuration_smolvla import SmolVLAConfig
 64 | from transformers import AutoProcessor
 65 | 
 66 | # Load model (replace with your checkpoint if needed)
 67 | policy = SmolVLAPolicy.from_pretrained(
 68 |     "/home/oop/lerobot/outputs/train/2025-06-05/08-54-14_smolvla/checkpoints/last/pretrained_model"
 69 | ).to("cuda")
 70 | policy.eval()
 71 | 
 72 | # Monkey-patch: The loaded policy is missing the language_tokenizer attribute.
 73 | policy.language_tokenizer = AutoProcessor.from_pretrained(policy.config.vlm_model_name).tokenizer
 74 | 
 75 | # Dummy batch config for a single observation
 76 | batch_size = 1
 77 | img_shape = (3, 512, 512)  # (C, H, W)
 78 | # Infer state_dim from the loaded normalization stats
 79 | state_dim = policy.normalize_inputs.buffer_observation_state.mean.shape[-1]
 80 | 
 81 | dummy_batch = {
 82 |     # a single image observation
 83 |     "observation.images.top": torch.rand(batch_size, *img_shape, device="cuda"),
 84 |     # a single state observation
 85 |     "observation.state": torch.rand(batch_size, state_dim, device="cuda"),
 86 |     "task": ["stack the blocks"] * batch_size,
 87 | }
 88 | 
 89 | # --- Prepare inputs for the model ---
 90 | # The policy expects normalized inputs and specific data preparation.
 91 | normalized_batch = policy.normalize_inputs(dummy_batch)
 92 | images, img_masks = policy.prepare_images(normalized_batch)
 93 | state = policy.prepare_state(normalized_batch)
 94 | lang_tokens, lang_masks = policy.prepare_language(normalized_batch)
 95 | # ---
 96 | 
 97 | # Warmup
 98 | for _ in range(3):
 99 |     with torch.no_grad():
100 |         _ = policy.model.sample_actions(images, img_masks, lang_tokens, lang_masks, state)
101 | 
102 | # Benchmark
103 | torch.cuda.reset_peak_memory_stats()
104 | start = time.time()
105 | for _ in range(100):
106 |     with torch.no_grad():
107 |         _ = policy.model.sample_actions(images, img_masks, lang_tokens, lang_masks, state)
108 | end = time.time()
109 | 
110 | print(f"Avg inference time: {(end - start)/100:.6f} s")
111 | print(f"Max GPU memory used: {torch.cuda.max_memory_allocated() / 1024**2:.2f} MB")
112 | ```
113 | 
114 | eval on agx
115 | after cloning lerobot:
116 | - add `OBS_ROBOT = "observation.state"` to `lerobot/lerobot/common/constants.py`
117 | - delete torch from the `pyproject.toml`
118 | ```bash
119 | sudo docker run --runtime nvidia -it --rm \
120 |   --network=host \
121 |   -v $PWD:/workspace/lerobot \
122 |   -v $HOME/.cache/huggingface:/root/.cache/huggingface \
123 |   -w /workspace/lerobot \
124 |   dustynv/pytorch:2.1-r36.2.0 \
125 |   bash
126 | pip install --ignore-installed "numpy<2" blinker pyserial
127 | pip install --no-deps -e ".[smolvla]"
128 | pip install --force-reinstall "numpy<2"
129 | pip install --upgrade --force-reinstall transformers
130 | python3 smolvla_benchmark.py
131 | ```
132 | https://github.com/dusty-nv/jetson-containers/tree/master/packages/pytorch
133 | 


--------------------------------------------------------------------------------
/2025.06.06.smolvla/thumbnail.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hu-po/docs/bf7b0a6952809b567d110db46860dd5542cc678f/2025.06.06.smolvla/thumbnail.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # docs
 2 | 
 3 | This repo contains documentation for content creation: stream links, references, scripts, notes, thumbnails, etc.
 4 | 
 5 | Each folder contains content for a single stream, format is `year.month.date.title`
 6 | 
 7 | I stream on Fridays:
 8 | 
 9 | ```
10 | - CT (Austin): 9am-11am
11 | - ET (New York): 10am-12pm
12 | - BST/WEST (London): 3-5pm
13 | - CEST (Berlin): 4-6pm
14 | - EEST (Athens): 5-7pm
15 | - IST (Mumbai): 7:30-9:30pm
16 | - CST (Beijing): 10pm-12am
17 | - JST (Tokyo): 11pm-1am
18 | ```
19 | 
20 | I source my papers from the following places:
21 | 
22 | - [X.com](https://x.com/i/lists/1653485531546767361)
23 | - [PapersWithCode](https://paperswithcode.com/)
24 | - [Reddit](https://www.reddit.com/user/deephugs/m/ml/top/?t=week)
25 | - [EmergentMind](https://www.emergentmind.com/)
26 | - [HuggingFace](https://huggingface.co/papers)
27 | - [ML-Papers-of-the-Week](https://github.com/dair-ai/ML-Papers-of-the-Week)
28 | - [ArxivSanity](http://www.arxiv-sanity.com/)
29 | 
30 | ### Citation
31 | 
32 | ```
33 | @misc{hu-po-stream,
34 |   title={hu-po livestreams},
35 |   author={Hugo Ponte},
36 |   year={2025},
37 |   url={https://github.com/hu-po/docs}
38 | }
39 | 


--------------------------------------------------------------------------------
/template/README.md:
--------------------------------------------------------------------------------
 1 | ![thumbnail](thumbnail.png)
 2 | 
 3 | # Title
 4 | 
 5 | ### Links
 6 | 
 7 | **YouTube:**
 8 | 
 9 | **X:**
10 | 
11 | ### References
12 | 


--------------------------------------------------------------------------------