├── .gitignore ├── README.md ├── dual_scaled_adam.py ├── dual_scaled_psgd_kron.py ├── dual_scaled_psgd_lra.py ├── images ├── adam_by_momentum_decay.png ├── adaptive_muon_by_momentum_decay.png ├── adaptive_muon_by_momentum_decay_bfloat16.png ├── adaptive_muon_by_momentum_decay_optimized_coeffs_opt.png ├── adaptive_muon_by_momentum_decay_optimized_coeffs_opt_bfloat16.png ├── muon_by_momentum_decay.png ├── muon_by_momentum_decay_optimized_coeffs.png ├── muon_variants.png ├── optimizer_variants.png ├── optimizer_variants_beta=0.png ├── optimizer_variants_beta=0_5.png ├── optimizer_variants_beta=opt.png ├── optimizer_variants_bfloat16.png ├── optimizer_variants_bfloat16_beta=0.png ├── optimizer_variants_bfloat16_beta=0_5.png └── optimizer_variants_bfloat16_beta=opt.png └── simple_benchmark.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/* 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/README.md -------------------------------------------------------------------------------- /dual_scaled_adam.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/dual_scaled_adam.py -------------------------------------------------------------------------------- /dual_scaled_psgd_kron.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/dual_scaled_psgd_kron.py -------------------------------------------------------------------------------- /dual_scaled_psgd_lra.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/dual_scaled_psgd_lra.py -------------------------------------------------------------------------------- /images/adam_by_momentum_decay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/adam_by_momentum_decay.png -------------------------------------------------------------------------------- /images/adaptive_muon_by_momentum_decay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/adaptive_muon_by_momentum_decay.png -------------------------------------------------------------------------------- /images/adaptive_muon_by_momentum_decay_bfloat16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/adaptive_muon_by_momentum_decay_bfloat16.png -------------------------------------------------------------------------------- /images/adaptive_muon_by_momentum_decay_optimized_coeffs_opt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/adaptive_muon_by_momentum_decay_optimized_coeffs_opt.png -------------------------------------------------------------------------------- /images/adaptive_muon_by_momentum_decay_optimized_coeffs_opt_bfloat16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/adaptive_muon_by_momentum_decay_optimized_coeffs_opt_bfloat16.png -------------------------------------------------------------------------------- /images/muon_by_momentum_decay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/muon_by_momentum_decay.png -------------------------------------------------------------------------------- /images/muon_by_momentum_decay_optimized_coeffs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/muon_by_momentum_decay_optimized_coeffs.png -------------------------------------------------------------------------------- /images/muon_variants.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/muon_variants.png -------------------------------------------------------------------------------- /images/optimizer_variants.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants.png -------------------------------------------------------------------------------- /images/optimizer_variants_beta=0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_beta=0.png -------------------------------------------------------------------------------- /images/optimizer_variants_beta=0_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_beta=0_5.png -------------------------------------------------------------------------------- /images/optimizer_variants_beta=opt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_beta=opt.png -------------------------------------------------------------------------------- /images/optimizer_variants_bfloat16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_bfloat16.png -------------------------------------------------------------------------------- /images/optimizer_variants_bfloat16_beta=0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_bfloat16_beta=0.png -------------------------------------------------------------------------------- /images/optimizer_variants_bfloat16_beta=0_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_bfloat16_beta=0_5.png -------------------------------------------------------------------------------- /images/optimizer_variants_bfloat16_beta=opt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/images/optimizer_variants_bfloat16_beta=opt.png -------------------------------------------------------------------------------- /simple_benchmark.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/leloykun/adaptive-muon/HEAD/simple_benchmark.ipynb --------------------------------------------------------------------------------