├── .gitignore ├── README.md ├── Results.MD ├── anemll_bench.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── requires.txt └── top_level.txt ├── anemll_bench ├── __init__.py ├── __main__.py ├── benchmark.py ├── models │ ├── __init__.py │ ├── benchmark_result.py │ ├── coreml_adapter.py │ ├── meta.yalm │ ├── model_loader.py │ └── model_syncer.py ├── reports │ ├── __init__.py │ ├── report_generator.py │ └── report_uploader.py └── utils │ ├── __init__.py │ ├── system_info.py │ └── visualization.py ├── assets └── sample.png ├── create_python39_env.sh ├── examples ├── DUAL_MODEL_BENCHMARKING.md ├── basic_benchmark.py ├── batch_profile.py ├── benchmark_all_models.py ├── benchmark_config.json ├── benchmark_dual_models.py ├── benchmark_local_lm_head.py ├── check_online_models.py ├── generate_results_report.py ├── load_platform_models.py ├── manage_cache.py ├── plot_chip_comparison.py ├── profile_coreml.py ├── profile_local_model.py ├── sync_models.py ├── test_lm_head_benchmark.py └── test_model_loading.py ├── install_dependencies.sh ├── reports ├── chip_comparison_direct.png └── chip_comparison_llama_lm_head.png ├── requirements.txt ├── setup.py ├── test_browser_open.py └── tests ├── test_benchmark.py ├── test_report_uploader.py └── test_system_info.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Virtual Environment 2 | env-aneml-bench/* 3 | venv/ 4 | ENV/ 5 | .env 6 | 7 | # Python 8 | __pycache__/ 9 | *.py[cod] 10 | *.class 11 | *.so 12 | .Python 13 | build/ 14 | develop-eggs/ 15 | dist/ 16 | 17 | 18 | profile_report.html 19 | anemll_bench/results/ 20 | reports/*.html 21 | examples/*.html 22 | benchmark_*.html 23 | env-anemll-bench/* 24 | reports/plots_20250306_125822/throughput.png 25 | =8.2 26 | examples/upload_deephermes_model.py 27 | examples/update_dual_benchmark.py 28 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ANEMLL-Bench 2 | 3 | ## ⚠️ Attention: macOS 15.x is required! ⚠️ 4 | 5 | This alpha release requires macOS 15. We plan to update support for older OS versions in the next update. 6 | 7 | ## 🆕 New: Enhanced Testing for Ultra Models 8 | 9 | If you're running on a high-performance Apple chip (Ultra or Pro variants), we strongly recommend running the dual model benchmark to properly evaluate the enhanced ANE capabilities: 10 | 11 | ```bash 12 | 13 | # Update meta.yalm file and download new models 14 | python examples/sync_models.py --update 15 | 16 | # For Ultra models (M1/M2/M3 Ultra) and M4 Pro/Max models 17 | python examples/benchmark_dual_models.py --runs 300 18 | ``` 19 | 20 | ### Automatic Testing Recommendation 21 | 22 | The tool will automatically detect your CPU model and provide testing recommendations: 23 | 24 | - If running on **M1/M2/M3 Ultra**: Dual model testing is essential to evaluate the dual ANE clusters 25 | - If running on **M4 Pro/Max**: Dual model testing is recommended to evaluate enhanced ANE performance 26 | - For other models: Standard benchmarking is sufficient, but dual testing provides additional insights 27 | 28 | When you run `benchmark_all_models.py`, you'll see a recommendation to run the dual test if your system would benefit from it: 29 | 30 | ## Overview 31 | ANEMLL-Bench (pronounced like "animal-bench") is a benchmarking tool specifically designed to measure and evaluate the performance of machine learning models on Apple's Neural Engine (ANE). It provides comprehensive metrics including inference time and memory bandwidth utilization (GB/s) to help researchers and developers optimize their models for Apple Silicon. 32 | 33 | This alpha release requires macOS 15. We plan to update support for older OS versions in the next update. Currently, only Memory bandwidth (GB/s) is benchmarked in this release. 34 | 35 | ANEMLL-Bench is part on ANEMLL Open Source Project [anemll.com](https://anemll.com) 36 | 37 | ## 📊 [View Benchmark Results](./Results.MD) 📊 38 | 39 | [](./Results.MD) 40 | 41 | Check out our latest [benchmark results](./Results.MD) comparing performance across different Apple Silicon chips (M1, M2, M3, M4 series). 42 | 43 |
M1 Series | 50 |M2 Series | 51 |M3 Series | 52 |M4 Series | 53 |
56 | ✓ M1 ✅ 57 | ✓ M1 PRO ✅ 58 | ✓ M1 MAX ✅ 59 | ✓ M1 ULTRA ✅ 60 | |
61 |
62 | ✓ M2 ✅ 63 | ✓ M2 PRO 64 | ✓ M2 MAX ✅ 65 | ✓ M2 ULTRA ✅ 66 | |
67 |
68 | ✓ M3 69 | ✓ M3 PRO 70 | ✓ M3 MAX ✅ 71 | ✓ M3 ULTRA 72 | |
73 |
74 | ✓ M4 ✅ 75 | ✓ M4 PRO ✅ 76 | ✓ M4 MAX ✅ 77 | |
78 |
📧 Submit results to: realanemll@gmail.com or open an issue
81 |Generated on: {summary["timestamp"]}
313 |Mac Model: {system_info.get('mac_model', 'Unknown')}
320 |CPU: {system_info.get('cpu', {}).get('brand', 'Unknown')}
321 |CPU Cores: {system_info.get('cpu', {}).get('cores', 'Unknown')}
322 |RAM: {system_info.get('ram', {}).get('total_gb', 'Unknown')} GB
323 |Apple Silicon: {'Yes' if system_info.get('apple_silicon', False) else 'No'}
324 |OS: {system_info.get('os', {}).get('name', 'Unknown')} {system_info.get('os', {}).get('release', '')}
328 |OS Version: {system_info.get('os', {}).get('version', 'Unknown')}
329 |Python Version: {system_info.get('python_version', 'Unknown')}
330 |Compute Units: {summary["compute_units"]}
331 |Batch Size: {summary["batch_size"]}
332 |Sequence Length: {summary["sequence_length"]}
333 |Hidden Size: {summary["hidden_size"]}
334 |Iterations: {summary["iterations"]}
335 |# | 343 |Model | 344 |Size (MB) | 345 |Inference Time (ms) | 346 |Throughput (GB/s) | 347 |TFLOPS | 348 |CPU Speedup | 349 |Report | 350 |
---|---|---|---|---|---|---|---|
{i+1} | 362 |{model["name"]} | 363 |Error: {model["error"]} | 364 ||||||
{i+1} | 379 |{model["name"]} | 380 |{model["size_mb"]:.2f} | 381 |{model["inference_time_ms"]:.2f} | 382 |{model["throughput_gbps"]:.2f} | 383 |{tflops_html} | 384 |{speedup} | 385 |{report_link} | 386 |
This summary report compares the performance of multiple CoreML models on the Apple Neural Engine.
397 | 398 |Mac Model: | 258 |{system_info.get('mac_model', 'Unknown')} | 259 |
OS: | 262 |{os_display} | 263 |
CPU: | 266 |{system_info.get('cpu_model', 'Unknown')} | 267 |
RAM: | 270 |{system_info.get('memory_gb', 'Unknown')} GB | 271 |
Apple Silicon: | 274 |{'Yes' if system_info.get('is_apple_silicon', False) else 'No'} | 275 |
Name: | 284 |{model_name} | 285 |
Size: | 288 |{model_size_info['size_mb']:.2f} MB ({model_size_info['size_bytes']:,} bytes) | 289 |
Weights Size: | 292 |{model_size_info['weights_mb']:.2f} MB ({model_size_info['weights_bytes']:,} bytes) | 293 |
Weights %: | 296 |{model_size_info['weights_percentage']:.1f}% of total size | 297 |
Input Shape: | 300 |{input_shape} | 301 |
Inference Time: | 310 |{inference_time_ms:.2f} ms | 311 |
Throughput: | 314 |{throughput_gb_s:.2f} GB/s | 315 |