A comprehensive benchmarking suite for ML accelerators across the entire spectrum of computing devices - from powerful edge AI accelerators to resource-constrained microcontrollers.
- Overview
- Features
- Supported Hardware
- Installation
- Usage
- Benchmark Metrics
- Example Results
- Future Development
- Contributing
- License
This benchmark suite provides standardised performance metrics for matrix operations commonly used in ML workloads. It automatically detects hardware capabilities and maximises matrix sizes based on available memory.
- Automatic hardware detection and configuration
- Memory-aware matrix size optimisation
- Multi-core awareness
- Temperature and power monitoring (where available)
- Standardised performance metrics
- JSON output format for easy parsing and comparison
Currently tested and supported platforms:
- RP2040 (Raspberry Pi Pico) - MicroPython + ulab (Piromoni)
- RP2350 (Raspberry Pi Pico W 2) - MicroPython + ulab (Piromoni), CircuitPython
- ESP32-P4 - CircuitPython
- Clone the repository:
git clone https://github.com/platima/ml-accelerator-benchmark.git
- Choose the appropriate version for your platform:
micropython-benchmark.py
for MicroPython devicescircuitpython-benchmark.py
for CircuitPython devices
- Upload the appropriate benchmark file to your device
- Run the benchmark:
import benchmark benchmark = UniversalBenchmark() results = benchmark.run()
The benchmark provides several key metrics:
- Matrix Operations: Maximum supported matrix size and performance
- Memory Usage: Total available and used memory
- Performance Metrics:
- Raw inference time
- Operations per second
- Normalised score (ops/second/MHz)
- Theoretical power (accounting for cores and frequency)
- Hardware Monitoring:
- Temperature (where available)
- Power usage (where available)
{
"_meta": {
"Source version": "0.2",
"Source code": "https://github.com/platima/ml-accelerator-benchmark/blob/main/micropython-benchmark.py",
"Source repo": "https://github.com/platima/ml-accelerator-benchmark",
"Test date": "2025-01-19",
"Tester": "Platima"
},
"device": {
"board_type": "rp2350",
"cpu_freq_mhz": 150.000,
"num_cores": 2,
"temp_sensor": true,
"power_sensor": true
},
"performance": {
"channels": 3,
"array_size": 134,
"memory_total": 480160,
"memory_used": 288096,
"min_inference_ms": 1759.000,
"max_inference_ms": 1770.000,
"avg_inference_ms": 1760.100,
"throughput_fps": 0.568
},
"benchmark": {
"total_ops": 7332828,
"ops_per_second": 4166143,
"normalized_score": 27774.290,
"theoretical_power": 8332287
}
}
The future planned version of this benchmark suite can be found in the future-development branch. Note that this branch contains work-in-progress features that are currently untested.
Contributions are welcome! Key areas for improvement include:
- Additional hardware support
- Improved detection methods
- New benchmark metrics
- Documentation improvements
- Bug fixes and optimisations
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.