WhisperKit Android brings Foundation Models On Device for Automatic Speech Recognition. It extends the performance and feature set of WhisperKit from Apple platforms to Android and Linux. The current feature set is a subset of the iOS counterpart, but we are continuing to invest in Android and now welcome contributions from the community.
[Example App (Coming Soon)] [Blog Post] [Python Tools Repo]
WhisperKit API is currently experimental and requires explicit opt-in using @OptIn(ExperimentalWhisperKit::class)
. This indicates that the API may change in future releases. Use with caution in production code.
To use WhisperKit in your Android app, you need to:
- Add the following dependencies to your app's
build.gradle.kts
:
dependencies {
// 1. WhisperKit SDK
implementation("com.argmaxinc:whisperkit:0.3.0") // Check badge above for latest version
// 2. QNN dependencies for hardware acceleration
implementation("com.qualcomm.qnn:qnn-runtime:2.33.2")
implementation("com.qualcomm.qnn:qnn-litert-delegate:2.33.2")
}
- Configure JNI library packaging in your app's
build.gradle.kts
:
android {
// ...
packaging {
jniLibs {
useLegacyPackaging = true
}
}
}
- Use WhisperKit in your code:
@OptIn(ExperimentalWhisperKit::class)
class YourActivity : AppCompatActivity() {
private lateinit var whisperKit: WhisperKit
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// Initialize WhisperKit
// Note: Always use applicationContext to avoid memory leaks and ensure proper lifecycle management
whisperKit = WhisperKit.Builder()
.setModel(WhisperKit.OPENAI_TINY_EN)
.setApplicationContext(applicationContext)
.setCallback { what, timestamp, msg ->
// Handle transcription output
when (what) {
WhisperKit.TextOutputCallback.MSG_INIT -> {
// Model initialized successfully
}
WhisperKit.TextOutputCallback.MSG_TEXT_OUT -> {
// New transcription available
val text = msg
val time = timestamp
// Process the transcribed text as it becomes available
// This callback will be called multiple times as more audio is processed
}
WhisperKit.TextOutputCallback.MSG_CLOSE -> {
// Cleanup complete
}
}
}
.build()
// Load the model
lifecycleScope.launch {
whisperKit.loadModel().collect { progress ->
// Handle download progress
}
// Initialize with audio parameters
whisperKit.init(frequency = 16000, channels = 1, duration = 0)
// Transcribe audio data in chunks
// You can call transcribe() multiple times with different chunks of audio data
// Results will be delivered through the callback as they become available
val audioChunk1: ByteArray = // First chunk of audio data
whisperKit.transcribe(audioChunk1)
val audioChunk2: ByteArray = // Second chunk of audio data
whisperKit.transcribe(audioChunk2)
// Continue processing more chunks as needed...
}
}
override fun onDestroy() {
super.onDestroy()
whisperKit.deinitialize()
}
}
Note: The WhisperKit API is currently experimental and may change in future releases. Make sure to handle the @OptIn(ExperimentalWhisperKit::class)
annotation appropriately in your code.
(Click to expand)
The following setup was tested on macOS 15.1.
These steps are required for both Android app development and CLI:
- Install required build tools:
make setup
- Build development environment in Docker with all development tools:
make env
The first time running make env
command will take several minutes. After the Docker image builds, the next time running make env
will execute inside the Docker container right away.
If you need to rebuild the Docker image:
make rebuild-env
- Build and enter the Docker environment:
make env
- Build the required native libraries:
make build jni
- Open the Android project in Android Studio:
- Open the root project in Android Studio
- Navigate to
android/examples/WhisperAX
- Build and run the app
- Build and enter the Docker environment:
make env
- Build the CLI app:
make build [linux | qnn | gpu]
linux
: CPU-only build for Linuxqnn
: Android build with Qualcomm NPU supportgpu
: Android build with GPU support
- Push dependencies to Android device (skip for Linux):
make adb-push
- Run the CLI app:
For Android:
adb shell
cd /sdcard/argmax/tflite
export PATH=/data/local/tmp/bin:$PATH
export LD_LIBRARY_PATH=/data/local/tmp/lib
whisperkit-cli transcribe --model-path /path/to/openai_whisper-base --audio-path /path/to/inputs/jfk_441khz.m4a
For Linux:
./build/linux/whisperkit-cli transcribe --model-path /path/to/my/whisper_model --audio-path /path/to/my/audio_file.m4a --report --report-path /path/to/dump/report.json
For all options, run whisperkit-cli --help
- Clean build files when needed:
make clean [all]
With all
option, it will conduct deep clean including open source components.
WhisperKit Android is currently in the beta stage. We are actively developing the project and welcome contributions from the community.
- We release WhisperKit Android under MIT License.
- FFmpeg open-source (audio decompressing) is released under LGPL
- OpenAI Whisper model open-source checkpoints were released under the MIT License.
- Qualcomm AI Hub
.tflite
models and QNN libraries for NPU deployment are released under the Qualcomm AI Model & Software License.
If you use WhisperKit for something cool or just find it useful, please drop us a note at info@argmaxinc.com!
If you are looking for managed enterprise deployment with Argmax, please drop us a note at info+sales@argmaxinc.com.
If you use WhisperKit for academic work, here is the BibTeX:
@misc{whisperkit-argmax,
title = {WhisperKit},
author = {Argmax, Inc.},
year = {2024},
URL = {https://github.com/argmaxinc/WhisperKitAndroid}
}