GitHub - agentem-ai/izwi: A local audio inference engine
Summary
Izwi is a local audio inference stack built in Rust, designed for speech and audio workflows, offering text-to-speech (TTS), automatic speech recognition (ASR), and chat/audio-chat model support. It features a CLI-first workflow (`izwi`) and an accompanying web UI, with the server exposing OpenAI-style routes under `/v1`. Key features include local-first operation, model lifecycle management via the CLI (downloading from Hugging Face), Apple Silicon (Metal) acceleration support, and cross-platform native builds. Requirements include the Rust toolchain and Node.js 18+ for the UI. Users can quickly set up the system by installing UI dependencies, building binaries, installing the CLI, and running `izwi serve`. Supported model families currently include various Qwen3 variants for TTS, ASR, chat, and forced alignment, with Voxtral realtime and LFM2-Audio planned for the future.
(Source:GitHub)