Open Source PySide6 4 AI Providers

Voice Assistant
On Your Desktop

A native desktop app for real-time voice conversations with AI. Google Gemini, OpenAI, Anthropic Claude, or local models — switch with one click. Built on RoomKit.

uv run python -m roomkit_ui
RoomKit UI — real-time voice conversation

Everything You Need

A complete voice assistant experience with professional-grade audio processing.

Real-Time Voice

Full-duplex voice conversations with sub-second latency. Speak naturally with interruption support.

Live Transcript

iMessage-style chat bubbles with streaming partial transcriptions. See every word as it's spoken.

Animated VU Meter

Ambient glow visualization showing mic and speaker audio levels with smooth animated waveforms.

Echo Cancellation

Built-in WebRTC and Speex AEC for hands-free conversations without feedback loops.

Device Selection

Choose your microphone and speaker from settings. Supports all system audio devices.

Noise Reduction

Optional RNNoise denoiser removes background noise for crystal-clear voice input.

System-Wide Dictation

Press a global hotkey to dictate anywhere. Transcription is pasted into the focused app automatically.

MCP Tool Support

Connect external tools via Model Context Protocol. Stdio, SSE, and HTTP transports supported.

Multi-Language STT

Dictation in 14+ languages including English, French, Spanish, German, Japanese, Chinese, and more.

Dark & Light Themes

Apple-inspired dark and light mode. Switch instantly from settings with full theme-aware components.

Markdown Chat

AI responses render with full markdown — code blocks, tables, links, and inline formatting.

Two Conversation Modes

Speech-to-Speech (realtime) or Voice Channel (STT → LLM → TTS). Choose per provider.

Local TTS

Piper, Qwen3-TTS (voice clone), NeuTTS (voice clone). Fully offline text-to-speech.

Local STT Models

Whisper, Parakeet, Zipformer via sherpa-onnx. Download models from settings.

MCP Apps

Render HTML UIs from MCP servers inline in the chat. Interactive tool results, not just text.

Skills Marketplace

Browse and install skills from CLabHub. Extend your assistant without writing code.

GPU Acceleration

CUDA (NVIDIA) or CoreML (Apple) for local models. Hardware-accelerated inference.

Choose Your Provider

Switch between providers instantly. Your API keys are saved independently.

Google Gemini

Native audio with Gemini 2.5 Flash. Low latency, multilingual, with built-in thinking.

Aoede Charon Fenrir Kore Puck

OpenAI

GPT-4o Realtime API with server-side VAD. Natural, expressive voices in real time.

alloy echo fable onyx nova shimmer

Anthropic Claude

Claude with STT/TTS voice channel mode. Powerful reasoning with natural voice.

Voice Channel STT + LLM + TTS

Local LLMs

vLLM, Ollama. No API key needed, fully offline. Run models on your own hardware.

vLLM Ollama Offline

Cross-Platform

Pre-built binaries for every major operating system. Or run from source with one command.

macOS
Linux
Windows

Get Started in 3 Steps

1

Clone & install

git clone https://github.com/roomkit-live/roomkit-ui.git
cd roomkit-ui && uv sync
2

Run the app

uv run python -m roomkit_ui
3

Enter your API key

Open Settings, choose your provider (Gemini, OpenAI, Claude, or local), paste your API key, and start talking.

Ready to Talk?

Download RoomKit UI or build from source. It's open-source and free.