WavFileRecorder¶
A debug audio recorder that writes pipeline audio to .wav files on disk using Python's stdlib wave module. Zero external dependencies.
Use it to inspect exactly what the audio pipeline sees — useful for verifying AEC effectiveness, denoiser quality, AGC levels, or diagnosing audio issues in general.
Quick start¶
from roomkit.voice.pipeline import AudioPipelineConfig, WavFileRecorder
from roomkit.voice.pipeline.recorder import RecordingConfig
config = AudioPipelineConfig(
recorder=WavFileRecorder(),
recording_config=RecordingConfig(storage="./recordings"),
# ... other providers (vad, denoiser, etc.)
)
Recording starts automatically when a voice session becomes active and stops when the session ends. Output files appear in the storage directory as {session_id}_{timestamp}.wav.
Channel modes¶
Configure via RecordingConfig.channels:
| Mode | Output | Description |
|---|---|---|
MIXED (default) |
Single mono .wav |
Inbound + outbound averaged into one channel. |
SEPARATE |
Two files: *_inbound.wav, *_outbound.wav |
Each direction in its own file. Best for comparing mic vs speaker audio side by side. |
STEREO |
Single stereo .wav |
Inbound on the left channel, outbound on the right. Open in any audio editor and solo L/R to inspect each direction. |
from roomkit.voice.pipeline.recorder import RecordingChannelMode, RecordingConfig
# Stereo: inbound=left, outbound=right
config = RecordingConfig(
storage="./recordings",
channels=RecordingChannelMode.STEREO,
)
# Separate files per direction
config = RecordingConfig(
storage="./recordings",
channels=RecordingChannelMode.SEPARATE,
)
Recording modes¶
Configure via RecordingConfig.mode:
| Mode | Behavior |
|---|---|
BOTH (default) |
Record both inbound (mic) and outbound (TTS/speaker) audio. |
INBOUND_ONLY |
Record only inbound audio. Outbound taps are ignored. |
OUTBOUND_ONLY |
Record only outbound audio. Inbound taps are ignored. |
from roomkit.voice.pipeline.recorder import RecordingConfig, RecordingMode
# Only capture what the mic picks up
config = RecordingConfig(
storage="./recordings",
mode=RecordingMode.INBOUND_ONLY,
)
Recording trigger¶
RecordingTrigger.ALWAYS is the only supported trigger. The recorder taps run before VAD in the pipeline, so speech boundaries are not available at recording time. If SPEECH_ONLY is configured, the recorder logs a warning and falls back to ALWAYS.
Output directory¶
- If
RecordingConfig.storageis set, files are written there (directories created automatically). - If
storageis empty, files go to the system temp directory (tempfile.gettempdir()).
File naming¶
Files are named {session_id}_{timestamp}.wav where timestamp is YYYYMMDDTHHMMSS in UTC.
In SEPARATE mode, two files are created: {session_id}_{timestamp}_inbound.wav and {session_id}_{timestamp}_outbound.wav.
How mixing works¶
For MIXED and STEREO modes, inbound and outbound audio is buffered in memory during the session. On stop():
- The shorter buffer is padded with silence to match the longer one.
- MIXED: samples are averaged
(inbound + outbound) / 2into a mono signal. - STEREO: samples are interleaved as left (inbound) / right (outbound).
For SEPARATE mode, audio is written directly to disk via wave.Wave_write — no buffering needed.
Pipeline position¶
The recorder taps are positioned early in the pipeline:
- Inbound tap: after resampling, before AEC/AGC/denoiser/VAD. You hear the raw mic signal (at pipeline sample rate).
- Outbound tap: after post-processors, before AEC reference feed. You hear the final TTS output.
Debug taps¶
For deeper diagnostics, PipelineDebugTaps captures audio at every pipeline stage boundary into separate WAV files. Unlike the recorder (which captures the signal at a single point), debug taps let you compare the signal before and after each transformation — useful for verifying that AEC, AGC, or denoiser stages are working correctly.
from roomkit.voice.pipeline import AudioPipelineConfig, PipelineDebugTaps
config = AudioPipelineConfig(
debug_taps=PipelineDebugTaps(output_dir="./debug_audio/"),
# ... other providers
)
Output files are numbered by pipeline order:
debug_audio/
{session_id}_01_raw.wav # after resampler, before processing
{session_id}_02_post_aec.wav
{session_id}_03_post_agc.wav
{session_id}_04_post_denoiser.wav
{session_id}_05_post_vad_speech_001.wav # accumulated speech segments
{session_id}_06_outbound_raw.wav # before post-processors
{session_id}_07_outbound_final.wav # after post-processors
Selecting stages¶
By default all stages are captured. To capture only specific stages:
PipelineDebugTaps(
output_dir="./debug_audio/",
stages=["raw", "post_denoiser", "post_vad_speech"],
)
Valid stage names: raw, post_aec, post_agc, post_denoiser, post_vad_speech, outbound_raw, outbound_final.
Recorder vs debug taps¶
| WavFileRecorder | PipelineDebugTaps | |
|---|---|---|
| Purpose | Capture full session audio | Compare signal at each stage |
| Output | 1-2 files per session | Up to 7+ files per session |
| Inbound/outbound | Configurable (both, inbound, outbound) | All stages captured |
| Channel modes | Mixed, separate, stereo | One file per stage |
| Production use | Yes | Development/debugging only |
Both can be used simultaneously. The recorder taps run at a fixed pipeline position, while debug taps are wired at each stage boundary.
Example¶
See examples/wav_recorder.py for a complete runnable example using mock providers.