WavFileRecorder¶

A debug audio recorder that writes pipeline audio to .wav files on disk using Python's stdlib wave module. Zero external dependencies.

Use it to inspect exactly what the audio pipeline sees — useful for verifying AEC effectiveness, denoiser quality, AGC levels, or diagnosing audio issues in general.

Quick start¶

from roomkit.voice.pipeline import AudioPipelineConfig, WavFileRecorder
from roomkit.voice.pipeline.recorder import RecordingConfig

config = AudioPipelineConfig(
    recorder=WavFileRecorder(),
    recording_config=RecordingConfig(storage="./recordings"),
    # ... other providers (vad, denoiser, etc.)
)

Recording starts automatically when a voice session becomes active and stops when the session ends. Output files appear in the storage directory as {session_id}_{timestamp}.wav.

Channel modes¶

Configure via RecordingConfig.channels:

Mode	Output	Description
`MIXED` (default)	Single mono `.wav`	Inbound + outbound averaged into one channel.
`SEPARATE`	Two files: `_inbound.wav`, `_outbound.wav`	Each direction in its own file. Best for comparing mic vs speaker audio side by side.
`STEREO`	Single stereo `.wav`	Inbound on the left channel, outbound on the right. Open in any audio editor and solo L/R to inspect each direction.

from roomkit.voice.pipeline.recorder import RecordingChannelMode, RecordingConfig

# Stereo: inbound=left, outbound=right
config = RecordingConfig(
    storage="./recordings",
    channels=RecordingChannelMode.STEREO,
)

# Separate files per direction
config = RecordingConfig(
    storage="./recordings",
    channels=RecordingChannelMode.SEPARATE,
)

Recording modes¶

Configure via RecordingConfig.mode:

Mode	Behavior
`BOTH` (default)	Record both inbound (mic) and outbound (TTS/speaker) audio.
`INBOUND_ONLY`	Record only inbound audio. Outbound taps are ignored.
`OUTBOUND_ONLY`	Record only outbound audio. Inbound taps are ignored.

from roomkit.voice.pipeline.recorder import RecordingConfig, RecordingMode

# Only capture what the mic picks up
config = RecordingConfig(
    storage="./recordings",
    mode=RecordingMode.INBOUND_ONLY,
)

Recording trigger¶

RecordingTrigger.ALWAYS is the only supported trigger. The recorder taps run before VAD in the pipeline, so speech boundaries are not available at recording time. If SPEECH_ONLY is configured, the recorder logs a warning and falls back to ALWAYS.

Output directory¶

If RecordingConfig.storage is set, files are written there (directories created automatically).
If storage is empty, files go to the system temp directory (tempfile.gettempdir()).

File naming¶

Files are named {session_id}_{timestamp}.wav where timestamp is YYYYMMDDTHHMMSS in UTC.

In SEPARATE mode, two files are created: {session_id}_{timestamp}_inbound.wav and {session_id}_{timestamp}_outbound.wav.

How mixing works¶

For MIXED and STEREO modes, inbound and outbound audio is buffered in memory during the session. On stop():

The shorter buffer is padded with silence to match the longer one.
MIXED: samples are averaged (inbound + outbound) / 2 into a mono signal.
STEREO: samples are interleaved as left (inbound) / right (outbound).

For SEPARATE mode, audio is written directly to disk via wave.Wave_write — no buffering needed.

Pipeline position¶

The recorder taps are positioned early in the pipeline:

Inbound tap: after resampling, before AEC/AGC/denoiser/VAD. You hear the raw mic signal (at pipeline sample rate).
Outbound tap: after post-processors, before AEC reference feed. You hear the final TTS output.

Debug taps¶

For deeper diagnostics, PipelineDebugTaps captures audio at every pipeline stage boundary into separate WAV files. Unlike the recorder (which captures the signal at a single point), debug taps let you compare the signal before and after each transformation — useful for verifying that AEC, AGC, or denoiser stages are working correctly.

from roomkit.voice.pipeline import AudioPipelineConfig, PipelineDebugTaps

config = AudioPipelineConfig(
    debug_taps=PipelineDebugTaps(output_dir="./debug_audio/"),
    # ... other providers
)

Output files are numbered by pipeline order:

debug_audio/
  {session_id}_01_raw.wav           # after resampler, before processing
  {session_id}_02_post_aec.wav
  {session_id}_03_post_agc.wav
  {session_id}_04_post_denoiser.wav
  {session_id}_05_post_vad_speech_001.wav  # accumulated speech segments
  {session_id}_06_outbound_raw.wav         # before post-processors
  {session_id}_07_outbound_final.wav       # after post-processors

Selecting stages¶

By default all stages are captured. To capture only specific stages:

PipelineDebugTaps(
    output_dir="./debug_audio/",
    stages=["raw", "post_denoiser", "post_vad_speech"],
)

Valid stage names: raw, post_aec, post_agc, post_denoiser, post_vad_speech, outbound_raw, outbound_final.

Recorder vs debug taps¶

	WavFileRecorder	PipelineDebugTaps
Purpose	Capture full session audio	Compare signal at each stage
Output	1-2 files per session	Up to 7+ files per session
Inbound/outbound	Configurable (both, inbound, outbound)	All stages captured
Channel modes	Mixed, separate, stereo	One file per stage
Production use	Yes	Development/debugging only

Both can be used simultaneously. The recorder taps run at a fixed pipeline position, while debug taps are wired at each stage boundary.

Example¶

See examples/wav_recorder.py for a complete runnable example using mock providers.