Face Touch Guard¶

Detect hand-to-face contact in real-time video using MediaPipe landmarks. The FaceTouchFilter runs in the video pipeline and fires ON_VIDEO_DETECTION hooks when a confirmed touch is detected, enabling audio alerts, messages, or any custom reaction.

Inspired by FaceTouchGuard — a local desktop app that alerts you when you touch your face.

Architecture¶

Camera (VideoBackend)
  → VideoChannel
  → VideoPipeline.process_inbound()
  → FaceTouchFilter (MediaPipe Face Landmarker + Hand Landmarker)
    → Zone geometry: fingertip distance to face zone centroids
    → False-positive filtering: proximity, z-depth, confirmation, cooldown
    → FilterEvent(kind="face_touch") → context.events
  → VideoChannel drains events → ON_VIDEO_DETECTION hook
  → User hook handler (log, alert, TTS, message, etc.)

Installation¶

pip install roomkit[mediapipe]

# For local webcam capture
pip install roomkit[local-video,mediapipe]

Quick Start¶

import asyncio
from roomkit import RoomKit, HookTrigger, HookExecution, VideoDetectionEvent
from roomkit.channels.video import VideoChannel
from roomkit.video.backends.local import LocalVideoBackend
from roomkit.video.pipeline.config import VideoPipelineConfig
from roomkit.video.pipeline.filter.mediapipe_face_touch import (
    FaceTouchConfig,
    FaceTouchFilter,
    FaceTouchSensitivity,
)

async def main():
    kit = RoomKit()

    backend = LocalVideoBackend(device=0, fps=15)
    pipeline = VideoPipelineConfig(
        filters=[FaceTouchFilter(FaceTouchConfig(
            sensitivity=FaceTouchSensitivity.HIGH,
        ))],
    )
    video = VideoChannel("video-cam", backend=backend, pipeline=pipeline)
    kit.register_channel(video)

    @kit.hook(HookTrigger.ON_VIDEO_DETECTION, execution=HookExecution.ASYNC)
    async def on_touch(event: VideoDetectionEvent, ctx):
        if event.kind == "face_touch":
            zone = event.metadata.get("zone")
            print(f"Stop touching your {zone}!")

    room = await kit.create_room("guard-room")
    await kit.bind_channel(room.id, "video-cam")

asyncio.run(main())

Face Zones¶

The filter monitors five face zones defined by MediaPipe's 478-landmark face mesh:

Zone	Description	Default
`left_cheek`	Eye-cheek boundary to jawline (left)	Enabled
`right_cheek`	Mirror on right side	Enabled
`chin`	Lower jaw contour	Enabled
`mouth`	Outer lip perimeter	Enabled
`forehead`	Eyebrows to hairline	Disabled (higher FP rate)

Select zones via the zones parameter:

from roomkit.video.pipeline.filter.mediapipe_face_touch import FaceZone

config = FaceTouchConfig(
    zones=frozenset({FaceZone.LEFT_CHEEK, FaceZone.RIGHT_CHEEK, FaceZone.FOREHEAD}),
)

Sensitivity Presets¶

Three presets control detection thresholds:

Preset	Distance	Confirmation	Cooldown	Z-depth
`LOW`	0.04	4 frames	30 frames	0.08
`MEDIUM`	0.06	3 frames	20 frames	0.08
`HIGH`	0.08	2 frames	12 frames	0.12

Distance threshold — normalized 2D distance from fingertip to zone centroid (resolution-independent)
Confirmation frames — consecutive positive frames required before triggering
Cooldown frames — suppress re-firing for the same zone after a detection
Z-depth threshold — reject hands hovering in front of the face (not touching)

Override individual thresholds while keeping the preset defaults for the rest:

config = FaceTouchConfig(
    sensitivity=FaceTouchSensitivity.MEDIUM,
    touch_distance_threshold=0.05,  # tighter distance
    confirmation_frames=4,          # more frames needed
)

False-Positive Filtering¶

The filter applies multiple layers to reduce false alerts:

Layer	What it catches
Face bounding box	Hands beside the face, not on it
Distance threshold	Hands near but not close enough
Z-depth filter	Hands held in front of face (hovering)
Confirmation window	Momentary noise, hand passing by
Cooldown	Rapid re-triggers for the same zone

Performance¶

MediaPipe runs locally — no cloud APIs, no network latency
every_n_frames controls CPU usage: at default 3 with 15fps video, detection runs ~5 times/second
First frame is slower (model loading ~1-2s), subsequent frames take ~15-30ms on modern CPUs
Increase every_n_frames to reduce CPU load at the cost of reaction time

config = FaceTouchConfig(every_n_frames=5)  # ~3 analyses/sec at 15fps

The `ON_VIDEO_DETECTION` Hook¶

FaceTouchFilter emits VideoDetectionEvent with kind="face_touch". This is a generic hook trigger shared by all video detection filters (YOLO, face touch, future detections).

The event payload:

@dataclass
class VideoDetectionEvent:
    kind: str                      # "face_touch"
    session: VideoSession | None   # populated by channel
    labels: list[str]              # ["left_cheek"] or ["chin", "mouth"]
    confidence: float              # 0.0–1.0
    metadata: dict[str, Any]       # {"zone": "left_cheek", "hand": "right", ...}
    timestamp: datetime
    frame_sequence: int

The metadata dict for face touch events contains:

Key	Type	Description
`zone`	`str`	Face zone name (`"left_cheek"`, `"chin"`, etc.)
`hand`	`str`	`"left"` or `"right"`
`distance`	`float`	Normalized fingertip-to-zone distance
`touch_count`	`int`	Session cumulative touch count

Combining with Voice Alerts¶

Pair with a VoiceChannel to play audio alerts — the RoomKit equivalent of FaceTouchGuard's afplay clips:

from roomkit import VoiceChannel

voice = VoiceChannel("voice", backend=voice_backend, tts=tts_provider)
kit.register_channel(voice)

@kit.hook(HookTrigger.ON_VIDEO_DETECTION, execution=HookExecution.ASYNC)
async def alert(event: VideoDetectionEvent, ctx):
    if event.kind == "face_touch":
        voice_ch = kit.get_channel("voice")
        await voice_ch.say("Stop touching your face!")

Testing with MockFaceTouchFilter¶

Use MockFaceTouchFilter to test hook handlers without MediaPipe:

from roomkit.video.pipeline.filter.mock_face_touch import MockFaceTouchFilter
from roomkit.video.events import VideoDetectionEvent

mock = MockFaceTouchFilter(events_at={
    5: [VideoDetectionEvent(
        kind="face_touch",
        labels=["left_cheek"],
        confidence=0.85,
        metadata={"zone": "left_cheek", "hand": "right", "touch_count": 1},
    )],
})

pipeline = VideoPipelineConfig(filters=[mock])

API Reference¶

Class	Module	Description
`FaceTouchFilter`	`roomkit.video.pipeline.filter.mediapipe_face_touch`	MediaPipe-based detection filter
`FaceTouchConfig`	`roomkit.video.pipeline.filter.mediapipe_face_touch`	Configuration with sensitivity presets
`FaceTouchSensitivity`	`roomkit.video.pipeline.filter.mediapipe_face_touch`	`LOW`, `MEDIUM`, `HIGH` presets
`FaceZone`	`roomkit.video.pipeline.filter.mediapipe_face_touch`	Face zone enum
`MockFaceTouchFilter`	`roomkit.video.pipeline.filter.mock_face_touch`	Mock for testing
`VideoDetectionEvent`	`roomkit.video.events`	Detection event payload
`FilterEvent`	`roomkit.video.pipeline.filter.base`	Generic filter event wrapper