Scoring¶

ConversationScorer ¶

Bases: ABC

Pluggable quality scorer for AI responses.

Implement this ABC to evaluate AI response quality. Scorers are invoked by :class:~roomkit.scoring.ScoringHook after each AI response via the ON_AI_RESPONSE hook.

Implementations can be:

LLM-as-judge — call a separate model to rate the response
Rule-based — regex/keyword checks, length validation
Heuristic — latency thresholds, tool usage patterns
Human feedback — bridge to user rating collection

name `property` ¶

name

Human-readable scorer name.

score `abstractmethod` `async` ¶

score(*, response_content, query, room_id, channel_id, usage=None, thinking='')

Score an AI response.

Parameters:

Name	Type	Description	Default
`response_content`	`str`	The AI-generated text.	required
`query`	`str`	The user message that triggered the response.	required
`room_id`	`str`	Room where the response was generated.	required
`channel_id`	`str`	AI channel that generated the response.	required
`usage`	`dict[str, Any] \| None`	Token usage from the provider.	`None`
`thinking`	`str`	Extended thinking/reasoning (if available).	`''`

Returns:

Type	Description
`list[Score]`	A list of :class:`Score` objects, one per dimension.
`list[Score]`	Return an empty list to skip scoring for this response.

close `async` ¶

close()

Release resources held by the scorer (optional).

Score `dataclass` ¶

Score(value, dimension, reason='', metadata=dict())

A quality score for an AI response.

Attributes:

Name	Type	Description
`value`	`float`	Score between 0.0 (worst) and 1.0 (best).
`dimension`	`str`	What is being scored (e.g. "relevance", "helpfulness", "safety", "accuracy", "coherence").
`reason`	`str`	Human-readable explanation of the score.
`metadata`	`dict[str, Any]`	Arbitrary metadata (model used for judging, etc.).

ScoringHook ¶

ScoringHook(scorers, *, store=None, max_recent=100)

Runs conversation scorers on every AI response.

Attach to a :class:~roomkit.RoomKit instance to automatically score AI responses via the ON_AI_RESPONSE hook. Scores are stored as :class:~roomkit.models.task.Observation objects in the conversation store and kept in a bounded in-memory buffer for quick access.

Parameters:

Name	Type	Description	Default
`scorers`	`list[ConversationScorer]`	List of :class:`ConversationScorer` implementations.	required
`store`	`ConversationStore \| None`	Optional store for persisting scores. If None, uses kit's store.	`None`
`max_recent`	`int`	Maximum number of scores to keep in the in-memory buffer.	`100`

attach ¶

attach(kit)

Register the scoring hook on a RoomKit instance.

close `async` ¶

close()

Close all scorers.

MockScorer ¶

MockScorer(scores=None)

Bases: ConversationScorer

Returns configured scores and records calls.

Example::

scorer = MockScorer(scores=[Score(value=0.9, dimension="relevance")])
results = await scorer.score(
    response_content="Hello!", query="Hi", room_id="r1", channel_id="ai"
)
assert len(scorer.calls) == 1

Scoring¶

ConversationScorer ¶

name property ¶

score abstractmethod async ¶

close async ¶

Score dataclass ¶

ScoringHook ¶

attach ¶

close async ¶

MockScorer ¶

name `property` ¶

score `abstractmethod` `async` ¶

close `async` ¶

Score `dataclass` ¶

close `async` ¶