Skip to content

Scoring

ConversationScorer

Bases: ABC

Pluggable quality scorer for AI responses.

Implement this ABC to evaluate AI response quality. Scorers are invoked by :class:~roomkit.scoring.ScoringHook after each AI response via the ON_AI_RESPONSE hook.

Implementations can be:

  • LLM-as-judge — call a separate model to rate the response
  • Rule-based — regex/keyword checks, length validation
  • Heuristic — latency thresholds, tool usage patterns
  • Human feedback — bridge to user rating collection

name property

name

Human-readable scorer name.

score abstractmethod async

score(*, response_content, query, room_id, channel_id, usage=None, thinking='')

Score an AI response.

Parameters:

Name Type Description Default
response_content str

The AI-generated text.

required
query str

The user message that triggered the response.

required
room_id str

Room where the response was generated.

required
channel_id str

AI channel that generated the response.

required
usage dict[str, Any] | None

Token usage from the provider.

None
thinking str

Extended thinking/reasoning (if available).

''

Returns:

Type Description
list[Score]

A list of :class:Score objects, one per dimension.

list[Score]

Return an empty list to skip scoring for this response.

close async

close()

Release resources held by the scorer (optional).

Score dataclass

Score(value, dimension, reason='', metadata=dict())

A quality score for an AI response.

Attributes:

Name Type Description
value float

Score between 0.0 (worst) and 1.0 (best).

dimension str

What is being scored (e.g. "relevance", "helpfulness", "safety", "accuracy", "coherence").

reason str

Human-readable explanation of the score.

metadata dict[str, Any]

Arbitrary metadata (model used for judging, etc.).

ScoringHook

ScoringHook(scorers, *, store=None, max_recent=100)

Runs conversation scorers on every AI response.

Attach to a :class:~roomkit.RoomKit instance to automatically score AI responses via the ON_AI_RESPONSE hook. Scores are stored as :class:~roomkit.models.task.Observation objects in the conversation store and kept in a bounded in-memory buffer for quick access.

Parameters:

Name Type Description Default
scorers list[ConversationScorer]

List of :class:ConversationScorer implementations.

required
store ConversationStore | None

Optional store for persisting scores. If None, uses kit's store.

None
max_recent int

Maximum number of scores to keep in the in-memory buffer.

100

attach

attach(kit)

Register the scoring hook on a RoomKit instance.

close async

close()

Close all scorers.

MockScorer

MockScorer(scores=None)

Bases: ConversationScorer

Returns configured scores and records calls.

Example::

scorer = MockScorer(scores=[Score(value=0.9, dimension="relevance")])
results = await scorer.score(
    response_content="Hello!", query="Hi", room_id="r1", channel_id="ai"
)
assert len(scorer.calls) == 1