Drift detection

What drift is, how rendfly detects it, and what to do when an alert fires.

Drift is when an agent gradually starts misbehaving. It’s different from a single bad reply — drift means the baseline is shifting. The agent that was scoring 94/100 last week is now averaging 71/100, and the trend is downward. No single conversation is obviously broken, but something has changed.

Three flavors of drift

Topic drift — the agent starts answering questions outside its scope.

Your e-commerce Slack bot starts helping users write Python scripts because someone phrased their request in a way that sounded like a product question. Once it succeeds once, similar phrasing keeps coming in. The agent never refuses. Within a few days it’s answering general programming questions for a meaningful fraction of conversations, none of which match the rules in the system message.

Tone drift — the agent’s voice shifts away from what you defined.

A formal customer-service agent for a B2B SaaS starts getting more casual. “Sure thing!” instead of “Certainly.” Contractions creep in. The occasional exclamation mark. No single message is alarming — the aggregate tone score drops slowly. A month later the agent sounds nothing like the brand voice, and no alert ever fired because no single conversation was clearly wrong.

Factual drift — the agent quotes information that has gone stale.

The system message was written when shipping took 3–5 business days. A warehouse move changed it to 5–7. Nobody updated the system message. The agent keeps citing the old timeframe. A seasonal promotion ends, but the agent still mentions it when users ask about discounts. The knowledge in the system message is now inconsistent with reality — and the agent is confidently wrong.

How rendfly detects it

Drift detection is built on a rolling window comparison. The algorithm works like this:

Every conversation gets scored 0–100 by the judge (see How rendfly judges conversations for the scoring details).
rendfly maintains a 7-day baseline — a rolling average of per-rule and aggregate scores for the project.
A 24-hour window of recent scores is compared against that baseline.
When the delta between the window average and the baseline exceeds the configured threshold — default is 2 standard deviations — a drift alert fires.

The threshold is configurable per project. Tighter thresholds catch smaller regressions earlier but produce more noise. Looser thresholds reduce false positives but let gradual drift go undetected longer. For a customer-facing production agent with active users, the default 2σ is a reasonable starting point.

Why this matters

The most common LLM regressions in production are silent and gradual. A few concrete mechanisms:

Model provider updates. OpenAI, Anthropic, and Google all periodically update hosted model versions. Most of these changes are improvements. Some introduce subtle behavioral shifts — slightly different refusal sensitivity, changed verbosity, altered tone tendencies. Your eval suite passes because the updates don’t break the test cases you wrote. Production traffic reveals the difference.

Knowledge staleness. If your system message contains factual claims — prices, policies, hours, supported regions — those claims can go stale without anyone touching the system message. The agent keeps honoring what it was told, which is now wrong.

Distribution shift in user queries. The questions users ask in month three are often different from the questions they asked in month one. New use cases emerge. Slang and phrasing patterns evolve. The system message wasn’t written for those patterns, and the agent’s handling of them is untested.

Without drift detection, all of these are invisible. The infrastructure dashboard stays green. The user sees something wrong. By the time it’s reported, the regression has been running for days.

What to do when an alert fires

When a drift alert fires, the playbook is three steps:

Check which conversations contributed to the score drop. The alert links to a filtered view of the conversations in the flagged window, sorted by score. Read the lowest-scoring ones first. They’ll show you which rule is failing and what the agent actually said.
Inspect the system message. Is the rule that’s failing still accurate? If the refund policy changed, or the supported regions changed, or the tone guide was updated in the product wiki but not in the system message, the fix is to update the system message — not to tune the judge.
Decide whether to tighten or accept. Sometimes drift is intentional: you softened the tone rules on purpose and the lower score reflects that decision. In that case, dismiss the alert and let the baseline reestablish. Other times the drift is a genuine regression — a model update that weakened a refusal, a stale claim causing wrong answers — and the fix is to adjust the system message or escalate to the provider.

How rendfly judges conversations — how individual conversation scores are produced
The system message is the contract — why the system message is the source of truth for what correct behavior looks like