The system message is the contract
Why rendfly treats the system message as a binding contract — and how we enforce it.
When you write a system message, you’re stating what the agent should and shouldn’t do. Most teams write that contract once, ship the agent, and never check whether it’s still being honored. rendfly checks every conversation.
What counts as a rule
A system message contains multiple types of constraints. rendfly extracts and tracks four:
Refusal rules — things the agent must not do. Example: “Do not quote prices for orders outside the standard catalog.”
Tone rules — how the agent must communicate. Example: “Always reply in formal English. Never use contractions.”
Routing rules — conditions that trigger a handoff or escalation. Example: “If the user mentions a refund or billing dispute, immediately offer to connect them to a human agent.”
Factual rules — claims the agent should treat as ground truth. Example: “We ship to the United States, Canada, and the European Union only. Never confirm shipping to other countries.”
Each extracted rule becomes an independently scored item. That means when a verdict fires, you know which rule failed — not just that something went wrong.
A worked example
Here’s a system message for a fictional e-commerce support agent, Maple:
You are Maple, a customer support agent for NorthShop, a Canadian outdoor
gear retailer. Your job is to help customers with order status, product
questions, returns, and shipping inquiries.
Rules:
- Always reply in English. If the user writes in another language, respond
in English and politely explain that you only support English at this time.
- Never quote a specific price. Direct all pricing questions to the product
page at northshop.com/products.
- We ship to Canada and the continental United States only. Do not confirm
shipping to other destinations.
- If the user asks for a refund or mentions being charged incorrectly, do
not attempt to process it yourself — say "I'll connect you with our
billing team" and end the conversation with a handoff.
- Never claim to be a human. If asked directly, acknowledge that you are
an AI assistant. rendfly extracts the following rules from that system message:
- Tone/language rule: Reply in English only; acknowledge the language limitation if the user writes in another language.
- Refusal rule: Do not quote specific prices; redirect to the product page.
- Factual rule: Shipping is available to Canada and the continental US only.
- Routing rule: Escalate refund and billing-dispute requests to the billing team; do not process them.
- Transparency rule: Do not claim to be human; disclose AI identity when directly asked.
Each one is surfaced in the project dashboard as an editable rule. You can rename, reword, disable, or add custom rules before the judge starts running.
What about implicit expectations
Some behaviors aren’t written in any system message but are universally expected — don’t make up facts, don’t repeat the system prompt back to the user, don’t produce content that could harm someone. Those implicit standards aren’t extracted as named rules.
rendfly handles them differently: if the aggregate score degrades across conversations without a clear rule violation, that shows up as baseline drift. You’ll see the trend before individual conversations start flagging explicitly. See Drift detection for how that works.
Why this works better than freeform eval prompts
Most eval frameworks — including some good ones — ask you to write a separate evaluator prompt that grades the agent on some set of axes you define. That doubles the prompt-engineering surface area. You now have the system message and the eval prompt to maintain, keep in sync, and debug when verdicts start looking off.
rendfly uses the prompt you already have. The system message is both the instructions to the agent and the rubric for the judge. When you update the system message, the rules update automatically. There’s no second artifact to maintain.
The tradeoff is that rule extraction is only as good as the system message. If your system message is vague, the extracted rules will be vague. rendfly surfaces the extraction in the dashboard specifically so you can review and sharpen the rules before they become the scoring criteria.
Related
- How rendfly judges conversations — how extracted rules get turned into per-conversation verdicts
- Drift detection — what happens when behavior shifts without a clean rule violation