RAGSpine
Concepts

Anti-fabrication

When the structured channel returns no found fact, the orchestrator deterministically rewrites the answer to "not found" — in control flow, not a prompt.

Anti-fabrication is RAGSpine's defining invariant: the system never invents a number. When the structured channel returns no found fact, the orchestrator rewrites the answer to an honest "not found" — regardless of what the model produced.

Guarantee. On the structured path, the numeric answer is synthesized deterministically from the fact value. If no fact is found, the answer is an explicit refusal. The model's prose is never the source of a number.

It lives in control flow, not a prompt

The system prompt does tell the model to call query_metric and to say "not found" when the tool returns not_found. But a prompt is a request, not a guarantee — a model can ignore it. So RAGSpine does not rely on it. The enforcement is in code, in agent/agent.py:

  • _structured_answer partitions the tool results. If any are found, the answer is built from the fact values themselves实体 期间 指标(渠道):值 单位(来源…) — and the model's free-text answer is discarded entirely on the found path.
  • If nothing is found but something is not_found, the answer is rewritten by _not_found_answer.
  • If a parameter was unrecognized_param, it is rewritten by _unrecognized_answer.
# agent/agent.py — _structured_answer (paraphrased shape)
found = [r for r in tool_results if r.get("status") == "found"]
if found:
    # deterministic synthesis from fact values — model prose is NOT used
    return _render_from_facts(found)
not_found = [r for r in tool_results if r.get("status") == "not_found"]
if not_found:
    return _not_found_answer(not_found[0]), []     # forced refusal
unrecognized = [r for r in tool_results if r.get("status") == "unrecognized_param"]
if unrecognized:
    return _unrecognized_answer(unrecognized[0]), []

Discarding model prose on the found path is itself an anti-fabrication measure: a live LLM could otherwise smuggle an extra fabricated figure into its prose alongside the real one. The found answer is rendered from the fact value only. (Regression test: test_found_path_discards_fabricated_extra_number.)

Honest-refusal example

Ask for a value the data doesn't have, and you get a refusal — never a guess:

.venv/bin/python scripts/ask.py --provider mock --db data/fact_metric.db \
  "中国内地FY2025的REVENUE是多少"
查不到:REVENUE / ACME_CN / 2025(渠道 TOTAL)未在事实表中找到。
为避免误导,不提供任何推测数字;可尝试调整期间或实体后重问。

Compare with a value that is present — the answer carries the deterministic number and its lineage:

ACME_CN FY2024 REVENUE:1320 USD_M(来源:ACME_FY2024_Review.pptx · slide=2,table=1,row=REVENUE,col=FY2024)

Per-path semantics (not a single switch)

The guard is applied differently on each path, on purpose:

PathMechanismTrusts model prose?
structured (_structured_answer)found → render from fact value; no-found → forced refusalNo
multi-sub-task (_multi_subtask_answer)each sub-task executed deterministically; never calls the LLMN/A
narrative (_run_narrative)trusts synthesis but forces source citation; no found-fact rewriteYes, with forced citation

How a request signals that the guard fired

The privacy-aware trace (see Provenance and common/observability) records a boolean fabrication_guard_triggered — true when there were tool results but none were found (i.e. the request fell through to the not_found / unrecognized refusal). It records the flag, never the answer text or the fact value.

Provider failure degrades honestly too. If the LLM call raises a ProviderError (network / API / timeout), the agent returns a fixed degrade message that contains no number and no guess — it never fabricates to fill the gap.

On this page