Common
Cross-cutting primitives shared by every domain — the configurable company/domain profile, deterministic sensitivity classification, the dimension glossary, privacy-aware observability, and global path constants.
The common domain holds the cross-cutting primitives every other domain depends on: the configurable company/domain profile (so no company is ever hardcoded), the deterministic sensitivity model, the dimension glossary, privacy-aware observability, and the single source of truth for global paths. These have no dependencies on the rest of the system — they are the base of the import graph.
Package: src/ragspine/common/. Contract: src/ragspine/common/CLAUDE.md.
Layout
Company / domain profile
company_profile.py makes RAGSpine a generic management copilot rather than a hardcoded
finance app. Identity, synonyms, dimensions, and the competitor list all come from a TOML
file (config/company.toml); a missing file silently falls back to built-in ACME defaults.
The profile is an immutable frozen dataclass. CompanyProfile is a module-level alias
for the same DomainProfile class — both names work in isinstance checks and constructors.
Prop
Type
Each declared dimension is an immutable DimensionSpec (ADR 0004): name, label, kind
(categorical / temporal / measure), synonyms, units, labels, default,
required, clarify (ask_first / assume / none), identity, expand,
derived_from / derivation, and the fabrication-whitelist flags. The default profile
declares five dimensions — metric, entity, period (temporal), channel
(default TOTAL, non-expanding), and geography (identity=False, derived from entity).
from ragspine.common.company_profile import load_company_profile
profile = load_company_profile() # explicit path > $RAGSPINE_COMPANY_CONFIG > config/company.toml
print(profile.home_entity_code) # 'ACME_GROUP' by defaultload_company_profile(path=None) resolves the config path (explicit arg →
RAGSPINE_COMPANY_CONFIG env var → config/company.toml), parses it with tomllib/tomli,
and falls back to the built-in default on any missing or unparseable file — never raising,
never printing.
Config-driven, no hardcoded company. Identity, metrics, and competitors come from the profile.
The external_entities map (competitor alias → display name) is how the agent recognizes an
out-of-scope competitor question. Don't hardcode a company anywhere.
Sensitivity
sensitivity.py is the deterministic data-classification layer that the
narrative ingestion path applies and that
RESTRICTED isolation downstream depends on. Levels are
plain strings — there is no enum. The only named constant is RESTRICTED = "RESTRICTED";
the unmarked default is "INTERNAL".
Rules live in an immutable SensitivityPolicy (read from the [sensitivity] config
section, no hardcoded company words):
Prop
Type
classify_sensitivity(filename, text, policy) -> str is a pure function (zero external
calls) that returns the first matching level, in priority order:
restricted_filename_patterns entry → RESTRICTEDrestricted_keywords entry → RESTRICTEDescalate_unknown_to_restricted is True → RESTRICTEDpolicy.default_level (default "INTERNAL")The security stance: unmarked docs default to INTERNAL; a signal hit escalates to
RESTRICTED; the blanket "everything unknown → RESTRICTED" behavior is an opt-in strict
switch, off by default.
Glossary
glossary.py normalizes free-form dimension synonyms to controlled codes. There is no
class — it is module-level functions plus dictionaries derived from the loaded profile.
Unrecognized input returns None; it never guesses.
| function | maps |
|---|---|
normalize_metric(raw) -> str | None | metric synonym → metric_code |
normalize_entity(raw) -> str | None | entity synonym → entity_code |
resolve_external_entity(text) -> str | None | competitor mention → display name (longest match) |
geography_for_entity(entity_code) -> str | None | default geography for an entity |
unit_for_metric(metric_code) -> str | None | default unit for a metric |
normalize_period(raw) -> tuple[str, str] | None | absolute period → (period_type, period) |
resolve_relative_period(raw, reference_date=None) -> tuple[str, str] | None | relative period (今年/去年/上季度…) → (period_type, period) |
Period normalization produces the same period_type values used by the
fact store: FY (FY2024 / 2024),
HY (2024H1), and QUARTER (2025Q1).
from ragspine.common.glossary import normalize_metric, normalize_period
normalize_metric("营收") # 'REVENUE'
normalize_period("FY 2024") # ('FY', '2024')
normalize_metric("not a metric") # None — never guessesObservability
observability.py is two small primitives:
new_request_id() -> str— a 12-charuuid4hex short code.emit_trace(logger=None, **fields) -> None— logs one INFO record with the fixed message"trace", attaching**fieldsto the log record viaextra=. It uses stdlibloggingonly (nobasicConfig); the host or test attaches the handler. The logger name isTRACE_LOGGER_NAME = "ragspine.trace".
Privacy-aware traces. Logs are treated as Restricted. emit_trace is meant to carry only
non-sensitive metadata — request_id, route, controlled codes (metric / entity / period / channel),
status counts, scores, boolean flags, durations, token usage — and never the raw answer text,
a fact value, or chunk text.
Core — global constants
core.py is the single source of truth for paths, derived from __file__ (no cwd or
rootutils dependency, zero import-time side effects):
| constant | value |
|---|---|
REPO_ROOT | repo root, from Path(__file__).parents[3] |
DATA_DIR | REPO_ROOT / "data" |
DEFAULT_FACT_DB | data/fact_metric.db — fact table and narrative-chunk table (same sqlite) |
DEFAULT_MAPPING_DB | data/color_mapping.db — color-mapping registry |
DEFAULT_REVIEW_QUEUE_DB | data/review_queue.db — SME review queue |
Invariants this domain upholds
- Config-driven — identity / metrics / competitors come from
CompanyProfile; never hardcode a company. - Privacy-aware traces —
observabilityrecords codes / counts / timings only, never the answer, a fact value, or chunk text. - Deterministic, never-guess normalization — the glossary and sensitivity classifier are
pure functions; unrecognized input returns
None/ the default level.
Related
Pipeline
Pipeline-topology export — derive a static PipelineGraph from RAGSpine's real wiring and render it as Mermaid, DOT, or JSON, via the agent / retriever / service builders and the topology.py CLI.
Configuration
Environment variables (RAGSPINE_*) read by ServiceConfig, and the CompanyProfile TOML that drives identity, metrics, and competitor scope — no hardcoded company anywhere.