Whitepaper
Notes from Twenty-Three Years Inside a Compliance Lexicon
A Multi-Axis Classification Practice with Auditable Provenance
Compliance lexicons inherit their structure from general-language dictionary APIs, but compliance terminology must answer questions those dictionaries do not ask. This essay reports a practice developed over twenty-three years: identify the dimensions a class of term silently encodes in regulatory text, encode them as independent properties with per-axis provenance, and populate them through a rule-based pipeline with human-in-the-loop adjudication.
Verbs in the case study carry three orthogonal axes (cognitive level, responsibility tier, and automation feasibility); nouns carry three parallel axes (entity type, sensitivity, and regulatory information class). The implementation, nexus-lexicon, ships an OpenAPI-fronted, JSON-LD-backed terminology service with snapshot-citable audit trails, and a pilot inter-rater reliability study returns Fleiss κ = 0.51 across four LLM raters.
Key ideas
Compliance Terms Encode Multiple Dimensions
Every requirement in a compliance document silently asks several independent questions at once. A catalog that records only one answer per term forces practitioners to carry the remaining answers in their heads, making institutional knowledge fragile and audit findings recurrent.
The Human Stays in the Loop Permanently
The premise that automation can remove the human from a compliance classification pipeline is structurally false. Rules externalize expert knowledge but cannot replace the judgment needed to adjudicate low-confidence outputs, resolve disagreements, and re-review classifications when regulations evolve. "Rules plus H" is the correct shape; either alone fails.
Per-Axis Provenance Makes Classifications Auditable
Each classification axis must carry its own provenance record, capturing the source (rule, LLM, manual, or imported), confidence score, reviewer identity, timestamp, and rationale for any override. This allows an auditor to open a URL and see exactly why a term was classified the way it was, including the full history of changes.
The Discipline Survives the Axes Being Wrong
The contribution is not any specific set of axes but the practice of identifying dimensions, encoding them independently, and populating them with auditable human adjudication. Practitioners should adopt the discipline and revise the axes for their own domain; copying the axes verbatim while missing the underlying practice defeats the purpose.
Framework-First Advice Produces Compounding Technical Debt
Organizations that build separate compliance programs for each framework they encounter end up with overlapping tool stacks, incompatible vocabularies, and budgets that triple while security posture quietly worsens. The actual work sits one layer underneath the frameworks, at the level of the shared terminology the frameworks all draw on.
nexus-lexicon: A Running Implementation
The nexus-lexicon system ships 426 verbs classified across three axes and roughly 60 noun seeds, backed by a Postgres store, a SHACL-constrained JSON-LD response shape, and an OpenAPI surface. Snapshot identifiers allow citations taken today to resolve identically in future audit cycles, decoupling the living catalog from the frozen evidence record.