The empirical case for structural integrity measurement — controlled studies, retroactive scoring of historical disclosures, archetype validation across domains.
The full validation page: 81-point gap between genuine federal opinion and AI-fabricated brief; 71-point discrimination delta on a blinded 20-document batch; 13/13 archetype classification accuracy; the corpus scores spanning Federalist No. 51 (91) down to Enron FY2000 10-K (8).
Why structural integrity measurement works, why the discrimination gap is so large, and why accuracy-based detection methods measure the wrong layer.
A hallucinating AI is a low-bandwidth channel impersonating a high-bandwidth signal. This paper applies Shannon's 1948 channel-capacity framework to AI hallucination, documenting a 71-point discrimination delta between genuine and hallucinated content and explaining why accuracy-based detection measures the wrong layer. Positions structural integrity analysis as a measurement of channel capacity grounded in the same theoretical framework the AI models themselves are built on.
Read Paper (HTML) →Retrieval-augmented generation and citation verification are necessary but not sufficient for hallucination detection. A document with perfectly accurate citations can still fail structural integrity analysis if the reasoning connecting those citations is performed rather than genuine. Establishes the taxonomy of hallucination modes and maps each to the measurement layer capable of detecting it.
Read Paper (PDF) →Shumailov et al. (2024) demonstrated that AI models trained on recursively generated data collapse toward a bland distributional mean. Argues that model collapse is not just a model quality problem — it is a documentary ecosystem problem. As AI-generated content replaces human-authored content in training data, the structural integrity gap between genuine and generated documents narrows from below. Measurement infrastructure becomes more critical, not less, as the surface differences diminish.
Read Paper (PDF) →Why structural integrity is a property of systems rather than people, why the AI accountability crisis is already behind institutions rather than ahead, and what the regulatory environment is about to require.
The sanctions event horizon has already passed. Mata v. Avianca (2023) and Brigandi v. GEICO ($110,000+) established the liability pattern. FRE 707 codifies it. Argues that institutions are not preparing for future AI accountability risk — they are managing existing, undiscovered liability in documents already filed and relied upon.
Read Paper (PDF) →Structural integrity measurement is not a judgment about individual authors — it is a measurement of the accountability architecture in which documents are produced. Establishes the theoretical basis for why foundational accountability (the G4/G6 gate pair) is the most fundamental dimension and why no amount of surface polish can compensate for its absence.
Read Paper (PDF) →The empirical results that grounded the theoretical framework — the 81-point Mata gap and the 71-point discrimination delta. Both are summarized on the Evidence page.
Detailed structural integrity scoring of the ChatGPT-fabricated brief submitted in Mata v. Avianca, Inc. (S.D.N.Y. 2023), which resulted in Rule 11 sanctions from Judge P. Kevin Castel. The fabricated brief scored 7/100 (T4 Fabricated). Authentic briefs from the same legal domain scored 72–88/100 (T1 Integrated). The 81-point gap is the largest single-document delta in the 4CITE validation corpus.
View on Evidence Page →Controlled study: 20 matched AI responses (10 genuine, 10 hallucinated) across five professional domains, scored blind on multi-dimensional structural integrity analysis. Genuine responses averaged 82.4 (T1 Integrated — all). Hallucinated responses averaged 11.4 (T4 Fabricated — all). Zero overlap in score distributions. The 71-point average discrimination delta is the primary empirical validation of the structural integrity measurement methodology.
View on Evidence Page →Validation study of the 4CITE archetype classification layer across 13 documents spanning legal, corporate, and government domains. 13 of 13 documents received archetype classifications consistent with human expert review. Validates the archetype layer as a reliable subtype descriptor operating above tier designation.
What the Research Covers
Shannon Framework & Channel Capacity
Applying Shannon's 1948 information-theoretic framework to document reliability. The foundational theory behind why structural integrity measurement works and why the discrimination gap is so large.
AI Hallucination Detection
Structural integrity as a hallucination detection method. Why accuracy-based tools (RAG, citation verification) measure the wrong layer, and what Layer 3 adds to the complete integrity stack.
Model Collapse & Ecosystem Effects
How recursive AI training degrades documentary ecosystems. The Shumailov et al. (2024) findings applied to the institutional document corpus and long-run measurement infrastructure requirements.
Legal Accountability Theater
Structural integrity analysis of fabricated legal briefs. The Mata v. Avianca and Brigandi case studies, Rule 11 sanctions patterns, and FRE 707 regulatory implications for AI-generated legal content.
Corporate Disclosure Integrity
Retroactive scoring of SEC filings from SVB, Enron, and the broader EDGAR corpus. Score drift as a leading indicator. The accountability theater pattern in risk disclosures that precedes institutional failure.
Founding Document Benchmarks
High-integrity calibration corpus: Federalist Papers (91), Gettysburg Address (89), Declaration of Independence. The structural standard that existed before it was measurable — now measured.
Research Partnerships
Academic institutions and independent researchers interested in corpus access, methodology validation, or collaborative research are invited to reach out directly.
Research Inquiry → See Pricing →