Skip to content

Co-founder · Principal engineer

Veridi

An agentic fact-checking methodology with four verification depth tiers, specialist routing, gaming countermeasures, and Brier-scored calibration. Shipped as a publicly-available framework.

2026-present

The problem

Fact-checking at scale breaks in predictable ways. Single-pass LLM verification often produces plausible-sounding but wrong answers. Retrieval-augmented approaches inherit source bias and struggle with adversarial prompts. And most verification pipelines are uncalibrated: there may be a confidence score, but being ungrounded, it doesn’t reliably reflect accuracy.

Veridi is our answer to those three failures. It’s an agentic fact-checking methodology and implementation pipeline, designed so verification is structured, auditable, and calibrated, rather than a single LLM call wearing a lab coat.

The pipeline

Seven agents (A1 through A7) run on FastAPI with the Anthropic SDK and a local SQLite store. Each has a defined role, a handoff contract with the next agent, and a documented failure mode. Four verification depth tiers let a caller trade latency for rigor. Routing across eight domains keeps each agent’s context focused. Eleven gaming countermeasures block known manipulation patterns. Every output is Brier-scored, so stated confidence is measurable against observed accuracy.

What I’ve built

Architecture, engineering, and methodology, in roughly that order. The technical layer is a seven-agent FastAPI pipeline with explicit scaffold contracts and a cost model that accounts for API, storage, and compute per claim. The methodology layer is a 12,000-line corpus across 19 documents: eight domain specialists, a four-tier source hierarchy, eleven gaming countermeasures, and the Brier calibration framework. The product specification sits at 1,582 lines in version 1.3, covering failure modes, cost, and validation protocol; iteration continues.

Validation

We published an empirical validation report covering 97 claims across 8 domains, 11 attack vectors, and 4 languages (Japanese, Turkish, Chinese, Hindi). Results: 96 pass, 1 partial, 0 fail. The same protocol runs as behavioral regression across prompt and scaffold changes, so silent degradation is detectable before it ships.

What’s next

Production service launches in April 2026, pending API provisioning; entity formation as a Canadian nonprofit to come. Public framework, public methodology, public validation corpus.