Service - Factual Accuracy

Hallucination Detection

Claim-by-claim verification of AI outputs against source documents or domain knowledge. Our ML pipeline auto-extracts every factual claim; expert annotators verify each one. Delivered with a hallucination rate breakdown by category and severity so you know exactly what your model gets wrong and how badly.

Claim-by-Claim
Every factual claim extracted and individually verified not just document-level review
4-Tier
Severity classification: Verified, Minor, Significant, and Critical hallucination
5 Days
Turnaround on free 50-output audit no commitment required
Domain
Expert verifiers matched to your domain legal, financial, automative, gen AI
Scroll
Claim ExtractionFact VerificationSeverity ScoringSource AttributionDomain Expert ReviewFabrication DetectionRAG FaithfulnessCitation AccuracyClaim ExtractionFact VerificationSeverity ScoringSource AttributionDomain Expert ReviewFabrication Detection
Hallucination Detection Annotation
✓ VERIFIED CLAIM
✗ HALLUCINATED
? UNSUPPORTED
~ PARTIAL MATCH
▼ CLAIM VERIFICATION RATE
0%68% VERIFIED100%
⚠ 14.2% HALLUCINATION RATE
Factual Accuracy - What It Is

Claim-by-claim verification, to confabulate AI models

Hallucination is not a vague quality problem it is a specific, measurable failure when a AI model generates a factual claim that is false, unsupported, or fabricated. A Claim-by-claim verification is the only way to measure hallucination rate with the precision needed to act on it.Every factual claim extracted by our ML pipeline is verified and delivered with severity tiering and hallucination rate by category.

Get a Free Audit →
Live Annotation Interface

Claim-Level Hallucination Verification Tool

Annotators decompose AI responses into atomic claims and verify each against source documents building grounded training signal for factuality-focused fine-tuning.

ConcaveLabel Studio Hallucination Eval · Domain: Medical · Response #2,841
SOURCE: WHO Technical Report on Metformin Chapter 4, Dosage Guidelines, 2023 Ed.
"Metformin is the first-line pharmacological treatment for Type 2 diabetes in most clinical guidelines."
↳ Source: WHO Report §4.1confirmed verbatim
VERIFIED ✓
"The standard adult dosage of Metformin is 500mg three times daily, with a maximum of 3,000mg per day."
↳ Source: WHO Report §4.3 states max is 2,550mg/day figure hallucinated
HALLUCINATED ✗
"Metformin is contraindicated in patients with severe renal impairment (eGFR < 30 mL/min/1.73m²)."
↳ Source: WHO Report §4. confirmed with exact threshold
VERIFIED ✓
"Recent studies show Metformin may reduce the risk of certain cancers by up to 30%."
↳ Not present in source document extrapolation from unspecified studies
UNSUPPORTED ?
"Gastrointestinal side effects are the most common reason for Metformin discontinuation."
↳ Source §4.8 confirms GI effects are common but doesn't rank discontinuation reasons
PARTIAL ~
How It Works

Three things the pipeline does on every hallucination audit

Claim-level factual decomposition
Responses broken into individual factual claims before evaluation, no holistic scoring that misses individual hallucinations buried in otherwise accurate text. Claim boundaries logged with every delivery.
Source attribution verification
Every claim verified against grounding documents, knowledge cutoff, or cited source. Three-tier severity classification: factual error, unsupported claim, and misleading framing each handled differently in the remediation output.
Severity scores published per batch
Hallucination rate by claim type, domain, and output category which are verifiable against the raw evidence logs, not just a summary score. Every flagged claim includes the source passage it contradicts.
Pipeline Capabilities

What the infrastructure delivers

Domain-Expert Routing
Factual claims route to reviewers with appropriate domain credentials—medical claims to licensed practitioners, legal claims to qualified lawyers, technical claims to subject specialists.
Sentence-Level Error Taxonomy
Every hallucination is classified by error type—factual error, outdated claim, entity confusion, fabricated citation—with severity grading per domain context.
Calibrated Uncertainty Pairs
Preferred responses flag model uncertainty rather than asserting false confidence, training the model to say "I don't know" when the correct answer is genuinely unavailable.
Hallucination Types

Six hallucination patterns we systematically detect

🔮
Fabrication
Completely invented facts entities, statistics, citations, or events that do not exist. The most dangerous hallucination type because the claim is entirely false with no basis in reality.
Example: "According to the WHO 2023 study by Dr. Patel et al. (n=12,000)..." where the study does not exist
🔀
Substitution
Real entities or facts that are swapped or misattributed real names, real studies, or real statistics applied to the wrong context, year, or conclusion.
Example: Correctly citing a real study but stating the wrong finding, or attributing one author's work to another
📅
Temporal Drift
Presenting outdated information as current regulatory requirements that have changed, drug dosages that have been revised, case law that has been overturned, or statistics from a prior year presented as the current figure.
Example: "The current VAT rate for X is 20%" when the rate was revised to 17% in a recent budget
🔢
Numerical Error
Incorrect figures, statistics, calculations, or quantities even small errors in medical dosages, financial figures, or legal penalties can have significant consequences.
Example: Correct drug name, correct indication, but wrong recommended dosage or wrong contraindication threshold
🌀
Over-generalisation
A claim that was true in a specific, limited context is stated as universally applicable applying findings from one population to all populations, or applying a rule that has many exceptions without noting them.
Example: "All Type 2 diabetes patients should avoid X" when the actual guidance is conditional on comorbidities
📚
RAG Faithfulness Failure
In retrieval-augmented systems, the generation contradicts or diverges from the retrieved source documents the model ignores or misrepresents its own retrieved context to produce a more "fluent" response.
Example: Retrieved document says "X is contraindicated in pregnancy" model generates "X is safe during pregnancy" for a smoother response flow
What You Get

Claim-level evidence, not an impressionistic score

Per-Claim Verification Report
Every factual claim extracted from your AI outputs, with: verification status, severity tier, correct version (where wrong), authoritative source citation, domain category, and hallucination type classification. Delivered as structured JSON and formatted PDF.
Hallucination Rate Analytics
Overall hallucination rate, broken down by: severity tier (Critical/Significant/Minor), hallucination type (fabrication/substitution/temporal/numerical), domain category, output position (where in responses hallucinations concentrate), and query type (which prompts trigger highest hallucination rates).
Corrective Training Pairs
RLHF preference pairs for every Critical and Significant finding rejected output is the hallucinating version, chosen response is a domain-expert verified correct alternative. Ready to integrate into your RLHF or DPO training pipeline to improve factual accuracy.
Pricing

Per-output
transparent pricing

Priced per AI output evaluated, based on domain and average claim density. Volume discounts at 500+ outputs. Free 50-output audit with no commitment.

Get 50 Outputs Audited Free →
General-purpose outputs$3.50–6 / output
Technical / code outputs$5–8 / output
Medical / legal / financial outputs$10–18 / output
RAG faithfulness evaluation$6–12 / output
Free audit50 outputs / $0
Free audit turnaround5 working days

Find out your model's hallucination rate free

Send us 50 of your AI model's outputs. We will return a complete claim-by-claim verification report with hallucination rate and severity breakdown in 5 working days. No cost, no commitment.