Service - Factual Accuracy

Hallucination Detection

Claim-by-claim verification of AI outputs against source documents or domain knowledge. Our ML pipeline auto-extracts every factual claim; expert annotators verify each one. Delivered with a hallucination rate breakdown by category and severity so you know exactly what your model gets wrong and how badly.

Get 50 Outputs Audited Free → View Pricing

Claim-by-Claim

Every factual claim extracted and individually verified not just document-level review

4-Tier

Severity classification: Verified, Minor, Significant, and Critical hallucination

5 Days

Turnaround on free 50-output audit no commitment required

Domain

Expert verifiers matched to your domain legal, financial, automative, gen AI

Scroll

✓ VERIFIED CLAIM

✗ HALLUCINATED

? UNSUPPORTED

~ PARTIAL MATCH

▼ CLAIM VERIFICATION RATE

0%68% VERIFIED100%

⚠ 14.2% HALLUCINATION RATE

Factual Accuracy - What It Is

Claim-by-claim verification, to confabulate AI models

Hallucination is not a vague quality problem it is a specific, measurable failure when a AI model generates a factual claim that is false, unsupported, or fabricated. A Claim-by-claim verification is the only way to measure hallucination rate with the precision needed to act on it.Every factual claim extracted by our ML pipeline is verified and delivered with severity tiering and hallucination rate by category.

Get a Free Audit →

Live Annotation Interface

Claim-Level Hallucination Verification Tool

Annotators decompose AI responses into atomic claims and verify each against source documents building grounded training signal for factuality-focused fine-tuning.

ConcaveLabel Studio Hallucination Eval · Domain: Medical · Response #2,841

SOURCE: WHO Technical Report on Metformin Chapter 4, Dosage Guidelines, 2023 Ed.

"Metformin is the first-line pharmacological treatment for Type 2 diabetes in most clinical guidelines."

↳ Source: WHO Report §4.1confirmed verbatim

VERIFIED ✓

"The standard adult dosage of Metformin is 500mg three times daily, with a maximum of 3,000mg per day."

↳ Source: WHO Report §4.3 states max is 2,550mg/day figure hallucinated

HALLUCINATED ✗

"Metformin is contraindicated in patients with severe renal impairment (eGFR < 30 mL/min/1.73m²)."

↳ Source: WHO Report §4. confirmed with exact threshold

VERIFIED ✓

"Recent studies show Metformin may reduce the risk of certain cancers by up to 30%."

↳ Not present in source document extrapolation from unspecified studies

UNSUPPORTED ?

"Gastrointestinal side effects are the most common reason for Metformin discontinuation."

↳ Source §4.8 confirms GI effects are common but doesn't rank discontinuation reasons

PARTIAL ~

How It Works

Three things the pipeline does on every hallucination audit

Claim-level factual decomposition

Responses broken into individual factual claims before evaluation, no holistic scoring that misses individual hallucinations buried in otherwise accurate text. Claim boundaries logged with every delivery.

Source attribution verification

Every claim verified against grounding documents, knowledge cutoff, or cited source. Three-tier severity classification: factual error, unsupported claim, and misleading framing each handled differently in the remediation output.

Severity scores published per batch

Hallucination rate by claim type, domain, and output category which are verifiable against the raw evidence logs, not just a summary score. Every flagged claim includes the source passage it contradicts.

Pipeline Capabilities

What the infrastructure delivers

Domain-Expert Routing

Factual claims route to reviewers with appropriate domain credentials—medical claims to licensed practitioners, legal claims to qualified lawyers, technical claims to subject specialists.

Sentence-Level Error Taxonomy

Every hallucination is classified by error type—factual error, outdated claim, entity confusion, fabricated citation—with severity grading per domain context.

Calibrated Uncertainty Pairs

Preferred responses flag model uncertainty rather than asserting false confidence, training the model to say "I don't know" when the correct answer is genuinely unavailable.

Hallucination Types

Six hallucination patterns we systematically detect

🔮

Fabrication

Completely invented facts entities, statistics, citations, or events that do not exist. The most dangerous hallucination type because the claim is entirely false with no basis in reality.

Example: "According to the WHO 2023 study by Dr. Patel et al. (n=12,000)..." where the study does not exist

🔀

Substitution

Real entities or facts that are swapped or misattributed real names, real studies, or real statistics applied to the wrong context, year, or conclusion.

Example: Correctly citing a real study but stating the wrong finding, or attributing one author's work to another

📅

Temporal Drift

Presenting outdated information as current regulatory requirements that have changed, drug dosages that have been revised, case law that has been overturned, or statistics from a prior year presented as the current figure.

Example: "The current VAT rate for X is 20%" when the rate was revised to 17% in a recent budget

🔢

Numerical Error

Incorrect figures, statistics, calculations, or quantities even small errors in medical dosages, financial figures, or legal penalties can have significant consequences.

Example: Correct drug name, correct indication, but wrong recommended dosage or wrong contraindication threshold

🌀

Over-generalisation

A claim that was true in a specific, limited context is stated as universally applicable applying findings from one population to all populations, or applying a rule that has many exceptions without noting them.

Example: "All Type 2 diabetes patients should avoid X" when the actual guidance is conditional on comorbidities

📚

RAG Faithfulness Failure

In retrieval-augmented systems, the generation contradicts or diverges from the retrieved source documents the model ignores or misrepresents its own retrieved context to produce a more "fluent" response.

Example: Retrieved document says "X is contraindicated in pregnancy" model generates "X is safe during pregnancy" for a smoother response flow

What You Get

Claim-level evidence, not an impressionistic score

Per-Claim Verification Report

Every factual claim extracted from your AI outputs, with: verification status, severity tier, correct version (where wrong), authoritative source citation, domain category, and hallucination type classification. Delivered as structured JSON and formatted PDF.

Hallucination Rate Analytics

Overall hallucination rate, broken down by: severity tier (Critical/Significant/Minor), hallucination type (fabrication/substitution/temporal/numerical), domain category, output position (where in responses hallucinations concentrate), and query type (which prompts trigger highest hallucination rates).

Corrective Training Pairs

RLHF preference pairs for every Critical and Significant finding rejected output is the hallucinating version, chosen response is a domain-expert verified correct alternative. Ready to integrate into your RLHF or DPO training pipeline to improve factual accuracy.

Pricing

Per-output
transparent pricing

Priced per AI output evaluated, based on domain and average claim density. Volume discounts at 500+ outputs. Free 50-output audit with no commitment.

Get 50 Outputs Audited Free →

General-purpose outputs$3.50–6 / output

Technical / code outputs$5–8 / output

Medical / legal / financial outputs$10–18 / output

RAG faithfulness evaluation$6–12 / output

Free audit50 outputs / $0

Free audit turnaround5 working days