About Concave AI

Built for the gap nobody else is filling

An ML-engineer-led AI data company from Bengaluru. We produce the RLHF, NLP, SFT, and GenAI evaluation data that determines whether your AI model works in the real world — with quality metrics you can verify, not just trust.

Start a Conversation → Our Quality Standards

🎯

Founded by an ML engineer, not a BPO operator

Every quality decision is made by someone who understands what the annotation data needs to do downstream in a training pipeline — not a project manager following a checklist.

📊

We publish kappa scores. No competitor does.

Cohen's kappa inter-annotator agreement, gold standard pass rates, batch error logs — shipped with every delivery. You verify the quality claim yourself.

🇮🇳

India-native. India-priced. India-context aware.

Bengaluru-based with Indic language capability across Hindi, Tamil, Telugu, Kannada, Malayalam, Bengali, and more. DPDP Act 2023 compliant. No 11-hour time-zone gap.

Scroll

The Name

What Concave means — and why it matters

A concave shape curves inward — it focuses everything that enters it toward a single point of precision. That is exactly what we do with raw, unstructured data: we curve it inward through expert human judgment and AI-assisted systems until it converges on high-quality, precisely labeled training data.

Concave

Latin: concavus — curved inward, hollow

In optics, a concave lens focuses light to a precise point. In data annotation, we focus raw human feedback, model outputs, and unstructured text to a single, precisely measured quality output — the training data your model actually needs.

There is a version of the AI failure story that every ML engineer knows. The model is trained, deployed, and proceeds to hallucinate, contradict itself, and validate confidently wrong answers with equal confidence. The team reaches for a bigger model, a different architecture, more compute. None of it helps. Because the problem was never in the model.

The problem was in the data. Specifically: who annotated it, how consistently, under what guidelines, with what domain knowledge, measured by what metrics. These are the questions that determine whether a fine-tuned model is genuinely aligned or just statistically plausible.

"The quality of the intelligence you build is exactly the quality of the intelligence you put in."

Concave AI was built to close the gap between the annotation quality that frontier AI labs get from Surge AI — expert-vetted, measured, published — and what was available to Indian AI companies at Indian prices. That gap was, in 2026, still enormous. We are closing it.

We are not a data labeling BPO that pivoted to AI. We are not a crowdsourcing platform. We are an ML-engineer-led AI data company that treats every annotation decision as a training signal and every delivery as a model quality intervention.

Why We Exist

The market gap we are built to close

India's AI companies were left without a credible, technically rigorous, India-based data partner. They were either paying USD rates to Western vendors or accepting inconsistent quality from generic providers.

01 / The problem

Generic vendors claim quality they cannot measure

Indian annotation companies overwhelmingly claim "98% accuracy" — a number that is unmeasurable, unverifiable, and meaningless in practice. Inter-annotator agreement, gold standard monitoring, anomaly detection — these systematic quality controls simply do not exist at most Indian providers. The result is inconsistent data that produces models that are unreliable in production.

02 / The gap

Western providers cannot serve India-context work

Surge AI is exceptional — but US-centric, 11 hours behind, and sized for frontier lab budgets. Scale AI is powerful — but now Meta-affiliated, which makes every Indian AI startup uncomfortable about sending training data there. iMerit is credible — but enterprise-only pricing excludes the Series A Indian AI company that needs 1,000 RLHF pairs at a startup budget, fast.

03 / Our answer

Expert quality, India pricing, published metrics

We apply Surge-level rigor — domain-expert annotators, project-specific guidelines, three-tier QA, published kappa scores — at pricing that works for Indian AI startups. We are in Bengaluru, so the time zone is right. We have native annotators across 8 Indic languages. We are independently owned, so your training data never sees a competitor. And we publish the numbers.

The Problem We Solve

Three types of annotation failure. All preventable.

Every AI model quality problem traces back to one of three annotation failures. We are specifically designed to prevent all three.

Failure Type 1

Sycophancy baked in at the data level

When annotators reward agreeable-sounding responses over accurate ones — even without realising it — the model learns to validate user beliefs rather than provide truthful answers. This is a data-level problem that no amount of RLHF training can fix if the preference data itself is sycophantic. Most annotation teams never measure it. We inject sycophancy traps and measure susceptibility on every project.

Failure Type 2

Domain errors from unqualified annotators

A medical AI trained by annotators who cannot distinguish a contraindication from a side effect. A legal AI trained by annotators who confuse jurisdiction with precedent. A financial AI trained by annotators who cannot read a balance sheet. These are not edge cases — they are the standard outcome of generic crowdsourcing applied to specialist domains. We maintain doctor, lawyer, CA, and engineer annotator pools for exactly this reason.

Our Solution

Expert judgment + published proof

Domain-expert annotators calibrated to a kappa baseline before every project. RLAIF pre-scorer handling clear-cut cases at AI speed. Three-tier QA running concurrently. Gold standard injection catching drift in real time. And a data card with every delivery that shows exactly what you received and how it was produced — so you never have to guess whether the data is good enough.

What We Stand For

Five principles that govern every project

Transparency over claims

If you cannot measure it, we do not claim it

Every delivery includes a QA report with actual Cohen's kappa scores broken down by annotator pair and task category, gold standard accuracy per annotator, and a complete batch error log. You do not have to take our word for the quality — you have the numbers to verify it yourself. No other Indian annotation company does this as a standard deliverable on every project.

ML-native quality design

Built by someone who trains models, not just manages annotators

Our quality systems are designed by an ML engineer who understands what downstream model training needs. We design annotation rubrics that account for reward hacking, sycophancy, and distribution shift. We build automated QA pipelines that catch failures before a human reviewer sees the data. We audit for ML failure modes — not just labeling consistency. This is a different quality model from anything a BPO operator can offer.

Domain expertise is non-negotiable

Generic annotators produce generic results

A healthcare AI trained on annotations by people who cannot read a clinical note will fail at the moments that matter. A legal AI trained on annotations by someone who confuses common law with statute will produce dangerous outputs. We maintain specialist annotator pools — MBBS doctors, practising lawyers, chartered accountants, software engineers, academic NLP researchers — for every domain we serve. Generic crowd access is available to anyone. Expert domain access is our product.

Complete vendor neutrality

Your training data never sees a competitor

When Meta acquired a 49% stake in Scale AI in 2025, Google, OpenAI, and xAI moved their annotation work elsewhere within weeks — because they were not willing to share training data with a competitor's portfolio company. Concave AI is founder-owned, has no investment from any AI lab, cloud provider, or model company, and will never accept such investment. Your proprietary training data is yours and stays yours. DPDP Act 2023 compliant for Indian data subjects.

The feedback loop is the product

We stay until your model actually improves

Most annotation companies deliver a data file and disappear. We build a mandatory model performance follow-up into every engagement — two weeks after delivery, we ask for your benchmark result. If our data produced a measurable improvement, that result becomes the evidence we use to improve the next batch. If it did not produce the expected improvement, we investigate the cause and re-deliver at no cost. This is the loop that turns one project into a partnership.

How We Work

The model behind the quality

We do not employ a 5,000-person annotation workforce on fixed salaries. We maintain a curated network of vetted domain-expert contractors — and use AI to handle the volume, so expert humans can focus exclusively on judgment.

Every project has three layers working in sequence. The RLAIF layer (Claude API as AI evaluator) handles clear-cut tasks at AI speed, pre-scoring RLHF pairs and pre-labelling NLP entities before any human sees them. The human expert layer handles all uncertain cases, edge cases, and domain-sensitive content that requires genuine judgment. The QA layer — three automated + peer + expert tiers — runs concurrently on all live annotation.

The result: AI speed on volume. Human precision on judgment. Published metrics on output. 40–60% faster than pure-human annotation at the same quality level, at 35–45% lower cost. That cost advantage is passed to customers — not kept as margin.

Our annotator network includes specialists across every domain we serve. Each annotator completes a paid calibration task before joining any project. Kappa baseline is established per annotator before live work begins. Anyone below 0.65 kappa on calibration is re-trained or replaced before they annotate a single live task.

MBBS / MD clinicians Practising lawyers (LLB/LLM) Chartered Accountants Software engineers NLP researchers Linguists (8 Indic languages) Financial analysts Academic domain experts

Work distribution — typical RLHF project

🤖 RLAIF pre-scorer — clear preference tasks ~72%

↓

👤 Human expert — uncertain + edge cases ~22%

↓

🔍 3-tier QA review (concurrent) ~6%

Result: 60% faster than pure-human annotation at the same Cohen's kappa score — because humans still validate every annotation before delivery. AI handles volume. Humans handle judgment.

What a typical engagement looks like

Day 1–3Scoping call, SOW signed, NDA executed

Day 3–7Guidelines written, annotators calibrated

Day 7+RLAIF + human annotation + 3-tier QA

Day NData + QA report + data card delivered

+14 daysModel benchmark follow-up call

Our Promise

If the data does not improve your model, we fix it

Every project includes a two-week model performance check-in. If our data contributed to a measurable benchmark improvement — we document it as a case study. If it did not produce the expected improvement — we investigate the cause and re-deliver at no additional cost.

This is not a legal guarantee buried in contract terms. It is a professional commitment that exists because we are confident enough in our quality systems to back them with our time and effort.

Talk to our ML team →

📦

Every delivery: Data + QA Report + Data Card

Not just a data file. A complete, auditable package with kappa scores, gold standard accuracy, batch error logs, annotator demographics, and known limitations.

📞

Day 14 benchmark follow-up — every project

We ask for your model's benchmark result after training on our data. That result improves the next batch. Most annotation companies deliver and disappear. We stay.

🔁

Free re-delivery if quality falls below threshold

If a delivered batch falls below our guaranteed kappa threshold of 0.70, we investigate and re-deliver the affected portions at no cost. This has never happened — but the policy exists.

🔒

Complete data confidentiality — DPDP + GDPR ready

Encrypted S3 storage, signed NDAs for all annotators, named-access-only policies, and DPDP Act 2023 compliant data handling. Your training data never leaves your encrypted bucket.