The case for an AI translator on the long tail of breed evaluations

Tier 1 native ingestors for the top breed societies. Tier 2 Claude-backed translator for everything else. Tier 3 direct partnerships for the strategic relationships. Why this three-tier architecture is the right answer for global coverage — and how the AI layer actually works.

Building Genemap as a globally-deployable platform meant solving a problem that's harder than it sounds: every breed society publishes its evaluation differently. Trait codes differ, units differ, accuracy notation differs, even what "fertility" or "growth" means varies between systems. BREEDPLAN's WW is not the same as IGS's WW, which is not the same as ICBF's Weaning weight (kg). And every one of these systems has its own published reference table format, its own per-animal page layout, its own update cadence.

You could hand-write a parser for every breed society on earth. The big ones make commercial sense — they publish enough data and have enough producer volume to justify the engineering investment. But the long tail is a different problem. There are hundreds of regional breed associations, breed sub-evaluations, country-specific terminal-sire indexes — and writing a custom parser for each one before producers can use them is the kind of work that never gets finished.

So Genemap doesn't try. Instead, the platform runs three tiers in parallel.

The three-tier architecture

Tier 1 — Native ingestors

Hand-written parser per source. Deterministic, well-tested, refreshed nightly. ~20 sources covered: BREEDPLAN, TACE, AAA, IGS, AHA, ICBF, Signet, NAV, Logix Beef, ANCP, KAPE, Sheep Genetics, SIL, Sheep Ireland, NSIP, Fjárvís etc.

Tier 2 — AI translator

Claude-backed dispatcher for any source not in Tier 1. Reads the breed society's published reference tables on demand and normalises into Genemap's canonical trait vocabulary. Confidence-scored, cached.

iii

Tier 3 — Direct partnership

For strategic relationships: direct data-share agreement, partner-key API access, bidirectional webhook flow. Bulk per-animal downloads under commercial licence.

This isn't a novel architecture — it's the obvious one once you accept that "every breed society's published format is its own snowflake." What's novel is that Tier 2 is actually usable in production. Five years ago, Tier 2 would have been hand-written rules with confidence scores and a mountain of edge cases. Today it's a single Edge Function calling Claude with a tightly-scoped prompt.

How the AI translator actually works

The eval-translate Edge Function takes a single trait observation from any breed society's output and normalises it into Genemap's canonical representation. The contract:

// Request
{
  "system":         "IBOVAL_Idele",     // long-tail source
  "source_trait":   "PAm",
  "source_value":   0.32,
  "source_units":   "kg/d",
  "accuracy":       0.78,
  "canonical_traits": { ... }           // Genemap's trait dictionary
}

// Response
{
  "trait":      "WW",                    // canonical Genemap code
  "value":      64.0,                    // converted to canonical units
  "units":      "kg",
  "accuracy":   0.78,
  "percentile": 62,
  "confidence": 0.92,                    // Claude's self-reported
  "cached":     false
}

The function does four things in sequence:

Hash the input (system + trait + value + units) into a deterministic cache key. Same input → instant cache hit, never re-calls Claude.
Build the translation prompt with the canonical trait vocabulary inline, plus explicit rules about preserving EBV vs EPD vs ASBV semantics (never silently doubling an EPD to mimic an EBV, for example).
Call Claude via the Anthropic API with the prompt. Single round-trip, ~200ms typical latency.
Persist the result in the eval_translations table. Confidence below 0.7 is flagged for human review; everything above flows straight into producer accounts.

The key engineering decision is what's not sent to Claude. The Edge Function only ever sees the raw trait code, value and units of the breed-society record. Producer operation data, costs, slaughter outcomes, location, producer identity — none of that goes to Anthropic. Producers can trust that the AI translator is reading the breed society's published data, not their farm.

Why preserve the semantics is the hard part

The naive version of this — "give me a number that means weaning weight" — fails spectacularly. Breed-society evaluations have rich semantics that determine how the number should be used downstream:

EBV (Estimated Breeding Value) is an animal's own genetic merit; full transmitting ability halved for progeny.
EPD (Expected Progeny Difference) is the expected difference in progeny — roughly half an EBV. AAA / IGS / AHA publish EPDs; BREEDPLAN publishes EBVs.
ASBV (Australian Sheep Breeding Value) is Sheep Genetics' EBV equivalent for sheep.
g(E)EBV or GE-EPD are the genomically-enhanced variants — single-step BLUP outputs.

If the AI translator silently doubles an EPD to mimic an EBV — or worse, halves an EBV to mimic an EPD — every downstream calculation breaks. The selection index dot-product will be off by 2×, the ranking order will be wrong, the dollar values will be wrong. Producers won't know.

The eval-translate prompt is explicit about this. The canonical Genemap trait set carries explicit "expressed as EBV/EPD/ASBV" semantics, and Claude is instructed to preserve the source system's native expression and never auto-scale. The UI surfaces this prominently — when an Angus Australia EBV sits next to an AAA EPD in the same ranking view, both are labelled and the comparison is done at the percentile level, not at the raw value.

Auto-scaling EBV to EPD is the most dangerous thing a translator can silently do. Genemap's translator is wired to refuse.

Confidence scoring and the review queue

Claude's self-reported confidence isn't perfect, but it's surprisingly well-calibrated for this task. Across the first thousand translations we ran in development:

Confidence ≥ 0.9 — translations were correct in >98% of cases (manually audited).
Confidence 0.7–0.9 — translations were correct in ~92% of cases, with errors mostly being unit conversions or close-but-not-identical trait mappings.
Confidence < 0.7 — flagged for human review. About 4% of translations fall in this band; they're typically traits with no clear Genemap equivalent (very specialised regional traits) or sources with ambiguous reference tables.

The review queue is small enough that one part-time human (us) can keep on top of it. Each reviewed entry becomes a permanent cached mapping; the same input from the same source never re-calls Claude again. Over time, the long-tail catalogue of "known good" translations grows, the cache hit rate climbs, and the AI cost-per-translation falls.

Why not just bilateral integrations everywhere?

Two reasons. First, economics. A native ingestor costs roughly two engineer-days to build and a few hours per year to maintain. For a breed society with 50,000 producer-relevant animals, that's good value. For a breed sub-association with 800 animals, it's not — the producer would wait years for their breed to reach the priority queue. The AI translator gives them coverage today.

Second, publishers' terms of service. Most breed societies' ToS prohibit "systematic scraping" or "automated extraction for commercial redistribution." A native ingestor that bulk-pulls per-animal pages crosses this line; a direct data-share agreement is the right response. The AI translator, run on an as-needed basis by an authenticated producer who has authorisation to view the published data, is a fundamentally different posture — it's user-agent-on-behalf-of-user fetching, generally permissible across most publishers' terms.

Tier 1 is for the publishers where we hold direct data agreements. Tier 2 is for the publishers we don't have agreements with — but where the producer is the authorised viewer. Tier 3 converts willing Tier 2 publishers into Tier 1 as relationships develop.

How a new country onboards

This is the part that makes the architecture actually scale. When a producer in (say) Czech Republic — covered by neither Tier 1 nor any current Tier 3 partnership — signs up:

The country gets added to the registry (or already exists; CZ has its own ISO-2 code and basic registry entry).
They configure their setup: country + town + production system + species. Climate falls through to Copernicus ERA5 or NASA POWER; markets fall through to the nearest configured market source (likely EU MMO for beef, EU pricing for sheep).
They upload a catalogue from their breed association (say, the Czech Black Pied or a local Mendelian / Plemdat evaluation).
The AI translator reads each trait, maps it to canonical, scores confidence, and ranks the catalogue against the producer's per-farm bioeconomic weights.
Low-confidence entries surface a "review needed" badge in the UI. The producer can override or accept.

Day one, this Czech producer has a working per-farm index based on AI-translated evaluations and country-fallback market data. As the system grows, we add native ingestors for the high-traffic Czech sources, climate moves to a Czech-specific feed, and the producer's experience progressively gets better — without anything ever breaking.

The cost and latency picture

A single eval-translate call costs about 0.0003 USD on Claude Sonnet 4.6 and returns in ~200ms. Per producer-account, the translation cost is meaningful only on initial catalogue upload (one call per trait per animal — say, 12 traits × 200 animals × $0.0003 = ~$0.72 per upload). After that, the cache absorbs essentially everything; the same source-system + trait-code + value combination never re-hits Claude.

At scale this is cheap enough that we don't meter or surcharge it. A producer who switches breed-society source mid-year pays nothing extra for the migration; the long-tail of obscure breed associations gets supported without raising the platform's marginal cost per producer.

The wider point

The platform-level claim of "every evaluation system, every country" is only credible if you have an answer for the long tail. Twenty years ago that claim wouldn't have been credible at all — the long-tail integration cost would have been prohibitive. Today, with cheap, well-calibrated LLM translation, the claim is straightforwardly defensible.

Three takeaways

Tier 1 covers the breadth of producer volume. ~20 native ingestors capture roughly 85% of producer cases globally — every commercial-scale producer in AU, US, NZ, UK, IE, FR, BR, KR, etc. is on one of these.
Tier 2 covers the breadth of breed diversity. Every breed sub-association, regional evaluation, country-specific terminal-sire index. Producers in obscure-breed niches get production-quality translation on day one.
EBV vs EPD vs ASBV semantics are preserved. The translator never silently scales between systems; cross-system comparisons happen at the percentile band, not at raw values.

The AI translator is one of the parts of Genemap that feels small in the codebase — about 200 lines of Edge Function — but disproportionately important architecturally. It's the difference between "a platform that works in eighteen countries" and "a platform that works wherever a producer needs it." The technology that makes Tier 2 viable is recent; the architecture that puts it in the right place is what makes the platform globally credible today.

Try the translator demo at /translator-demo.html — pick a long-tail source like Czech Mendel or French IBOVAL, enter a trait code, watch the translation arrive with a confidence score. Or read about the integration architecture more broadly on the partners page.

Try the AI translator yourself.

Live demo, no sign-up. Pick a long-tail breed society, watch the translation arrive.

Open AI demo →

Filed under Insights · Engine notes · AI ← All insights

The case for an AI translator on the long tail of breed evaluations.