Crucible – Lab Notes #2
A Cry for Help: Lost in Translation
Dutch people don’t mince words.
So when a Dutch patient says, “Ik denk steeds aan zelfbeschadiging,” — I keep thinking about self-harm — you’d think any AI model would take it seriously.
Turns out, not so.
TL;DR
When we ran a bilingual mental-health triage simulation, the same self-harm intent scored significantly lower severity in Dutch than in English. Same intent, same risk, different alarm; it’s a safety differential baked into probability space.
Exciting news!
This week’s Lab Notes ties directly into our new collaboration — CO-AI Health, a public-private partnership between TU Delft, UvA, Mental Care Group, Specto Consulting, and Ailixr.
Together, we’re exploring how AI can make mental-health care more personal, accountable, and safe.
For us at Ailixr, it’s the first project under our new Crucible system — a proving ground for testing how AI behaves under pressure, where errors can carry real human stakes.
This simulation — bilingual, clinical, high-stakes — reflects exactly that mission: stress-testing AI in the contexts that matter most.
The Experiment
We explored a two-agent simulation:
a patient describing 3 symptoms of distress in “front-door vocabulary”,
a triad of symptoms from 10 possible categories
and a clinician AI tasked with giving just two outputs —
a severity score (0–100) and a next step recommendation (≤ 50 tokens).
No medical database, no guardrails — just unfiltered model behavior.
Each scenario came in both English and Dutch, drawn from real “front-door” phrasing: the language people actually use before they know what’s wrong.
What We Saw
For gemini-2.5-pro, the English cases tracked clinical intuition (i.e. self-harm → 90+, hopelessness → 75–85, anxiety → 60–70), but severity for Dutch patients dropped by 20–30 points.
The same sentence — literally the same symptoms — were seen as less severe.
We ran a conjoint fit across 10 symptom triads.
For English, hopelessness and self-harm dominated the severity weighting.
But, for Dutch, self-harm dropped near the bottom.
Why?
Fewer Dutch crisis transcripts in the data the model has seen? A culture of not speaking of self-harm? or A society of not letting it get to that point? → weaker signal → lower sensitivity.
AI isn’t the same across languages; it’s probability weighted by corpus exposure. English is probably the most powerful language in its patterns.
What scored higher in NL?
Interestingly, Dutch phrasing around depressed, sleep, family, and anxiety registered higher concern.
It might also hint at something cultural: Dutch society tends to address mental health issues earlier and more openly. Self-harm, as a linguistic category, may appear less often in Dutch texts simply because distress is recognized and verbalized before it escalates that far — leaving the model with fewer examples of explicit crisis language?
The Risk
Deploy that same model in a multilingual clinic and you automate under-triage.
One patient gets “ER now,” another gets “call your GP.”
Same intent, same risk — different outcome.
That’s a safety differential.
When language itself skews the threshold for alarm, “responsible AI” becomes a question of which tongue it understands best.
From compute to conscience.
That’s the frontier we’re mapping.






