LAIT · A Reader Study of Literary AI Translation
AI translation of literary texts is ‘fine’, but readers still prefer human translations
Simon Fraser University · Université du Québec à Montréal · Microsoft
We asked 15 avid readers to compare recently published human translations into English with machine translations from an agentic LLM pipeline, across 15 recent novels originally written in French, Polish, and Japanese.
One recurring observation is that both human and machine translations vary in quality, but machine translation varies more: human translations tend to be relatively stable across a novel, whereas machine translation quality can fluctuate more from chunk to chunk within a single book.
Every leaf is one reader's comment · 952 in all · 15 readers · 15 novels · 3 source languages · all translations into English
Showing reader voices about for novels originally in ·
Human translation (HT) Machine translation (MT) Hover or tap of the "leaf" to unfold a comment on English translation
How the study worked
Evaluation pipeline
Each reader read two full English translations of a novel excerpt (8,000 words), one at a time, rating each for quality and leaving comments. They then compared the two human and machine versions side by side, giving relative ratings and comments. After a one-day break, they compared shorter chunks (300 words) side by side, highlighting wording and giving their preference with a justification.
HT — human MT — machine good poor
French Polish Japanese → English · 5 novels each
blind evaluation · reading order counterbalanced
Immersive reading
15 whole ~8,000-word excerpts × 2 readers
-
1
Read both versions
Each excerpt in full, one at a time.
-
2
Rate each
8 questions · human or AI?
-
3
Compare them
Which is better? Why? Which is AI?
Close reading
386 chunks of ~300 words, side by side × 2 readers
-
4
Read side by side
Same chunk, both versions.
-
5
Highlight wording
good · poor
-
6
Pick the better one & say why
Was it a difficult choice?
Fifteen readers produced…
Preferences
Overall, readers prefer the human translation
Readers leaned human when reading whole excerpts, and more clearly when comparing shorter chunks. Within each bar, the deeper the colour, the more decisive the choice.
Immersive Reading · 5-point scale
The human translation scored higher on all four qualities
Each property is rated on a 5-point scale. The gold dot is the human-translation mean, the blue dot is the machine mean; the positive gap between them is the human lead.
MT identification
Readers cannot reliably detect MT
AI detection accuracy stayed low for both excerpt-level single reading and comparison stages.
Highlights
7,234 reader highlights
Reading closely, readers marked spans as especially good or poor. The human translation received significantly more good highlights than the machine translation. In the visualization below each tile is about 20 highlights.
HT Human translation
MT Machine translation
Fifteen Novels
Ratings by book
We evaluate five novels per source language, each being rated by two readers. Select the card to see details.
Case Study · Beyond English
Does it hold in other languages?
The Dataset
Café au LAIT
We present the LAIT dataset, which contains readers' perspectives on the quality of human and machine translation of literary texts.
We do not release human translations or source texts. Researchers who would like to gain access to the entire dataset, for research purposes only, should apply using the button below.
Comments
All 952 reader comments
All free-text comments collected from readers during the experiments. Readers were asked to comment on the good and bad qualities of both translations, explain why they prefer one excerpt over the other, why they think a translation is AI, and why one chunk-level translation is better than the other.