LAIT · A Reader Study of Literary AI Translation

AI translation of literary texts is ‘fine’, but readers still prefer human translations

Yves Ferstler · Adam Podoxin · Ty Brassington · Roman Grundkiewicz · Maite Taboada · Marzena Karpinska

Simon Fraser University · Université du Québec à Montréal · Microsoft

We asked 15 avid readers to compare recently published human translations into English with machine translations from an agentic LLM pipeline, across 15 recent novels originally written in French, Polish, and Japanese.

One recurring observation is that both human and machine translations vary in quality, but machine translation varies more: human translations tend to be relatively stable across a novel, whereas machine translation quality can fluctuate more from chunk to chunk within a single book.

Every leaf is one reader's comment · 952 in all · 15 readers · 15 novels · 3 source languages · all translations into English

Read the paper soon Code on GitHub

Showing reader voices about for novels originally in ·

Human translation (HT) Machine translation (MT) Hover or tap of the "leaf" to unfold a comment on English translation

How the study worked

Evaluation pipeline

Each reader read two full English translations of a novel excerpt (8,000 words), one at a time, rating each for quality and leaving comments. They then compared the two human and machine versions side by side, giving relative ratings and comments. After a one-day break, they compared shorter chunks (300 words) side by side, highlighting wording and giving their preference with a justification.

HT — human MT — machine good poor

HT vs MT

French Polish Japanese → English · 5 novels each

15readers

2readers / book

2books / reader

~8,000words / excerpt

blind evaluation · reading order counterbalanced

Immersive reading

15 whole ~8,000-word excerpts × 2 readers

1

Read both versions

Each excerpt in full, one at a time.
2

Rate each

8 questions · human or AI?
3

Compare them

Which is better? Why? Which is AI?

One-day break 24 hours — the close-reading tool stayed locked.

Close reading

386 chunks of ~300 words, side by side × 2 readers

4

Read side by side

Same chunk, both versions.
5

Highlight wording

good · poor
6

Pick the better one & say why

Was it a difficult choice?

Fifteen readers produced…

30excerpt comparisons

60single readings

772chunk comparisons

952reader comments

7,234span highlights

Preferences

Overall, readers prefer the human translation

Readers leaned human when reading whole excerpts, and more clearly when comparing shorter chunks. Within each bar, the deeper the colour, the more decisive the choice.

Immersive Reading · 5-point scale

The human translation scored higher on all four qualities

Each property is rated on a 5-point scale. The gold dot is the human-translation mean, the blue dot is the machine mean; the positive gap between them is the human lead.

MT identification

Readers cannot reliably detect MT

AI detection accuracy stayed low for both excerpt-level single reading and comparison stages.

Highlights

7,234 reader highlights

Reading closely, readers marked spans as especially good or poor. The human translation received significantly more good highlights than the machine translation. In the visualization below each tile is about 20 highlights.

HT Human translation

MT Machine translation

Marked good Marked poor 1 tile ≈ 20 highlights

Comments

All 952 reader comments

All free-text comments collected from readers during the experiments. Readers were asked to comment on the good and bad qualities of both translations, explain why they prefer one excerpt over the other, why they think a translation is AI, and why one chunk-level translation is better than the other.

Fifteen Novels

Ratings by book

We evaluate five novels per source language, each being rated by two readers. Select the card to see details.

Case Study · Beyond English

Does it hold in other languages?

The Dataset

Café au LAIT

We present the LAIT dataset, which contains readers' perspectives on the quality of human and machine translation of literary texts.

We do not release human translations or source texts. Researchers who would like to gain access to the entire dataset, for research purposes only, should apply using the button below.

Apply for dataset access →

AI translation of literary texts is ‘fine’, but readers still prefer human translations

Evaluation pipeline

Read both versions

Rate each

Compare them

Read side by side

Highlight wording

Pick the better one & say why