LAIT · A Reader Study of Literary AI Translation

AI translation of literary texts is ‘fine’, but readers still prefer human translations

Yves Ferstler · Adam Podoxin · Ty Brassington · Roman Grundkiewicz · Maite Taboada · Marzena Karpinska

Simon Fraser University · Université du Québec à Montréal · Microsoft

We asked 15 avid readers to compare recently published human translations into English with machine translations from an agentic LLM pipeline, across 15 recent novels originally written in French, Polish, and Japanese.

One recurring observation is that both human and machine translations vary in quality, but machine translation varies more: human translations tend to be relatively stable across a novel, whereas machine translation quality can fluctuate more from chunk to chunk within a single book.

Every leaf is one reader's comment · 952 in all · 15 readers · 15 novels · 3 source languages · all translations into English

A tree of reader voices An ink-drawn tree whose left limb is drawn thicker than the right because readers preferred the human translation. Sampled reader comments appear as paper tags on the branches. The full archive of all 952 comments is available as a list further down the page.

Showing reader voices about for novels originally in ·

Human translation (HT) Machine translation (MT) Hover or tap of the "leaf" to unfold a comment on English translation

How the study worked

Evaluation pipeline

Each reader read two full English translations of a novel excerpt (8,000 words), one at a time, rating each for quality and leaving comments. They then compared the two human and machine versions side by side, giving relative ratings and comments. After a one-day break, they compared shorter chunks (300 words) side by side, highlighting wording and giving their preference with a justification.

HT — human MT — machine good poor

French Polish Japanese → English · 5 novels each

15readers
2readers / book
2books / reader
~8,000words / excerpt

blind evaluation · reading order counterbalanced

Immersive reading

15 whole ~8,000-word excerpts × 2 readers

  1. 1

    Read both versions

    Each excerpt in full, one at a time.

  2. 2

    Rate each

    8 questions · human or AI?

  3. 3

    Compare them

    Which is better? Why? Which is AI?

Close reading

386 chunks of ~300 words, side by side × 2 readers

  1. 4

    Read side by side

    Same chunk, both versions.

  2. 5

    Highlight wording

    good · poor

  3. 6

    Pick the better one & say why

    Was it a difficult choice?

Fifteen readers produced…

30excerpt comparisons
60single readings
772chunk comparisons
952reader comments
7,234span highlights

Preferences

Overall, readers prefer the human translation

Readers leaned human when reading whole excerpts, and more clearly when comparing shorter chunks. Within each bar, the deeper the colour, the more decisive the choice.

Immersive Reading · 5-point scale

The human translation scored higher on all four qualities

Each property is rated on a 5-point scale. The gold dot is the human-translation mean, the blue dot is the machine mean; the positive gap between them is the human lead.

MT identification

Readers cannot reliably detect MT

AI detection accuracy stayed low for both excerpt-level single reading and comparison stages.

Highlights

7,234 reader highlights

Reading closely, readers marked spans as especially good or poor. The human translation received significantly more good highlights than the machine translation. In the visualization below each tile is about 20 highlights.

HT Human translation

MT Machine translation

Marked good Marked poor 1 tile ≈ 20 highlights

Comments

All 952 reader comments

All free-text comments collected from readers during the experiments. Readers were asked to comment on the good and bad qualities of both translations, explain why they prefer one excerpt over the other, why they think a translation is AI, and why one chunk-level translation is better than the other.

Fifteen Novels

Ratings by book

We evaluate five novels per source language, each being rated by two readers. Select the card to see details.

Case Study · Beyond English

Does it hold in other languages?

The Dataset

Café au LAIT

We present the LAIT dataset, which contains readers' perspectives on the quality of human and machine translation of literary texts.