Two-Candidate Next-Word/Token Probability (Robust)

Compare P(A|context) vs P(B|context) from a pretrained causal LM (no fine-tuning).

Context (prompt)

Candidate A (follow-up)

Candidate B (follow-up)

Assume leading space before candidates (useful for GPT-2 tokenization)

Show top-k alternatives (per token step)

0 20

Use length normalization (average log-prob per token)

Candidate A — step-by-step

Candidate B — step-by-step