Two-Candidate Next-Word/Token Probability (Robust)
Compare P(A|context) vs P(B|context) from a pretrained causal LM (no fine-tuning).
- Proper tie handling and numerical guards.
- Optional length normalization (per-token).
- Use Swap to sanity-check symmetry.