This Markov chain text generator offers three modes:
Words and Text modes handle punctuation intelligently. Paired punctuation (quotes, brackets) is removed to avoid pairing errors, while unpaired punctuation (periods, commas, dashes) is properly separated and formatted. GPT mode uses the tokenizer's built-in punctuation handling.
Differences from vanilla Markov chains:
Instead of a fixed n-gram size, this implementation generates all possible prefix-suffix pairs from 1 to N tokens and randomly selects a prefix length weighted by frequency at each generation step.
This adaptive approach better captures patterns at multiple scales.