Prepending random passphrases (random words, random numbers, random alphanumeric strings) before a creative prompt increases output diversity compared to baseline.
The generation parameters experiment showed temperature is the dominant diversity knob, but even at temp=1.0 cosine similarity averages 0.65. Tests whether random noise in the prompt can break these patterns.
100 completions: 4 configurations × 25 runs.
Prompt: "Write a short story about a traveler arriving in a strange city."
Configurations:
- Control: no passphrase
- Random words: 5 random English nouns prepended
- Random numbers: 5 random integers (1-9999) prepended
- Random alphanumeric: 5 random tokens (4-8 chars) prepended
Passphrase format: "{passphrase}\n\n{prompt}" — no explanation, just raw tokens. Fresh passphrase per run.
- Model: gpt-oss-120b via Cerebras API (free tier)
- Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000
- Metrics include MATTR-50 (length-corrected lexical diversity) to control for output length differences
INCONCLUSIVE
Metrics conflict with each other, making a clear verdict impossible.
Pro-diversity signals: all passphrase types reduce bigram overlap (Jaccard drops from 0.056 to 0.047-0.051). Random words reduce cosine similarity from 0.667 to 0.605 (large effect, d=-0.82).
Anti-diversity signals: all passphrase types reduce token entropy (from 0.940 to 0.846-0.862, very large effects d=-1.53 to -1.95). The model becomes more predictable in its token selection with a passphrase present.
Length confound: passphrase configs produce 31-39% fewer words (788-893 vs 1297 baseline). The large unique word ratio improvements (+1.25 to +1.97 d) collapse to negligible when measured by MATTR-50 (length-corrected), confirming they were a length artifact.
| Config | Cosine sim | Bigram Jaccard | Token entropy | MATTR-50 |
|---|---|---|---|---|
| Control | 0.667 | 0.056 | 0.940 | 0.819 |
| Random words | 0.605 | 0.047 | 0.862 | 0.818 |
| Random numbers | 0.669 | 0.051 | 0.853 | 0.821 |
| Random alphanum | 0.645 | 0.050 | 0.846 | 0.826 |
The consistent entropy reduction is the strongest finding. It suggests the model becomes more conservative in token selection when a passphrase is present, possibly because the reasoning model processes the passphrase during chain-of-thought, consuming capacity that would otherwise go to exploratory token selection.
- Test themed passphrases (semantically coherent words) to see if meaningful prefixes produce different effects than noise
- Investigate the entropy reduction further — could be useful for output stabilization in code generation
- Test at different temperatures to see if the passphrase effect interacts with the temperature effect