When given a word count range ('between X and Y words'), LLMs produce output within the range more reliably than with exact or approximate point targets, and output clusters near the midpoint.
Third phrasing variant after the exact/approximate experiment. Tests whether explicit slack improves compliance.
81 completions: 9 range configurations × 3 topics × 3 runs.
Ranges: 3 positions (short ~250w, medium ~750w, long ~2000w) × 3 widths (tight ~50w, medium ~150w, wide ~400-500w)
Topics: factual, creative, argumentative
Prompt: "Write between {lower} and {upper} words about: {topic}"
- Model: gpt-oss-120b via Cerebras API (free tier)
- Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000
REJECTED
Range instructions do not improve compliance compared to point targets. Overall in-range rate: 37.0% (30/81).
By position (the dominant factor):
| Position | In-range | Rate | Mean abs deviation |
|---|---|---|---|
| Short (~250w) | 26/27 | 96.3% | 11.0% |
| Medium (~750w) | 4/27 | 14.8% | 45.4% |
| Long (~2000w) | 0/27 | 0.0% | 37.6% |
By width (minimal impact):
| Width | In-range rate |
|---|---|
| Tight (~50w) | 40.7% |
| Medium (~150w) | 37.0% |
| Wide (~400-500w) | 33.3% |
Wider ranges don't help — the model overshoots so dramatically at medium/long positions that even a 500-word range can't contain the excess. Of 51 out-of-range completions, 47 (92%) were over the upper bound.
Compared to point targets:
| Target | "Exactly" dev | "Approx" dev | Range (tight) dev |
|---|---|---|---|
| ~250w | 0.07% | 5.2% | 3.78% |
| ~750w | 0.51% | 63.8% | 24.6% |
| ~2000w | 7.49% | 42.5% | 14.7% |
"Exactly" phrasing outperforms range phrasing at every position. Range phrasing is slightly better than "approximately" at medium/long targets but fundamentally can't overcome the model's tendency to produce ~1000-1500 words regardless of instruction.
- Test range instructions on other models to see if the overshoot pattern is universal
- Try combining range with "exactly" framing ("Write exactly between X and Y words")
- Test whether iterative prompting ("you wrote N words, that's over the limit, please shorten") improves compliance