SCRATCHPADS-Experiment

Word Count Range Compliance (gpt-oss-120b) 2026-02-25
Hypothesis

When given a word count range ('between X and Y words'), LLMs produce output within the range more reliably than with exact or approximate point targets, and output clusters near the midpoint.

Test

Third phrasing variant after the exact/approximate experiment. Tests whether explicit slack improves compliance.

81 completions: 9 range configurations × 3 topics × 3 runs.

Ranges: 3 positions (short ~250w, medium ~750w, long ~2000w) × 3 widths (tight ~50w, medium ~150w, wide ~400-500w)

Topics: factual, creative, argumentative

Prompt: "Write between {lower} and {upper} words about: {topic}"

  • Model: gpt-oss-120b via Cerebras API (free tier)
  • Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000
Result

REJECTED

Range instructions do not improve compliance compared to point targets. Overall in-range rate: 37.0% (30/81).

By position (the dominant factor):

Position In-range Rate Mean abs deviation
Short (~250w) 26/27 96.3% 11.0%
Medium (~750w) 4/27 14.8% 45.4%
Long (~2000w) 0/27 0.0% 37.6%

By width (minimal impact):

Width In-range rate
Tight (~50w) 40.7%
Medium (~150w) 37.0%
Wide (~400-500w) 33.3%

Wider ranges don't help — the model overshoots so dramatically at medium/long positions that even a 500-word range can't contain the excess. Of 51 out-of-range completions, 47 (92%) were over the upper bound.

Compared to point targets:

Target "Exactly" dev "Approx" dev Range (tight) dev
~250w 0.07% 5.2% 3.78%
~750w 0.51% 63.8% 24.6%
~2000w 7.49% 42.5% 14.7%

"Exactly" phrasing outperforms range phrasing at every position. Range phrasing is slightly better than "approximately" at medium/long targets but fundamentally can't overcome the model's tendency to produce ~1000-1500 words regardless of instruction.

Next
  1. Test range instructions on other models to see if the overshoot pattern is universal
  2. Try combining range with "exactly" framing ("Write exactly between X and Y words")
  3. Test whether iterative prompting ("you wrote N words, that's over the limit, please shorten") improves compliance