Character count ranges ('between X and Y characters') produce more reliable compliance than exact or approximate point targets, particularly at 5000 characters where exact targeting broke down catastrophically.
Follow-up to the character count compliance experiment, where "exactly 5000 characters" produced 625% mean deviation with outputs ranging from 0 to 211K characters. Tests whether range instructions prevent this catastrophic failure.
108 completions: 12 range configurations × 3 topics × 3 runs.
Ranges: 4 positions (SMS ~160, tweet ~280, medium ~1500, long ~5000) × 3 widths (tight, medium, wide)
Prompt: "Write between {lower} and {upper} characters about: {topic}"
- Model: gpt-oss-120b via Cerebras API (free tier)
- Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000
REJECTED
Ranges do not produce more reliable in-range compliance than exact point targets at the scales where it matters.
By position:
| Position | In-range | Rate | Mean abs deviation |
|---|---|---|---|
| SMS (~160) | 27/27 | 100.0% | 7.94% |
| Tweet (~280) | 27/27 | 100.0% | 7.93% |
| Medium (~1500) | 8/27 | 29.6% | 15.9% |
| Long (~5000) | 0/27 | 0.0% | 59.6% |
At small targets, 100% in-range — but exact phrasing already achieved 0.04-0.07% deviation at these scales, far more precise. At 5000 characters, range compliance is 0% with 59.6% mean deviation, comparable to "approximately" phrasing (~59%).
The key improvement over exact phrasing at 5000 chars: zero finish_reason=length events (vs 3/15 with exact phrasing), no zero-length outputs, no 200K+ runaway outputs. The longest was 10,372 characters. Range instructions eliminate the catastrophic tail behavior but don't achieve compliance.
Width had minimal impact: tight 55.6%, medium 55.6%, wide 61.1%.
Systematic overshoot: 45 completions over the upper bound vs 1 under. The model treats the lower bound as more salient than the upper.
- Compare against word count range compliance to see if character-based and word-based ranges show the same position-dependent pattern
- Test whether combining range with a hard stop instruction ("stop immediately at Y characters") improves upper-bound compliance