Character Count Range Compliance (gpt-oss-120b)

length-compliancecharacter-countrange-instructionscerebras

◇ Hypothesis

Character count ranges ('between X and Y characters') produce more reliable compliance than exact or approximate point targets, particularly at 5000 characters where exact targeting broke down catastrophically.

◇ Test

Follow-up to the character count compliance experiment, where "exactly 5000 characters" produced 625% mean deviation with outputs ranging from 0 to 211K characters. Tests whether range instructions prevent this catastrophic failure.

108 completions: 12 range configurations × 3 topics × 3 runs.

Ranges: 4 positions (SMS ~160, tweet ~280, medium ~1500, long ~5000) × 3 widths (tight, medium, wide)

Prompt: "Write between {lower} and {upper} characters about: {topic}"

Model: gpt-oss-120b via Cerebras API (free tier)
Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000

◇ Result

REJECTED

Ranges do not produce more reliable in-range compliance than exact point targets at the scales where it matters.

By position:

Position	In-range	Rate	Mean abs deviation
SMS (~160)	27/27	100.0%	7.94%
Tweet (~280)	27/27	100.0%	7.93%
Medium (~1500)	8/27	29.6%	15.9%
Long (~5000)	0/27	0.0%	59.6%

At small targets, 100% in-range — but exact phrasing already achieved 0.04-0.07% deviation at these scales, far more precise. At 5000 characters, range compliance is 0% with 59.6% mean deviation, comparable to "approximately" phrasing (~59%).

The key improvement over exact phrasing at 5000 chars: zero finish_reason=length events (vs 3/15 with exact phrasing), no zero-length outputs, no 200K+ runaway outputs. The longest was 10,372 characters. Range instructions eliminate the catastrophic tail behavior but don't achieve compliance.

Width had minimal impact: tight 55.6%, medium 55.6%, wide 61.1%.

Systematic overshoot: 45 completions over the upper bound vs 1 under. The model treats the lower bound as more salient than the upper.

◇ Next

Compare against word count range compliance to see if character-based and word-based ranges show the same position-dependent pattern
Test whether combining range with a hard stop instruction ("stop immediately at Y characters") improves upper-bound compliance

SCRATCHPADS-Experiment