SCRATCHPADS-Experiment

Default Generation Length Without Constraints (gpt-oss-120b) 2026-02-24
Hypothesis

Without a length constraint, LLM outputs cluster around a 'natural' default length that varies by topic type and prompt framing.

Test

Establishes the model's unconstrained baseline output length, which the prior word count and character count experiments can be compared against.

60 completions: 3 topics × 2 framings × 10 runs.

Topics: factual (solar panels), creative (lighthouse keeper story), argumentative (remote work)

Framings:

  • Bare: "Write about the following topic: {topic}"
  • Direct: "{topic}" (topic text only, no wrapper)

No word count, character count, or length instructions of any kind.

  • Model: gpt-oss-120b via Cerebras API (free tier)
  • Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000
Result

CONFIRMED

Outputs cluster around topic-dependent default lengths. Topic matters; framing doesn't.

By topic (n=20 each):

Topic Mean words Mean chars
Factual 1586 10,252
Creative 1399 7,881
Argumentative 1015 7,309

Factual prompts produce 56% more words than argumentative. The ordering is consistent across both framings.

By framing (n=30 each):

Framing Mean words
Bare ("Write about...") 1348
Direct (topic only) 1319

A 2.1% difference, well within the per-group standard deviations. The framing wrapper adds no meaningful length.

Overall: median 1359 words, range 770–2061. All 60 completions finished naturally (finish_reason=stop). Creative writing had the highest variability (SD=354 words with bare framing), argumentative the most consistent (SD=109 with direct framing).

Characters per word varied by topic: argumentative used longer words (7.20 chars/word) vs creative (5.64), reflecting formal vs conversational vocabulary.

Next
  1. Compare these baselines against the constrained experiments — the unconstrained argumentative mean (1015 words) is close to the "exactly 1000 words" constrained result (~988 words)
  2. Test whether system prompts or role instructions shift the default length
  3. Measure default length on other models for cross-model comparison