Default Generation Length Without Constraints (gpt-oss-120b)

lengthgenerationbaselinecerebras

◇ Hypothesis

Without a length constraint, LLM outputs cluster around a 'natural' default length that varies by topic type and prompt framing.

◇ Test

Establishes the model's unconstrained baseline output length, which the prior word count and character count experiments can be compared against.

60 completions: 3 topics × 2 framings × 10 runs.

Topics: factual (solar panels), creative (lighthouse keeper story), argumentative (remote work)

Framings:

Bare: "Write about the following topic: {topic}"
Direct: "{topic}" (topic text only, no wrapper)

No word count, character count, or length instructions of any kind.

Model: gpt-oss-120b via Cerebras API (free tier)
Temperature: 1.0, top_p: 0.95, max_completion_tokens: 65000

◇ Result

CONFIRMED

Outputs cluster around topic-dependent default lengths. Topic matters; framing doesn't.

By topic (n=20 each):

Topic	Mean words	Mean chars
Factual	1586	10,252
Creative	1399	7,881
Argumentative	1015	7,309

Factual prompts produce 56% more words than argumentative. The ordering is consistent across both framings.

By framing (n=30 each):

Framing	Mean words
Bare ("Write about...")	1348
Direct (topic only)	1319

A 2.1% difference, well within the per-group standard deviations. The framing wrapper adds no meaningful length.

Overall: median 1359 words, range 770–2061. All 60 completions finished naturally (finish_reason=stop). Creative writing had the highest variability (SD=354 words with bare framing), argumentative the most consistent (SD=109 with direct framing).

Characters per word varied by topic: argumentative used longer words (7.20 chars/word) vs creative (5.64), reflecting formal vs conversational vocabulary.

◇ Next

Compare these baselines against the constrained experiments — the unconstrained argumentative mean (1015 words) is close to the "exactly 1000 words" constrained result (~988 words)
Test whether system prompts or role instructions shift the default length
Measure default length on other models for cross-model comparison

SCRATCHPADS-Experiment