SCRATCHPADS-Experiment

Multiplication Accuracy on Negative Integers (gpt-oss-120b) 2026-02-21
Hypothesis

LLMs can reliably perform simple multiplication involving negative integers below 100 in absolute value.

Test

Follow-up to the positive multiplication scratchpad. Negative multiplication combines sign handling with magnitude computation — the model needs to apply sign rules (neg × pos = neg, neg × neg = pos, anything × 0 = 0) on top of the multiplication itself.

100 multiplication problems, stratified into 3 cases:

  • One negative (40 pairs): one operand in [-99, -1], the other in [1, 99] — result is negative
  • Both negative (40 pairs): both operands in [-99, -1] — result is positive
  • Zero result (20 pairs): one operand is zero

Product magnitudes ranged from 0 to 9,120.

  • Model: gpt-oss-120b via Cerebras API (free tier)
  • Temperature: 0, top_p: 1, max_completion_tokens: 1024
  • Prompt: "What is {a} * {b}? Reply with only the number."
  • 1 run per pair, 2-second delay between calls
  • 1 call hit a 429 rate limit and was retried successfully
Result

CONFIRMED

100% accuracy across all 100 problems. No sign errors, no magnitude errors, no parse failures.

Case Total Correct Accuracy
One negative (result negative) 40 40 100.0%
Both negative (result positive) 40 40 100.0%
Zero result 20 20 100.0%

Sign rules were applied correctly in every case. Combined with the three prior experiments (addition positive, addition negative, multiplication positive), gpt-oss-120b is now 400/400 on basic two-operand arithmetic with integers below 100 in absolute value.

Next
  1. Scale operands to 3-digit and 4-digit ranges to find accuracy limits
  2. Test division, which introduces remainders and potential floating-point ambiguity
  3. Run the full suite on a non-reasoning model to isolate the chain-of-thought effect