Multiplication Accuracy on Negative Integers (gpt-oss-120b)

arithmeticmultiplicationnegative-numbersaccuracyreasoning-modelcerebras

◇ Hypothesis

LLMs can reliably perform simple multiplication involving negative integers below 100 in absolute value.

◇ Test

Follow-up to the positive multiplication scratchpad. Negative multiplication combines sign handling with magnitude computation — the model needs to apply sign rules (neg × pos = neg, neg × neg = pos, anything × 0 = 0) on top of the multiplication itself.

100 multiplication problems, stratified into 3 cases:

One negative (40 pairs): one operand in [-99, -1], the other in [1, 99] — result is negative
Both negative (40 pairs): both operands in [-99, -1] — result is positive
Zero result (20 pairs): one operand is zero

Product magnitudes ranged from 0 to 9,120.

Model: gpt-oss-120b via Cerebras API (free tier)
Temperature: 0, top_p: 1, max_completion_tokens: 1024
Prompt: "What is {a} * {b}? Reply with only the number."
1 run per pair, 2-second delay between calls
1 call hit a 429 rate limit and was retried successfully

◇ Result

CONFIRMED

100% accuracy across all 100 problems. No sign errors, no magnitude errors, no parse failures.

Case	Total	Correct	Accuracy
One negative (result negative)	40	40	100.0%
Both negative (result positive)	40	40	100.0%
Zero result	20	20	100.0%

Sign rules were applied correctly in every case. Combined with the three prior experiments (addition positive, addition negative, multiplication positive), gpt-oss-120b is now 400/400 on basic two-operand arithmetic with integers below 100 in absolute value.

◇ Next

Scale operands to 3-digit and 4-digit ranges to find accuracy limits
Test division, which introduces remainders and potential floating-point ambiguity
Run the full suite on a non-reasoning model to isolate the chain-of-thought effect

SCRATCHPADS-Experiment