LLMs can reliably perform simple multiplication involving negative integers below 100 in absolute value.
Follow-up to the positive multiplication scratchpad. Negative multiplication combines sign handling with magnitude computation — the model needs to apply sign rules (neg × pos = neg, neg × neg = pos, anything × 0 = 0) on top of the multiplication itself.
100 multiplication problems, stratified into 3 cases:
- One negative (40 pairs): one operand in [-99, -1], the other in [1, 99] — result is negative
- Both negative (40 pairs): both operands in [-99, -1] — result is positive
- Zero result (20 pairs): one operand is zero
Product magnitudes ranged from 0 to 9,120.
- Model: gpt-oss-120b via Cerebras API (free tier)
- Temperature: 0, top_p: 1, max_completion_tokens: 1024
- Prompt: "What is {a} * {b}? Reply with only the number."
- 1 run per pair, 2-second delay between calls
- 1 call hit a 429 rate limit and was retried successfully
CONFIRMED
100% accuracy across all 100 problems. No sign errors, no magnitude errors, no parse failures.
| Case | Total | Correct | Accuracy |
|---|---|---|---|
| One negative (result negative) | 40 | 40 | 100.0% |
| Both negative (result positive) | 40 | 40 | 100.0% |
| Zero result | 20 | 20 | 100.0% |
Sign rules were applied correctly in every case. Combined with the three prior experiments (addition positive, addition negative, multiplication positive), gpt-oss-120b is now 400/400 on basic two-operand arithmetic with integers below 100 in absolute value.
- Scale operands to 3-digit and 4-digit ranges to find accuracy limits
- Test division, which introduces remainders and potential floating-point ambiguity
- Run the full suite on a non-reasoning model to isolate the chain-of-thought effect