In harmony with gpt-oss

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the absence of independent replication of GPT-OSS-20B’s official performance on tool-calling tasks, as its original paper did not disclose the specific tools or agent framework used. By reverse-engineering the model’s training distribution, we identify the embedded tools and develop HarmonyAgent—a lightweight agent architecture aligned with the model’s native message format—thereby avoiding lossy transformations through standard Chat Completions APIs. For the first time, we demonstrate GPT-OSS-20B’s high-confidence tool invocation capability on in-distribution tools without explicit tool-definition prompts. Our evaluation successfully reproduces OpenAI’s reported results: 60.4% on SWE Verified HIGH (original: 60.7%), 53.3% on MEDIUM (original: 53.2%), and 91.7% on AIME25 with tools (original: 90.4%).
📝 Abstract
No one has independently reproduced OpenAI's published scores for gpt-oss-20b with tools, because the original paper discloses neither the tools nor the agent harness. We reverse-engineered the model's in-distribution tools: when prompted without tool definitions, gpt-oss still calls tools from its training distribution with high statistical confidence -- a strong prior, not a hallucination. We then built a native harmony agent harness (https://github.com/borislavmavrin/harmonyagent.git) that encodes messages in the model's native format, bypassing the lossy Chat Completions conversion. Together, these yield the first independent reproduction of OpenAI's published scores: 60.4% on SWE Verified HIGH (published 60.7%), 53.3% MEDIUM (53.2%), and 91.7% on AIME25 with tools (90.4%).
Problem

Research questions and friction points this paper is trying to address.

reproduction
gpt-oss
tool use
agent harness
benchmark scores
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-augmented reasoning
model reverse engineering
native agent harness
independent reproduction
GPT-OSS