OpenAI has just launched something huge, and it’s free, open, and incredibly powerful. Say hello to gpt-oss-120b and gpt-oss-20b, two open-weight models that bring near-state-of-the-art performance into your hands, all under the highly permissive Apache 2.0 license. Whether you’re a solo developer looking to build something amazing or a large enterprise needing full control over your AI stack, this release is for you.
Smarter Models, Open Weights
These aren’t just any open models. gpt-oss-120b achieves near-parity with OpenAI’s o4-mini in reasoning benchmarks, while running efficiently on an 80 GB GPU. Meanwhile, gpt–oss-20b hits benchmarks similar to o3-Mini and runs on just 16 GB of memory, making it ideal for on-device use or cost-sensitive environments. They’re trained with reinforcement learning techniques informed by OpenAI’s latest proprietary systems, including o3 and other frontier models.
Built for Developers: Tool Use, CoT, Structured Output
Both models shine in agentic workflows: they support full Chain-of-Thought (CoT) reasoning, few-shot function calling, and structured outputs. You can easily adjust the reasoning effort (low, medium, high) to trade off performance and latency depending on your task. Web browsing, Python code execution, and tool chaining are all built in.
Under the Hood: Sparse and Efficient
Here’s what makes these models tick:
Model | Layers | Total Params | Active Params/Token | Experts/Layer | Active Experts | Context Length |
---|---|---|---|---|---|---|
gpt-oss-120b | 36 | 117B | 5.1B | 128 | 4 | 128k |
gpt-oss-20b | 24 | 21B | 3.6B | 32 | 4 | 128k |
They use Mixture-of-Experts (MoE), Rotary Positional Embeddings (RoPE), and Grouped Multi-Query Attention for memory efficiency and scaling. Tokenization is done via a new open-source tokenizer: o200k_harmony.
Training & Post-Training Like the Pros
Just like OpenAI’s best internal models, these were trained on a curated, mostly English text dataset rich in STEM, code, and general knowledge. Post-training included supervised fine-tuning and high-compute reinforcement learning, aligning them with the OpenAI Model Spec. That means these models don’t just sound smart—they think through problems before answering.
Real-World Performance: Benchmarks That Matter
Let’s talk numbers. Across coding, health, math, and reasoning tasks:
- gpt-oss-120b outperforms o3-mini, and often matches or beats o4-mini.
- gpt-oss-20b, despite its size, competes toe-to-toe with o3-mini and excels in math and health use cases.
This includes tests like MMLU, AIME, HealthBench, and Codeforces—and yes, they perform well with and without tools.
Safety First—Even for Open Models
Safety isn’t an afterthought. These models were built with:
- CBRN-filtered pretraining
- Deliberative alignment
- Instructional refusal tuning
Plus, OpenAI went further by adversarially fine-tuning the models to simulate real-world misuse in domains like cybersecurity and bioengineering. The findings? Even with aggressive fine-tuning, the models remained within acceptable safety margins according to the Preparedness Framework.
Red Teaming Challenge: $500K Up for Grabs
To crowdsource safety research, OpenAI is launching a Red Teaming Challenge with a whopping $500,000 prize pool. If you can discover vulnerabilities or novel safety issues, you can contribute to safer AI and get rewarded for it.
Deployment-Ready, Anywhere
These models are designed to run anywhere:
- Locally on your laptop or GPU
- In the cloud via Hugging Face, Azure, AWS, Cloudflare, and more
- On Windows via ONNX Runtime and VS Code
Reference implementations are available in PyTorch, Rust, and Apple Metal. You can also use the provided Harmony renderer to adapt to the model’s prompt format.
Why This Matters: Lowering Barriers, Driving Innovation
This release democratizes access to top-tier AI. It brings enterprise-grade reasoning and tool use to developers in underserved markets, research institutions, and startups. It’s a step toward a more open, equitable, and safe AI future.
Whether you’re fine-tuning for local use, deploying for a customer, or just tinkering with edge AI, GPT-OSS puts powerful tools in your hands—without vendor lock-in.
Ollama Partners with OpenAI to Launch gpt-oss 20B & 120B Models
In a groundbreaking collaboration, Ollama and OpenAI have joined forces to bring state-of-the-art open-weight models directly to your local machine. Say hello to gpt-oss-20b and gpt-oss-120b designed for blazing-fast, local AI experiences without compromising on reasoning power, developer flexibility, or agentic capabilities.
Whether you’re a developer building intelligent agents, a researcher exploring complex reasoning tasks, or a business looking for secure on-prem AI, this update is a game-changer.
Feature Highlights
Agentic Capabilities Out of the Box
Both models come pre-equipped for advanced function-calling, structured outputs, Python tool use, and even optional web browsing through Ollama’s built-in web search. This means you can augment the models with real-time information for highly dynamic tasks.
Transparent Chain-of-Thought Reasoning
You get full visibility into the models’ reasoning process. This not only boosts debuggability and trust but also enables deeper insights for researchers and developers alike.
Adjustable Reasoning Effort
Optimize for latency or depth: you can configure the reasoning effort (low, medium, high) depending on your use case. Whether you’re building a lightweight assistant or a deep-thinking agent, you’re in control.
Fine-Tuning Flexibility
These models are fine-tunable, allowing you to tailor them for domain-specific tasks with ease, from customer support bots to scientific research tools.
Apache 2.0 License Freedom
Built under a permissive Apache 2.0 license, you’re free to experiment, modify, and commercially deploy without copyleft concerns or patent issues. Innovation, unchained.
Performance Meets Portability with MXFP4 Quantization
To make local inference possible without heavy hardware, OpenAI and Ollama employ MXFP4 quantization — a 4.25-bit format for MoE (Mixture of Experts) weights that dominate the parameter count (~90%).
-
gpt-oss-20B can run on machines with just 16GB of RAM.
-
gpt-oss-120B fits into a single 80GB NVIDIA GPU.
This is made possible by:
-
Native MXFP4 format support in Ollama (no extra conversions needed)
-
Custom kernels developed for Ollama’s engine
-
Benchmarked parity with OpenAI’s reference implementations
Two Models, Endless Possibilities
gpt-oss:20b
-
20 billion parameters
-
Low-latency, high-efficiency
-
Perfect for local assistants, embedded AI, or specialized tasks
gpt-oss:120b
-
120 billion parameters
-
Designed for general-purpose reasoning and agentic intelligence
-
Ideal for production-grade apps, AI copilots, and decision support systems
Boosted by NVIDIA RTX: Local AI at Its Best
Ollama’s deepening partnership with NVIDIA ensures that both models are optimized for GeForce RTX and RTX PRO GPUs. This unlocks top-tier performance on your local RTX-powered PC — no cloud dependency, just raw AI horsepower at your fingertips.
How to Get Started
Download the latest version of Ollama and start running the models right from your terminal or new app interface:
Tip: Use
:20b
for fast, responsive tasks and:120b
for advanced reasoning and complex workflows.
Introducing OpenAI GPT-OSS Models on Amazon Bedrock and SageMaker JumpStart
AWS is thrilled to introduce two powerful OpenAI models with open weights gpt-oss-120b and gpt-oss-20b, now available on Amazon Bedrock and Amazon SageMaker JumpStart. These models are not just powerful, they’re practical. They’re designed to empower developers, startups, and enterprises to build AI-native applications with greater transparency, fine-tuning capability, and full infrastructure control.
What’s New?
Both models are optimized for:
-
Natural language generation
-
Scientific and mathematical reasoning
-
Coding tasks
-
Tool-augmented and agentic workflows
They come with a 128K context window, adjustable reasoning levels (low, medium, high), and external tool support. You can plug them into frameworks like Strands Agents, making them ideal for building autonomous, chain-of-thought agents.
Access the Models Where You Build
1. Amazon Bedrock — Serverless Simplicity
In Amazon Bedrock, you can:
-
Request access to
gpt-oss-120b
andgpt-oss-20b
via the Model Access section -
Use the Chat/Test playground to prototype and explore
-
Deploy with OpenAI SDK compatibility using a Bedrock endpoint
-
Invoke the model with the OpenAI Python SDK like this:
Agent Support Example with Strands:
Once satisfied, you can deploy your agent using Amazon Bedrock AgentCore for full runtime, memory, and identity management.
2. Amazon SageMaker JumpStart — ML Flexibility
With SageMaker JumpStart, you can:
-
Select and deploy either model in just a few clicks
-
Choose your instance type and scale based on your workload
-
Fine-tune the models for specific use cases
-
Run inference directly in SageMaker Studio or using the AWS SDK
JumpStart provides full control over your deployment environment and model lifecycle, making it ideal for production-scale AI applications and experimentation.
Why This Matters: Open Weight, Open Innovation
Here’s why open-weight models on AWS are a game-changer:
-
Transparency: See the chain-of-thought output to understand how the model reasons
-
Customization: Fine-tune and adapt the models to your domain or data
-
Security First: Built-in safety mechanisms and compatibility with AWS security best practices
-
Open Ecosystem: Leverage the OpenAI API or Bedrock-native tools for deployment, orchestration, and monitoring
Region Availability
Platform | Available Regions |
---|---|
Amazon Bedrock | US West (Oregon) |
SageMaker JumpStart | US East (N. Virginia, Ohio), APAC (Mumbai, Tokyo) |