open-ai-gpt-bedrock-ollama

OpenAI GPT is now Open Source, also Available on Ollama and Amazon Bedrock

Share

OpenAI has just launched something huge, and it’s free, open, and incredibly powerful. Say hello to gpt-oss-120b and gpt-oss-20b, two open-weight models that bring near-state-of-the-art performance into your hands, all under the highly permissive Apache 2.0 license. Whether you’re a solo developer looking to build something amazing or a large enterprise needing full control over your AI stack, this release is for you.

Smarter Models, Open Weights

These aren’t just any open models. gpt-oss-120b achieves near-parity with OpenAI’s o4-mini in reasoning benchmarks, while running efficiently on an 80 GB GPU. Meanwhile, gptoss-20b hits benchmarks similar to o3-Mini and runs on just 16 GB of memory, making it ideal for on-device use or cost-sensitive environments. They’re trained with reinforcement learning techniques informed by OpenAI’s latest proprietary systems, including o3 and other frontier models.

Built for Developers: Tool Use, CoT, Structured Output

Both models shine in agentic workflows: they support full Chain-of-Thought (CoT) reasoning, few-shot function calling, and structured outputs. You can easily adjust the reasoning effort (low, medium, high) to trade off performance and latency depending on your task. Web browsing, Python code execution, and tool chaining are all built in.

Under the Hood: Sparse and Efficient

Here’s what makes these models tick:

Model Layers Total Params Active Params/Token Experts/Layer Active Experts Context Length
gpt-oss-120b 36 117B 5.1B 128 4 128k
gpt-oss-20b 24 21B 3.6B 32 4 128k

They use Mixture-of-Experts (MoE), Rotary Positional Embeddings (RoPE), and Grouped Multi-Query Attention for memory efficiency and scaling. Tokenization is done via a new open-source tokenizer: o200k_harmony.

Training & Post-Training Like the Pros

Just like OpenAI’s best internal models, these were trained on a curated, mostly English text dataset rich in STEM, code, and general knowledge. Post-training included supervised fine-tuning and high-compute reinforcement learning, aligning them with the OpenAI Model Spec. That means these models don’t just sound smart—they think through problems before answering.

Real-World Performance: Benchmarks That Matter

Let’s talk numbers. Across coding, health, math, and reasoning tasks:

  • gpt-oss-120b outperforms o3-mini, and often matches or beats o4-mini.
  • gpt-oss-20b, despite its size, competes toe-to-toe with o3-mini and excels in math and health use cases.

This includes tests like MMLU, AIME, HealthBench, and Codeforces—and yes, they perform well with and without tools.

Safety First—Even for Open Models

Safety isn’t an afterthought. These models were built with:

  • CBRN-filtered pretraining
  • Deliberative alignment
  • Instructional refusal tuning

Plus, OpenAI went further by adversarially fine-tuning the models to simulate real-world misuse in domains like cybersecurity and bioengineering. The findings? Even with aggressive fine-tuning, the models remained within acceptable safety margins according to the Preparedness Framework.

Red Teaming Challenge: $500K Up for Grabs

To crowdsource safety research, OpenAI is launching a Red Teaming Challenge with a whopping $500,000 prize pool. If you can discover vulnerabilities or novel safety issues, you can contribute to safer AI and get rewarded for it.

Deployment-Ready, Anywhere

These models are designed to run anywhere:

  • Locally on your laptop or GPU
  • In the cloud via Hugging Face, Azure, AWS, Cloudflare, and more
  • On Windows via ONNX Runtime and VS Code

Reference implementations are available in PyTorch, Rust, and Apple Metal. You can also use the provided Harmony renderer to adapt to the model’s prompt format.

Why This Matters: Lowering Barriers, Driving Innovation

This release democratizes access to top-tier AI. It brings enterprise-grade reasoning and tool use to developers in underserved markets, research institutions, and startups. It’s a step toward a more open, equitable, and safe AI future.

Whether you’re fine-tuning for local use, deploying for a customer, or just tinkering with edge AI, GPT-OSS puts powerful tools in your hands—without vendor lock-in.

 

Ollama Partners with OpenAI to Launch gpt-oss 20B & 120B Models

In a groundbreaking collaboration, Ollama and OpenAI have joined forces to bring state-of-the-art open-weight models directly to your local machine. Say hello to gpt-oss-20b and gpt-oss-120b designed for blazing-fast, local AI experiences without compromising on reasoning power, developer flexibility, or agentic capabilities.

Whether you’re a developer building intelligent agents, a researcher exploring complex reasoning tasks, or a business looking for secure on-prem AI, this update is a game-changer.

Feature Highlights

Agentic Capabilities Out of the Box

Both models come pre-equipped for advanced function-calling, structured outputs, Python tool use, and even optional web browsing through Ollama’s built-in web search. This means you can augment the models with real-time information for highly dynamic tasks.

Transparent Chain-of-Thought Reasoning

You get full visibility into the models’ reasoning process. This not only boosts debuggability and trust but also enables deeper insights for researchers and developers alike.

Adjustable Reasoning Effort

Optimize for latency or depth: you can configure the reasoning effort (low, medium, high) depending on your use case. Whether you’re building a lightweight assistant or a deep-thinking agent, you’re in control.

Fine-Tuning Flexibility

These models are fine-tunable, allowing you to tailor them for domain-specific tasks with ease, from customer support bots to scientific research tools.

Apache 2.0 License Freedom

Built under a permissive Apache 2.0 license, you’re free to experiment, modify, and commercially deploy without copyleft concerns or patent issues. Innovation, unchained.

Performance Meets Portability with MXFP4 Quantization

To make local inference possible without heavy hardware, OpenAI and Ollama employ MXFP4 quantization — a 4.25-bit format for MoE (Mixture of Experts) weights that dominate the parameter count (~90%).

  • gpt-oss-20B can run on machines with just 16GB of RAM.

  • gpt-oss-120B fits into a single 80GB NVIDIA GPU.

This is made possible by:

  • Native MXFP4 format support in Ollama (no extra conversions needed)

  • Custom kernels developed for Ollama’s engine

  • Benchmarked parity with OpenAI’s reference implementations

Two Models, Endless Possibilities

gpt-oss:20b

  • 20 billion parameters

  • Low-latency, high-efficiency

  • Perfect for local assistants, embedded AI, or specialized tasks

gpt-oss:120b

  • 120 billion parameters

  • Designed for general-purpose reasoning and agentic intelligence

  • Ideal for production-grade apps, AI copilots, and decision support systems

Boosted by NVIDIA RTX: Local AI at Its Best

Ollama’s deepening partnership with NVIDIA ensures that both models are optimized for GeForce RTX and RTX PRO GPUs. This unlocks top-tier performance on your local RTX-powered PC — no cloud dependency, just raw AI horsepower at your fingertips.

How to Get Started

Download the latest version of Ollama and start running the models right from your terminal or new app interface:

ollama run gpt-oss:20b
ollama run gpt-oss:120b

Tip: Use :20b for fast, responsive tasks and :120b for advanced reasoning and complex workflows.

Introducing OpenAI GPT-OSS Models on Amazon Bedrock and SageMaker JumpStart

AWS is thrilled to introduce two powerful OpenAI models with open weights  gpt-oss-120b and gpt-oss-20b, now available on Amazon Bedrock and Amazon SageMaker JumpStart. These models are not just powerful, they’re practical. They’re designed to empower developers, startups, and enterprises to build AI-native applications with greater transparency, fine-tuning capability, and full infrastructure control.

What’s New?

Both models are optimized for:

  • Natural language generation

  • Scientific and mathematical reasoning

  • Coding tasks

  • Tool-augmented and agentic workflows

They come with a 128K context window, adjustable reasoning levels (low, medium, high), and external tool support. You can plug them into frameworks like Strands Agents, making them ideal for building autonomous, chain-of-thought agents.

Access the Models Where You Build
1. Amazon Bedrock — Serverless Simplicity

In Amazon Bedrock, you can:

  • Request access to gpt-oss-120b and gpt-oss-20b via the Model Access section

  • Use the Chat/Test playground to prototype and explore

  • Deploy with OpenAI SDK compatibility using a Bedrock endpoint

  • Invoke the model with the OpenAI Python SDK like this:

export OPENAI_API_KEY="<my-bedrock-api-key>"
export OPENAI_BASE_URL="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="openai.gpt-oss-120b-1:0",
messages=[{"role": "user", "content": "Tell me the square root of 42 ^ 3"}], stream=False)
print(response)

Agent Support Example with Strands:

from strands import Agentfrom strands.models

import BedrockModel

bedrock_model = BedrockModel(
model_id=”openai.gpt-oss-120b-1:0″,
region_name=”us-west-2″
)

agent = Agent(model=bedrock_model)
agent("Tell me the square root of 42 ^ 3")

Once satisfied, you can deploy your agent using Amazon Bedrock AgentCore for full runtime, memory, and identity management.

2. Amazon SageMaker JumpStart — ML Flexibility

With SageMaker JumpStart, you can:

  • Select and deploy either model in just a few clicks

  • Choose your instance type and scale based on your workload

  • Fine-tune the models for specific use cases

  • Run inference directly in SageMaker Studio or using the AWS SDK

JumpStart provides full control over your deployment environment and model lifecycle, making it ideal for production-scale AI applications and experimentation.

Why This Matters: Open Weight, Open Innovation

Here’s why open-weight models on AWS are a game-changer:

  • Transparency: See the chain-of-thought output to understand how the model reasons

  • Customization: Fine-tune and adapt the models to your domain or data

  • Security First: Built-in safety mechanisms and compatibility with AWS security best practices

  • Open Ecosystem: Leverage the OpenAI API or Bedrock-native tools for deployment, orchestration, and monitoring

Region Availability
Platform Available Regions
Amazon Bedrock US West (Oregon)
SageMaker JumpStart US East (N. Virginia, Ohio), APAC (Mumbai, Tokyo)

Share

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×