GPT-OSS 120B

Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments.

OpenAI Chat 256K Tokens
Get API Key
Try in Playground
Free Trial Credit No Credit Card Required
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/chat/completions" \
  -H "Authorization: Bearer $QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "openai/gpt-oss-120b",
  "messages": [
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500
}'

Technical Specifications

Model Architecture & Performance

Model Size 121.7B Params
Context Length 256K Tokens
Quantization fp16
Tokens/Second 389
License Apache 2.0
Release Date August 2024
Developers OpenAI

Pricing

Pay-per-use, no commitments

Input Tokens $0.00015/1K Tokens
Output Tokens $0.00061/1K Tokens

API Reference

Complete parameter documentation

ParameterTypeDefaultDescription
streambooleantrueEnable streaming responses for real-time output.
temperaturenumber0.7Controls randomness. Higher values mean more creative but less predictable output.
max_tokensnumber4096Maximum number of tokens to generate in the response.
top_pnumber1Nucleus sampling: considers tokens with top_p probability mass.
effortselectmediumControls how much reasoning effort the model should apply.
summaryselectconciseControls the level of explanation in the reasoning summary.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

StrengthsConsiderations
High-capacity MoE design for strong reasoning and generalization
Optimized activation load for high throughput (12B active parameters)
State-of-the-art performance under native FP4 and FP8 quantization
Scales across multi-GPU clusters and distributed inference setups
Up to 256K context window with efficient sparse attention
Superior agentic and planning abilities for sequential decision tasks
Built-in support for structured schema-based function calling
Apache 2.0 license enabling commercial and derivative use
Higher compute and memory requirements compared to smaller gpt-oss models
Latency may increase on single-GPU deployments
Fine-tuning recommended for highly specialized enterprise domains

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."

AI Infrastructure Team

Automation & Orchestration