Qwen/Qwen3.5-122B-A10B
Qwen3.5-122B-A10B is the most powerful open-source model in the Qwen3.5 Medium Series. With 122B total parameters and 10B active per token across a 48-layer hybrid architecture, it delivers the strongest knowledge, vision, and function-calling performance in the medium class — scoring 86.6% on GPQA Diamond (beating GPT-5 mini's 82.8%), 72.2% on BFCL-V4 tool calling (vs GPT-5 mini's 55.5%), 92.1% on OCRBench, and 83.9% on MMMU. Supports text, image, and video input natively via early fusion.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| stream | boolean | true | Enable streaming responses for real-time output. |
| temperature | number | 1 | Recommended 1.0 for thinking mode. Use 0.6–0.7 for non-thinking tasks. |
| max_tokens | number | 16384 | Maximum tokens to generate. Thinking mode may require higher values. |
| top_p | number | 0.95 | Nucleus sampling parameter. |
| top_k | number | 20 | Limits token sampling to top-k candidates. |
| presence_penalty | number | 1.5 | Reduces repetition in longer outputs. Recommended 1.5 for this model. |
| enable_thinking | boolean | true | Toggle chain-of-thought reasoning. Enables deep problem solving at the cost of higher latency. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
| 86.6% GPQA Diamond — beats GPT-5 mini (82.8%) by 4 points 72.2% BFCL-V4 function calling — 30% ahead of GPT-5 mini (55.5%) 92.1% OCRBench and 89.8% OmniDocBench — best open-weight document model 70.4% ScreenSpot Pro — 2x Claude Sonnet 4.5 (36.2%) on GUI automation 48-layer hybrid DeltaNet architecture for deep reasoning 122B total / 10B active — excellent efficiency for capability tier Native multimodal: text + image + video via early fusion Apache 2.0 license | Requires 244GB VRAM at bf16 (3–4× A100 80GB); 60–70GB at 4-bit Thinking mode is verbose — generates 91M+ tokens in benchmarks Higher cost per token than 35B-A3B sibling Longer TTFT (~1.03s) vs smaller models |
Use cases
Recommended applications for this model
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."
AI Infrastructure Team
Automation & Orchestration
