Qwen/Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is Alibaba's flagship open-source model — the first in the Qwen3.5 series (released February 16, 2026) and the most capable open-weight model in the family. It is a native multimodal model trained from scratch on trillions of text, image, and video tokens using early fusion across 201 languages. With 397B total parameters and 17B active per token, it outperforms all Qwen3-VL models on vision tasks while matching or exceeding text-only frontiers. The hosted version is called Qwen3.5-Plus.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| stream | boolean | true | Enable streaming responses for real-time output. |
| temperature | number | 0.6 | Use 0.6 for non-thinking tasks, 1.0 for thinking/reasoning tasks. |
| max_tokens | number | 16384 | Maximum tokens to generate. Use higher values for thinking mode. |
| top_p | number | 0.95 | Nucleus sampling parameter. |
| top_k | number | 20 | Limits token sampling to top-k candidates. |
| enable_thinking | boolean | false | Toggle chain-of-thought reasoning mode. Set temperature=1.0 when enabled. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
| First open-source native multimodal model — text, image, video early fusion Outperforms all dedicated Qwen3-VL vision models on reasoning and coding 397B total / 17B active — frontier intelligence at efficient compute cost 87.8% MMLU-Pro — state-of-the-art open-weight knowledge benchmark 60-layer hybrid Gated DeltaNet + MoE architecture 256K native context (1M via hosted Qwen3.5-Plus API) Multi-Token Prediction (MTP) for enhanced throughput Apache 2.0 license — full commercial freedom 201 language and dialect support | 807GB model size — requires 8× H100/A100 80GB for bf16 inference Thinking mode traces can be very verbose (high token output) Quantization (FP8/NVFP4) required for practical deployment 1M context only available via hosted Qwen3.5-Plus API, not self-hosted |
Use cases
Recommended applications for this model
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid's medical OCR and research parsing cut our document extraction time in half. We now have traceable pipelines and reproducible outputs that meet our compliance requirements."
Clinical AI Team
Research & Clinical Intelligence
