Qwen/Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba's flagship open-source model — the first in the Qwen3.5 series (released February 16, 2026) and the most capable open-weight model in the family. It is a native multimodal model trained from scratch on trillions of text, image, and video tokens using early fusion across 201 languages. With 397B total parameters and 17B active per token, it outperforms all Qwen3-VL models on vision tasks while matching or exceeding text-only frontiers. The hosted version is called Qwen3.5-Plus.

Alibaba Cloud (Qwen) Vision 256K Tokens (up to 1M via Qwen3.5-Plus API)
Get API Key
Try in Playground
Free Trial Credit No Credit Card Required
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3.5-397B-A17B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 16384,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95
}'

Technical Specifications

Model Architecture & Performance

Variant Instruct
Model Size 397B params (17B active)
Context Length 256K Tokens (up to 1M via Qwen3.5-Plus API)
Quantization bf16 / FP8 / NVFP4
Tokens/Second 80
Architecture Hybrid Gated DeltaNet + Sparse MoE Transformer — 60 layers (15 cycles of 3× DeltaNet + 1× Gated Attention), hidden size 4096, 248,320 vocab size, early fusion vision+video encoder, Multi-Token Prediction (MTP)
Precision bf16 (FP8 and NVFP4 quantized variants available from NVIDIA)
License Apache 2.0
Release Date February 16, 2026
Developers Alibaba Cloud (QwenLM)

Pricing

Pay-per-use, no commitments

Input Tokens $0.0006/1K Tokens
Output Tokens $0.0036/1K Tokens

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 0.6 Use 0.6 for non-thinking tasks, 1.0 for thinking/reasoning tasks.
max_tokens number 16384 Maximum tokens to generate. Use higher values for thinking mode.
top_p number 0.95 Nucleus sampling parameter.
top_k number 20 Limits token sampling to top-k candidates.
enable_thinking boolean false Toggle chain-of-thought reasoning mode. Set temperature=1.0 when enabled.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
First open-source native multimodal model — text, image, video early fusion
Outperforms all dedicated Qwen3-VL vision models on reasoning and coding
397B total / 17B active — frontier intelligence at efficient compute cost
87.8% MMLU-Pro — state-of-the-art open-weight knowledge benchmark
60-layer hybrid Gated DeltaNet + MoE architecture
256K native context (1M via hosted Qwen3.5-Plus API)
Multi-Token Prediction (MTP) for enhanced throughput
Apache 2.0 license — full commercial freedom
201 language and dialect support
807GB model size — requires 8× H100/A100 80GB for bf16 inference
Thinking mode traces can be very verbose (high token output)
Quantization (FP8/NVFP4) required for practical deployment
1M context only available via hosted Qwen3.5-Plus API, not self-hosted

Use cases

Recommended applications for this model

Native multimodal reasoning (text + image + video)
Frontier-level agentic workflows and multi-tool orchestration
Long-horizon code generation and system design
Scientific research and mathematical problem solving
Complex document understanding and RAG
GUI and web automation
Multilingual applications (201 languages)

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid's medical OCR and research parsing cut our document extraction time in half. We now have traceable pipelines and reproducible outputs that meet our compliance requirements."

Clinical AI Team

Research & Clinical Intelligence