Qwen/Qwen3.6-35B-A3B logo

Qwen/Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is an efficient MoE variant in the Qwen 3.6 family aimed at strong multimodal reasoning and cost-effective deployment.

Alibaba (Cloud) Vision 256K Tokens (up to 1M)
Get API Key
Deposit $5 to get started Unlock API access and start running inference right away. See how many million tokens $5 gets you

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3.6-35B-A3B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 8192,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95
}'

Pricing

Pay-per-use, no commitments

Input Tokens $0.25/1M Tokens
Output Tokens $1.49/1M Tokens
Cached Input Tokens $0.00/1M Tokens

Technical Specifications

Model Architecture & Performance

Variant Instruct
Model Size 35B params (A3B active)
Context Length 256K Tokens (up to 1M)
Quantization bf16 / 4-bit
Tokens/sec 120
Architecture Qwen 3.6 sparse MoE transformer architecture for multimodal reasoning and instruction following
Precision bf16 (4-bit quantization available)
License Apache 2.0
Release Date 2026
Developers Alibaba Cloud (QwenLM)

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 0.6 Use 0.6 for non-thinking mode, 1.0 for thinking/reasoning mode.
max_tokens number 8192 Maximum number of tokens to generate.
top_p number 0.95 Nucleus sampling parameter.
top_k number 20 Limits token sampling to top-k candidates.
enable_thinking boolean false Toggle chain-of-thought reasoning. Set temperature=1.0 when enabled.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
MoE efficiency profile with strong capability-per-cost
Supports multimodal inputs and reasoning-heavy workloads
Thinking mode available for deeper analysis
Long context support for enterprise use-cases
Open-source model family ecosystem
Good performance/latency balance
Thinking mode can increase response latency and verbosity
MoE routing may add overhead in some scenarios
Peak quality depends on prompt and parameter tuning
Very large contexts may increase inference cost

Use cases

Recommended applications for this model

Cost-efficient enterprise inference at scale
Agentic coding and tool-calling workflows
Multimodal chat (text, image, video)
Long-context document analysis
Complex reasoning with optional thinking mode
Edge and cloud deployment scenarios

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid enabled us to deploy production AI agents with reliable tool-calling and step tracing. We now ship agents faster with full visibility into every decision and API call."

AI Agents Team

Agent Systems & Orchestration