Qwen/Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is an efficient MoE variant in the Qwen 3.6 family aimed at strong multimodal reasoning and cost-effective deployment.

Alibaba (Cloud) Vision 256K Tokens (up to 1M)

Get API Key

Deposit $5 to get started Unlock API access and start running inference right away. See how many million tokens $5 gets you

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3.6-35B-A3B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 8192,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95
}'

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

stream = client.chat.completions.create(
    model="Qwen/Qwen3.6-35B-A3B",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe the main elements."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
          }
        ]
      }
    ],
    max_tokens=8192,
    temperature=0.6,
    top_p=0.95,
    stream=True
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print("\n")

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://platform.qubrid.com/v1',
  apiKey: 'QUBRID_API_KEY',
});

const stream = await client.chat.completions.create({
  model: 'Qwen/Qwen3.6-35B-A3B',
  messages: [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  max_tokens: 8192,
  temperature: 0.6,
  top_p: 0.95,
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log('\n');

package main

import (
  "bytes"
  "encoding/json"
  "net/http"
)

func main() {
  url := "https://platform.qubrid.com/v1/chat/completions"

  data := {
  "model": "Qwen/Qwen3.6-35B-A3B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 8192,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95
}
  jsonData, _ := json.Marshal(data)

  req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
  req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
  req.Header.Set("Content-Type", "application/json")

  client := &http.Client{}
  res, _ := client.Do(req)
}

Pricing

Pay-per-use, no commitments

Input Tokens $0.25/1M Tokens

Output Tokens $1.49/1M Tokens

Cached Input Tokens $0.00/1M Tokens

Technical Specifications

Model Architecture & Performance

Variant Instruct

Model Size 35B params (A3B active)

Context Length 256K Tokens (up to 1M)

Quantization bf16 / 4-bit

Tokens/sec 120

Architecture Qwen 3.6 sparse MoE transformer architecture for multimodal reasoning and instruction following

Precision bf16 (4-bit quantization available)

License Apache 2.0

Release Date 2026

Developers Alibaba Cloud (QwenLM)

API Reference

Complete parameter documentation

Parameter	Type	Default	Description
stream	boolean	true	Enable streaming responses for real-time output.
temperature	number	0.6	Use 0.6 for non-thinking mode, 1.0 for thinking/reasoning mode.
max_tokens	number	8192	Maximum number of tokens to generate.
top_p	number	0.95	Nucleus sampling parameter.
top_k	number	20	Limits token sampling to top-k candidates.
enable_thinking	boolean	false	Toggle chain-of-thought reasoning. Set temperature=1.0 when enabled.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths	Considerations
MoE efficiency profile with strong capability-per-cost Supports multimodal inputs and reasoning-heavy workloads Thinking mode available for deeper analysis Long context support for enterprise use-cases Open-source model family ecosystem Good performance/latency balance	Thinking mode can increase response latency and verbosity MoE routing may add overhead in some scenarios Peak quality depends on prompt and parameter tuning Very large contexts may increase inference cost

Use cases

Recommended applications for this model

Cost-efficient enterprise inference at scale

Agentic coding and tool-calling workflows

Multimodal chat (text, image, video)

Long-context document analysis

Complex reasoning with optional thinking mode

Edge and cloud deployment scenarios

Enterprise
Platform Integration

Docker Support

Official Docker images for containerized deployments

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

Get Started

"Qubrid enabled us to deploy production AI agents with reliable tool-calling and step tracing. We now ship agents faster with full visibility into every decision and API call."

AI Agents Team

Agent Systems & Orchestration