Qwen/Qwen3.5-27B

Qwen3.5-27B is a dense (non-MoE) transformer model and the only full-weight model in the Qwen3.5 Medium Series. Released February 24, 2026, it achieves 72.4% on SWE-bench Verified — matching GPT-5 mini — despite having just 27B parameters. It supports native multimodal input (text + images + video) via early fusion, runs on a 22GB Mac M-series device, and natively extends to 1M token contexts.

Alibaba Cloud (Qwen) Vision 256K Tokens (up to 1M)
Get API Key
Try in Playground
Free Trial Credit No Credit Card Required
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3.5-27B",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 8192,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95
}'

Technical Specifications

Model Architecture & Performance

Variant Instruct
Model Size 27B params (dense)
Context Length 256K Tokens (up to 1M)
Quantization bf16
Tokens/Second 200
Architecture Dense Transformer with Gated DeltaNet hybrid attention (linear + full attention, 3:1 ratio), early fusion multimodal vision encoder
Precision bf16
License Apache 2.0
Release Date February 24, 2026
Developers Alibaba Cloud (QwenLM)

Pricing

Pay-per-use, no commitments

Input Tokens $0.0003/1K Tokens
Output Tokens $0.0024/1K Tokens

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 0.6 Use 0.6 for non-thinking tasks, 1.0 for thinking/reasoning tasks.
max_tokens number 8192 Maximum number of tokens to generate.
top_p number 0.95 Nucleus sampling parameter.
top_k number 20 Limits token sampling to top-k candidates.
enable_thinking boolean false Toggle chain-of-thought reasoning mode. Set temperature=1.0 when enabled.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
72.4% SWE-bench Verified — matches GPT-5 mini at 27B
Dense architecture — simple deployment, no MoE routing
Native multimodal (early fusion: text + image + video)
256K context natively, extensible to 1M tokens
Runs on 22GB Mac or consumer GPU
Apache 2.0 license — free for commercial use
201 language support
Dense model — higher per-token compute vs MoE siblings
Less efficient than 35B-A3B on long contexts
Thinking mode increases latency and output verbosity
Smaller than 35B-A3B in benchmark tasks (except SWE-bench)

Use cases

Recommended applications for this model

Local deployment on consumer hardware (22GB+ RAM)
Agentic coding and software development
Multimodal chat (text, images, video)
Complex reasoning and analysis
Long-context document processing
Fine-tuning for specialized domains

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid scaled our personalized outreach from hundreds to tens of thousands of prospects. AI-driven research and content generation doubled our campaign velocity without sacrificing quality."

Demand Generation Team

Marketing & Sales Operations