Qwen/Qwen3-Coder-Next

Qwen3-Coder-Next is an open-weight MoE language model designed specifically for coding agents. With only 3B activated parameters out of 79.7B total, it achieves performance comparable to models with 10–20x more active parameters. It features a hybrid Gated Attention + Gated DeltaNet MoE architecture with 512 experts (10 active per token), 262K native context, and achieves 74.2% on SWE-Bench Verified — making it highly cost-effective for production agent deployment.

Alibaba Cloud Code 262K Tokens
Get API Key
Try in Playground
Free Trial Credit On first TopUp of minimum $5
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "Qwen/Qwen3-Coder-Next",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function to calculate fibonacci sequence"
    }
  ],
  "temperature": 1,
  "max_tokens": 8192,
  "stream": true,
  "top_p": 0.95
}'

Technical Specifications

Model Architecture & Performance

Variant Instruct
Model Size 79.7B params (3B active)
Context Length 262K Tokens
Quantization FP8
Tokens/Second 80
Architecture Hybrid Gated Attention + Gated DeltaNet MoE Transformer, 512 experts / 10 active per token, 48 layers
Precision FP8
License Apache 2.0
Release Date February 1, 2026
Developers Alibaba Cloud (QwenLM)

Pricing

Pay-per-use, no commitments

Input Tokens $0.30/1M Tokens
Output Tokens $1.50/1M Tokens

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
temperature number 1 Controls randomness in output.
max_tokens number 8192 Maximum tokens to generate.
top_p number 0.95 Controls nucleus sampling.

Performance

Strengths & considerations

Strengths Considerations
Only 3B active params from 79.7B total — performs like 30–60B models
74.2% on SWE-Bench Verified, 63.7% SWE-Bench Multilingual
Native 262K context length (262,144 tokens)
Hybrid Gated Attention + Gated DeltaNet MoE, 512 experts / 10 active
Advanced tool calling with complex function orchestration
10–20x parameter efficiency advantage for agent workloads
Non-thinking mode only — no chain-of-thought reasoning blocks
Not optimized for vision or multimodal tasks
Best suited for agentic tasks; overkill for simple completions

Use cases

Recommended applications for this model

Agentic software development & long-horizon coding
Complex tool use & function orchestration
Execution failure recovery in dynamic workflows
Repository-scale navigation and bug fixing
Automated testing, refactoring & documentation
CI/CD pipeline integration for code generation

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid's medical OCR and research parsing cut our document extraction time in half. We now have traceable pipelines and reproducible outputs that meet our compliance requirements."

Clinical AI Team

Research & Clinical Intelligence