moonshotai/Kimi-K2.6 logo

moonshotai/Kimi-K2.6

Kimi K2.6 is Kimi's flagship versatile model on Alibaba Cloud Model Studio for dialogue and agent tasks, with native multimodal (text + vision) input and optional thinking mode.

Moonshot AI Vision 256K Tokens
Get API Key
Deposit $5 to get started Unlock API access and start running inference right away. See how many million tokens $5 gets you

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "moonshotai/Kimi-K2.6",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 16384,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95,
  "presence_penalty": 0
}'

Pricing

Pay-per-use, no commitments

Input Tokens $0.89/1M Tokens
Output Tokens $3.71/1M Tokens
Cached Input Tokens $0.18/1M Tokens

Technical Specifications

Model Architecture & Performance

Variant Multimodal Dialogue + Agent
Model Size 1T params (32B active)
Context Length 256K Tokens
Quantization INT4 (QAT)
Tokens/sec 50
Architecture Sparse MoE Transformer — 1T total / 32B active, 61 layers, 384 experts (8 selected per token), MLA attention, SwiGLU; native vision encoder with spatial-temporal pooling
Precision INT4 (QAT)
License Modified MIT License
Release Date 2026
Developers Moonshot AI

API Reference

Complete parameter documentation

Parameter Type Default Description
stream boolean true Enable streaming responses for real-time output.
enable_thinking boolean false Enable deep reasoning mode for richer chain-of-thought style responses.
temperature number 0.6 Alibaba defaults: 1.0 in thinking mode, 0.6 in non-thinking mode.
max_tokens number 16384 Maximum number of tokens to generate.
top_p number 0.95 Default 0.95 for both thinking and non-thinking modes.
presence_penalty number 0 Default 0.0 for both thinking and non-thinking modes.
fps number 2 Video sampling frame rate. Alibaba default: 2.
max_frames number 2000 Maximum sampled frames for video understanding. Alibaba default: 2000.
thinking_mode select thinking Thinking mode enables deep reasoning traces. Instant mode provides fast direct responses.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
Officially supports multimodal input (text, image, and video processing paths)
Supports both thinking and non-thinking operating modes
Strong fit for both dialogue and agent scenarios
Reasoning output support when thinking mode is enabled
Open-source lineage through Moonshot model family
Some parameters are mode-dependent and require careful tuning
Video understanding quality depends on fps/max_frames settings
High-context and multimodal tasks can increase latency and cost
Capabilities vary by endpoint/runtime configuration

Use cases

Recommended applications for this model

General chat and dialogue assistants
Agent/task orchestration workflows
Visual understanding from image inputs
Video understanding workflows
Complex reasoning with thinking mode enabled
Fast direct answers with thinking mode disabled

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."

AI Infrastructure Team

Automation & Orchestration