moonshotai/Kimi-K2.6

Kimi K2.6 is Kimi's flagship versatile model on Alibaba Cloud Model Studio for dialogue and agent tasks, with native multimodal (text + vision) input and optional thinking mode.

Moonshot AI Vision 256K Tokens

Get API Key

Deposit $5 to get started Unlock API access and start running inference right away. See how many million tokens $5 gets you

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/chat/completions" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "moonshotai/Kimi-K2.6",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 16384,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95,
  "presence_penalty": 0
}'

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

stream = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image? Describe the main elements."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
          }
        ]
      }
    ],
    max_tokens=16384,
    temperature=0.6,
    top_p=0.95,
    stream=True,
    presence_penalty=0
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print("\n")

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://platform.qubrid.com/v1',
  apiKey: 'QUBRID_API_KEY',
});

const stream = await client.chat.completions.create({
  model: 'moonshotai/Kimi-K2.6',
  messages: [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  max_tokens: 16384,
  temperature: 0.6,
  top_p: 0.95,
  stream: true,
  presence_penalty: 0
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}
console.log('\n');

package main

import (
  "bytes"
  "encoding/json"
  "net/http"
)

func main() {
  url := "https://platform.qubrid.com/v1/chat/completions"

  data := {
  "model": "moonshotai/Kimi-K2.6",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image? Describe the main elements."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 16384,
  "temperature": 0.6,
  "stream": true,
  "top_p": 0.95,
  "presence_penalty": 0
}
  jsonData, _ := json.Marshal(data)

  req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
  req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
  req.Header.Set("Content-Type", "application/json")

  client := &http.Client{}
  res, _ := client.Do(req)
}

Pricing

Pay-per-use, no commitments

Input Tokens $0.89/1M Tokens

Output Tokens $3.71/1M Tokens

Cached Input Tokens $0.18/1M Tokens

Technical Specifications

Model Architecture & Performance

Variant Multimodal Dialogue + Agent

Model Size 1T params (32B active)

Context Length 256K Tokens

Quantization INT4 (QAT)

Tokens/sec 50

Architecture Sparse MoE Transformer — 1T total / 32B active, 61 layers, 384 experts (8 selected per token), MLA attention, SwiGLU; native vision encoder with spatial-temporal pooling

Precision INT4 (QAT)

License Modified MIT License

Release Date 2026

Developers Moonshot AI

API Reference

Complete parameter documentation

Parameter	Type	Default	Description
stream	boolean	true	Enable streaming responses for real-time output.
enable_thinking	boolean	false	Enable deep reasoning mode for richer chain-of-thought style responses.
temperature	number	0.6	Alibaba defaults: 1.0 in thinking mode, 0.6 in non-thinking mode.
max_tokens	number	16384	Maximum number of tokens to generate.
top_p	number	0.95	Default 0.95 for both thinking and non-thinking modes.
presence_penalty	number	0	Default 0.0 for both thinking and non-thinking modes.
fps	number	2	Video sampling frame rate. Alibaba default: 2.
max_frames	number	2000	Maximum sampled frames for video understanding. Alibaba default: 2000.
thinking_mode	select	thinking	Thinking mode enables deep reasoning traces. Instant mode provides fast direct responses.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths	Considerations
Officially supports multimodal input (text, image, and video processing paths) Supports both thinking and non-thinking operating modes Strong fit for both dialogue and agent scenarios Reasoning output support when thinking mode is enabled Open-source lineage through Moonshot model family	Some parameters are mode-dependent and require careful tuning Video understanding quality depends on fps/max_frames settings High-context and multimodal tasks can increase latency and cost Capabilities vary by endpoint/runtime configuration

Use cases

Recommended applications for this model

General chat and dialogue assistants

Agent/task orchestration workflows

Visual understanding from image inputs

Video understanding workflows

Complex reasoning with thinking mode enabled

Fast direct answers with thinking mode disabled

Enterprise
Platform Integration

Docker Support

Official Docker images for containerized deployments

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

Get Started

"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."

AI Infrastructure Team

Automation & Orchestration