Z-Image-Turbo [LoRA]

Z-Image-Turbo is a 6B parameter distilled text-to-image model developed by Alibaba Tongyi Lab, released November 27, 2025 under Apache 2.0. It is the distilled variant of the Z-Image foundation model, accelerated using the Decoupled-DMD (Distribution Matching Distillation) algorithm with DMDR (DMD + Reinforcement Learning), enabling high-quality generation in just 8 NFEs. The model is built on a Scalable Single-Stream DiT (S3-DiT) architecture where text tokens, visual semantic tokens, and image VAE tokens are concatenated at the sequence level into a single unified input stream — unlike dual-stream approaches (e.g. FLUX MMDiT) — maximizing parameter efficiency and enabling dense cross-modal interaction at every transformer layer. The text encoder is Qwen 3.4B (qwen_3_4b.safetensors) and the VAE is a Flux-compatible autoencoder (ae.safetensors). Z-Image-Turbo achieves sub-second latency on enterprise H800 GPUs, fits within 16GB VRAM on consumer hardware, and ranks as the leading open-source model on Alibaba AI Arena Elo-based human preference evaluations. The LoRA variant here supports loading multiple LoRA adapters simultaneously with independent scale control.

Tongyi-MAI (Alibaba) Image Context N/A

Get API Key

Try in Playground

Free Trial Credit On first TopUp of minimum $5

$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/images/generations" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "z-image-turbo-lora",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "height": 1024,
  "width": 1024,
  "num_inference_steps": 8,
  "guidance_scale": 0,
  "seed": null,
  "lora_weights": null,
  "lora_scales": null,
  "output_format": "jpg",
  "output_quality": 80
}'

import requests
import json

url = "https://platform.qubrid.com/v1/images/generations"
headers = {
    "Authorization": "Bearer QUBRID_API_KEY",
    "Content-Type": "application/json"
}

data = {
  "model": "z-image-turbo-lora",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "height": 1024,
  "width": 1024,
  "num_inference_steps": 8,
  "guidance_scale": 0,
  "seed": None,
  "lora_weights": None,
  "lora_scales": None,
  "output_format": "jpg",
  "output_quality": 80
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    with open("generated_image.png", "wb") as f:
        f.write(response.content)
        print("Image saved to generated_image.png")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

const body = {
  "model": "z-image-turbo-lora",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "height": 1024,
  "width": 1024,
  "num_inference_steps": 8,
  "guidance_scale": 0,
  "seed": null,
  "lora_weights": null,
  "lora_scales": null,
  "output_format": "jpg",
  "output_quality": 80
};

const res = await fetch("https://platform.qubrid.com/v1/images/generations", {
  method: "POST",
  headers: {
    Authorization: "Bearer QUBRID_API_KEY",
    "Content-Type": "application/json"
  },
  body: JSON.stringify(body)
});

const result = await res.json();

package main

import (
  "bytes"
  "encoding/json"
  "net/http"
)

func main() {
  url := "https://platform.qubrid.com/v1/images/generations"

  data := {
  "model": "z-image-turbo-lora",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "height": 1024,
  "width": 1024,
  "num_inference_steps": 8,
  "guidance_scale": 0,
  "seed": null,
  "lora_weights": null,
  "lora_scales": null,
  "output_format": "jpg",
  "output_quality": 80
}
  jsonData, _ := json.Marshal(data)

  req, _ := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
  req.Header.Set("Authorization", "Bearer QUBRID_API_KEY")
  req.Header.Set("Content-Type", "application/json")

  client := &http.Client{}
  res, _ := client.Do(req)
}

Technical Specifications

Model Architecture & Performance

Variant Turbo (Decoupled-DMD + DMDR distilled, 8-step)

Model Size 6B parameters (~16GB VRAM at bf16)

Quantization None (fp8/NF4 community variants available for <16GB GPUs)

Architecture Scalable Single-Stream DiT (S3-DiT) — text tokens, visual semantic tokens, and image VAE tokens concatenated into a single unified input stream; dense cross-modal interaction at every transformer layer

Precision bfloat16

License Apache 2.0

Release Date November 27, 2025

Developers Alibaba Tongyi Lab (Tongyi-MAI)

Pricing

Pay-per-use, no commitments

Per Image $0.008/Image

API Reference

Complete parameter documentation

Parameter	Type	Default	Description
height	number	1024	Height of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM.
width	number	1024	Width of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM.
num_inference_steps	number	8	Number of inference steps. Note: this results in (num_inference_steps - 1) actual DiT forward passes. Best quality is achieved at 8–9 steps for the Turbo variant.
guidance_scale	number	0	Classifier-free guidance scale. Must be set to 0.0 for Turbo models — distillation bakes CFG in, so non-zero values are not recommended.
seed	number	null	Random seed for reproducible generation. Leave unset (null) for random results.
lora_weights	array	null	Array of LoRA weight URLs to apply. Supports .safetensors, .tar, and .zip files from HuggingFace or any public URL (e.g. 'https://huggingface.co/user/model/resolve/main/lora.safetensors'). Multiple LoRAs can be stacked.
lora_scales	array	null	Array of scale values for each LoRA in lora_weights. Must match the number of lora_weights entries. Defaults to 1.0 per LoRA if not provided. Recommended range: 0.5–1.2.
output_format	string	jpg	Format of the generated image. Options: png, jpg, webp.
output_quality	number	80	Compression quality for jpg/webp output (0–100). Not applicable for png outputs.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths	Considerations
6B parameter S3-DiT — matches closed-source 20B+ models with 8 NFEs Sub-second inference on enterprise GPUs; ~5–10s on consumer hardware Best-in-class bilingual text rendering (English & Chinese) among open-source models Single-stream architecture: denser cross-modal interaction, higher parameter efficiency vs dual-stream Decoupled-DMD + DMDR distillation for stable, high-quality few-step generation Native multi-LoRA support with per-adapter scale control Apache 2.0 — fully open for commercial use MeanCache integration available for up to 3.7× additional speedup Runs on 16GB VRAM; 8–12GB possible at 768px with offloading	guidance_scale must be 0.0 — non-zero values are not compatible with the distilled Turbo variant Optimal resolution is 1024×1024; resolutions above 1440px require significantly more VRAM Training LoRAs directly on the Turbo checkpoint degrades distillation — LoRA training should target the base Z-Image model lora_scales array must exactly match the length of lora_weights or validation fails May reflect biases present in the training data num_inference_steps actually performs (steps - 1) DiT forward passes — set to 9 for 8 effective passes

Strengths

Considerations

6B parameter S3-DiT — matches closed-source 20B+ models with 8 NFEs

Sub-second inference on enterprise GPUs; ~5–10s on consumer hardware

Best-in-class bilingual text rendering (English & Chinese) among open-source models

Single-stream architecture: denser cross-modal interaction, higher parameter efficiency vs dual-stream

Decoupled-DMD + DMDR distillation for stable, high-quality few-step generation

Native multi-LoRA support with per-adapter scale control

Apache 2.0 — fully open for commercial use

MeanCache integration available for up to 3.7× additional speedup

Runs on 16GB VRAM; 8–12GB possible at 768px with offloading

guidance_scale must be 0.0 — non-zero values are not compatible with the distilled Turbo variant

Optimal resolution is 1024×1024; resolutions above 1440px require significantly more VRAM

Training LoRAs directly on the Turbo checkpoint degrades distillation — LoRA training should target the base Z-Image model

lora_scales array must exactly match the length of lora_weights or validation fails

May reflect biases present in the training data

num_inference_steps actually performs (steps - 1) DiT forward passes — set to 9 for 8 effective passes

Use cases

Recommended applications for this model

Ultra-fast photorealistic image generation in production pipelines

Bilingual text rendering in images (English and Chinese)

Portrait and character generation with custom LoRA styles

Style transfer and artistic image generation via stacked LoRAs

Rapid prototyping and large-batch image generation

Consumer GPU deployment (16GB VRAM compatible)

Enterprise
Platform Integration

Docker Support

Official Docker images for containerized deployments

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

Get Started

"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."

Enterprise AI Team

Document Intelligence Platform