Z-Image-Turbo [LoRA]

Z-Image-Turbo is a 6B parameter distilled text-to-image model developed by Alibaba Tongyi Lab, released November 27, 2025 under Apache 2.0. It is the distilled variant of the Z-Image foundation model, accelerated using the Decoupled-DMD (Distribution Matching Distillation) algorithm with DMDR (DMD + Reinforcement Learning), enabling high-quality generation in just 8 NFEs. The model is built on a Scalable Single-Stream DiT (S3-DiT) architecture where text tokens, visual semantic tokens, and image VAE tokens are concatenated at the sequence level into a single unified input stream — unlike dual-stream approaches (e.g. FLUX MMDiT) — maximizing parameter efficiency and enabling dense cross-modal interaction at every transformer layer. The text encoder is Qwen 3.4B (qwen_3_4b.safetensors) and the VAE is a Flux-compatible autoencoder (ae.safetensors). Z-Image-Turbo achieves sub-second latency on enterprise H800 GPUs, fits within 16GB VRAM on consumer hardware, and ranks as the leading open-source model on Alibaba AI Arena Elo-based human preference evaluations. The LoRA variant here supports loading multiple LoRA adapters simultaneously with independent scale control.

Tongyi-MAI (Alibaba) Image Context N/A
Get API Key
Try in Playground
Free Trial Credit On first TopUp of minimum $5
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/images/generations" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "z-image-turbo-lora",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "height": 1024,
  "width": 1024,
  "num_inference_steps": 8,
  "guidance_scale": 0,
  "seed": null,
  "lora_weights": null,
  "lora_scales": null,
  "output_format": "jpg",
  "output_quality": 80
}'

Technical Specifications

Model Architecture & Performance

Variant Turbo (Decoupled-DMD + DMDR distilled, 8-step)
Model Size 6B parameters (~16GB VRAM at bf16)
Quantization None (fp8/NF4 community variants available for <16GB GPUs)
Architecture Scalable Single-Stream DiT (S3-DiT) — text tokens, visual semantic tokens, and image VAE tokens concatenated into a single unified input stream; dense cross-modal interaction at every transformer layer
Precision bfloat16
License Apache 2.0
Release Date November 27, 2025
Developers Alibaba Tongyi Lab (Tongyi-MAI)

Pricing

Pay-per-use, no commitments

Per Image $0.008/Image

API Reference

Complete parameter documentation

Parameter Type Default Description
height number 1024 Height of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM.
width number 1024 Width of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM.
num_inference_steps number 8 Number of inference steps. Note: this results in (num_inference_steps - 1) actual DiT forward passes. Best quality is achieved at 8–9 steps for the Turbo variant.
guidance_scale number 0 Classifier-free guidance scale. Must be set to 0.0 for Turbo models — distillation bakes CFG in, so non-zero values are not recommended.
seed number null Random seed for reproducible generation. Leave unset (null) for random results.
lora_weights array null Array of LoRA weight URLs to apply. Supports .safetensors, .tar, and .zip files from HuggingFace or any public URL (e.g. 'https://huggingface.co/user/model/resolve/main/lora.safetensors'). Multiple LoRAs can be stacked.
lora_scales array null Array of scale values for each LoRA in lora_weights. Must match the number of lora_weights entries. Defaults to 1.0 per LoRA if not provided. Recommended range: 0.5–1.2.
output_format string jpg Format of the generated image. Options: png, jpg, webp.
output_quality number 80 Compression quality for jpg/webp output (0–100). Not applicable for png outputs.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
6B parameter S3-DiT — matches closed-source 20B+ models with 8 NFEs
Sub-second inference on enterprise GPUs; ~5–10s on consumer hardware
Best-in-class bilingual text rendering (English & Chinese) among open-source models
Single-stream architecture: denser cross-modal interaction, higher parameter efficiency vs dual-stream
Decoupled-DMD + DMDR distillation for stable, high-quality few-step generation
Native multi-LoRA support with per-adapter scale control
Apache 2.0 — fully open for commercial use
MeanCache integration available for up to 3.7× additional speedup
Runs on 16GB VRAM; 8–12GB possible at 768px with offloading
guidance_scale must be 0.0 — non-zero values are not compatible with the distilled Turbo variant
Optimal resolution is 1024×1024; resolutions above 1440px require significantly more VRAM
Training LoRAs directly on the Turbo checkpoint degrades distillation — LoRA training should target the base Z-Image model
lora_scales array must exactly match the length of lora_weights or validation fails
May reflect biases present in the training data
num_inference_steps actually performs (steps - 1) DiT forward passes — set to 9 for 8 effective passes

Use cases

Recommended applications for this model

Ultra-fast photorealistic image generation in production pipelines
Bilingual text rendering in images (English and Chinese)
Portrait and character generation with custom LoRA styles
Style transfer and artistic image generation via stacked LoRAs
Rapid prototyping and large-batch image generation
Consumer GPU deployment (16GB VRAM compatible)

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."

Enterprise AI Team

Document Intelligence Platform