Z-Image-Turbo [LoRA]
Z-Image-Turbo is a 6B parameter distilled text-to-image model developed by Alibaba Tongyi Lab, released November 27, 2025 under Apache 2.0. It is the distilled variant of the Z-Image foundation model, accelerated using the Decoupled-DMD (Distribution Matching Distillation) algorithm with DMDR (DMD + Reinforcement Learning), enabling high-quality generation in just 8 NFEs. The model is built on a Scalable Single-Stream DiT (S3-DiT) architecture where text tokens, visual semantic tokens, and image VAE tokens are concatenated at the sequence level into a single unified input stream — unlike dual-stream approaches (e.g. FLUX MMDiT) — maximizing parameter efficiency and enabling dense cross-modal interaction at every transformer layer. The text encoder is Qwen 3.4B (qwen_3_4b.safetensors) and the VAE is a Flux-compatible autoencoder (ae.safetensors). Z-Image-Turbo achieves sub-second latency on enterprise H800 GPUs, fits within 16GB VRAM on consumer hardware, and ranks as the leading open-source model on Alibaba AI Arena Elo-based human preference evaluations. The LoRA variant here supports loading multiple LoRA adapters simultaneously with independent scale control.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| height | number | 1024 | Height of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM. |
| width | number | 1024 | Width of the generated image in pixels. Optimal results at 1024px; higher values require more VRAM. |
| num_inference_steps | number | 8 | Number of inference steps. Note: this results in (num_inference_steps - 1) actual DiT forward passes. Best quality is achieved at 8–9 steps for the Turbo variant. |
| guidance_scale | number | 0 | Classifier-free guidance scale. Must be set to 0.0 for Turbo models — distillation bakes CFG in, so non-zero values are not recommended. |
| seed | number | null | Random seed for reproducible generation. Leave unset (null) for random results. |
| lora_weights | array | null | Array of LoRA weight URLs to apply. Supports .safetensors, .tar, and .zip files from HuggingFace or any public URL (e.g. 'https://huggingface.co/user/model/resolve/main/lora.safetensors'). Multiple LoRAs can be stacked. |
| lora_scales | array | null | Array of scale values for each LoRA in lora_weights. Must match the number of lora_weights entries. Defaults to 1.0 per LoRA if not provided. Recommended range: 0.5–1.2. |
| output_format | string | jpg | Format of the generated image. Options: png, jpg, webp. |
| output_quality | number | 80 | Compression quality for jpg/webp output (0–100). Not applicable for png outputs. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
| 6B parameter S3-DiT — matches closed-source 20B+ models with 8 NFEs Sub-second inference on enterprise GPUs; ~5–10s on consumer hardware Best-in-class bilingual text rendering (English & Chinese) among open-source models Single-stream architecture: denser cross-modal interaction, higher parameter efficiency vs dual-stream Decoupled-DMD + DMDR distillation for stable, high-quality few-step generation Native multi-LoRA support with per-adapter scale control Apache 2.0 — fully open for commercial use MeanCache integration available for up to 3.7× additional speedup Runs on 16GB VRAM; 8–12GB possible at 768px with offloading | guidance_scale must be 0.0 — non-zero values are not compatible with the distilled Turbo variant Optimal resolution is 1024×1024; resolutions above 1440px require significantly more VRAM Training LoRAs directly on the Turbo checkpoint degrades distillation — LoRA training should target the base Z-Image model lora_scales array must exactly match the length of lora_weights or validation fails May reflect biases present in the training data num_inference_steps actually performs (steps - 1) DiT forward passes — set to 9 for 8 effective passes |
Use cases
Recommended applications for this model
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."
Enterprise AI Team
Document Intelligence Platform
