FLUX.1 [dev]
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer developed by Black Forest Labs. It uses a hybrid architecture combining MMDiT (Multi-Modal Diffusion Transformer) and SingleDiT blocks, with dual text encoders — CLIP ViT-L/14 (77 tokens) for global semantic alignment and T5-v1.1-XXL (up to 512 tokens) for rich, nuanced language understanding. A 16-channel VAE (4× more channels than SDXL) enables higher fidelity latent representations. The model uses Rotary Positional Encoding (RoPE) and a Flow Matching Euler Discrete scheduler, making it highly capable across varied resolutions and aspect ratios. It is guidance-distilled from FLUX.1 [pro], achieving near-pro quality at significantly lower inference cost.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| num_inference_steps | number | 28 | Number of denoising steps. More steps yield higher quality but slower generation. |
| guidance | number | 3.5 | How closely the model follows the prompt. Higher values produce more literal interpretation of the text. |
| seed | number | -1 | Random seed for reproducible generation. Use -1 for random. |
| aspect_ratio | string | 1:1 | Aspect ratio of the output image. Options: 1:1, 16:9, 21:9, 3:2, 2:3, 4:5, 5:4, 3:4, 4:3, 9:16, 9:21. |
| image_size | number | 1024 | Base size in pixels for the longest side of the output image. |
| output_format | string | jpg | Format of the generated image. Options: png, jpg, webp. |
| output_quality | number | 80 | Compression quality for jpg/webp output (1–100). Higher values retain more detail. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
| 12B parameters with state-of-the-art output quality Dual text encoders: CLIP L/14 + T5-v1.1-XXL for deep prompt understanding 16-channel VAE for high-fidelity image encoding Hybrid MMDiT + SingleDiT transformer architecture Supports wide range of aspect ratios and resolutions Open weights — compatible with LoRA, ControlNet, and fine-tuning Guidance distillation from FLUX.1 [pro] for efficient inference | Non-commercial license only — separate commercial license required from Black Forest Labs 12B parameters require significant VRAM (~24GB in fp16); quantized versions (fp8, NF4) needed for consumer hardware May reflect societal biases present in training data Not designed to produce factually accurate or grounded outputs Generation quality sensitive to prompt length, style, and step count |
Use cases
Recommended applications for this model
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid scaled our personalized outreach from hundreds to tens of thousands of prospects. AI-driven research and content generation doubled our campaign velocity without sacrificing quality."
Demand Generation Team
Marketing & Sales Operations
