FLUX.2 [klein] 4B
FLUX.2 [klein] 4B is a 4 billion parameter rectified flow transformer developed by Black Forest Labs, released January 15, 2026 under Apache 2.0. It is part of the FLUX.2 [klein] model family — BFL's fastest image models to date. The architecture is a unified generative-editing backbone: the same weights handle text-to-image generation, single-reference editing, and multi-reference generation without switching pipelines. Built on Rectified Flow, the model finds the straightest possible path between noise and image, enabling high-quality generation in as few as 4 inference steps (sub-second on enterprise GPUs). The distilled variant (this checkpoint) is step-distilled for speed; the undistilled Base variant (FLUX.2-klein-base-4B) is available for LoRA training and fine-tuning. The model fits in ~13GB VRAM (full bf16: 23.7GB checkpoint; quantized fp8/nvfp4/GGUF variants available for tighter budgets) and is accessible on RTX 3090/4070 and above. Pixel-layer watermarking (C2PA standard) is implemented in the inference code for content provenance. Safety filtering of both inputs and outputs is encouraged for all deployments.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| seed | number | -1 | Random seed for reproducible generation. Use -1 for random results. |
| aspect_ratio | string | 1:1 | Aspect ratio of the output image. Options: 1:1, 16:9, 21:9, 3:2, 2:3, 4:5, 5:4, 3:4, 4:3, 9:16, 9:21. |
| output_format | string | jpg | Format of the generated image. Options: png, jpg, webp. |
| output_quality | number | 80 | Compression quality for jpg/webp output (1–100). Not applicable for png outputs. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
| 4B parameter rectified flow transformer — sub-second inference in 4 steps Unified generative-editing backbone: T2I + single-reference + multi-reference from the same weights Fits in ~13GB VRAM — accessible on RTX 3090/4070 and above Apache 2.0 license — fully open for commercial use with no restrictions Rectified flow architecture: straight noise-to-image paths = fewer steps, faster generation Matches quality of much larger models on the Pareto frontier for quality vs. latency fp8, nvfp4, and GGUF quantized variants available for sub-13GB deployment Pixel-layer C2PA watermarking built into inference code for content provenance Diffusers-native via Flux2KleinPipeline | Distilled checkpoint is optimized for speed — for LoRA training or fine-tuning, use the Base variant (FLUX.2-klein-base-4B) May amplify biases observed in training data Not intended or able to provide factual information; text rendering in images may be inaccurate Prompt following is sensitive to prompting style Full bf16 checkpoint is 23.7GB — quantization (fp8/GGUF) required for strict 13GB deployments |
Use cases
Recommended applications for this model
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."
AI Infrastructure Team
Automation & Orchestration
