FLUX.2 [klein] 4B

FLUX.2 [klein] 4B is a 4 billion parameter rectified flow transformer developed by Black Forest Labs, released January 15, 2026 under Apache 2.0. It is part of the FLUX.2 [klein] model family — BFL's fastest image models to date. The architecture is a unified generative-editing backbone: the same weights handle text-to-image generation, single-reference editing, and multi-reference generation without switching pipelines. Built on Rectified Flow, the model finds the straightest possible path between noise and image, enabling high-quality generation in as few as 4 inference steps (sub-second on enterprise GPUs). The distilled variant (this checkpoint) is step-distilled for speed; the undistilled Base variant (FLUX.2-klein-base-4B) is available for LoRA training and fine-tuning. The model fits in ~13GB VRAM (full bf16: 23.7GB checkpoint; quantized fp8/nvfp4/GGUF variants available for tighter budgets) and is accessible on RTX 3090/4070 and above. Pixel-layer watermarking (C2PA standard) is implemented in the inference code for content provenance. Safety filtering of both inputs and outputs is encouraged for all deployments.

Black Forest Labs Image Context N/A
Get API Key
Try in Playground
Free Trial Credit On first TopUp of minimum $5
$1.00

api_example.sh

curl -X POST "https://platform.qubrid.com/v1/images/generations" \
  -H "Authorization: Bearer QUBRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "flux-klein-4b",
  "prompt": "cinematic shot of a lone astronaut standing on a desolate alien planet, glowing orange sunset sky, dust storms swirling, dramatic lighting, ultra-wide lens composition, movie still aesthetic, realistic space suit details, volumetric atmosphere, 8k sci-fi film scene",
  "seed": -1,
  "aspect_ratio": "1:1",
  "output_format": "jpg",
  "output_quality": 80
}'

Technical Specifications

Model Architecture & Performance

Variant Distilled (step-distilled, 4-step inference)
Model Size 4B parameters (~23.7GB bf16 checkpoint; ~13GB VRAM at runtime)
Quantization None (fp8, nvfp4, GGUF community and official variants available)
Architecture Rectified Flow Transformer — unified generative-editing backbone; same weights for T2I, single-reference editing, and multi-reference generation
Precision bfloat16
License Apache 2.0
Release Date January 15, 2026
Developers Black Forest Labs

Pricing

Pay-per-use, no commitments

Per Image $0.0001/Image

API Reference

Complete parameter documentation

Parameter Type Default Description
seed number -1 Random seed for reproducible generation. Use -1 for random results.
aspect_ratio string 1:1 Aspect ratio of the output image. Options: 1:1, 16:9, 21:9, 3:2, 2:3, 4:5, 5:4, 3:4, 4:3, 9:16, 9:21.
output_format string jpg Format of the generated image. Options: png, jpg, webp.
output_quality number 80 Compression quality for jpg/webp output (1–100). Not applicable for png outputs.

Explore the full request and response schema in our external API documentation

Performance

Strengths & considerations

Strengths Considerations
4B parameter rectified flow transformer — sub-second inference in 4 steps
Unified generative-editing backbone: T2I + single-reference + multi-reference from the same weights
Fits in ~13GB VRAM — accessible on RTX 3090/4070 and above
Apache 2.0 license — fully open for commercial use with no restrictions
Rectified flow architecture: straight noise-to-image paths = fewer steps, faster generation
Matches quality of much larger models on the Pareto frontier for quality vs. latency
fp8, nvfp4, and GGUF quantized variants available for sub-13GB deployment
Pixel-layer C2PA watermarking built into inference code for content provenance
Diffusers-native via Flux2KleinPipeline
Distilled checkpoint is optimized for speed — for LoRA training or fine-tuning, use the Base variant (FLUX.2-klein-base-4B)
May amplify biases observed in training data
Not intended or able to provide factual information; text rendering in images may be inaccurate
Prompt following is sensitive to prompting style
Full bf16 checkpoint is 23.7GB — quantization (fp8/GGUF) required for strict 13GB deployments

Use cases

Recommended applications for this model

Real-time and sub-second text-to-image generation
Multi-reference image editing (style, character, object transfer)
Single-reference image editing and style transfer
Local deployment on consumer GPUs (RTX 3090/4070+)
Edge deployment and production pipelines requiring Apache 2.0 licensing
Rapid prototyping and creative workflows

Enterprise
Platform Integration

Docker

Docker Support

Official Docker images for containerized deployments

Kubernetes

Kubernetes Ready

Production-grade KBS manifests and Helm charts

SDK

SDK Libraries

Official SDKs for Python, Javascript, Go, and Java

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."

AI Infrastructure Team

Automation & Orchestration