NVIDIA Nemotron 3 Nano 30B-A3B
Nemotron 3 Nano 30B-A3B is NVIDIA’s flagship open reasoning model using a hybrid Mamba-2 + Transformer Mixture-of-Experts architecture. Although it has 31.6B total parameters, only 3.2B are active per forward pass, delivering significantly higher throughput while maintaining state-of-the-art reasoning accuracy.
api_example.sh
Technical Specifications
Model Architecture & Performance
Pricing
Pay-per-use, no commitments
API Reference
Complete parameter documentation
| Parameter | Type | Default | Description |
|---|---|---|---|
| stream | boolean | true | Enable streaming responses for real-time output. |
| temperature | number | 0.3 | Controls randomness. Higher values produce more creative but less predictable output. |
| max_tokens | number | 8192 | Maximum number of tokens the model can generate. |
| top_p | number | 1 | Nucleus sampling threshold for token selection. |
| enable_thinking | boolean | true | Enable chain-of-thought reasoning traces. |
| thinking_budget | number | 16384 | Maximum tokens allocated for reasoning traces. |
Explore the full request and response schema in our external API documentation
Performance
Strengths & considerations
| Strengths | Considerations |
|---|---|
Hybrid Mamba-2 + Transformer MoE architecture Only 3.2B active parameters per inference Up to 3.3× higher throughput than comparable 30B models Supports extremely long context (up to 1M tokens) Configurable reasoning depth with thinking budget Native tool calling and function execution FP8 optimized for memory efficiency and speed Strong performance on SWE-Bench, GPQA Diamond, and AIME benchmarks | Requires 32GB+ VRAM for FP8 inference BF16 requires 60GB+ VRAM Hybrid architecture has less community tooling than pure transformers FlashInfer backend requires CUDA toolkit support |
Enterprise
Platform Integration
Docker Support
Official Docker images for containerized deployments
Kubernetes Ready
Production-grade KBS manifests and Helm charts
SDK Libraries
Official SDKs for Python, Javascript, Go, and Java
Don't let your AI control you. Control your AI the Qubrid way!
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid helped us turn a collection of AI scripts into structured production workflows. We now have better reliability, visibility, and control over every run."
AI Infrastructure Team
Automation & Orchestration
