Model Catalog

Search our library of open source models
and deploy in seconds.

DeepSeek R1 Distill LLaMA 70B

DeepSeek R1 Distill LLaMA 70B is optimized for efficient, high-level reasoning and conversational intelligence. It delivers near frontier-level analytical performance while running on significantly smaller hardware.

Explore →

Chat

Fara 7B

Fara 7B is a compact and efficient transformer model developed by Microsoft for high-speed inference, instruction following, text generation, and lightweight reasoning tasks. Its small parameter size allows easy deployment on consumer GPUs and edge devices while maintaining strong performance.

Explore →

Chat

GPT-OSS 120B

Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments.

Explore →

Chat

GPT-OSS 20B

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is a 21.5B parameter model with Mixture-of-Experts (MoE) architecture, featuring 3.6B active parameters during inference. It's optimized for lower latency and local or specialized use-cases, supporting configurable reasoning depth for agentic applications.

Explore →

OCR

Hunyuan OCR (1B)

Released in late 2025, Hunyuan OCR is an open-source contribution from Tencent that outperforms many larger proprietary models. It uses a global-to-local architecture with a SigLIP-v2 visual encoder to handle high-resolution inputs and extreme aspect ratios without artificial image splitting.

Explore →

Chat

Mistral 7B

Mistral 7B is a 7.3B parameter language model celebrated for its efficiency, outperforming larger models on many benchmarks. The v0.3 instruct version is specifically fine-tuned for chat and instruction-following tasks.

Explore →

Chat

Nemotron Orchestrator 8B

NVIDIA Orchestrator is purpose-built for agent workflows and complex task sequencing. It excels in planning, structured reasoning, autonomous execution, and coordinating multiple tools or APIs. With deep GPU-level optimization, it delivers superior throughput and low latency in enterprise automation scenarios.

Explore →

Chat

NVIDIA Nemotron 3 Nano 30B-A3B

Nemotron 3 Nano 30B-A3B is NVIDIA’s flagship open reasoning model using a hybrid Mamba-2 + Transformer Mixture-of-Experts architecture. Although it has 31.6B total parameters, only 3.2B are active per forward pass, delivering significantly higher throughput while maintaining state-of-the-art reasoning accuracy.

Explore →

Image

Qwen-Image-Edit

Qwen-Image-Edit is a 20B multimodal diffusion model for advanced image editing and transformation. It performs precise text-guided edits, inpainting, and style modifications while preserving visual fidelity and layout.

Explore →

Code

Qwen3-Coder 30B A3B

Qwen3-Coder-30B-A3B-Instruct is a sparse Mixture-of-Experts (MoE) model with around 30.5B total parameters (3.3B active per inference), 48 layers, supporting extremely long context (native 262,144 tokens — extendable to 1M in some deployments).

Explore →

Vision

Qwen3-VL 8B Instruct

Qwen3-VL is a vision-language instruction-tuned model capable of understanding text and images. It supports streaming, OCR, and rich multimodal conversations.

Explore →

Vision

Qwen3-VL-30B-A3B-Instruct

Qwen3-VL-30B-A3-Instruct is a large-scale, high-capacity vision-language instruction model designed for advanced multimodal reasoning. It delivers significantly stronger visual understanding, OCR accuracy, document reasoning, long-context comprehension, and agent-style interactions compared to smaller Qwen-VL variants.

Explore →

Image

Stable Diffusion

This model generates and edits images from text prompts using a Latent Diffusion framework. It leverages two fixed, pretrained text encoders — OpenCLIP-ViT/G and CLIP-ViT/L — to understand and translate textual descriptions into visual representations.

Explore →

Audio

Whisper Large V3

Whisper V3 delivers high-accuracy speech recognition across 99 languages. Ideal for transcription, subtitles, and accessibility.

Explore →

Image

Z-Image Turbo

Z-Image Turbo is built for ultra-fast image generation, ideal for low-latency workflows and real-time creative tasks. It enables high-quality output with extremely few sampling steps.

Explore →

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

Get Started

"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."

Enterprise AI Team

Document Intelligence Platform