DeepSeek R1 Distill LLaMA 70B
DeepSeek R1 Distill LLaMA 70B is optimized for efficient, high-level reasoning and conversational intelligence. It delivers near frontier-level analytical performance while running on significantly smaller hardware.
Search our library of open source models
and deploy in seconds.
DeepSeek R1 Distill LLaMA 70B is optimized for efficient, high-level reasoning and conversational intelligence. It delivers near frontier-level analytical performance while running on significantly smaller hardware.
Fara 7B is a compact and efficient transformer model developed by Microsoft for high-speed inference, instruction following, text generation, and lightweight reasoning tasks. Its small parameter size allows easy deployment on consumer GPUs and edge devices while maintaining strong performance.
Introducing gpt-oss-120B, OpenAI's flagship open-weight model in the gpt-oss series, built for advanced reasoning, large-scale agentic workloads, and enterprise-grade automation. With 120B parameters and a highly optimized Mixture-of-Experts (MoE) architecture, it activates 12B parameters during inference, delivering exceptional intelligence while maintaining competitive latency. Designed for complex reasoning, multi-task agents, and long-horizon planning, gpt-oss-120B brings frontier-level capability to commercial and self-hosted deployments.
Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. gpt-oss-20b is a 21.5B parameter model with Mixture-of-Experts (MoE) architecture, featuring 3.6B active parameters during inference. It's optimized for lower latency and local or specialized use-cases, supporting configurable reasoning depth for agentic applications.
Released in late 2025, Hunyuan OCR is an open-source contribution from Tencent that outperforms many larger proprietary models. It uses a global-to-local architecture with a SigLIP-v2 visual encoder to handle high-resolution inputs and extreme aspect ratios without artificial image splitting.
Mistral 7B is a 7.3B parameter language model celebrated for its efficiency, outperforming larger models on many benchmarks. The v0.3 instruct version is specifically fine-tuned for chat and instruction-following tasks.
NVIDIA Orchestrator is purpose-built for agent workflows and complex task sequencing. It excels in planning, structured reasoning, autonomous execution, and coordinating multiple tools or APIs. With deep GPU-level optimization, it delivers superior throughput and low latency in enterprise automation scenarios.
Nemotron 3 Nano 30B-A3B is NVIDIA’s flagship open reasoning model using a hybrid Mamba-2 + Transformer Mixture-of-Experts architecture. Although it has 31.6B total parameters, only 3.2B are active per forward pass, delivering significantly higher throughput while maintaining state-of-the-art reasoning accuracy.
Qwen-Image-Edit is a 20B multimodal diffusion model for advanced image editing and transformation. It performs precise text-guided edits, inpainting, and style modifications while preserving visual fidelity and layout.
Qwen3-Coder-30B-A3B-Instruct is a sparse Mixture-of-Experts (MoE) model with around 30.5B total parameters (3.3B active per inference), 48 layers, supporting extremely long context (native 262,144 tokens — extendable to 1M in some deployments).
Qwen3-VL is a vision-language instruction-tuned model capable of understanding text and images. It supports streaming, OCR, and rich multimodal conversations.
Qwen3-VL-30B-A3-Instruct is a large-scale, high-capacity vision-language instruction model designed for advanced multimodal reasoning. It delivers significantly stronger visual understanding, OCR accuracy, document reasoning, long-context comprehension, and agent-style interactions compared to smaller Qwen-VL variants.
This model generates and edits images from text prompts using a Latent Diffusion framework. It leverages two fixed, pretrained text encoders — OpenCLIP-ViT/G and CLIP-ViT/L — to understand and translate textual descriptions into visual representations.
Whisper V3 delivers high-accuracy speech recognition across 99 languages. Ideal for transcription, subtitles, and accessibility.
Z-Image Turbo is built for ultra-fast image generation, ideal for low-latency workflows and real-time creative tasks. It enables high-quality output with extremely few sampling steps.
No models match your search. Try a different keyword or category.
Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.
"Qubrid AI reduced our document processing time by over 60% and significantly improved retrieval accuracy across our RAG workflows."
Enterprise AI Team
Document Intelligence Platform