AI Runtimes

NVIDIA NIM Connector

Integrate NVIDIA NIM (NVIDIA Inference Microservices) with OnPremiseAgent for optimized GPU-accelerated model serving. NIM provides pre-optimized containers for popular models with TensorRT-LLM acceleration, delivering maximum performance on NVIDIA hardware.

API Key

Get Started Talk to Sales

Auth

API Key

Purpose-built capabilities

Everything you need to integrate NVIDIA NIM into your on-premise agent workflows.

TensorRT-LLM

Hardware-optimized inference with TensorRT-LLM for maximum throughput on NVIDIA GPUs.

Pre-Built Containers

Deploy pre-optimized NIM containers for Llama, Mistral, and other popular models.

Multi-GPU Scaling

Automatic tensor parallelism across multiple GPUs for serving large models.

OpenAI Compatible

Industry-standard OpenAI-compatible API for seamless integration.

Pull NIM Container

Pull the NVIDIA NIM container for your chosen model from NVIDIA NGC catalog.

Get Started

Key Benefits

Why enterprises choose this connector

Maximum inference performance on NVIDIA hardware
Pre-optimized containers — no manual optimization needed
Multi-GPU tensor parallelism for large models
Enterprise support from NVIDIA

Maximum Performance

Deploy models with TensorRT-LLM optimization for the lowest possible latency on NVIDIA hardware.

Large Model Serving

Serve 70B+ parameter models across multiple GPUs with automatic tensor parallelism.

Enterprise Standardization

Standardize on NVIDIA NIM for all AI inference with enterprise support and SLA guarantees.

Frequently Asked Questions

Do I need an NVIDIA AI Enterprise license?

NIM containers are available through NVIDIA NGC. Some models require an NVIDIA AI Enterprise subscription for production use.

Which models are available as NIM containers?

NIM supports Llama 3, Mistral, Mixtral, and many other popular models with pre-optimized TensorRT-LLM configurations.

Works great with

Combine NVIDIA NIM with these connectors for a complete integration stack.

Coming Soon

vLLM

High-throughput model serving with vLLM for production AI workloads.

AI Runtimes

Coming Soon

Ollama

Run open-source LLMs locally with Ollama for fully air-gapped AI inference.

AI Runtimes

Available

Kubernetes

Orchestrate AI agents as containerized workloads with auto-scaling and self-healing.

Infrastructure

Ready to connect NVIDIA NIM?

Deploy on your own infrastructure with full data sovereignty. Get started in minutes.

Join the Waitlist Schedule a Demo