Integrate NVIDIA NIM (NVIDIA Inference Microservices) with OnPremiseAgent for optimized GPU-accelerated model serving. NIM provides pre-optimized containers for popular models with TensorRT-LLM acceleration, delivering maximum performance on NVIDIA hardware.
API Key
AI Runtimes
NVIDIA A100, NVIDIA H100, NVIDIA L40S, CUDA 12.x
Coming Soon
Everything you need to integrate NVIDIA NIM into your on-premise agent workflows.
Hardware-optimized inference with TensorRT-LLM for maximum throughput on NVIDIA GPUs.
Deploy pre-optimized NIM containers for Llama, Mistral, and other popular models.
Automatic tensor parallelism across multiple GPUs for serving large models.
Industry-standard OpenAI-compatible API for seamless integration.
Pull the NVIDIA NIM container for your chosen model from NVIDIA NGC catalog.
Deploy models with TensorRT-LLM optimization for the lowest possible latency on NVIDIA hardware.
Serve 70B+ parameter models across multiple GPUs with automatic tensor parallelism.
Standardize on NVIDIA NIM for all AI inference with enterprise support and SLA guarantees.
NIM containers are available through NVIDIA NGC. Some models require an NVIDIA AI Enterprise subscription for production use.
NIM supports Llama 3, Mistral, Mixtral, and many other popular models with pre-optimized TensorRT-LLM configurations.
Combine NVIDIA NIM with these connectors for a complete integration stack.
Deploy on your own infrastructure with full data sovereignty. Get started in minutes.