NVIDIA NIM: cloud-native tools for Inferencing micro-services

27 Mar

Amongst the many announcements made at NVIDIA’s recent GTC developers’ event in San Jose, was the introduction of NVIDIA NIM, a set of inference microservices that the firm believes will enhance the development of cloud-native applications to take advantage of GPU Generative AI (GenAI) capabilities. Built on the NVIDIA CUDA platform, the company describes NIM as a “containerized inference microservice”, designed to make deployment, management and scalability of AI models easier and more efficient.

Background

Introduced in 2006, NVIDIA’s CUDA compute platform now boasts over 500 million CUDA-enabled GPUs, according to the firm. As the platform has matured, NVIDIA has added a sizeable catalogue of general-purpose compute processor capabilities, parallel computing extensions, accelerated libraries that can be “dropped-in” to applications, as well as cloud-based compute appliances etc. NVIDIA considers CUDA to be a GPU ecosystem in its own right, and arguably it is: applications, libraries, and tools are complemented with a slew of partner services, and collaborative dynamics that generate additional value.

Getting the most out of advanced GPU architectures is not easy, and when combined with workloads as complex and compute-intensive as AI/ML, architectural tools and building blocks are a welcome addition for any developer. For GenAI in particular, the deployment of trained models and the consequent challenge of inferencing - the process of prediction based on supplied data - is the real-time test that users of GenAI services first encounter. Inferencing has to be reliable, accurate, scalable and robust, with low-latency and efficient utilization of compute resources.

While experienced development teams might not be too phased by the requirement to create these inference capabilities, they are always on the lookout for tools that will save time and increase efficiency - especially for large projects. For smaller teams, anything that makes it easier to deploy AI models at scale with the resources they have, could make a significant difference to their ability to take advantage of the power of AI GPUs (if at all).

NVIDIA NIM - what does it bring to the table?

A containerized inference microservice can be described as a self-contained piece of software that packages a trained AI/ML model and its dependencies into a holistic service for performing predictions based on input data. Packaged in this way, and deployed as a service, the software container is designed to provide consistency and portability across environments, simplifying the deployment and management of AI/ML models at scale. Container orchestration specialist Docker and open-source platform Kubernetes are commonly used to manage and deploy these services.

While containers and microservices are not new technologies, and the process of inferencing is not an unheard-of concept outside of AI/GenAI, providing containerized inferencing microservices to take advantage of specific GPU capabilities is less well-known. NVIDIA NIM is the company’s attempt to streamline the development and deployment of GenAI applications across its customers’ (CUDA-enabled) datacentres, cloud infrastructure and GPU-enabled environments via its own optimized, cloud-native microservices. Its containers are pre-built using Triton Inference Server and TensorRT-LLM software with the objective of increasing the speed of data processing, LLM customization, inference and retrieval-augmented generation. In addition, Docker announced its collaboration with NVIDIA to address the ways that it can provide access to the capabilities of NIM’s new inference microservices.

The Quick Tech Take

NVIDIA’s latest hardware announcements will provide powerful infrastructure components for the burgeoning demand for AI/ML model deployments, but the full capability will not be realised without software to take full advantage of the need to roll-out inferencing capabilities that utilize the power of its technology. By releasing NIM, NVIDIA provides its enterprise customers with cloud-native, packaged model deployments that are optimized for the NVIDIA ecosystem. Microservices that are ‘ready to go’ will save on development time and reduce the heavy lift when it comes to scaling-up. This will also appeal to managed service providers (MSPs) and others with NVIDIA infrastructure who are serving the mid-market and SMB sectors, making it possible to develop and deploy AI services in this space without a large development team.

Rory Duncan

NVIDIA NIM: cloud-native tools for Inferencing micro-services

Telefónica Tech: the small but perfectly formed part of Telefónica Group’s FY23

Scale Computing: the dark horse in the race to find a VMware alternative?