Title

NVIDIA Triton LLM Inference Server for Llama3.2 models

About Or Title

About This Workshop

Desc Long

Leverage the power of NVIDIA GPUs and advanced deep learning frameworks, such as Triton and TensorRT, to gain hands-on experience deploying scalable Llama 3.2 models for large language model inference. This workshop offers a comprehensive introduction to deploying and optimizing AI models using NVIDIA Triton, focusing on key tools and techniques to enhance inference performance and reduce latency. Attendees will explore real-world scenarios using A10 shape standalone or A10 shape within OKE, learning to streamline model management and fully utilize GPU resources for efficient AI deployments.

Workshop Length

1 hour

Outline

Lab 1 - Provision the resources for A10 instance
Lab 2 - Provision the resources for oke

Prerequisites

Administrative access to OCI Tenancy
Ability to spin-up A10 instances in OCI
Ability to create resources with Public IP addresses (load Balancer, Instances, OCI API Endpoint)

Other Workshops you might like

About This Workshop

Youtube Video

Workshop Info

Other Workshops you might like

Footer with links

Links_col_1

Resources

Links_col_2

Partners

Links_col_3

Solutions

Links_col_4

What's New

Links_col_5

Contact Us