Tigera: Building Resilient AI Services Using Multi-Cluster Kubernetes
Designing AI platforms on Kubernetes often means balancing performance, isolation, and operational complexity. A multi-cluster architecture can help—but only if it’s designed with clear separation of concerns and consistent guardrails.
Join us as we walk through a practical reference pattern for running AI workloads across multiple Kubernetes clusters using Mirantis k0rdent AI. We’ll show how separating training, inference, and shared services into dedicated clusters helps isolate GPU resources, reduce blast radius, and scale each layer independently. We’ll also demonstrate how k0rdent AI provides a consistent control plane for provisioning clusters, applying governance, and managing lifecycle operations across environments.
We conclude by showing how Calico ClusterMesh and federated services enable secure, private east-west connectivity and consistent service discovery across clusters managed by k0rdent. Using Calico’s zero-trust micro-segmentation and observability, teams can tightly control cross-cluster traffic and confidently validate and troubleshoot network flows.
What you will learn:- A Multi-Cluster Reference Architecture for AI: How to separate training, inference, and shared services to improve isolation, resilience, and scalability.
- Operating at Scale with Mirantis k0rdent AI: How reusable templates and a centralized control plane standardize cluster builds and day-2 operations.
- Governance and Guardrails: Enforcing multi-tenancy, resource quotas, and compliance across clusters.
- Secure Cross-Cluster Connectivity: How Calico ClusterMesh and federated services enable private, east-west communication and service discovery.
- Zero-Trust Enforcement and Observability: Using Calico network policy and flow visibility to control, validate, and and troubleshoot ingress, egress, and east-west traffic across clusters.
Why Attend?
Whether you are just beginning your AI journey or scaling existing GPU-intensive workloads, you’ll walk away with a clear blueprint for an AI-ready platform that is secure by design, compliant, and operationally efficient.