
Google Cloud and NVIDIA are partnering to deploy agentic AI capabilities on-premises. This will enable enterprises with strict security requirements to leverage Google's Gemini models while maintaining data sovereignty and security compliance.
Key Points:
- Gemini models will run on-prem using Google Distributed Cloud and NVIDIA Blackwell hardware
- Confidential computing ensures sensitive enterprise data and model prompts stay protected
- GKE Inference Gateway and Cluster Director support large-scale agentic AI, improving AI performance and reducing serving costs
For enterprises operating under strict regulatory or data sovereignty requirements — think healthcare, finance, and government — this announcement is a big deal. Until now, these organizations couldn’t tap into the full capabilities of agentic AI from the leading labs due to cloud-only deployment models and security concerns. This new setup changes the equation. With NVIDIA Blackwell HGX and DGX systems running Gemini behind their own firewalls, companies can now build AI agents that not only understand but reason, act, and adapt — all without compromising on privacy or compliance.
NVIDIA’s confidential computing stack plays a key role here. It locks down both the prompts sent to the Gemini models and the sensitive data used in fine-tuning, shielding them from prying eyes — even from the cloud provider itself. As Sachin Gupta, VP of infrastructure at Google Cloud, put it: this is about “innovating securely without compromising on performance or operational ease.”
This isn’t just about securing inference, though. It’s also about infrastructure readiness. Google Cloud rolled out several enhancements to Google Kubernetes Engine (GKE) designed specifically to support large-scale AI workloads. GKE’s new Cluster Director (formerly Hypercompute Cluster) enables massive GPU and TPU clusters to operate as single compute units, while the GKE Inference Gateway provides intelligent routing and load balancing to reduce latency and cost. It integrates tightly with NVIDIA Triton and NeMo Guardrails to enforce security and governance around model usage.
Agentic AI systems — which differ from traditional models by being able to make autonomous decisions and carry out multi-step tasks — are fueling next-generation applications across sectors. Think financial tools that proactively prevent fraud, or customer support agents that escalate and resolve issues without human intervention.
Google is also investing in observability and scaling tools for agentic AI. They announced plans to integrate NVIDIA Dynamo, an open-source library for reasoning model scaling, and revealed that RayTurbo — an optimized version of the Ray framework co-developed with Anyscale — will come to GKE later this year for faster, more efficient AI pipelines.
For enterprises looking to deploy agentic AI applications while maintaining strict control over sensitive data, this collaboration means they may no longer need to choose between advanced AI capabilities and regulatory compliance. They can now tap into Google’s leading multimodal Gemini models and NVIDIA’s performance-optimized AI chips — while keeping everything local, secure, and compliant.