Google Gemini Managed Agents: Your Execution Loop Now Lives in Google's Sandbox

Google's Managed Agents put the stateful execution loop in Google's hands — your agent runs in an isolated Linux sandbox, your tools stay where they are. Here's the architecture, the tradeoffs, and when it's actually worth switching.

Google Gemini Managed Agents Hit Public Preview — And the Hosting Model Is Worth Thinking Through

On June 30, Google pushed Managed Agents into public preview on the Gemini API. The short version: instead of hosting your agent's execution loop yourself, you push your tool definitions to Google and they run the stateful loop in an isolated Linux sandbox on their infrastructure.

That's a meaningful architectural shift, not a minor API addition. Here's what it actually changes.

The Loop Has Always Been Your Problem

If you've built an agentic pipeline before, you know the shape of it:

while not done:
    response = gemini.generate(history + tools)
    if response.tool_call:
        result = run_tool(response.tool_call)
        history.append(result)
    else:
        done = True
        return response.text

Your server holds that loop. Your server holds history. Your server manages state between turns, handles retries when tools fail, and keeps track of what the agent knows. If your server dies mid-session, the agent's context dies with it.

This is fine for short agent runs. For anything that spans minutes — or longer — you're building session persistence, checkpointing, and resumability yourself. That's real infrastructure work that most teams underestimate until they're debugging a half-completed multi-step workflow at 2am.

What Managed Agents Change

With Managed Agents, Google takes ownership of that loop. You define your agent (system prompt, model, tools it can call), expose your tools as HTTPS endpoints, create a session, and send user messages. Google drives the execution.

import google.generativeai as genai

agent = genai.ManagedAgent(
    model="gemini-2.5-pro",
    system_instruction="You are a data analysis assistant.",
    tools=[
        genai.Tool(
            name="query_database",
            description="Query the analytics database",
            endpoint="https://your-api.example.com/tools/query"
        )
    ]
)

session = agent.create_session()
response = session.send_message("Summarize last week's signups by region")
print(response.text)

Google spawns an isolated environment per session, drives the tool-call loop, calls your endpoints as needed, and maintains state between send_message calls. Sessions persist until you terminate them or they hit the configured idle timeout.

The Architecture

The sandbox is a real Linux container. Your tool endpoints are regular HTTPS webhooks — any language, any framework, hosted wherever you want. The agent can chain tool calls and reason between them within the session boundary. Nothing about your backend changes; you're just exposing existing logic as callable HTTP endpoints.

What You Don't Get to Touch

The sandbox is isolated by design. There's no direct database connection from inside the sandbox — every external interaction goes through your registered tool endpoints. Files the agent creates during execution don't persist between sessions by default. State that needs to survive is your job to push somewhere durable via a tool call.

This is actually a good constraint. It forces a clean separation: the agent handles reasoning and orchestration, your endpoints handle data and side effects. You don't end up with agents that have ambient database access and undefined blast radii. Every external interaction is an explicit, logged tool call.

Session Lifecycle

Sessions are cheap to create and persist across multiple messages. The agent accumulates context within a session. Long-running agents can stay alive between user interactions without you polling or keeping a connection open on your side — you just send the next message when you have one.

The Tradeoffs

**You get:** session persistence without infrastructure, isolated execution, built-in retry handling on tool calls, and a Google-managed environment that scales to zero when nothing is running.

**You give up:** control over the execution environment, the ability to introspect or interrupt mid-execution, and portability. Your agent's stateful loop is now tightly coupled to Google's infrastructure. Migrating it to another provider means rebuilding the orchestration yourself.

**Latency:** Each tool call crosses the public internet to your endpoint and back. For agents that make 10+ tool calls per turn, this adds up fast. Tools co-located in Google Cloud (Cloud Run, Cloud Functions) will bring this down significantly — keep that in mind if you're designing for throughput.

**Cost model:** You pay for model tokens (same as always) plus a per-session runtime fee. Short-lived agents with focused tool use will often be cheaper than equivalent self-hosted setups once you factor in the EC2/GCE instance running 24/7. Long-lived idle sessions — think a customer support agent that stays alive across a multi-hour conversation thread — accumulate runtime cost even when doing nothing, so design your idle timeouts accordingly.

Compared to the Alternatives

The closest comparison right now is the Cloudflare Agents SDK, which gives you a durable Agent class running on Workers with built-in state via Durable Objects. The execution model is similar: the platform owns persistence. The difference is control. Cloudflare's model has you deploy and own a Worker — you write the loop, you control the execution logic, you can add middleware, logging, and custom retry strategies. Gemini Managed Agents have you deploy tool endpoints and let Google drive the loop.

For self-hosted approaches (LangGraph, a homegrown loop on your own infra), the trade is the same: more control, more ops work.

If you're already using Gemini and want zero orchestration infrastructure, Managed Agents is the straightest path. If you want more control over execution logic, need to support multiple model providers, or have strict data residency requirements that make Google-hosted execution a non-starter, a framework-based or self-hosted approach gives you more room.

Is This Worth Switching For?

For new agent projects already on Gemini: probably yes, at least to prototype. The reduction in orchestration boilerplate is real — you can have a working stateful agent pipeline in about 30 lines of Python without touching session storage, retry logic, or connection management.

For existing agents in production: the migration cost (restructuring your loop, refactoring your tool logic into HTTP endpoints, adjusting your state management assumptions) is probably not worth it unless you're hitting real pain points around persistence or operational overhead. Working software with boring infrastructure beats a cleaner architecture that requires a two-week migration.

The pattern Google is betting on — register tools, we drive the loop — is the right long-term direction for production agents. The question is whether you trust Google to be the right infrastructure provider for that. That answer depends on your existing stack, your compliance requirements, and how much the vendor tie-in matters to you.

The public preview is open now. Worth trying against something real before committing.