“Trust But Verify”: AWS’s Vision for Autonomous AI Agents

At AWS re:Invent 2025, Amazon Web Services unveiled what it calls “frontier agents”, a new class of autonomous AI systems designed to work for hours or days without human intervention. Adnan Ijaz, who leads product and design for agentic AI at AWS, spoke with implicator.ai about how teams should think about trusting agents that touch multiple codebases, how Kiro’s automatic model selection works under the hood, and why AWS believes we are moving from incremental efficiency gains to order-of-magnitude improvements in software development.

You described a scenario where a frontier agent fixes an issue across ten repositories at once, learning from one and applying it to all. What is the realistic learning curve for teams to trust an agent working across that many codebases?

Adnan Ijaz:
The way we think about this, and Matt Garman talked about it in his keynote as well, is “trust but verify”. As we build these agentic interactions, we want frontier agents to be autonomous. They can run for days or hours, for however long they are required to solve a problem. But the human remains in control.

Take the Kiro Autonomous Agent. It can make changes across multiple repositories, but it will not check in the code until you do. The human is always empowered to override what the agent did or take a different path. With frontier agents, they are working in the background and doing work on their own, but that does not mean humans cannot see, inspect, or redirect what the agent did.

Key Takeaways

• AWS launched "frontier agents," a new class of autonomous AI that runs for hours or days, breaking down complex tasks without human intervention

• Kiro's auto model selection uses intent detection to route tasks to optimal models, balancing latency, accuracy, and cost automatically

• DevOps Agent found root cause in 15 minutes at Commonwealth Bank, a task that previously took engineers hours

• The "trust but verify" framework: agents work autonomously but never commit code or take final action without human approval

The analogy we use is a new teammate. On day one, they do not know all the answers. You hired somebody really smart and capable, but they still need to learn how your team works. Thirty or sixty days later, they are more integrated. They know your practices and how to think about problems. Kiro and other frontier agents are similar. As you work with them, they accumulate knowledge. That is where the learning and trust come from.

Just like a senior engineer might review a new hire’s output, you can think of a developer asking the agent to solve a problem, the agent spending five hours on it, and then the developer reviewing the result. If they do not like the outcome, they leave a comment. The agent learns from it and applies that feedback next time. You still inspected the work, sent the agent off to do other things, and came back.

So it is a layer on top of frontier models. And if I understood correctly, it does not run out of memory in the way Claude Code sometimes does.

Adnan Ijaz:
Right. All these frontier agents use a frontier model underneath, because there is an LLM at the core. Think of them as having a brain. Internally, we use that parallel. You give them a very complex problem, and they are able to break it down.

Today, if you are using Kiro or competing products, the human is usually the one breaking down the task. You are working with agents, but you are giving them a very narrow task.

With frontier agents, you give the agent a higher-level task. It has a frontier model on the backend and the ability to take the task, break it down, ask clarifying questions if needed, then just go off and do the work. It comes back either with a clarifying question or with the result. Maybe in the case of Kiro it is a pull request. For the DevOps Agent it is incident triage. For the Security Agent it might be the result of penetration testing. You look at the outcome and, if you like it, you proceed to the next step in your workflow.

That is how humans build trust in technology. As the technology gets better and learns from interactions, humans tend to trust it more. But “trust but verify” is the model I think a lot of organizations will apply.

So it is a more structured development flow than vibe coding, for instance.

Adnan Ijaz:
Yes. Vibe coding is great. That is how I have gotten my kids excited about building video games, because they go to Kiro, enter a prompt, and have it build a soccer game. Vibe coding is great for prototyping.

When you are writing production-ready code, you need to be able to go back and see what you did a month ago for a particular feature. Kiro’s spec-driven development gives you that structure. It keeps the fun of working with agents and generative AI. It does not make it boring, but it brings the structure needed to ship production-ready code.

How does Kiro actually make model-selection decisions? Is it task classification, real-time benchmarking, or something else entirely?

Adnan Ijaz:
You are asking about how Kiro’s auto model selection picks the right model for the job. There is something we call internally, and it is recognized in the industry as well, intent detection. When you enter a prompt and say “explain this codebase to me”, Kiro’s auto agent decides which model to route the request to.

That comes from our own internal benchmarking, our evaluations, optimizations we have done to figure out the right model that optimizes latency, accuracy, and cost-effectiveness.

We have gotten really good feedback on our auto mode. Some people call it auto model. We call it auto agent, not to be confused with the autonomous agent we launched today. But to clarify, the auto agent still uses state-of-the-art frontier models underneath. It is not some unknown thing in the background. It is still using Claude Sonnet and other models heavily, based on the task you are asking it to solve.

You said you do not want to put users through the hassle of figuring out which models fit which task. But some developers want that control. How do you balance abstracting away complexity versus giving power users the transparency they expect?

Adnan Ijaz:
Our mental model is we want to give you flexibility and choice.

If you go to Kiro, whether it is the IDE environment or the CLI, you can pick a model explicitly or you can ask Kiro to pick the model through the auto option. If you do not want to use what we are doing in auto, and you want to pick your own model, maybe it is Haiku, Claude Sonnet, Opus, you can do that.

For the long-running autonomous agent, it is different. Because you are not in the loop and the agent might be working for hours doing several things, we optimize for choosing the right model for you. You are not going to be there.

Take the example Matt gave in his keynote. Maybe it is end of day, you ask the Kiro Autonomous Agent or any other frontier agent to solve a problem. It is running in the background while you are asleep. For autonomous agents, we optimize model selection for you. In the IDE and CLI, you can choose auto or pick an explicit model. That is how we balance control and flexibility with actually getting the job done.

What happens if there is a decision to make mid-task? If the model could go down two different routes, does Kiro pause and wait for your input?

Adnan Ijaz:
For the IDE and CLI today, if the model has a question, it comes back and asks you. You are in the loop.

For autonomous agents, they ask a whole bunch of questions upfront. The only time they come back for clarification is if they really do not know which way to go and need more input. By and large, the model is autonomous. It is working on its own.

If a question does need to be asked, it notifies you. Maybe you are working on a GitHub issue. The model has gotten to a point where the work is progressing, but it is critical to have your input. It generates a notification. In GitHub, you will see that and can respond.

We have designed these frontier agents so they can go on for hours working on a problem without blocking on your input. You can always see what they are doing, and you can interject and steer them in a different direction. Autonomy does not mean you do not have control.

Does it always produce documentation as output?

Adnan Ijaz:
It depends on which frontier agent you are using.

For the Kiro Autonomous Agent, the output is a pull request. The pull request contains your tests, the documentation, and the changes it made, depending on the problem you asked it to solve.

For the DevOps Agent, it is the incident triage, root-cause analysis, and related artifacts. For the Security Agent, it could be issues in your code that it identified or the outcome of penetration testing. The output depends on what the agent is doing and what task you asked it to complete.

AWS announced the DevOps Agent for outage recovery today. Commonwealth Bank reported it found root cause in under fifteen minutes instead of hours. What makes an AI agent better at hypothesis generation during incidents than a seasoned SRE?

Adnan Ijaz:
The differentiation is the scale of signals the agent is working with in the background.

A human DevOps engineer probably has high-priority projects they are working on. There is a lot of data coming through, a lot of signals to analyze. Agents are really good at that. They can ingest large amounts of data, look at it, and correlate it. Humans can still do the analysis, but the speed and scale at which an agent operates is unmatched.

By looking at vast amounts of data, correlating them, and coming up with hypotheses, agents give humans a head start. It does not eliminate human judgment. As Matt said in his keynote, maybe there is an on-call issue. A human wakes up in the middle of the night. Instead of trying to figure out twenty different sources, they already have a few hypotheses and a likely root cause. They can quickly proceed, maybe even go back to sleep.

That is where agents are powerful. They analyze large amounts of data, work in parallel, and draw correlations from patterns.

You described frontier agents as autonomous, massively scalable, and long-running. Is this a new product category AWS is defining, or more of a capability within existing tools?

Adnan Ijaz:
It is a new class of agents we are announcing.

These agents are more sophisticated in the sense that they have all the properties you mentioned. They are autonomous, long-running, massively scalable, and they learn from their interactions with humans.

The concept of agentic AI and agents already exists. Kiro uses it, and many other products do as well. But we realized those agents were deficient in those dimensions.

Frontier agents are a whole new class. We laid out the characteristics of what a frontier agent is, and we have launched three frontier agents that match that definition. So yes, it is a new class of agents that AWS is announcing and launching today.

Kiro uses models from Anthropic and potentially others. You also have your own Nova models. What is the philosophy behind which models to prefer?

Adnan Ijaz:
Today Kiro offers the auto agent, which we provide, and Anthropic’s range of models: Haiku, Sonnet, and Opus. We are always looking to add more models from other vendors.

The auto agent I keep talking about is our way to balance latency, accuracy, and cost. We will bring in explicit models from Anthropic, our first-party models, more third-party models. But we are really putting focus on auto mode. You pick auto, and you do not have to worry about it.

Otherwise, as you asked earlier, if you leave it entirely to humans, yes, they want control and flexibility, but it is also a lot of work to figure out whether to use Haiku or something else for every task.

With auto, that is what we are trying to solve. You will see continued expansion of model coverage in Kiro. For instance, Opus 4.5 from Anthropic was announced, and we had it in Kiro the same day. You will also see continued improvement in auto mode. A few weeks ago we launched optimizations that allowed us to pass significant cost savings to customers. For the same amount of credits, they could do more work. That was only possible because of the work we did in auto mode.

Adnan Ijaz is Director of Product and Design for Agentic AI at AWS, leading work on Amazon Q Developer, Kiro, and the company’s frontier agent initiatives. Previously he led product for EC2 Commercial Software Services and helped launch Amazon Linux 2, Bottlerocket, and AWS Systems Manager. Before joining AWS, he spent several years at Microsoft working on Azure HDInsight and Office server products.

❓ Frequently Asked Questions

Q: What's the difference between frontier agents and regular AI coding assistants?

A: Regular AI assistants like GitHub Copilot work in a prompt-response loop where you stay in the conversation. Frontier agents work autonomously in the background for hours or days, breaking down complex tasks themselves. They ask clarifying questions upfront, then run independently until they deliver a complete output like a pull request or incident report.

Q: What's the relationship between Kiro and Amazon Q Developer?

A: Both fall under AWS's agentic AI portfolio led by Ijaz. Amazon Q Developer is AWS's broader AI assistant for code generation and application modernization. Kiro is a newer "spec-driven development" tool with its own IDE and CLI, featuring auto model selection and now the Kiro Autonomous Agent for long-running tasks.

Q: Which AI models power Kiro's auto selection?

A: Kiro currently offers Anthropic's full Claude lineup: Haiku, Sonnet, and Opus, including Opus 4.5 added on launch day. AWS's own Nova models and additional third-party models are planned. The auto agent uses intent detection to route each task to the optimal model based on internal benchmarks for latency, accuracy, and cost.

Q: When will frontier agents be available and what do they cost?

A: The DevOps Agent opened for preview sign-ups on December 2, 2025, with pricing to come when it exits preview. AWS hasn't announced general availability dates or pricing for the Kiro Autonomous Agent or Security Agent. Ijaz mentioned recent auto mode optimizations that reduced costs, letting customers "do more work for the same credits."

Q: How does AWS's approach differ from Microsoft's and Google's AI coding tools?

A: Microsoft launched an SRE Agent in May 2025. Google debuted Antigravity for developers in November. AWS differentiates on autonomy duration, with agents designed to run for hours or days versus session-based tools. AWS also emphasizes multi-model flexibility through auto selection, while competitors typically default to their own models.