Как контролировать работу AI агентов

You’re building AI agents? Great. You’re also probably building a ticking time bomb. Most of you have no fucking clue how to control what these things actually *do*.

This isn’t for the academic purists or the ‘everything must be formally verified’ crowd. They’ll hate this. This is for the builders, the engineers, the people actually shipping AI products who need *real* control, not just theoretical bullshit.

You’re letting these agents run around with full access, making decisions, hitting APIs, changing files. “Oh, it’s fine, I trust my prompt!” LOL. That’s not control, that’s wishful thinking. It’s a recipe for disaster, data breaches, and ‘oops, my agent just wiped the production database’ moments. You’re treating them like black boxes, hoping for the best. That’s not engineering; that’s gambling with your business.

Here’s the cold, hard truth: controlling an AI agent is harder than controlling some random script in your prod environment. If you’re not thinking like a security architect, you’re already fucked. The fundamental mistake? You’re blurring the lines between *what* an agent is told to do and *how* it actually does it. Big mistake. Huge.

The only sane way to manage this chaos? Separate task definition from execution. Period.

Think of your project as a ‘shared state’ – everything agents touch, everything they need, lives here. Text files, code, whatever. For databases, it’s trickier, but the principle holds.

Layer 1: Local Agents.
Put your whole damn project under Git. Seriously. If you haven’t, stop reading and do it now.
Every change, every file modification by an agent? Git diff. Instant transparency.
Want to know what your agents did all day? Ask another agent (or Claude) to summarize the diffs.
Want an extra layer of sanity? Build a ‘watchdog’ agent. Give it rules: ‘don’t do X, don’t touch Y.’ Let it monitor diffs and scream if something’s off.
Real auto-tests? Even better. But Git is your baseline.

Layer 2: External Agents (the ones hitting the real world).
This is where shit gets real. No perfect solution, but here’s the practical approach:
Split the task. One skill defines the task, another executes it.
Example: ‘Issue invoice X to Y.’
Skill 1: Creates a simple markdown file in an ‘inputs/’ directory. Details: who, what, amount. Commits to Git. NO real-world operations. No credentials.
Skill 2: Picks up that file from ‘inputs/’. Executes the actual operation (e.g., calls the bank API). Dumps result in ‘outputs/’. Commits to Git.
Crucially: Skill 2 sees ONLY that one file. No other project context. No sensitive info. It’s a dumb, focused executor.
Physically separate them. Different projects, different Linux users, whatever it takes. Skill 2 gets only the minimal credentials it needs for *that specific task*.

What does this give you? Transactional integrity. Atomic operations. Clear separation of concerns.
Did Skill 1 fail? No input file.
Did Skill 2 fail? Input file exists, no output file.
Success? Input and output files exist. Ask an agent to compare and verify.
At the end of the day, you have a full, auditable trail in Git. Transparent as hell. No more ‘magic’ agent actions.

Opacity is the enemy of any AI architect. Stop building black boxes. Make every step atomic, log it outside the agent (in Git), and you’ll actually sleep at night. This isn’t just good practice; it’s your unfair advantage against the chaos merchants.