# What if 90% of your prompt is content you can't control?

Discover how Brex's audit agent handles messy, real-world context, and why good AI products give judgment a structure to operate inside, not a script to follow.

**URL Source:** https://www.brex.com/journal/articles/what-if-you-cant-control-your-prompt

---

What if 90% of your prompt is content you can't control?

Not all context is equal

During testing, one of our security engineers put a magic model refusal string into an expense memo and the agent refused to audit the expense at all.

That failure was useful because it showed us that the problem was not just the agent's ability to read documents. It was deciding how different kinds of context should be allowed to influence the decision.

At Brex, we have been building agents that review employee expenses, gather the surrounding evidence, compare it against policy and audit standard operating procedures (SOPs), and decide what should happen next.

Policy tells the system what is allowed. The SOP tells it how to handle a case when something looks wrong.

To reach a decision, the agent has to reason over receipts, memos, calendar events, web searches, policy documents, SOPs, prior cases, merchants, and user profiles. The difficult part is deciding how each of those sources should influence the outcome.

Building audit agents taught us that the hard part is in how the decision is structured. The system needs to know what counts as evidence, what should guide the outcome, what needs extra handling, and when a judgment is ready to become a real outcome.

**Inputs for the audit agent**

```json
{
  "_key": "fd993da22eba",
  "_type": "imageWithCaption",
  "altText": "Agent inputs chart",
  "asset": {
    "_type": "media",
    "disableVideoLoop": false,
    "media": {
      "_type": "bynder.asset",
      "aspectRatio": 0.56375,
      "datUrl": "https://brand.brex.com/transform/78e4ced6-dcf4-4571-b9ac-73c576b91945/agent-inputs",
      "databaseId": "78E4CED6-DCF4-4571-B9AC73C576B91945",
      "description": null,
      "id": "KEFzc2V0X2lkIDc4RTRDRUQ2LURDRjQtNDU3MS1COUFDNzNDNTc2QjkxOTQ1KQ==",
      "name": "agent inputs",
      "originalUrl": null,
      "previewImg": "https://brand.brex.com/asset/78e4ced6-dcf4-4571-b9ac-73c576b91945/thumbnail/webimage-agent-inputs",
      "previewUrl": "https://brand.brex.com/asset/78e4ced6-dcf4-4571-b9ac-73c576b91945/thumbnail/webimage-agent-inputs",
      "thumbnailUrl": "https://brand.brex.com/asset/78e4ced6-dcf4-4571-b9ac-73c576b91945/thumbnail/thul-agent-inputs",
      "type": "IMAGE",
      "videoUrl": null
    },
    "mediaStyle": {
      "_type": "mediaStyle",
      "borderRadius": "0px"
    },
    "play": "auto",
    "showVideoControls": false,
    "thumbnail": {
      "_type": "bynder.asset",
      "priority": false
    }
  },
  "caption": "The model does not receive one neat instruction. It receives a pile of inputs with very different levels of trust and authority."
}
```

Different inputs need different handling

The challenge was making those differences real inside the system, so the model would actually treat each source with the right weight instead of seeing everything as just more context. We stopped dropping everything raw into the same prompt. Policy and SOPs stayed as reference material. Expense and case records became the factual state of the case. Memos, receipts, and web search each had to be processed before they were allowed to influence the main reasoning loop.

A memo might explain an expense, but it is not proof. A note saying "the CEO approved it" is not very helpful on its own if the system cannot really validate that claim. This forced us to treat memos as both business context and an attack surface. Receipts needed different handling too. They are high-value evidence, but they are also noisy and easy to manipulate, so what mattered was extracting the facts needed for the decision rather than handing the whole receipt interpretation to the rest of the system.

Web search also needed strong controls. It is useful for validating outside facts, but it also creates a path for private information to leak out through url parameters. We added defenses around what could be sent to search: filtering suspicious long URLs, stripping risky query parameters, and keeping sensitive case details out of requests. Those inputs were all valuable, but they could not just be dropped raw into the main reasoning loop.

Production agents need clear system boundaries

Once we knew those inputs could not all enter the system in the same way, the next question was scope. How much context and how many tools should any one agent get at once?

Our earlier audit-agent versions had broader scope: more tools and more context handed to the agent at once. On the surface, that sounded better. But when we looked at the results, the extra scope was not improving performance. It mostly made the agent harder to control and less predictable.

We got better results by keeping the scope closer to the task. A useful exercise was to put ourselves in the agent's shoes and ask what the ideal set of inputs and tools would be for that job, and nothing more. From there, we watched executions and kept refining the inputs, outputs, and tool descriptions until the behavior became easier to steer.

**Scoping each step from input to outcome**

```json
{
  "_key": "790c72f0f23d",
  "_type": "imageWithCaption",
  "altText": "Case outcome",
  "asset": {
    "_type": "media",
    "disableVideoLoop": false,
    "media": {
      "_type": "bynder.asset",
      "aspectRatio": 0.4275,
      "datUrl": "https://brand.brex.com/transform/311e969a-d5da-4c1f-b5b1-85276ac4760b/case-outcome",
      "databaseId": "311E969A-D5DA-4C1F-B5B185276AC4760B",
      "description": null,
      "id": "KEFzc2V0X2lkIDMxMUU5NjlBLUQ1REEtNEMxRi1CNUIxODUyNzZBQzQ3NjBCKQ==",
      "name": "case outcome",
      "originalUrl": null,
      "previewImg": "https://brand.brex.com/asset/311e969a-d5da-4c1f-b5b1-85276ac4760b/thumbnail/webimage-case-outcome",
      "previewUrl": "https://brand.brex.com/asset/311e969a-d5da-4c1f-b5b1-85276ac4760b/thumbnail/webimage-case-outcome",
      "thumbnailUrl": "https://brand.brex.com/asset/311e969a-d5da-4c1f-b5b1-85276ac4760b/thumbnail/thul-case-outcome",
      "type": "IMAGE",
      "videoUrl": null
    },
    "mediaStyle": {
      "_type": "mediaStyle",
      "borderRadius": "0px"
    },
    "play": "auto",
    "showVideoControls": false,
    "thumbnail": {
      "_type": "bynder.asset",
      "priority": false
    }
  },
  "caption": "Better performance came from the right scope for each step, not the biggest possible context window or tool surface."
}
```

Reviewable outputs make the system safer

Once the agent became easier to steer, the next question was how to evaluate its answers. Our audit agents can take actions with financial consequences, like locking cards, rejecting reimbursements, or changing spend limits. That means trusting the LLM is not enough.

We treated model output as a candidate for review, not a final outcome. In practice, that meant saving a draft decision that went through deterministic checks and a reviewer agent before it could become a real case outcome.

**Model output review**

```json
{
  "_key": "b680d8462160",
  "_type": "imageWithCaption",
  "altText": "Revise draft chart",
  "asset": {
    "_type": "media",
    "disableVideoLoop": false,
    "media": {
      "_type": "bynder.asset",
      "aspectRatio": 0.4275,
      "datUrl": "https://brand.brex.com/transform/0151070e-a455-49da-964b-0a6fcd57db57/real-case-outcome",
      "databaseId": "0151070E-A455-49DA-964B0A6FCD57DB57",
      "description": null,
      "id": "KEFzc2V0X2lkIDAxNTEwNzBFLUE0NTUtNDlEQS05NjRCMEE2RkNENTdEQjU3KQ==",
      "name": "real case outcome",
      "originalUrl": null,
      "previewImg": "https://brand.brex.com/asset/0151070e-a455-49da-964b-0a6fcd57db57/thumbnail/webimage-real-case-outcome",
      "previewUrl": "https://brand.brex.com/asset/0151070e-a455-49da-964b-0a6fcd57db57/thumbnail/webimage-real-case-outcome",
      "thumbnailUrl": "https://brand.brex.com/asset/0151070e-a455-49da-964b-0a6fcd57db57/thumbnail/thul-real-case-outcome",
      "type": "IMAGE",
      "videoUrl": null
    },
    "mediaStyle": {
      "_type": "mediaStyle",
      "borderRadius": "0px"
    },
    "play": "auto",
    "showVideoControls": false,
    "thumbnail": {
      "_type": "bynder.asset",
      "priority": false
    }
  },
  "caption": "Model output becomes reviewable work product instead of an immediate side effect."
}
```

That separation gave us much better control over the system. We had a place to run deterministic checks, record disagreement, retry safely, inspect citations, and understand why a decision was made before it turned into durable state.

It also made the system much easier to inspect and measure. We could see the draft decision, the evidence behind it, and the review path it went through. Because we kept snapshots of each iteration as the output was refined, we could also see how answers changed over time and use that history for continuous improvement. That made it possible to track common failure patterns, see where decisions were getting sent back, and measure how often the main agent and the reviewer agreed over time.

The workflow makes sure each run finishes cleanly

The agent has a fair amount of freedom during the first pass. It can investigate the expense, save a draft recommendation, revise that draft after feedback, turn an approved recommendation into a real audit case, or conclude that no case is needed.

The workflow then looks at how that run ended and decides whether the job is actually done. If the agent created a real case, the workflow can move on to the next steps. If the agent reached an approved no-case conclusion, the workflow can end cleanly. If the agent stopped with a draft that still needed changes, forgot to turn an approved recommendation into a real case, or finished without recording any final outcome at all, the run is still incomplete.

**Checking for a complete outcome**

```json
{
  "_key": "9dddf7099e0a",
  "_type": "imageWithCaption",
  "altText": "Checking for complete outcome chart",
  "asset": {
    "_type": "media",
    "disableVideoLoop": false,
    "media": {
      "_type": "bynder.asset",
      "aspectRatio": 0.49625,
      "datUrl": "https://brand.brex.com/transform/1f1c30e0-8860-4f87-8035-3fbffc524cab/run-another-pass",
      "databaseId": "1F1C30E0-8860-4F87-80353FBFFC524CAB",
      "description": null,
      "id": "KEFzc2V0X2lkIDFGMUMzMEUwLTg4NjAtNEY4Ny04MDM1M0ZCRkZDNTI0Q0FCKQ==",
      "name": "run another pass",
      "originalUrl": null,
      "previewImg": "https://brand.brex.com/asset/1f1c30e0-8860-4f87-8035-3fbffc524cab/thumbnail/webimage-run-another-pass",
      "previewUrl": "https://brand.brex.com/asset/1f1c30e0-8860-4f87-8035-3fbffc524cab/thumbnail/webimage-run-another-pass",
      "thumbnailUrl": "https://brand.brex.com/asset/1f1c30e0-8860-4f87-8035-3fbffc524cab/thumbnail/thul-run-another-pass",
      "type": "IMAGE",
      "videoUrl": null
    },
    "mediaStyle": {
      "_type": "mediaStyle",
      "borderRadius": "0px"
    },
    "play": "auto",
    "showVideoControls": false,
    "thumbnail": {
      "_type": "bynder.asset",
      "priority": false
    }
  },
  "caption": "The workflow is checking for a complete outcome, not just whether the agent produced a draft."
}
```

That ended up being one of the most useful parts of the system. It meant the workflow could catch incomplete runs, retry when something went wrong, and keep the rest of the process from moving forward on a half-finished decision. It also gave us a much clearer picture of what happened in each execution: whether the agent found no issue, created a real case, or simply did not finish the job.

Simulation shows whether the system actually holds up

Once those pieces were in place, we could start testing the system the way people would actually use it and the way they might try to break it. We could simulate manipulative memos, risky web-search scenarios, noisy receipts, and other edge cases, then see whether the agent still followed the hierarchy and safeguards we intended.

That matters because these failures rarely show up in a single clean example. Simulation lets us see whether the system keeps making the same kinds of decisions across many runs, where it starts to drift, and which protections are actually doing work.

It also connects naturally to our simulation article, [How we test our agents: by committing fraud](https://www.brex.com/journal/articles/simulation-testing-ai-audit-agent).

Be opinionated on the right things

Building a system like this requires being opinionated on the right things. When most of the context comes from outside the system, you need clear authority boundaries, reviewable drafts, tool access that matches the job, a workflow that can tell whether a run actually finished cleanly, and simulation that shows where the structure starts to break down under pressure.

Good AI products create value by handling messy real-world context, not by forcing people into a tiny set of allowed paths. The goal is not to strip judgment out of the product. Audit work is valuable because the system can reason across messy context that would otherwise require a human to read everything by hand. The goal is to give that judgment a structure it can operate inside.

That is the real answer to the title question. When most of your prompt is content you cannot control, the work shifts to deciding how that content enters the decision, how the decision is reviewed, and how the system recovers when a run does not end cleanly.  


## Related Articles

### [B2B payment strategies](https://www.brex.com/journal/b2b-payment-strategies)

Get expert advice on how to use ACH transfers, wires, and cards to improve efficiency.

### [MMFs vs mutual funds](https://www.brex.com/journal/taking-a-closer-look-at-money-market-funds)

Eyeing a yield offer? Read this first to avoid hidden risks and learn how to earn yield while maintaining liquidity.

### [The founder's back-office survival guide: 10 things you shouldn't ignore](https://www.brex.com/journal/the-founders-back-office-survival-guide)

Discover 10 essential back-office moves for founders. See how Brex and Gusto automate payroll, cash flow & spend controls so you can focus on growth.

### [T&E](https://www.brex.com/journal/challenges-with-business-travel-solutions)

Many T&E providers that began as travel tools are still trying to catch up on the expense side.

### [See how Pilot grows faster with the Brex card’s built-in expense software.](https://www.brex.com/resources/customer/pilot)

See how Pilot provides bookkeeping, tax prep, CFO services, and financial operations to 1000+ startups ranging from pre-seed through Series C. 

### [Atlas announcement](https://www.brex.com/journal/brex-partners-with-stripe-atlas)

Founders can now access banking services via a Brex business account immediately after incorporating through Stripe Atlas.