System architecture & improvement opportunities

Aman Khan explains why AI products sometimes fail spectacularly, and why that might be intentional. He maps out how Bolt actually works: your request becomes a system prompt that triggers reasoning about what to build. You then have context or retrieval - “RAG is really, that is literally RAG here,” which references the huge system prompt, maybe a huge codebase that needs to be chunked up and pulled into the context. Then you have generated code, which gets plumbed into the terminal to render and then deploy. But evals are missing from this flow. “You might be hearing this - evals - quite a bit,” Aman notes. “You have the CPOs of OpenAI and Anthropic saying the same thing, that evals are really important.” Evals are checks that ensure system changes don’t break downstream functionality. When Bolt’s creators modify their system prompt, they run test cases asking: How do I know I’m actually improving versus potentially making it worse? The lack of real-time evals explains why I got three different results from the same prompt - one worked perfectly, one did nothing, and one threw an import error where the LLM hallucinated a non-existent package. “It’s very possible for you to build an AI prototyping tool that never really has errors,” Aman explains. “That experience might take a lot longer to generate almost close to perfect end code, but that’s probably not the end experience that the Bolt developers were optimizing for.” Bolt chose speed over perfection. They wanted to “single shot, generate code, run it, and execute it.” They could add runtime checks ensuring no errors reach users, but that would slow the magic. Instead, they optimize for showing things immediately versus running evals in real time. This is the trade-off every AI PM faces: Do you ship fast and let users iterate through occasional errors, or do you validate everything and make them wait? Bolt bet on speed. ➡️ The absence of real-time evals isn’t a bug - it’s a product decision. Understanding when to prioritize speed over accuracy (and when not to) is crucial for building AI products users actually want. Check out Aman’s course (not sponsored, he’s just legit).

Let’s build together

If you’re reading this, you’re in the top 10% of people pushing the limits of AI at work. To get your entire team on board, I offer build sprints for individuals and companies who want hands-on implementation and adoption.

Welcome!

Put LLMs to work

From using AI to building AI

Build AI automations

System architecture & improvement opportunities

Let’s build together

Get notified about future posts

Welcome!

Put LLMs to work

From using AI to building AI

Build AI automations

​Let’s build together

​Get notified about future posts

Let’s build together

Get notified about future posts