LLM
A computer algorithm for playing mad libs that’s a Magic 8-ball with a high IQ and the short-term memory of a hamster. When I input words to the algorithm, it fills in the blank, then promptly forgets what just happened. So, each round I feed it the entire conversation from the top.”Chat” LLM
Let’s modify the Magic 8-ball so it thinks we’re writing a screenplay about a chatty helpful robot. Bonus: users will think it’s talking to them.Context
The input mad libs text given to the LLM, so it can fill in the blank.Chat client (i.e. ChatGPT/Claude on Web/Desktop)
Because the LLM has the short term memory of a hamster, we need to send it the entire chat history every turn, starting from the top. We’ll also need to store this ever-growing chat text file somewhere.Context window
The LLM’s size limit on the chat text file.Context rot
The LLM gets dumb and forgetful way before reaching the context window.AI coding agent
I want the LLM to edit text files on my computer. Unfortunately the LLM is a toothless magic 8 ball that just spits out mad libs text. Let’s add some sides to the eight-ball die that are common computer commands: read, write, delete, or search_and_replace. When the LLM toothlessly prints out any of those words, the chat client will run those commands on my computer. When the command is done running, the chat client feed the results (e.g. success, error, or results, etc) back as a chat message. That way, the LLM knows what resulted (this is pretty similar to a human reporting back and hitting enter in ChatGPT). The LLM probably won’t finish the job with one tool call, so we’ll let it keep trying in a loop until it perceives the task as done.MCP server
Editing text files is nice, but my docs and data are in SaaS. My SaaS tools already have an API, so it’d be really nice if the AI agent could just use that. In addition to read/write/replace etc. it also can call those API endpoints.MCP client
A fancier name for “chat client” because now it can call SaaS APIs.Tool declaration
Unlike universal read/write/replace type tools, the LLM has never seen this specific SaaS or its special snowflake of an API. The chat client should start each chat thread by briefing the LLM.MCP protocol
It’d be nice if “how to brief an LLM on your API” was standardized. That way, each SaaS company would only have to build one “connector” and it would work with all the chat clients. Like USB or Bluetooth.RAG (Retrieval Augmented Generation)
This user is asking for very obscure madlibs. Let’s give the LLM a tool to search + pull in documents (i.e. asking an intern to google before responding). Let’s first try basic text search, because vector embeddings/semantic search is a schlep, expensive, and not always great.AI agent
Just a chat thread. It’s actually a group chat thread, because we’re calling tools. Still, “agent” = chat thread.Memory
LLMs are stateless, so every new chat thread (I mean, “agent”) starts completely from scratch. We should curate a summary of what we’ve already discussed and tried, and add it at the top of each new chat thread.Slash commands
I’m tired of writing the same prompt over and over, so I’ll save it in a text file. I’ll give it a nickname and whenever I type/nickname, my chat client (Claude Code, Cursor, etc.) will add it to the chat thread.
Claude Skills
I’m so lazy I don’t even want to type/nickname. I’d rather the LLM decide when that text-file-with-a-prompt is useful, and tell the chat client to add it to the chat thread for me.
For the LLM to know this exists, the chat client will start each chat thread by briefing the LLM.
Sub-agents
One of the steps to achieve our agent’s goal is long-winded. We don’t want the details of how it got done, just the bottom line, so it doesn’t clutter up our context window. Let’s enable our LLM to start a new chat thread on the side. When done, the side chat thread will send only the result to the original chat thread. (This is the same as a tool call, with the job is being done by an LLM.)Context engineering
Shit, we’re putting a lot of tool declarations, documents, summaries, and skills at the top of the chat thread, and we’re hitting context rot before the chat even starts. We should keep it short.I recently worked with the head of a well-known AI product, and watched her tell her team, “Listen, it’s just a bunch of text files.” Couldn’t have said it better.
