In the last post, we looked at installing an MCP server and getting it work with Claude Desktop. This post, we’re going to look at where MCP comes from and an earlier, similar technology you may be familiar with: “functions” in LLMs. This was introduced nearly two years ago1; you may find MCP quite similar in function (no pun) and scope. Let’s do a quick refresher on functions to see how we’ve gotten here.
Conjunction Function Junction, What’s your Function?
It will get you there, if you’re very careful2.
In the era of GPT-3, the interaction pattern with LLMs was pretty simple, consisting of these steps:
The input to the LLM is assembled:
The System Prompt that gives the overall scope of the chatbot (“You are a magic cat who can answer questions about catching mice.”)
Any examples (shots) or other information the application wishes to provide in advance
Some/all/none3 of the chat history so far (if any), consisting of user questions and previous answers, providing context.
Finally, the latest question from the user
This is sent to the LLM, which responds
The question and respond are appended to the history (for the next question)
The response is displayed to the user
And then we’re done until the user asks the next question.
The step 1b — providing additional information — is a tricky one, because the application has to predict what the LLM is going to need. In very constrained situations, you can do a good job of anticipating the LLM’s needs, but in more open-ended applications, it’s much harder4.
But — I know this sounds crazy — what if the LLM could just tell you what it wants, mid-conversation? Then you wouldn’t have to try to guess what it needs in advance, you could just let it tell you and have you go get what it wants.
As I first discussed a year and a half ago (time flies!), functions were introduced to do just that. The way functions work is that instead of returning an answer to you, the LLM returns an indication of which function (out of the ones you told it about) it wants you to call along with the arguments, and then you app would call the function return the results back to the LLM to continue work.
How does it know what function to call? How does it even know what functions there are? You tell it, when you make every call to the LLM, what functions there are, what they’re about, and what to pass to them.
In that much earlier series of articles, I released a small Python project that retrieved stock quotes. The code that defined what functions were available to GPT looks like this:
You can see that functions: (1) Have a name, (2) Have a description that describes what it does, (3) Lists the parameters that are passed to the function as well as which are required. If you’ve played with Agentforce actions, you can no doubt see a similarity.
The description is what is used by the LLM to figure out what the capabilities of each function are, so the better your description, the better the LLM is at using it (up to the point of diminishing returns, of course).
There are some real challenges with using functions. The biggest one is that the functions are declared and implemented in your code: the basic mechanism of telling the LLM what functions are available, calling them upon request of the LLM, and getting the results back to the LLM are all coding exercises.
Using third party libraries5 doesn’t save you anything; you still had to provide the glue between the LLM and the function. This isn’t terribly hard, but it can grow increasingly complex implementation as the number of functions grows. It gets messy, real quick.
And that assumes you have the source code to the application to modify to add in all the functions. What if you don’t, like with Claude Desktop? There’s no way to add in functions to a compiled application.
You’re out of luck with functions … But you are in luck with MCP!
MCP: Functions at an arm’s length
The simplest description of MCP is that it allows you to write functions that can be advertised in a standardized, JSON format and call via a consistent, simple API rather than via incorporating them in code.
The “advertised” part really is two parts. The first is a JSON config file for your app (e.g., Claude Desktop) that tells it how to connect to “servers” that provide extra functionality. The second part is that in initializing the connection to the servers, the servers will return a list of functions they support. Here’s an example of a single function that the MCP Server for PyCharm provides (allowing me to use Claude Desktop directly to work with code in my project; most IDEs can call MCP Servers, but PyCharm also can be a server to other clients!):
This provides exactly the same information that old-fashion (if 2 years ago is old) “functions” do, as you can see.
A single MCP Server can provide any number of functions (the PyCharm server provides over 30), and they all are called using a standard convention, which is how it becomes “consistent” and (more or less) “simple” to use.
Another result is that you can now move functions into standalone utility applications (the servers), allowing you to decouple them from your code. And the benefits of this are not just for pre-compiled packaged applications like Claude Desktop. This decoupling is great; I can use MCP servers that are written in any number of languages all at once; if my code is Python and yours is Typescript, nobody cares or even notices much.
There are, however, some limitations that remain. The notion of “at an arm’s length” also suggests not reaching much further than that. While the specification for MCP includes a method of accessing services across an HTTP connection, Claude Desktop (for example), doesn’t support it. But that may just be because it’s so early in MCP land. The fact that Claude asks you all the time if it’s OK to call a service also suggests some concerns that need to be addressed (lest Apple runs funny commercials about it6.). I have no real idea what, though…
Up Next
Next post, I’ll dive into how MCP really works, and then after that we’ll look at a sample implementation…
The chat history takes up space in the input “context window”, and early on the context windows were really small. So it was important to keep the amount of history small enough to not generate errors for too much input. The usual approach was to trim back older history so the LLM could “remember” what you said a couple of questions ago, but probably would not “remember” what you said 20 questions ago. Conceptually, this is still a concern, but with modern context windows 100x larger, you have a lot longer to go before you have to worry. There are other reasons to limit the context window (such as off topic history), but it’s more nuanced.
That’s where Retrieval Augmented Generation comes into play. You try to get a broad sense of what the user’s question is about, and then try to find documents that are related to that topic, often using embeddings, which I discussed a while back.
By this I mean third party libraries that do something not AI related, like retrieve data from a database or fetch a web page. Libraries like LangChain are a completely different beast.
These were great, although I will say that the Mac has gotten a bit more pop-uppy over the years so not quite as funny…