Agents all the way down

A pattern for UI in MCP clients

Say you’re working on an agent (a model using tools in a loop). Furthermore, let’s say your agent uses the Model Context Protocol to populate its set of tools dynamically. This results in an interesting UX question: how should you show text tool results to the user of your agent?

You could just show the raw text, but that’s a little unsatisfying when tool results are often JSON, XML, or some other structured data. You could parse the structured data, but that’s tricky too; the set of tools your agent has access to may change, and the tool results you get today could be structured differently tomorrow.

I like another option: pass the tool results to another agent.

The Visualization Agent

Let’s add another agent to our system; we’ll call it the visualization agent. After the main agent executes a tool, it will pass the results to the visualization agent and say “hey, can you visualize this for the user?”

The visualization agent has access to specialized tools like “show table”, “show chart”, “show formatted code”, etc. It handles the work of translating tool results in arbitrary formats into the structures that are useful for opinionated visualization.

And if it can’t figure out a good way to visualize something, well, we can always fall back to text.

Why do it this way?

The big thing is that we can display arbitrary data to the user in a nice way, without assuming much about the tools our agent will have access to. We could also give the main agent visualization tools (tempting! so simple!), but:

  1. That can be very wasteful of the context window
    1. Imagine receiving 10,000 tokens from a tool, then the agent decides to pass those 10,000 tokens by calling a visualization tool - the 10,000 tokens just doubled to 20,000 in our chat history
  2. The more tools an agent has access to, the more likely it is to get confused
  3. A specialized visualization agent can use a faster+cheaper model than our main agent

It’s not all sunshine and roses; calling the visualization agent can be slow, and it adds some complexity. But I like this approach compared to the others I’ve seen, and we’re not far away from fast local models being widely available. If you’ve got another approach, I’d love to hear from you!

This is a brief post about something that confused me a great deal when I started working with LLMs.

Context

Many LLM providers (Anthropic, OpenAI, Google) support “function calling”, AKA “tool use”. In a nutshell:

  1. When calling the provider’s chat completion APIs, you tell the model “if needed, I can run these specific functions for you.”
  2. The model responds saying “hey go run function X with arguments Y and Z.”
  3. You go and run the function with those arguments. Maybe you append the result to the chat so the model has access to it.

Weather lookup is a common example. You tell the model “I have a function get_temperature(city: String) that looks up the current temperature in a city”, and then when a question like “What’s the weather like in Tokyo?” comes up the model responds to your code with “please call get_temperature("Tokyo")”.

Structured Output

All well and good, but where this gets interesting is that function calling is also a good way to get structured data out of LLMs. You can provide a function definition that you have no intention of “calling”, purely to get data in the format you want.

For example, using the Rust genai library:

// Text to analyze
let text = "The quick brown fox jumps over the lazy dog.";

// Define a tool/function for rating grammar
let grammar_tool = Tool::new("rate_grammar")
    .with_description("Rate the grammatical correctness of English text")
    .with_schema(json!({
        "type": "object",
        "properties": {
          "rating": {
            "type": "integer",
            "minimum": 1,
            "maximum": 10,
            "description": "Grammar rating from 1 to 10, where 10 is perfect grammar"
          },
          "explanation": {
            "type": "string",
            "description": "Brief explanation for the rating"
          }
        },
        "required": ["rating", "explanation"]
    }));

// Create a chat request with the text and the grammar tool
let chat_req = ChatRequest::new(vec![
    ChatMessage::system("You are a professional English grammar expert. Analyze the grammar of the given text and provide a rating."),
    ChatMessage::user(format!("Please rate the grammar of this text: '{}'", text))
]).append_tool(grammar_tool);

// ...and execute it
let chat_res = client.exec_chat("gpt-4o-mini", chat_req, None).await?;

The result will include some JSON like:

{
    "rating": 10,
    "explanation": "This sentence is grammatically perfect..."
}

…and we’re done. We just used function calling to get structured data, with no intention of calling any functions. This is much nicer and more reliable than string parsing on the raw chat output.

This approach is probably obvious to many people, but it was unintuitive to me at first; I think “function calling” is a misleading name for this functionality that can be used for so much more.

Alternative Approaches

This isn’t the only way to get structured data out of an LLM; OpenAI supports Structured Outputs, and Gemini lets you specify a response schema. But for Anthropic, it seems like function calling is still recommended:

Tools do not necessarily need to be client-side functions — you can use tools anytime you want the model to return JSON output that follows a provided schema.

I tried to use Automerge again, and failed.

For those of you who aren’t familiar, Automerge is a neat library that helps with building collaborative and local-first applications. It’s pretty cool! I work on a collaborative notes application that does not handle concurrent edits very well, and Automerge is one of the main contenders for improving that situation.

I gave Automerge a try in 2023 and wasn’t able to get it working, to my chagrin. This weekend there was an event in Vancouver for local-first software with one of the main Automerge authors, so I decided to attend and give it another try. I made a fair bit of progress, but ultimately gave up after spending ~5 hours on the problem. A few thoughts+observations:

I am going off the beaten path (web)

Automerge’s “golden path” is web apps. The core of Automerge is written in Rust, but it’s primarily used via WASM in the browser.

This approach is unpleasant for me; I like Rust, I have a good understanding of how code runs+executes on a “real computer”, and I do not want to write an application where 99% of the business logic runs in the browser. Instead, I tried to write an application where my Rust backend was the primary Automerge node and browser/JS Automerge nodes would talk to it.

This did not go well; the documentation and ergonomics of the Rust library are lacking, and most tutorials assume that you are using the JS wrapper around the Rust library. And then when I tried to use the JS version in my simple web UI, the docs assumed a level of web development sophistication that I don’t have.

To be clear, this is mostly a me problem: primarily targeting the browser is absolutely the way to go in 2025!

I am going off the beaten path (local-first)

Automerge tries to solve a lot of problems related to local-first software. But I wanted to “start small” and solve the problem of concurrent text editing for an application that isn’t local-first. In retrospect this was a mistake; the documentation was written for a very different audience than me, and I wasn’t especially aligned with what other people at the event were building.

Chrome is winning

Something that was discussed at the event: if you are building entirely in-browser local-first applications you may want to target Chrome, because Firefox is way behind on several new+useful APIs. This is sad, but not surprising.

What next?

I think it’s possible to build an Automerge-based collaborative text editor the way I want, but it’s a lot harder than I expected. I’m going to shelve this and revisit it next time I have time+energy to hack on it.

I started strength training for the first time in January 2024, in my late 30s. I wish I’d started earlier, but it’s been a hugely positive change in my life. Some observations:

  1. Standing up straight is a lot easier; back muscles help with posture!
  2. Stability is cool. I’m noticeably better at tightening my core in daily life, balancing, etc.
  3. I can put much more power into specific movements (ex: pedal hard on a bike) before feeling exerted or losing control
  4. I can lift my gangly 85lb dog easily without worrying that I’ll mess up my back
  5. RSI pain in my arms + wrists is nearly gone

I’ve been going to the same gym 3x/week on average. They have a pretty good model where:

  1. They connect you with a specific personal trainer
  2. Before starting classes you do X training sessions, to make sure you have decent form and won’t injure yourself
  3. After starting classes you do a training session every other month or so
  4. The relationship with the trainer is a form of accountability; they notice if you miss classes for a couple weeks

The classes are Crossfit-ish; about 2/3 strength, 1/3 cardio. They’re surprisingly fun now that I know what I’m doing.

Why now?

Sometime in 2023 I saw people I respect in a professional context (Jack Rusher, Andreas Kling) talking about how it’s increasingly hard to build muscle as you grow older, and thought “hmm, concerning.”

Later I started thinking about New Year’s resolutions etc. (as one does), and reached out to a local gym on a whim.

Why didn’t I start earlier?

I’d never enjoyed lifting weights before, and I didn’t know how to do it safely. To be honest, I associated weight lifting with jock culture and looked down on it a bit. I’ve gone through on-and-off phases of going to the gym, but it’s always been individual cardio work. I didn’t enjoy team sports as a kid, and I’ve always been someone who prefers to figure things out on their own. That approach has served me well in some areas, but it wasn’t working for fitness.

Parting Advice

If you haven’t been able to commit to a fitness routine, try trading some money for training and a community of like-minded people. Expert advice is useful and social accountability works!

After 13 years of daily Twitter usage, I bit the bullet and switched to Bluesky (urbanism stuff, mostly) and Mastodon (computer stuff) full time.

I made friends on Twitter, I learned a lot, and I was part of a few communities that would not exist without Twitter. I’m proud of my small contributions to Vancouver urbanism over the years, and they mostly happened on Twitter or adjacent to it.

Leaving was sad but unavoidable; the site’s really gone downhill since Musk purchased it. The quality of discourse has plummeted now that the replies to any popular tweet are dominated by bluechecks. Twitter has a critical mass of shitty resentful people who wouldn’t be out of place on 4chan, and the site shoves their opinions in your face.

Anyway, it’s a few months in and I’m glad I made the change. I don’t love Mastodon, but Bluesky feels a lot like an earlier, more pleasant version of Twitter.

headshot

Cities & Code

Top Categories

View all categories