Function Calling is Structured Output

(or it can be anyway)

March 2, 2025

3 minute read

This is a brief post about something that confused me a great deal when I started working with LLMs.

Context

Many LLM providers (Anthropic, OpenAI, Google) support “function calling”, AKA “tool use”. In a nutshell:

When calling the provider’s chat completion APIs, you tell the model “if needed, I can run these specific functions for you.”
The model responds saying “hey go run function X with arguments Y and Z.”
You go and run the function with those arguments. Maybe you append the result to the chat so the model has access to it.

Weather lookup is a common example. You tell the model “I have a function get_temperature(city: String) that looks up the current temperature in a city”, and then when a question like “What’s the weather like in Tokyo?” comes up the model responds to your code with “please call get_temperature("Tokyo")”.

Structured Output

All well and good, but where this gets interesting is that function calling is also a good way to get structured data out of LLMs. You can provide a function definition that you have no intention of “calling”, purely to get data in the format you want.

For example, using the Rust genai library:

// Text to analyze
let text = "The quick brown fox jumps over the lazy dog.";

// Define a tool/function for rating grammar
let grammar_tool = Tool::new("rate_grammar")
    .with_description("Rate the grammatical correctness of English text")
    .with_schema(json!({
        "type": "object",
        "properties": {
          "rating": {
            "type": "integer",
            "minimum": 1,
            "maximum": 10,
            "description": "Grammar rating from 1 to 10, where 10 is perfect grammar"
          },
          "explanation": {
            "type": "string",
            "description": "Brief explanation for the rating"
          }
        },
        "required": ["rating", "explanation"]
    }));

// Create a chat request with the text and the grammar tool
let chat_req = ChatRequest::new(vec![
    ChatMessage::system("You are a professional English grammar expert. Analyze the grammar of the given text and provide a rating."),
    ChatMessage::user(format!("Please rate the grammar of this text: '{}'", text))
]).append_tool(grammar_tool);

// ...and execute it
let chat_res = client.exec_chat("gpt-4o-mini", chat_req, None).await?;

The result will include some JSON like:

{
    "rating": 10,
    "explanation": "This sentence is grammatically perfect..."
}

…and we’re done. We just used function calling to get structured data, with no intention of calling any functions. This is much nicer and more reliable than string parsing on the raw chat output.

This approach is probably obvious to many people, but it was unintuitive to me at first; I think “function calling” is a misleading name for this functionality that can be used for so much more.

Alternative Approaches

This isn’t the only way to get structured data out of an LLM; OpenAI supports Structured Outputs, and Gemini lets you specify a response schema. But for Anthropic, it seems like function calling is still recommended:

Tools do not necessarily need to be client-side functions — you can use tools anytime you want the model to return JSON output that follows a provided schema.

My latest attempt at collaborative text editing

January 12, 2025

3 minute read

I tried to use Automerge again, and failed.

For those of you who aren’t familiar, Automerge is a neat library that helps with building collaborative and local-first applications. It’s pretty cool! I work on a collaborative notes application that does not handle concurrent edits very well, and Automerge is one of the main contenders for improving that situation.

I gave Automerge a try in 2023 and wasn’t able to get it working, to my chagrin. This weekend there was an event in Vancouver for local-first software with one of the main Automerge authors, so I decided to attend and give it another try. I made a fair bit of progress, but ultimately gave up after spending ~5 hours on the problem. A few thoughts+observations:

I am going off the beaten path (web)

Automerge’s “golden path” is web apps. The core of Automerge is written in Rust, but it’s primarily used via WASM in the browser.

This approach is unpleasant for me; I like Rust, I have a good understanding of how code runs+executes on a “real computer”, and I do not want to write an application where 99% of the business logic runs in the browser. Instead, I tried to write an application where my Rust backend was the primary Automerge node and browser/JS Automerge nodes would talk to it.

This did not go well; the documentation and ergonomics of the Rust library are lacking, and most tutorials assume that you are using the JS wrapper around the Rust library. And then when I tried to use the JS version in my simple web UI, the docs assumed a level of web development sophistication that I don’t have.

To be clear, this is mostly a me problem: primarily targeting the browser is absolutely the way to go in 2025!

I am going off the beaten path (local-first)

Automerge tries to solve a lot of problems related to local-first software. But I wanted to “start small” and solve the problem of concurrent text editing for an application that isn’t local-first. In retrospect this was a mistake; the documentation was written for a very different audience than me, and I wasn’t especially aligned with what other people at the event were building.

Chrome is winning

Something that was discussed at the event: if you are building entirely in-browser local-first applications you may want to target Chrome, because Firefox is way behind on several new+useful APIs. This is sad, but not surprising.

What next?

I think it’s possible to build an Automerge-based collaborative text editor the way I want, but it’s a lot harder than I expected. I’m going to shelve this and revisit it next time I have time+energy to hack on it.

How I Use LLMs (Sep 2024)

Aider is pretty cool

September 15, 2024

3 minute read

It feels a bit early to be writing an update to something I wrote 1.5 months ago, but we live in interesting times. Shortly after writing that post, I started trying out Aider with Claude 3.5 Sonnet. Aider’s an open source Python CLI app that you run inside a Git repo with an OpenAI/Anthropic/whatever API key¹.

My Aider workflow

I direct Aider toward a file or multiple files of interest (with /add src/main.rs or similar)
I describe a commit-sized piece of work to do in 1 or 2 sentences
Aider sends some file contents and my prompt to the LLM and translates the response into a Git commit
I skim the commit and leave it as is, tell Aider to tweak it some more, tweak it myself, or /undo it entirely

This works shockingly well; most of the time, Aider+Claude can get it right on the first or second try. This workflow has a few properties that I really like:

It’s IDE-agnostic (no need to switch to something like Cursor)
It’s very low-friction, which encourages trying things out
1. No need to copy code from a browser, write commit messages, etc.
2. Undoing work is trivial (just delete the Git commit or run /undo)
It’s pay-as-you-go (I pay Anthropic by the token, no monthly subscription)

Prompts

Here are some examples of the prompts I do in Aider:

Library updates should be streamed to all connected web clients over a WebSocket. Add an /updates websocket in the Rust code that broadcasts updated LibraryItems to clients (triggered by a successful call to update_handler). The JS in index.html should subscribe to the WebSocket and call table.updateData() to update the Tabulator table
Add a new endpoint (POST or PUT) for adding new items to the library. It will create a new LibraryItemCreatedEvent, save it to the DB, apply it to the in-memory library, then broadcast the new item over the websocket
add a nice-looking button that bookmarks the current song. don’t worry about hooking it up to anything just yet
Add a new “test-api” command to justfile. It should curl the API exposed by add_item_handler and check that the response status code is CREATED
Write a throwaway C# program for benchmarking the the same SQLite insert as create_item() in lib.rs

I’m still developing an intuition for how to write these, but with all of these examples I got results that were correct or able to be fixed up easily. Sometimes I am very precise about what I want, and sometimes I am not; it all depends on the task at hand and how confident I am that the LLM will do what I’m looking for.

What does all this mean?

I dunno! The world is drowning in long-winded AI thinkpieces, so I’ll spare you another one.

All I know for a fact is that if I have a commit-sized piece of work in mind, there’s a very good chance that Claude+Aider can do it for me in less than a minute — today. I’m still exploring the implications of that, but Jamie Brandon’s Speed Matters post feels very relevant. I can try out more ideas and generally be more ambitious with my software projects, which is very exciting.

You can also point Aider at a locally-hosted LLM, which is cool, but in my experience the quality is nowhere near as good as Claude. ↩︎

How I use LLMs

August 1, 2024

3 minute read

I find LLMs to be pretty useful these days. I don’t consider myself to be on the frontier of LLM experimentation, but when I talk to (technical) people it sounds like my workflow is pretty uncommon, so I should probably write about it.

LLM (the command-line tool)

Simon Willison’s llm command-line tool is the primary way I use LLMs. I sometimes struggle to describe the appeal of llm to people because it’s boring. llm lets you do the following with any popular LLM (hosted or local):

Ask the LLM one-off questions (optionally taking stdin as context)
Start a chat session (optionally starting from the last ad-hoc question)

And that’s about it! It’s one of those lovely tools that does a few things well. I usually start sessions with exploratory questions/requests, sometimes piping in data:

cat xycursor.rs | llm "the end() function in this file is confusing, explain it"

And then if I need to follow up on a question, llm chat --continue drops me into an interactive chat that starts after the last question+response:

> llm chat --continue
Chatting with claude-3-5-sonnet-20240620
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> write a comment explaining that function, using ASCII diagrams

Important things about this workflow:

It’s trivial to “connect” the LLM to other data+files
1. For example, every week I used to manually rewrite the output of this script to be more readable before publishing it; now I pipe it to llm and tell it to do an initial rewrite first
llm makes it trivial to go from exploratory work to more focused iteration

I have llm set up to use this custom prompt, no matter what underlying LLM it’s using. I find that it helps make responses much more succinct.

GitHub Copilot

It’s good, I use it every day. It’s a lot more widely known than llm so I won’t spill too much ink over it.

Observations

I use LLMs and web search in a similar way: do a quick exploratory investigation into something, taking the initial results with a grain of salt. The skills+knowledge you need to evaluate Google results are very similar to the ones you need to evaluate LLM results!

I mostly use LLMs for computer stuff, and it’s often really easy to verify whether a programming/computing answer is any good; just try it out! LLMs are probably not quite as useful for fields where that’s not the case.

I’m happy with llm but it is, ultimately, a wrapper around a basic chat interface and we can probably do better. Claude Artifacts is very appealing in that it can offer a faster iteration cycle for web development (but is unfortunately coupled to an expensive subscription service), and Aider is interesting as a better way to give an LLM access to context from an entire code base. I’m hoping we’ll see more tools like these that extend what we can do with LLMs.

My home-cooked software

January 5, 2024

3 minute read

I reread Robin Sloan’s post about a “home-cooked” app, and it continues to resonate; there’s something special about writing software for very small audiences. In that vein, here’s something that my partner and I have been using since last year:

We needed a solution for shared notes that works well across Android, iOS, and the (desktop) web… so I made one. We use it a lot, mostly for things like our grocery list and cooking notes. It’s simple, fast, and it works for us.

Technology-wise, it’s just a website that works well on desktop and mobile. The back-end is written in Rust, it uses HTMX for some dynamic updates, and the note contents get saved to a SQLite database (backed up to S3 with Litestream). The whole thing compiles down to a single-file executable that gets run on a cheap VPS.

Many corners were cut during the development of this project! Proper collaborative text editing is hard, and I took a quick-and-dirty approach. Eventually I’ll get it working better with Automerge or whatever, but for now I have something good enough for a user base of 2.

And now it’s really easy to do stuff with our shared notes:

That’s a cheap 7" e-ink display with a Raspberry Pi on the back, showing a customized view of our grocery list plus enough information to answer the question “do I need an umbrella today?”. It’s like a little household dashboard that we can take a quick look at as we head out the door. Eventually I need to make a frame for it and mount it on the wall, but it’s pretty handy as-is.

Why do all this?

Shared notes are a solved problem, I’m sure I could have found an existing service for this. But:

I like having a project to tinker on for someone I care for, in the same way that I like cooking for them
This is permanent in a way that cloud services are not. It will never change unless we want it to, and it will will work for decades without much effort. We might be using this in a retirement home one day
It’s trivial to tweak and extend software I wrote myself. I didn’t need to figure out an API to get the the e-ink display, I just built a new HTML view on top of existing data
Software designed for an audience of 2 is able to be much simpler than software that anticipates the needs of a large+diverse user base

Bonus: Burninator.exe

Now that I’ve shown you something useful, here’s some much sillier home-cooked software:

A friend asked:

What is a program that I can run in the background to raise the temperature of a laptop? If my laptop gets too cold the screen starts to flicker…

I wasn’t aware of any, so I wrote one that queries WMI for the current CPU temperature and then does busy work until the temperature is high enough. Is it dumb? Yes. Does it solve this one person’s very specific problem? Also yes!

Category: Software

Context

Structured Output

Alternative Approaches

I am going off the beaten path (web)

I am going off the beaten path (local-first)

Chrome is winning

What next?

My Aider workflow

Prompts

What does all this mean?

LLM (the command-line tool)

GitHub Copilot

Observations

Why do all this?

Bonus: Burninator.exe

🏗️ Recently

🏗️ A Hotel Union Is Trying to Stop New Hotels in Vancouver

💻 2025 Retrospective

💻 What's MCP good for, anyway?

💻 Tool Calls Are Expensive And Finite

💻 APIs don't make good MCP tools

Home

About

Projects

Recently

A Hotel Union Is Trying to Stop New Hotels in Vancouver

2025 Retrospective

What's MCP good for, anyway?

Tool Calls Are Expensive And Finite

APIs don't make good MCP tools