Category: Databases

Been a while since the last update, I’ve been busy with the new job. Things have been going well!

I went to New York in March for my first week at Datadog, which was a good way to start the job; I met other new joiners, met people on my team, saw a bit of the city. And the view from the office is alright:

Datadog operates at a much larger scale than any company I’ve ever worked at, and that has some upsides and downsides (but mostly upsides). I feel lucky to work with a lot of smart, enthusiastic people.

Last week I was in New York again, for Datadog’s annual DASH conference. I was helping run a booth for my team and it was good to talk to users (and potential users) in person. I got to see a bit more of the city outside of work; the Intrepid Museum was a highlight (an aircraft carrier! a Space Shuttle! a submarine!).

Outside of work, I’ve been spending a little bit of time on Nushell (but not as much as I would like). I’ve been driving some changes to the explore interactive pager (which reminds me, I need to update that documentation). I’m trying to get it to a point where I’m happy with it for version 1.0; I’m not quite there yet but I’ve made a lot of changes under the hood.

Nonfiction I read recently

What Goes Around Comes Around… And Around… (⭐⭐⭐⭐) Michael Stonebraker’s back for another opinionated overview of databases, this time with Andy Pavlo. The whole thing is good but I particularly like their take on vector databases:

They are single-purpose DBMSs with indexes to accelerate nearest-neighbor search. RM DBMSs should soon provide native support for these data structures and search methods using their extendable type system that will render such specialized databases unnecessary

On a related note, I like Simon Willison’s point that maybe you don’t need vectors for RAG:

The more time I spend with this RAG pattern (ed: one using full-text search) the more I like it. It’s considerably easier to reason about than RAG using vector search based on embeddings, and can provide high quality results with a relatively simple implementation.

Fiction I read recently

Lonesome Dove (⭐⭐⭐⭐⭐) This was described to me as “the Western novel to read if you don’t normally read Westerns” and yeah, it was great.

Glorious Exploits (⭐⭐⭐⭐) Funny + touching story about 2 unemployed potters in 412 BC who decide to put on a play with imprisoned Athenian soldiers as the cast.

Microserfs (⭐⭐⭐⭐) Strange that I hadn’t read this before, but Douglas Coupland has a not-entirely-positive reputation in his hometown. He’s kinda known as the guy who got too many undeserved public art commissions around here. Anyway! It was a really fun read and it felt like it could have been written yesterday (surprising for a book about the tech industry written nearly 30 years ago).

A bunch of Horatio Hornblower and Richard Bolitho books (⭐⭐⭐) I was looking for something along the same lines as the excellent Aubrey-Maturin series. These weren’t quite it. Hornblower isn’t very fun as a protagonist and Bolitho is a boring one, I couldn’t make it very far into either series.

Recent Nushell/Rust Work

SQLite, file watcher, windows-rs

I’ve joined the Nushell core team. This doesn’t really change what I’m doing day-to-day, but it makes my work on Nu feel a little more official 🙂.

SQLite Support

This is the biggest feature I’ve implemented so far:

I’m pretty proud of how this turned out; it’s very convenient to be able to browse SQLite databases in your shell and interact with them the same way you would any other data source. Nu is often-but-not-always smart enough to avoid unnecessary work when loading things from the database; there’s still some work to do here and it will probably involve rearchitecting how Nushell queries data.

File watcher

I also implemented a watch command that runs arbitrary Nu code in response to file changes. Nothing groundbreaking, but I find myself needing this kind of low-key automation all the time: run tests when code changes, restart a development server, log changes in a directory, etc. I think the ability to respond to file changes should be a more widely available primitive, and now it is.

Rust for Windows

Against all odds, I somehow got sucked back into Windows development. I spent a solid week helping one of Nushell’s dependencies do a big upgrade of their Windows functionality. This required a deep dive into the current state of calling Windows APIs from Rust, and… it’s a mixed bag.

I used the windows crate which is maintained by Microsoft. It’s an automatically generated set of Rust bindings for Windows APIs, which is both good (very comprehensive, always kept up to date) and bad (some rough edges that might be solved in a handmade solution like winapi). The crate is actively being worked on and it frequently has breaking changes; this means documentation is a little scarce and often out of date. Overall I was impressed and I think the crate has a bright future . But until it settles down a bit, expect some growing pains.

I’m officially unemployed again 😎. In the interest of self-accountability, I’m going to try to document what I’m up to on my break; expect more frequent updates to this blog.

HYTRADBOI

I bought a ticket to Have You Tried Rubbing A Database On It?, which could loosely be described as a hipster database conference; lots of people using databases in unusual ways, not much in the way of enterprise RDBMSs. The speaker list is like a Who’s Who for offbeat database work, and I’m really looking forward to it.

Nushell

I’ve been using Nushell as my shell on both Windows and Linux, about half the time. Nushell is a fascinating project; it’s a shell that operates on structured data like PowerShell, but without PowerShell’s (many) pain points.

Nushell has recently seen some massive upgrades (the parsing and evaluation engine was completely rewritten) and it’s a very good time to give it a try. It’s still early days, but I’m hopeful Nushell will be able to displace POSIX shells; it’s liberating to work with much richer data types than plain text:

Nu is a way of saying “what if we didn’t need tools like awk so often?” Since you’re working with structured data, as we add more support for file types, it’s less often you need to reach for “awk”, “jq”, “grep”, and the array of other tools to open and work with common file types. In a way, it’s taking the original spirit of Unix — where you use pipelines to combine a set of tools — and imagining how that original spirit would work today, with what we know about programming languages and tools.

Building data-centric apps with a reactive relational database

This essay touches on a lot of my favourite things: SQLite! The intersection of native apps and web UI! iTunes clones! In a nutshell, it’s a very cool approach to building GUI applications in which all of the application’s state lives in a local database.

It’s more of a provocation than a fully finished system, but I think it shows promise. I’d like to see a bit more investigation of “escape hatches”; how hard would it be mix in a little imperative code when SQL/SQLite aren’t the right fit for a task? Also, this was a bit depressing:

One challenge has been inter-process communication. When the reactive graph is running in the UI thread and the SQLite database is on a web worker or native process, each query results in an asynchronous call that has to serialize and deserialize data.

This would have been a non-issue in traditional GUI code (just query SQLite on a background thread in the same process); one more thing we lose as web UI takes over, I guess 😞.

I recently spent a bit of time messing around with the Postgres wire protocol. I’d like to build a library that lets any application accept connections as if it were a Postgres database; many database clients and databases speak the Postgres wire protocol, and it seems like a practical way to expose tabular data over the network.

Implementing the protocol isn’t too bad; the docs aren’t great but the protocol itself is relatively straightforward. Unfortunately, client behaviour seems harder to accommodate. Clients can issue arbitrarily complex SQL queries to the database to inspect its schema, and sometimes they expect accurate answers. For example, Npgsql issues this beast and then disconnects if it gets an invalid response.

So I’m stuck with a question: how far should I go to support legitimate client behaviour outside of the protocol? The easiest approach for now is to test with a few popular clients and hardcode responses for any queries that they need implemented, but that will leave a long tail of unsupported clients. On the other end of the complexity scale, I could try running schema queries on an in-memory SQLite instance with database objects that mimic Postgres (certainly a massive yak shave, but tempting nonetheless).

I think my next step will be to dig through the source of a few Postgres-compatible databases to see how they solve this.

Managing Postgres with Pulumi and Terraform

Adventures with Infrastructure as Code

I’m a big fan of Pulumi, and I’ve previously used it for cloud infrastructure. That’s their main selling point; Pulumi’s slogan is literally “Program the cloud.” Interestingly, Pulumi can also be used to manage on-premise infrastructure, including PostgreSQL databases. Let’s dive into the details.

Pulumi basics

Pulumi takes a declarative infrastructure definition and handles provisioning said infrastructure, just like Terraform, AWS CloudFormation, and Azure Resource Manager. The main difference is that in Pulumi you’re writing real code (Node, Python, or Go) to build that infrastructure definition instead of a YAML or JSON file, it’s great.

Pulumi/Terraform PostgreSQL Provider

After Pulumi has executed your code to build up an infrastructure definition, it needs to interact with external infrastructure resources (cloud providers, databases, etc.) to turn that definition into a reality. Some of Pulumi’s functionality for working with external resources is derived from Terraform providers. For example, Pulumi has a PostgreSQL provider which is derived from the Terraform PostgreSQL provider.

At a high level, the PostgreSQL provider lets you define databases, roles, schemas, permissions, and PostgreSQL extensions. It stops short of letting you manage DDL objects and data within the database (which is probably for the best given the complexities of schema+data migrations).

Defining a Postgres database with TypeScript

Pulumi can be used to instantiate entire Postgres clusters, but for this tutorial let’s assume you already have an existing cluster managed outside of Pulumi. It can be running anywhere: in the cloud, in a container, on your local machine, whatever.

First, install Pulumi then instantiate a new TypeScript project with pulumi new typescript. This scaffolds a bare-bones Node+TypeScript Pulumi project with the following files:

index.ts                package-lock.json       tsconfig.json
Pulumi.yaml             node_modules            package.json

Install the Pulumi PostgreSQL provider with npm i @pulumi/postgresql (or just edit your package.json directly).

Next, we need to tell Pulumi which Postgres cluster to connect to and how. The PostgreSQL provider’s configuration points are documented here. My Postgres cluster is running on a local VM with the hostname fedora-vm, so I set the postgresql:host variable like so:

pulumi config set postgresql:host fedora-vm

This creates a YAML configuration file for the current Pulumi stack, or modifies one if it already exists. For my dev stack, this creates a file named Pulumi.dev.yaml with the following contents:

config:
  postgresql:host: fedora-vm

You’ll want to do the same for postgresql:username and postgresql:password, to tell Pulumi which credentials to use. Sensitive configuration values like passwords can be encrypted using the --secret flag.

Finally, we’re ready to work with the fun stuff: configuring a database in real TypeScript code. Let’s say we want to instantiate a Pulumi-managed database, create a role with login permissions, and grant the role SELECT permission on tables in the public schema in the new database. This can all be done in the index.ts file like so:

import * as pulumi from "@pulumi/pulumi";
import * as postgresql from "@pulumi/postgresql";

const config = new pulumi.Config();

const managedDatabase = new postgresql.Database("managedDatabase", {
  name: "pulumi"
});

const publicReaderRole = new postgresql.Role("publicReaderRole", {
  login: true,
  name: "public_reader",
  password: config.require("publicReaderPassword")
});

const publicReaderSelectTablesGrant = new postgresql.Grant(
  "publicReaderSelectTablesGrant",
  {
    database: managedDatabase.name,
    objectType: "table",
    privileges: ["SELECT"],
    role: publicReaderRole.name,
    schema: "public"
  }
);
headshot

Cities & Code

Top Categories

View all categories