Category: Databases

Quick T-SQL Regression Testing

Using CHECKSUM() and CHECKSUM_AGG()

Here’s something I’ve found useful in SQL Server, but it should apply in any DBMS with checksum functions. Next time you’re refactoring some database objects and you want to do some regression testing, give the CHECKSUM (MSDN) and CHECKSUM_AGG (MSDN) functions a try.

CHECKSUM and CHECKSUM_AGG

They behave pretty much as you would expect; CHECKSUM returns 1 checksum given 1 row, and CHECKSUM_AGG is an aggregate function that returns 1 checksum given multiple checksum rows. Between the two of them, you can get a checksum for any arbitrary collection of data:

-- Returns 2 rows with 2 columns
SELECT Field1, Field2
FROM (VALUES ('foo','bar'),('baz','foo')) TempTable(Field1,Field2)

-- Returns 2 rows with 1 checksum column
SELECT CHECKSUM(Field1,Field2)
FROM (VALUES ('foo','bar'),('baz','foo')) TempTable(Field1,Field2)
 
-- Returns 1 row with 1 checksum column
SELECT CHECKSUM_AGG(CHECKSUM(Field1,Field2))
FROM (VALUES ('foo','bar'),('baz','foo')) TempTable(Field1,Field2)

Regression testing

These help with a common problem: when you’re refactoring a database object, how can you be sure that you haven’t changed the existing functionality? Sometimes it’s enough to spot check a few inputs, but often the logic is complex enough that this does not provide a high level of confidence. With CHECKSUM and CHECKSUM_AGG, you can quickly check large numbers of test cases.

For example, say we have a TVF called MyDateTVF that takes in a date parameter and returns columns Column1 and Column2. I want to refactor this function and then test that the functionality is unchanged for every date in 2018. If we have a table called DateList which contains every date (more useful than you might expect in a DB with a lot of temporal aspects), I can simply run the following before and after my change:

SELECT CHECKSUM_AGG(CHECKSUM(MDT.Column1, MDT.Column2))
FROM DateList DL
CROSS APPLY (
    SELECT Column1, Column2 from dbo.MyDateTVF(DL.Date)
) MDT
WHERE DL.Date BETWEEN '2018-01-01' AND '2018-12-31'

If that query returns the same checksum before and after my change, we can rule out any regressions for those inputs. Nice and easy!

Databases as file formats

The .mbtiles way

I’m building an interactive online map of all properties in Vancouver, and along the way there have been a few pleasant surprises. Most recently: the .MBTiles tileset format is surprisingly cool.

Background

Mapbox is one of the biggest players in the open source mapping space (especially now that Mapzen and Carto have thrown in the towel – Mapzen is closing and Carto is now using Mapbox tech). One of the many nice things about Mapbox is that they developed an efficient open standard for vector map tiles, appropriately named Mapbox Vector Tiles (read this if you’re not sure why vector tiles are great).

Map tiles are often pre-computed for each zoom level, and once you’ve done that you need to store them somewhere. Enter the .MBTiles tileset format.

Poking around under the hood

My first encounter with this file format occurred when I used Eric Fischer’s excellent tippecanoe tool to simplify my data set at lower zoom levels. Tippecanoe generates .mbtiles files, which are easy to serve to clients either by uploading to Mapbox, using a third party tile server, or even by rolling your own server with something like the mbtiles Node.js package.

All great… but after setting up a server my Mapbox GL JS client refused to render the tiles. I tried a few things without much luck, and then as a last resort I decided to poke around in the .mbtiles file. I was expecting to need a hex editor or similar, but then I saw this beauty in the spec:

MBTiles is a specification for storing tiled map data in SQLite databases

The files themselves are just relational databases in a known schema – how cool is that? Emboldened, I grabbed a SQLite client and opened up my .mbtiles file:

headshot

Cities & Code

Top Categories

View all categories