I recently read 2 papers that I’d highly recommend if you ever wonder about the underpinnings of modern relational databases:
This is an opinionated history of database architectures from roughly 1970 to 2005, and it’s got some serious credentials behind it – Michael Stonebraker was the driving force behind Postgres and many other things. It’s an easy read, and it’s a good explanation of how relational DBs and SQL came to (mostly) rule the world. It was written amidst a lot of hype over XML databases, and it (rightly, in retrospect) critiques XML DBs and suggests that they won’t see much commercial adoption.
One interesting bit: Stonebraker and Hellerstein do not like SQL, and they claim that it became dominant largely because of IBM’s dominance in the marketplace during the 1980’s. Which leads us to the next item…
This was cited approvingly in the 1st paper as a “scathing critique of the semantics of SQL”, which caught my eye - virtually every database supports SQL, how bad can it be?
I won’t summarize the entire thing, it’s a laundry list of complaints, but this resonated with me:
Notice that it is just table-names that appear in the FROM clause. Completeness suggests that it should be table-expressions (as Gray puts it, “anything in computer science that is not recursive is no good”).
…a simple table-reference (i.e. a table-name) ought to be just a special case of a general table-expression
To give a (trivial) example, say we want to union 2 tables and extract a column. We’d write that like this in SQL:
SELECT EMPLOYEE_ID FROM EMPLOYEES UNION SELECT EMPLOYEE_ID FROM CONTRACTORS
But if table names were just special cases of table expressions, we could write it like this instead:
SELECT EMPLOYEE_ID FROM (EMPLOYEES UNION CONTRACTORS)
SQL gets the job done, and it could be a lot worse - but it is a little sad that many of the issues academics have known about for 35+ years still exist today. It’s a reminder of how entrenched something that’s only “good enough” can become.