Table of Contents

1 DBs that destroy data    slide

  • Bad DBs that lose data
  • But also other DBs where destroying data is part of the normal operation

2 DBs that destroy data    slide

  • Bad DBs that lose data
  • But also other DBs where destroying data is part of the normal operation
  • Turns out, that's most of the current DBs

3 Imagine if    slide

  • git threw away your old code while storing new one
  • Ledgers erased old entries
  • Log files discarded old entries (well they do, that's why they suck)

4 Modeling problems with mutable DBs    slide

  • No explicit time model, so the past is forgotten
  • Important events must be timestamped manually (tedious, error prone & easy to miss)
  • Complex queries are needed for logic involving history

4.1 Examples    notes

4.1.1 Click    slide

  • case_update
  • How about patient history?
  • How about doctor history (e.g. work where from when to when)

4.1.2 Bnb    slide

  • point_record
  • A coupon expired but was used, its expiration date may have been changed back & forth somehow, we could not know, and what if the admin decides to change that expiration date…
  • For group-based coupons, we want to check whether the user belonged to the group at the time we sent out the coupon email, not at the current moment

4.1.3 Academy    slide

  • Some invoices are associated with a plan, then disassociated, then associated with another plan, then the fact that they was associated with the previous plan is lost
  • How do we keep track of the fact that a student was enrolled in a class, then removed, then re-enrolled again?
  • How about a class being re-scheduled, what did its initial schedule look like?

5 Operational problems of mutable DBs    slide

  • Scaling problem. Destroying old data to store new ones leads to consistency problem
  • Fear of round-trips. This necessitates the need to do as much as possible in 1 query (to avoid N+1). It also leads to postgres extensions that extend the DB processing capability (which is actually a good thing in face of mutability; but eliminating mutability allow much greater flexibility and query power, just throw data to everyone)
  • Fear of overloading server (even with read queries). Again by making the DB immutable data everyone can keep a copy of the data to play around as long as needed
  • We interface with the DB by functions of the DB connection, not of the data (identity not value)! We want the data inside (value) not the connection/result set (identity)
  • Something wrong happened and we cannot look ask the DB what the data looked like when the event happened. We instead need to look through log (files!!!) to hopefully catch some traces
  • We don't trust the DB, so we need to add logging to debug problems (log files are somewhat "more" immutable than DB, since they are normally written once!)

6 git over svn    slide

The key reason git is better is because it cares mostly about facts (immutable data + timestamps). That allows it to do all sort of things: distribution, decentralization, operation without network.

The past doesn't change. Data is accumulated, not overwritten.

7 Datomic    slide

  • Decouple identity & value (value-oriented model instead of place-oriented)
  • Relational (but not SQL)
  • Immutable data
  • Horizontally scalable query capability
  • Caching aggressively
  • Database can be put behind CDN!
  • Transaction functions: (f db & args) -> tx-data (entire DB value as the argument, not the connection)
  • Independent query engines
  • Time queries: db.asOf, db.since, db.asIf, queries joining & comparing past & present data

8 Datomic    slide

  • Decoupling DB components make "cool" stuffs like this mundane & trivial
CREATE RULE "my_table_on_duplicate_ignore" AS ON INSERT TO "my_table"
                WHERE (pk_col_1, pk_col_2)=(NEW.pk_col_1, NEW.pk_col_2))
INSERT INTO my_table SELECT * FROM another_schema.my_table WHERE some_cond;
DROP RULE "my_table_on_duplicate_ignore" ON "my_table";

9 A little off-topic    slide

Mutable data & time model: (git hash) versioned static assets. CDN for DB. Mutable (unversioned) static assets on CDN is just asking for trouble. CDN works better with immutable data.

This is an example of place-oriented mutable thinking that causes troubles:

<script type="text/javascript" src="//site.js" ></script>

Better "dereference" it to a value by using a "timestamp"

<script type="text/javascript" src="//6ff6529826a92cf01aa1d52a134de07d0292e741/site.js" ></script>

Date: 2012-09-18T19:17+0700

Author: Nguyễn Tuấn Anh

Org version 7.9.1 with Emacs version 23

Validate XHTML 1.0