dbs-destroy-data

Table of Contents

1 How DBs are destroying data slide

1.1 This is not about low-quality DBs that lose data slide

images/CDShred.jpg

1.2 But about most current DBs slide

With UPDATE and DELETE operations

1.3 Throw away history on purpose slide

images/postgresql.png images/mysql.png images/mssql.png

images/redis.png images/mongodb.png

And most other

2 Why is it bad? slide

"Destroying data" should be enough, but let's look into more details

2.1 Modeling problems slide

2.1.1 Manual change tracking slide

  • Sooner or later, NOW is inadequate
  • Adhoc, incompatible approaches
    • Timestamps for important fields
    • Special tables keeping data delta, with no sensible way to query
  • Tedious, error prone
    • Big problem in click, bnb, and probably soon academy

2.1.2 Complicated queries and logic slide

  • Some examples, in fact most of the complexity in a lot of projects stems from the fact that only NOW data is kept
  • Academy
    • Which students were removed from a class, then re-enrolled?
    • What was the class schedule look like before this reschedule? How about last week?
  • Bnb
    • Which bids were placed at the last possible moment?
    • Which users bid aggressively even with few points left?
    • Which products get frequent shortage? When?
  • Click
    • How often do responders change opinion?
    • What was the doctor's prognosis for this case last week?

2.2 Operational problems slide

2.2.1 Fear of round-trips slide

  • Get all the needed data before it is destroys
  • Often the logic is so complex that round-trips are unavoidable

2.2.2 Fear of overloading the DB server slide

  • Because it is a bossy stuck-up fragile little princess
  • Giving it more power (for fear of round-trips) makes it worse

images/bossy.jpg

2.2.3 Errors are hard to track down slide

  • When data is corrupted, we don't know exactly when
  • If we know when the error happened, we have no clue what the data looked like
  • When something just randomly happens, we have no useful trace, at all

2.2.4 Redundant data is problematic slide

  • Hard to reconcile inconsistency without historical records
  • Unavoidable in many cases, for performance

3 Is there any hope? slide

Immutable data with built-in time model

Next time

4 Before you go slide

Place-oriented over value-oriented programming is the root problem

Do yourself a favor and watch this

http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey

Date: 2012-12-19T15:07+0700

Author: Nguyễn Tuấn Anh

Org version 7.9.2 with Emacs version 24

Validate XHTML 1.0