Talk shop and learn about MySQL and occasionally some other stores!

Topics and RecordingsLearn From Previous Discussions!

How to JoinSome simple setup help!

Follow Us On


#DBHangOps 01/08/15 -- More Compression!

Hello everybody!

Join in #DBHangOps this Thursday, January, 08, 2015 at 11:00am pacific (18:00 GMT), to participate in the discussion about:

  • More compression!
  • MySQL Event Scheduler
  • What is everyone's understanding of innodb_io_capacity?

You can check out the event page at https://plus.google.com/events/cb7pdn3egenc6c79lc4bc74kr04 on Thursday to participate.

As always, you can still watch the #DBHangOps twitter search, the @DBHangOps twitter feed, or this blog post to get a link for the google hangout on Thursday!

See all of you on Thursday!

You can catch a livestream at:

Show Notes


TokuDB compression

  • Quick review of TokuDB data structures
    • Fractal Indexes in TokuDB are essentially a B+ Tree
    • Compression was added after the original design was completed for this storage engine
    • Fractal tree nodes are very large.
    • This makes for great compression benefits
    • By default, all nodes are 4MB (InnoDB pages are 16KB) and the same size (root nodes and leaf nodes)
      • As a result of spinning disks, this was the preferred size to better utilize disk bandwidth (circa 2008-2009)
    • What's in a fractal tree leaf node?
    • the first byte of a node's header is a byte to indicate compression algorithm (e.g. QuickLZ, LZMA, SNAPPY)
    • The header also contains record ranges and offsets in the existing page (pointers to "basement nodes")
    • Entire rows are stored inline in "basement nodes" (InnoDB stores a portion of records inline)
    • MVCC data is stored inline (InnoDB does something else...)
    • Why "Basement nodes"?
      • If running a point query, TokuDB only needs to fetch a basement node (64KB compressed by default) instead of the whole 4MB node
  • Compression in TokuDB!
    • it's a simple 2 step process:
    • Compress each basement node (this can be done in parallel)
    • Write out a leaf node
      • compress basement nodes into a continues byte stream and create a new header with updated offsets for the compressed basement nodes
    • After compression happens, the compressed nodes are written as-is (e.g. a compressed node that is down to 1MB is just written. No standardized compressed format is written)
    • Lessons Learned
    • Rotating disks
      • Higher compression is usually a win
      • The cost of an IO (milliseconds) is far greater than the cost of decompression (microseconds)
      • Having a basement node of 64KB or 128KB with LZMA/ZLib compression is usually a huge savings
    • SSDs
      • The cost of an IO (microseconds) might be less than the cost of decompression
      • QuickLZ or SNAPPY compression and a 16KB basement node is probably better here
      • Depends more on the workload

Quick InnoDB Compression Comments

  • How does InnoDB manage storing rows inline?
    • It depends on the storage engine version. In Barracuda:
      • if the row fits in the it'll be stored inline. If data can't fit inline, it'll punt it off-page (e.g. BLOB fields) and leave a 20-byte pointer to the off-page data
  • InnoDB Compression Future
    • Using filesystems that support sparse files to allow for transparent page compression
    • Newer versions of Ext4, ZFS, XFS, and NTFS should support this
    • Transparent page compression eliminates complexity around key block sizing with your page size
    • InnoDB will be able to set punch hole support for any compressed data pages it's writing out which saves on disk usage
    • This will give better performance and make InnoDB's compression configuration a lot easier.