#DBHangOps 01/08/15 -- More Compression!

Hello everybody!

Join in #DBHangOps this Thursday, January, 08, 2015 at 11:00am pacific (18:00 GMT), to participate in the discussion about:

More compression!
~~MySQL Event Scheduler~~
~~What is everyone's understanding of innodb_io_capacity?~~

You can check out the event page at https://plus.google.com/events/cb7pdn3egenc6c79lc4bc74kr04 on Thursday to participate.

As always, you can still watch the #DBHangOps twitter search, the @DBHangOps twitter feed, or this blog post to get a link for the google hangout on Thursday!

See all of you on Thursday!

You can catch a livestream at:

Show Notes

Compression

TokuDB compression

Quick review of TokuDB data structures
- Fractal Indexes in TokuDB are essentially a B+ Tree
- Compression was added after the original design was completed for this storage engine
- Fractal tree nodes are very large.
- This makes for great compression benefits
- By default, all nodes are 4MB (InnoDB pages are 16KB) and the same size (root nodes and leaf nodes)
  - As a result of spinning disks, this was the preferred size to better utilize disk bandwidth (circa 2008-2009)
- What's in a fractal tree leaf node?
- the first byte of a node's header is a byte to indicate compression algorithm (e.g. QuickLZ, LZMA, SNAPPY)
- The header also contains record ranges and offsets in the existing page (pointers to "basement nodes")
- Entire rows are stored inline in "basement nodes" (InnoDB stores a portion of records inline)
- MVCC data is stored inline (InnoDB does something else...)
- Why "Basement nodes"?
  - If running a point query, TokuDB only needs to fetch a basement node (64KB compressed by default) instead of the whole 4MB node
Compression in TokuDB!
- it's a simple 2 step process:
- Compress each basement node (this can be done in parallel)
- Write out a leaf node
  - compress basement nodes into a continues byte stream and create a new header with updated offsets for the compressed basement nodes
- After compression happens, the compressed nodes are written as-is (e.g. a compressed node that is down to 1MB is just written. No standardized compressed format is written)
- Lessons Learned
- Rotating disks
  - Higher compression is usually a win
  - The cost of an IO (milliseconds) is far greater than the cost of decompression (microseconds)
  - Having a basement node of 64KB or 128KB with LZMA/ZLib compression is usually a huge savings
- SSDs
  - The cost of an IO (microseconds) might be less than the cost of decompression
  - QuickLZ or SNAPPY compression and a 16KB basement node is probably better here
  - Depends more on the workload

Quick InnoDB Compression Comments

How does InnoDB manage storing rows inline?
- It depends on the storage engine version. In Barracuda:
  - if the row fits in the it'll be stored inline. If data can't fit inline, it'll punt it off-page (e.g. BLOB fields) and leave a 20-byte pointer to the off-page data
InnoDB Compression Future
- Using filesystems that support sparse files to allow for transparent page compression
- Newer versions of Ext4, ZFS, XFS, and NTFS should support this
- Transparent page compression eliminates complexity around key block sizing with your page size
- InnoDB will be able to set punch hole support for any compressed data pages it's writing out which saves on disk usage
- This will give better performance and make InnoDB's compression configuration a lot easier.

#DBHangOps!

Follow Us On

#DBHangOps 01/08/15 -- More Compression!

Show Notes

Compression

TokuDB compression

Quick InnoDB Compression Comments

Links