#DBHangOps 12/11/14 -- Mixing Metadata with Data, InnoDB Compression, and more!
Hello everybody!
Join in #DBHangOps this Thursday, December, 11, 2014 at 11:00am pacific (18:00 GMT), to participate in the discussion about:
- Mixing Metadata and Data (requested by Shlomi Noach)
- E.g. schema representing some data instead of metadata
- InnoDB Compression (requested by John Cesario)
- How did you performance tune it for MySQL 5.6?
- Expected metrics changes when enabling it
- Overall performance with InnoDB compression enabled vs. disabled
- Comparing TokuDB and InnoDB compression
You can check out the event page at https://plus.google.com/events/cmiu31cksfj21t4b196hg7mi7jg on Thursday to participate.
As always, you can still watch the #DBHangOps twitter search, the @DBHangOps twitter feed, or this blog post to get a link for the google hangout on Thursday!
See all of you on Thursday!
You can catch a livestream at:
Mixing Data and Metadata
- sometimes you're forced to mix some of this data
- Views in MySQL or triggers
- these are created and run on behalf of the user that created them
- When dumping your schema or data, you don't want to know about this user account.
- running a mysqldump and then loading it into another server may fail because the user account doesn't exist on a new server
- If you create all your views/triggers as the root@localhost account, you may avoid these problems
- The flipside is that root@localhost has A LOT of privileges, not a limited set
- Ultimately, metadata around your data gets exported with it all the time
- If I create a trigger, it's forever owned by my user account or "root@localhost"
- It just...bothers Shlomi!
- How could we improve this?
- Maybe have it run the same way the event scheduler does -- it just runs as the 'mysql' user.
- It almost feels like the entire security model might need to be changed
- Maybe making these objects more apart of the application schema than the mysql routines table
- How do others deal with this?
- Some people just don't use stored routines at all! The instruments needed to properly monitor their performance are mostly non-existent until newer version of MySQL (5.7+)
- Views are definitley used more in general. For example, common_schema uses views to supply a lot of helpful functionality.
- There's a bunch of stuff that people may do that ties their data to MySQL specifically
- Table partitioning depends on your data to influence the metadata for patitioning
- Changes you'll make to your data access patterns may be dependent on the data in your schemas
InnoDB Compression
- Introduced in MySQL 5.0 (or 5.1) with the InnoDB plugin
- Every company has talked about compression at some point because it helps you save money on expensive storage (like solid-state drives!)
- Some compression issues/gotchas
- Appears to be a big hit on write latencies sometimes
- Also some rough stalls in InnoDB sometimes
- If you can afford some looser SLAs around your writes, then this is probably okay, overall
- Standard InnoDB page size is 16K.
- You can set the "key block size" for InnoDB compression to a value that will encourage it to pack into a smaller page size
- e.g. setting this value to "4" would be trying to pack innodb data into a 4kb page
- Getting good performance out of InnoDB compression, you really need to know your data
- If innodb can't meet the key block size you want, it'll have to recompress at a higher size (a "compression miss")
- a compression miss will result in higher CPU
- INFORMATION_SCHEMA.INNODB_CMP is the table that'll have information around compression in InnoDB.
- In TokuDB, compression is enabled by default. All benchmarks are always run with compression to be on.
- Something interesting that was added in MySQL 5.6 was "dynamic padding"
- "dynamic padding" helps with heuristics around compression. If there's a lot of compression misses, MySQL will pad pages in the table with empty space to help data fit.
- The downside of this is that the full 16K data block needs to go into memory, so you'll have some "empty space" when in memory
- When working with compressed innodb tables, some portion of the buffer pool is reserved for compressed pages and some is for regular pages
- To minimize IO, at times the buffer pool will containt both compressed and uncompressed pages.
- Compression's hard!
- Direct I/O provides the most consistent performance. For workloads where there are a lot of reads as opposed to writes, then more spindles and memory can allow for OS caching of the compressed data
- Compression is slower, yet your CPUs may not be 100% utilized sometimes so is compression still being efficient?
- the actual compression part isn't what's slowing it down. There seems to be other mutexes happening inside InnoDB that slow down things.
- The LRU pending flush list and other buffer pool mutexes showed some increased locking when compression was enabled
- How do we solve this problem? Storage is always expensive, so getting these efficiencies is ideal.
- the "compression miss" problem is probably a significantly large portion of this.
- Some of the challenges with "hole punching" in file systems (which allows for simple padding in InnoDB pages) arise operationally:
- What compression libraries are used?
- In MySQL 5.5 and 5.6, compression defaults to ZLIB
- Newer labs releases allow you to specify LZ4
- TokuDB supports a wider array of compression libraries (dynamically changeable!):
- ZLIB
- LZMA
- QuickLZ
- SNAPPY is being benchamrked right now
- Some helpful reading around compression: