Talk shop and learn about MySQL and occasionally some other stores!

Topics and RecordingsLearn From Previous Discussions!

How to JoinSome simple setup help!

Follow Us On


#DBHangOps 03/19/15 -- Failing over in the moment, Old and weird bugs, and more!

Hello everybody!

Join in #DBHangOps this Thursday, March, 19, 2015 at 11:00am pacific (19:00 GMT), to participate in the discussion about:

  • Failing over in the moment
    • How do you recognize you need to failover?
    • When is it safe to kill -9 the server?
    • Other thoughts?
  • Old/Weird bugs!
  • GTID for operators -- Have you set it up?

You can check out the event page at https://plus.google.com/events/ch7dvhercc2anl9knnvig02hng4 on Thursday to participate.

As always, you can still watch the #DBHangOps twitter search, the @DBHangOps twitter feed, or this blog post to get a link for the google hangout on Thursday!

See all of you on Thursday!

You can catch a livestream at:

Show Notes

Failing over in the moment

  • Failover stories?
    • Etsy has an active<->active topology, so writes go to both masters
    • Historically, most problems are hardware trouble that requires some sort of failover
    • typically set the read_only flag on a machine that's getting moved to a slave role
    • push a configuration to move traffic from one write master to another to manage failover
  • In the past, had a custom heartbeat tool to set read_only flags correctly and move customer traffic automatically

Master<->Master Replication

  • Master<->Master replication is a pretty common pattern to be able to quickly failover
    • This can give benefits around quick failover if you pin reads/writes to one side
  • Not wholly recommended because of the gotchas that can come up
    • Potential for data drift as writes hit both masters in a topology
  • MySQL 5.6 with GTIDs and something like MySQL Fabric make it easier to not need Master<->Master replication

  • When is it safe to kill -9 the server?

    • Make a judgement call. If you can't failover a server, you may wait less time
  • When do you failover a server?

    • Typically for maintenance
    • Hardware issues
    • In larger shops, this is probably more common
    • some edge case in the application that can't be reverted quickly

Other thoughts?

  • Being able to setup delayed replication in MySQL for disaster recovery scenarios
    • folks historically would use tools to do this (e.g. pt-slave-delay)
    • In MySQL 5.6+, you can natively do this and have more confidence about the catch up

Old/Weird bugs!

GTIDs for Operators

Other links!