Shared resource defense (e.g. running in the cloud or using shared virtualized environment)
AWS has provisioned IOPS which can help you get more dedicated resources
When in shared environments, "over-provisioning" may get you a more dedicated resource
Tools/suggestions
Server-level changes
MAX CONNECTIONS -- you can set this on a server-level, user-leve, and per-user level. If you have multiple applications, assigning them different user and setting a MAX CONNECTION on the user acocunt can help
wait_timeout -- influences how long sleeping connections are allowed on the server. Default is 8 hours. Consider setting this lower (perhaps 5-15 minutes)
lock_wait_timeout -- This influences how long a query will wait for a lock to be granted before proceeding. The default of this is 1 year. This value does not influence locking behavior in InnoDB (see innodb_lock_wait_timeout)
This is checked by some tools to influence their behavior (e.g. pt-online-schema-change)
slave_net_timeout -- cross geo replication (cross-continent) and network delays hit 100ms+, replication might constantly stop
After you're in a bad state
pt-kill -- Get familiar with the tool. It can be supremely helpful during problems!