Date: 2016-11-04
Time: 11:50–12:40
Room: Omega
Level: Intermediate
At Skype we run our most critical databases on PostgreSQL. Started out with just a handful of Postgres databases, we've grown to span multiple data centers with thousands of database instances, supporting millions of concurrent users.
This journey has presented many unique technical and organizational challenges. The most memorable are the moments when things go horribly wrong: split brain clusters, query floods, DC failures, DDOS-ing yourself with Skype, etc. Some of these events are funny, some are horrifying but all have provided good learning opportunities. This presentation is about sharing those moments, so that everyone can learn from them.
Keywords: war stories, troubleshooting, resilience, distributed systems