#HTTPS WWW DROPBOX COM SERIES#
(For those who are less familiar with the topic, we recommend this series of tutorials by Dropbox alum Tammy Butow to get an overview.)Īvailability SEVs, though by no means the only type of critical incident, are a useful slice to explore in more depth. The basic framework for managing incidents at Dropbox, which we call SEVs (as in SEVerity), is similar to the ones employed by many other SaaS companies. (Their usefulness will depend on your tech stack, org size, and other factors.) Instead, we hope this serves as a case study for how you can take a systematic view of your organization’s own incident response and evolve it to meet your users’ needs. You probably won’t find all of these in a textbook description of an incident command structure, and you shouldn’t view these improvements as a one-size-fits-all approach for every company. This post goes deeper into some of the lessons Dropbox has learned in incident management. The tweaks we’ve made over time include technological, organizational, and procedural improvements. The key components of our incident management process have been in place for several years, but we’ve also found constant opportunities to evolve in this area. Every minute counts for our users during a potential site outage or product issue. Though we also employ proactive techniques such as Chaos engineering, how we respond to incidents has a significant bearing on our users’ experience. At Dropbox, we view incident management as a central element of our reliability efforts.