Post-incident Reviews
Lectures 96 • 40 slides
Narration
Session 96 Slide 1: Building an On-Call System
Phase 7: Troubleshooting and Incident Management
mindmap
root((On-Call System))
Rotation Design
Tool Introduction
PagerDuty
Opsgenie
Escalation
24/7 Support
Course Overview
- Building a system to support 24/7 service operation
- Designing a sustainable on-call system
Learning Objectives for Session 96
- What is on-call?
- How to think about rotation design
- Introduction to PagerDuty and Opsgenie
- Escalation flow
Why is an on-call system necessary?
- Failures can happen at any time
- Quick initial response minimizes damage
- Maintaining user trust
1/40