Is Your Engineering Team Ready for The Holidays?

For many, the holiday season is a time to take off work and relax while business is slow. But for many technology companies, we’re entering some of the highest-traffic periods of the year. Providing support and appropriate incident response can be the most challenging at this time because many staff members will have limited availability while traveling and celebrating with family.

To ensure you’re adequately prepared to address an incident over the holidays, here’s a quick holiday preparedness checklist.

Capacity Planning

Make sure you have adequate capacity to handle potentially increased loads during the holiday season. If you have implemented autoscaling, review your scaling configuration and see if they are properly tuned for the anticipated traffic patterns.

Whether you have dynamic or static capacity, you may want to add some extra capacity as an insurance policy.

Run Books and Documentation

For those not familiar, a run book is your emergency response manual for how to handle various scenarios. Make sure your run books are up to date, and that staff members aware of these resources in case they are tasked with responding to an incident.

Logging, Monitoring, and Notifications

Check that adequate logging and monitoring is in place, and that the right people will receive email and SMS notifications is something goes wrong. Tools like AWS CloudWatch and PagerDuty are great for handling this requirement.

Be sure to review error thresholds for triggering an error notification, and that the thresholds are properly tuned for the anticipated traffic patterns.

On Call Schedule and Escalation Policy

Make you everyone is aware of the on-call schedule, which designates who is first to respond to an incident. The policies should be very clear about required response times, and the responsibility for the on-call engineer to have immediate and full access to systems in case of a problem.

Escalation policies are also important in case the on-call engineer fails to respond to an incident in time, or the engineer needs to escalate the issue to bring in additional resources to help.

Conclusion

Hopefully you find this quick holiday preparedness checklist useful! Of course, Konture Technology Services can help you put together effective incident mitigation and preparedness strategies to ensure you’re ready for anything the interwebs throws your way.

Let’s take a moment and thank the dedicated men and women who will be on call during the holidays to make sure we can continue to enjoy great web based entertainment, shopping, and critical services. We salute you!

P.S

Ironically, as I was searching for a banner image for this very article, Adobe Stock is crashing and timing out. They might want to review this checklist!