Friday, November 9, 2012

Incident Response Policy

What happens in your company when a production incident occurs?
Usually in a typical startup, you will see engineers running around frantically trying to resolve the problem. However, as soon as the incident is resolved, they forget about it and go back to their usual business.

A good incident response policy can help bring order into chaos. There are a few best-practices that one should keep in mind when production outages occur:

- Having a procedure in place helps reduce the panic. Security incidents should be treated differently than production outages.
- In the report, explain a response timeline and how the problem was discovered.
- An incident report should be written the same day as an incident occurred. Otherwise, you risk forgetting what happened.
- It should have concrete follow-up actions, tracked as JIRA tickets. If you don't do this, then engineers will not follow up.
- Put up incident reports in a public location and compute metrics Are incidents happening less frequently this month than the previous? Is there any correlation between incidents? Are follow-up actions being addressed?

Attached is a sample incident response template that I've used.

Incident Analysis Report

Time of Incident:
Time of Recovery:
5:50AM 3/15/12
Date Issue first identified
Discovered by:
Incident Report Prepared By:

I.           Description of Incident:
II.         AWS Statement
2:40 AM PDT We are investigating connectivity issues for EC2 in the US-EAST-1 region.
3:03 AM PDT Between 2:22 AM and 2:43 AM PDT internet connectivity was impaired in the US-EAST-1 region. Full connectivity has been restored. The service is operating normally.

III.       Business Impact: Frustrated customers because the website ACME was unaccessible.

IV.       Security Impact:
There are no known issues related to this subject           

V.         Technical Impact:
There weren’t enough servers to handle the load for new customers.

VI.       Event Timeline: 

All Amazon hosts were inaccessible
Service was restored 

VII.     Lessons Learned:
-          We need to know the business impact for each server on Amazon and put DR polices and procedures in place for outages.  We could also leverage the California EC2 Cloud to potentially help outages in just Virginia.

VIII.   Action Items:

1.       Called EC2 and they are going to alert us of what they find out about the issue (INFRA-123)
2.       Identify what we can  and can’t do if EC2 goes down   (INFRA-345)



Very smart idea. I am printing this out and having my team use it as a template. Thanks.

Keeping a record that tracks down all the transaction a company entered is advisable to everyone entrepreneur or not. Apart from producing an incident record, why don't try having a transaction log as well to monitor all the transaction taken by you or the company as a whole.

Liệu có nên mua hàng trên amazon hay không? Nếu đang có thắc mắc như thế hãy liên hệ với chúng tôi để được tư vấn giải đáp. Ngoài ra nếu bạn cần mua hang my online hay tìm nơi nhận ship hàng từ ebay. Thì hãy liên hệ với chúng tôi. Ngoài ra chúng tôi còn cung cấp nhiều dịch vụ vận chuyển khác: van chuyen hang di lao, gửi hàng đi mỹ, chuyển hàng đi đức giá rẻ, gửi hàng đi nhật , mua hộ hàng nước ngoài online, mua hàng uk, ship hàng từ pháp về việt nam,... Còn rất nhiều dịch vụ khác đang chờ đợi hãy liên hệ với chúng tôi khi bạn cần nhé.

Every county should set a department to help people who disable in any road or any kind of other accident. There are lots of people in this world which have no hope from anywhere. custom assignments writing taking steps toward this problem please be participate to save humanity.

I am sure this is a really interesting book and we can get get a lot of beneficent knowledge by it. The title of this book is really awesome hope the author tried his best to writing thesis and something special on this topic.

That's a very informative post especially when it comes to understanding the best ways to respond to an incident in the work place.
Research Paper Writing Services

Post a Comment