This page is a cut down version of the November 2016 - "AWS Well-Architected Framework" White Paper. It includes links to all the relevant technologies mentioned in the document. It can be used by Cloud Architects as a great reminder.
Click on the following links to see more relevant information:
The Five Pillars of the Well-Architected Framework
These are:
- Security Pillar - The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
- Reliability Pillar - The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as bad configurations or transient network issues.
- Performance Efficiency Pillar - The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
- Cost Optimisation Pillar - The ability to avoid or eliminate unneeded cost or sub-optimal resources.
- Operational Excellence Pillar - The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.
General Design Principles
Because of the difference between the Cloud and traditional infrastructure environments, you are able to change the way you do things and gain enormous benefit:
- Stop guessing your capacity needs
- Test systems at production scale
- Automate to make architectural experimentation easier
- Allow for evolutionary architectures
- Data-Driven architectures
- Improve through game days
Security Pillar
- Apply security at all layers
- Enable traceability
- Implement a principle of least privilege
- Focus on securing your system
- Automate security best practices
There are five best practice areas for Security in the cloud:
- Identity and access management - For workloads that require systems to have access to AWS, IAM enables secure access through instance profiles, identity federation, and temporary credentials. AWS recommends attaching MFA to the root account and locking the credentials with the MFA in a physically secured location. Use MFA for all user accounts.
- Detective controls - Including AWS CloudTrail / AWS API calls, AWS Config and Amazon CloudWatch.
- Infrastructure protection - Including Amazon Virtual Private Cloud (VPC) , hardened AWS Amazon Machine Image (AMI).
- Data protection - Encryption and key management, AWS S3 is designed so if you are storing e.g. 10k objects you can expect to lose 1 every 10million years, also supports versioning, by default data stays within an AWS Region. AWS Cloud Compliance
- Incident response - A process needs to be in place to respond and mitigate the potential impact of security incidents. AWS features that help this are: Detailed logging, automated event processing, pre-provision tooling and a 'clean room' (using Amazon CloudFormation or similar) to carry out forensics in a safe isolated environment.
Recommended further reading on and around the Security Pillar:
- AWS Security Center
- AWS Security Blog
- AWS Cloud Compliance
- Whitepaper - AWS Security Overview
- Whitepaper - AWS Security Best Practices
- Whitepaper - AWS Risk and Security Compliance
- Video - Security of the AWS Cloud
- Video - Shared Responsibility Overview
Reliability Pillar
The Reliability pillar includes the ability of a system to recover from
infrastructure or service disruptions, dynamically acquire computing resources
to meet demand, and mitigate disruptions such as misconfigurations or
transient network issues.
Reliability Pillar Design Principles
- Test recovery procedures
- Automatically recover from failure
- Scale horizontally to increase aggregate system availability
- Stop guessing capacity
- Manage change in automation
Best Practices for Reliability in the Cloud
- Foundations - Most of this is already covered if just using the AWS Cloud, E.g. Ensuring sufficient bandwidth or computing capacity is in place, and AWS sets service limits to avoid accidentally over provisioning (or use!) of resources. This may be slightly different if using a hybrid approach. Important AWS services here include AWS Identity and Access Management (IAM) and Amazon VPC.
- Change Management - Being aware of how change affects a system allows you to plan proactively, and
monitoring allows you to quickly identify trends that could lead to capacity
issues or SLA breaches. With AWS it is easy to automatically monitor and react to both natural demand and release of resources, but also to unexpected demands / occurrences. AWS services that can help here are: AWS CloudTrail and AWS Config. - Failure Management - Your recovery processes should be as well exercised as your production processes. AWS CloudFormation and Chef can both help here.
The AWS service that is key to ensuring reliability is Amazon CloudWatch.
Recommended further reading on and around the Reliability Pillar:
- Video - Nov 2014 - Embracing Failure: Fault-Injection and Service Reliability - NetFlix Director of Operations
- Analyst Report - March 2014 - Benchmarking Availability and Reliability in the Cloud
- Service Limits
- Service Limit Reports Blog
- Document - November 2014 - Backup Archive and Restore Approach Using AWS
- Document - February 2015 - Managing your AWS Infrastructure at Scale
- Document - October 2014 - AWS Disaster Recovery
- Document - July 2014 - AWS Amazon VPC Connectivity Options
- AWS Premium Support
- Trusted Advisor
Performance Efficiency Pillar
The Performance Efficiency pillar focuses on the efficient use of computing
resources to meet requirements and maintaining that efficiency as demand
changes and technologies evolve.
Design principles:
Democratise advanced technologies - Use AWS services rather than trying to become a master of everything yourselves.
- Go global in minutes
- Use serverless architectures
- Experiment more often
- Mechanical sympathy
The four best practice areas for Performance Efficiency in the cloud:
- Selection (compute, storage, database, network) - Not forgetting location.
- Review - Watch for new AWS releases in areas where your architecture is performacne constrained.
- Monitoring - Use Amazon CloudWatch along with Amazon Kinesis, Amazon Simple Queue Service (SQS) and AWS Lambda. Plan "game days" in the production environment.
- Tradeoffs - Also consider Amazon ElastiCache (using Redis or Memcache) and Amazon CloudFront.
In summary - Take a data-driven approach.
Key AWS Services for the Performance Pillar
Starting point - Amazon CloudWatch.
- Compute - Auto Scaling
- Storage - Amazon EBS, Amazon S3 and Amazon S3 Transfer Acceleration
- Database - Amazon RDS and Amazon DynamoDB
- Network - Amazon Route 53, Amazon VPC and AWS Direct Connect
Recommended further reading on and around the AWS Performance Pillar:
- Nov 2014 - Video Channel - Performance AWS re:Invent 2014
- Nov 2014 - Video - Performance Benchmarking on AWS
- Documentation - Amazon S3 Performance Optimisation
- Documentation - Amazon EBS Volume Performance
Cost Optimisation Pillar
The objective is to build and operate a cost aware system through a continual process of refinement and improvement from initial design with the goal of ensuring business objectives are achieved at minimum cost.
Cost Optimisation Design Principles
- Adopt a consumption model, i.e. pay only for what you use
- Benefit from economies of scale
- Stop spending money on data centre operations
- Analyse and attribute expenditure
- Use managed services to reduce cost of ownership