AWS Well-Architected Framework | B2B Internet Solutions Ltd.

This page is a cut down version of the November 2016 - "AWS Well-Architected Framework" White Paper. It includes links to all the relevant technologies mentioned in the document. It can be used by Cloud Architects as a great reminder.

Click on the following links to see more relevant information:

The Five Pillars of the Well-Architected Framework

These are:

Security Pillar - The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
Reliability Pillar - The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as bad configurations or transient network issues.
Performance Efficiency Pillar - The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
Cost Optimisation Pillar - The ability to avoid or eliminate unneeded cost or sub-optimal resources.
Operational Excellence Pillar - The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

General Design Principles

Because of the difference between the Cloud and traditional infrastructure environments, you are able to change the way you do things and gain enormous benefit:

Stop guessing your capacity needs
Test systems at production scale
Automate to make architectural experimentation easier
Allow for evolutionary architectures
Data-Driven architectures
Improve through game days

Security Pillar

Apply security at all layers
Enable traceability
Implement a principle of least privilege
Focus on securing your system
Automate security best practices

There are five best practice areas for Security in the cloud:

Identity and access management - For workloads that require systems to have access to AWS, IAM enables secure access through instance profiles, identity federation, and temporary credentials. AWS recommends attaching MFA to the root account and locking the credentials with the MFA in a physically secured location. Use MFA for all user accounts.
Detective controls - Including AWS CloudTrail / AWS API calls, AWS Config and Amazon CloudWatch.
Infrastructure protection - Including Amazon Virtual Private Cloud (VPC) , hardened AWS Amazon Machine Image (AMI).
Data protection - Encryption and key management, AWS S3 is designed so if you are storing e.g. 10k objects you can expect to lose 1 every 10million years, also supports versioning, by default data stays within an AWS Region. AWS Cloud Compliance
Incident response - A process needs to be in place to respond and mitigate the potential impact of security incidents. AWS features that help this are: Detailed logging, automated event processing, pre-provision tooling and a 'clean room' (using Amazon CloudFormation or similar) to carry out forensics in a safe isolated environment.

Recommended further reading on and around the Security Pillar:

AWS Security Center
AWS Security Blog
AWS Cloud Compliance
Whitepaper - AWS Security Overview
Whitepaper - AWS Security Best Practices
Whitepaper - AWS Risk and Security Compliance
Video - Security of the AWS Cloud
Video - Shared Responsibility Overview

Reliability Pillar

The Reliability pillar includes the ability of a system to recover from
infrastructure or service disruptions, dynamically acquire computing resources
to meet demand, and mitigate disruptions such as misconfigurations or
transient network issues.

Reliability Pillar Design Principles

Test recovery procedures
Automatically recover from failure
Scale horizontally to increase aggregate system availability
Stop guessing capacity
Manage change in automation

Best Practices for Reliability in the Cloud

Foundations - Most of this is already covered if just using the AWS Cloud, E.g. Ensuring sufficient bandwidth or computing capacity is in place, and AWS sets service limits to avoid accidentally over provisioning (or use!) of resources. This may be slightly different if using a hybrid approach. Important AWS services here include AWS Identity and Access Management (IAM) and Amazon VPC.
Change Management - Being aware of how change affects a system allows you to plan proactively, and
monitoring allows you to quickly identify trends that could lead to capacity
issues or SLA breaches. With AWS it is easy to automatically monitor and react to both natural demand and release of resources, but also to unexpected demands / occurrences. AWS services that can help here are: AWS CloudTrail and AWS Config.
Failure Management - Your recovery processes should be as well exercised as your production processes. AWS CloudFormation and Chef can both help here.

The AWS service that is key to ensuring reliability is Amazon CloudWatch.

Recommended further reading on and around the Reliability Pillar:

Video - Nov 2014 - Embracing Failure: Fault-Injection and Service Reliability - NetFlix Director of Operations
Analyst Report - March 2014 - Benchmarking Availability and Reliability in the Cloud
Service Limits
Service Limit Reports Blog
Document - November 2014 - Backup Archive and Restore Approach Using AWS
Document - February 2015 - Managing your AWS Infrastructure at Scale
Document - October 2014 - AWS Disaster Recovery
Document - July 2014 - AWS Amazon VPC Connectivity Options
AWS Premium Support
Trusted Advisor

Performance Efficiency Pillar

The Performance Efficiency pillar focuses on the efficient use of computing
resources to meet requirements and maintaining that efficiency as demand
changes and technologies evolve.

Design principles:

Democratise advanced technologies - Use AWS services rather than trying to become a master of everything yourselves.

Go global in minutes
Use serverless architectures
Experiment more often
Mechanical sympathy

The four best practice areas for Performance Efficiency in the cloud:

Selection (compute, storage, database, network) - Not forgetting location.
Review - Watch for new AWS releases in areas where your architecture is performacne constrained.
Monitoring - Use Amazon CloudWatch along with Amazon Kinesis, Amazon Simple Queue Service (SQS) and AWS Lambda. Plan "game days" in the production environment.
Tradeoffs - Also consider Amazon ElastiCache (using Redis or Memcache) and Amazon CloudFront.

In summary - Take a data-driven approach.

Key AWS Services for the Performance Pillar

Starting point - Amazon CloudWatch.

Compute - Auto Scaling
Storage - Amazon EBS, Amazon S3 and Amazon S3 Transfer Acceleration
Database - Amazon RDS and Amazon DynamoDB
Network - Amazon Route 53, Amazon VPC and AWS Direct Connect

Recommended further reading on and around the AWS Performance Pillar:

Nov 2014 - Video Channel - Performance AWS re:Invent 2014
Nov 2014 - Video - Performance Benchmarking on AWS
Documentation - Amazon S3 Performance Optimisation
Documentation - Amazon EBS Volume Performance

Cost Optimisation Pillar

The objective is to build and operate a cost aware system through a continual process of refinement and improvement from initial design with the goal of ensuring business objectives are achieved at minimum cost.

Cost Optimisation Design Principles

Adopt a consumption model, i.e. pay only for what you use
Benefit from economies of scale
Stop spending money on data centre operations
Analyse and attribute expenditure
Use managed services to reduce cost of ownership

B2B Internet Solutions Ltd

Synopsis of: AWS Well-Architected Framework