

Dynamo DB

A fast and flexible noSQL DB for applications that need consistent, single-digit millisecond latency at any scale.
Fully managed DB, can be configured to autoscale, integrates well with Lambda.
Supports both document and key-value data models
Flexible data model and reliable performance.
Stored on SSD
Spread across 3 geographically distinct dat centres
Choice of 2 consistency models:
Eventual Consistent Reads (Default)
Strongly Consistent Reads

Reads

Eventually Consistent Reads

Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data. (Best Read Performance)

Strongly Consistent Reads

A strongly consistent read returns a result that reflects all writes that received a successful response prior to the end

Structure

Tables
Items
Attributes
Supports key-value and document data structures
Key = Name of the data, Value = Data itself
Documents can be written in JSON, HTML or XML

Primary Keys

Dynamo DB Stores and retrieves data based on Primary Key - 2 Types:

Partition Key - unique attribute (e.g. user ID):

Value of partition key is input to an internal hash function which determines the partition or physical location on which the data is Stored
If you are using the partition key as your primary key, then no two items can have the same partition key.

Composite Key (Partition key + Sort Key) in combination:

Primary key would be a composite key consisting of
Partition Key - User ID
Sort Key - Timestamp of Post
2 items may have the same Partition key, but they must have a different sort key.
All items with the same Partition Key are stored together, then sorted according to the Sort key Value
Allows you to store multiple items with the same Partition Key

Access Control

Authentication and Access Control is managed using AWS IAM
You can create an IAM user within your AWS account which has specific permissions to access and create DynamoDB tables.
You can create an IAM role which enables you to obtain temporary access keys which can be used to access DynamoDB
You can also use a special IAM Condition to restrict user access to only their own records

IAM Conditions Example

Imagine a mobile gaming app with millions of users: * Users need to access high scores for each game they are playing * Access must be restricted to ensure they cannot view anyone else’s data

This can be done by adding a Condition to an IAM Policy to allow access only to items where the Partition Key value matches their User ID.

Exam Tips - Dynamo DB

DynamoDB is a low latency noSQL DB
Consists of Tables Items and Attributes
Supports both document and key-value data models
Supported document formats are JSON, HTML, XML
2 types of Primary key - Partition Key and combination of Partition Key + Sort Key (Composite Key)
2 consistency models: Strongly Consistent / Eventually Consistent
Access is controlled using IAM policies
Fine grained access control using IAM Condition parameter: dynamodb:LeadingKeys to allow users to access only the items where the partition key value matches their user ID

Useful Example Reference with PHP related scripts.

Sample CLI Commands

aws dynamodb get-item --table-name ProductCatalog --key '{"Id": {"N":"205"}}'

Index

In SQL DBs an index is a data structure which allows you to perform fast queries on specific columns in a table. You select the columns that you want included in the index and run your searches on the index - rather than the entire dataset.

Dynamo DB has two types of Index (even though its NoSQL): 1. Local Secondary Index 2. Global Secondary Index

Local Secondary Index

Can only be created when you are creating your table
You cannot add, remove or modify it later
It has the same partition key as your original table
But a different Sort Key
Gives you a different view of your data, organized according to an alternative Sort Key
Any queries based on this Sort Key are much faster using the index than the main table
e.g. Partition Key: User ID, Sort Key: account creation date

Global Secondary Index

You can create when you create your table, or add it later
Different Partition Key as well as Sort Key
Gives a completely different view of the data
Speeds up any queries related to this alternative partition and sort key
e.g. Partition Key: email address, Sort Key: last log-in date

Exam Tips - Indexes

Enable fast queries on specific data columns
Give you a different view of your data, based on alternative Partition / Sort Keys
Important to understand difference

Local Secondary Index	Global Secondary Index
Must be created at same time table is created	Can create at any time (including table creation)
Same Partition Key as your Table	Different Partition Key
Different Sort Key	Different Sort Key

Query & Scan

Query

A query operation finds item in a table based on the Primary Key attribute and a distinct value to search for e.g. select and item where the user ID is equal to 212, will select all attributes for that name e.g. first name, surname, email etc.

Use an optional Sort Key name and value to refine the results e.g. if Sort Key is a timestamp, you can refine query to only select items from last 7 days
By default a query returns all the attributes for the items but you can use the ProjectionExpression parameter if you want the query to only return the specific attributes you want e.g. if you only want to see the email address rather than all the attributes
Results are always sorted by the Sort Key
Numeric order - by default in ascending order (1,2,3,4)
ASCII character code values
You can reverse the order by setting the ScanIndexForward parameter to false - (not this param is only related to queries no scan)
By default, Queries are Eventually Consistent
You need to explicitly set the query to be Strongly Consistent

Scan

A scan operation examines every item in the table.

By default returns all data Attributes
Use the ProjectionExpression parameter to refine the scan to return only the attributes you want

Query or Scan?

Query is more efficient than a Scan
Scan dumps the entire table, then filters out the values to provide the desired result - removing the unwanted data
This adds an extra step of removing the data you don’t want
As the table grows, the scan operation takes longer
Scan operation on a larger table can use up the provisioned throughput for a large table in just a single operation

How to Improve Performance

You can reduce the impact of a query or scan by setting a smaller page size which uses fewer read operations e.g. set the page size to return 40 Items
Larger number of smaller operations will allow other requests to succeed without throttling
Avoid using scan operations if you can: design tables in a way that you can use Query, Get, or BatchGetItem APIs
By default, a scan operation processes data sequentially in returning 1MB increments before moving on to retrieve the next 1MB of data. It can only scan one partition at a time.
You can configure DynamoDB to use Parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel
Best to avoid parallel scans if your table or index is already incurring heavy read / write activity from other applications

Exam Tips - Scan -v- Query

A Query operation finds items in a table using only the Primary Key attribute -> You provide the primary key and a distinct value to search for
A scan operation examines every item in the table -> By default returns all data attributes
Use the ProjectionExpression parameter to refine the results
Query results always sorted by Sort Key (if there is one)
Sorted in ascending order
Set ScanIndexForward parameter to false to reverse the order - queries only
Query operation is generally more efficient than a Scan
Reduce the impact of a query of scan by setting a smaller page size which uses fewer read operations
Isolate scan operations to specific tables and segregate them from your mission-critical traffic
Try Parallel scans, rather than the default sequential scan
Avoid using scan operations if you can: design tables in a way that you can use the Query, Get, or BatchGetItem APIs

Provisioned Throughput

Dynamo DB Provisioned Throughput is measure in Capacity Units.

When you create your table, you specify your requirements in terms of Read Capacity Units and Write Capacity Units
1x Write Capacity Unit = 1x 1KB write per second
1x Read Capacity Unit = 1x Strongly Consistent Read of 4KB per second OR 2x Eventually ConsistentReads of 4KB per second (Default)

Strongly Consistent Reads Calculation

Your application needs to read 80 items (table rows) per second. Each item is 3KB in size. You need Strongly consistent reads.

First: Calculate how many Read Capacity Units needed for each read: size of each item / 4KB i.e. 3KB/4Kb = 0.75. Rounded up to the nearest whole number, each read will need 1x Read Capacity Unit per read operation.
Multiplied by the number of reads per second = 80 Read Capacity Units required

Eventually Consistent Reads Calculation

Same as above BUT 2x 4KB reads per second - double the throughput of Strongly Consistent Reads
3KB/4KB = 0.75, round to nearest whole number = 1
Multiply by number of reads per second = 80
Divided 80 by 2, only need 40 read capacity units for Eventually Consistent Reads

Write Capacity Unit Calculation

You want to write 100 items per second. Each item 512 bytes in size

First: Calculate how many Capacity Units for each write: Size of each item /1KB (for Write CU) 512 bytes / 1KB = 0.5
Round to nearest whole number = 1 Write Capacity Unit per write operation operation
Multiplied by number of writes per second = 100 Write Capacity Units required

Exam Tips - Provisioned throughput

Measured in Capacity Units
1x Write Capacity Unit = 1x 1KB Write per Second
1x Read Capacity Unit = 1x 4KB Strongly Consistent Read OR 2x 4KB Eventually Consistent Read

On Demand Capacity

Charges apply for: Reading, Writing, Storing data
Don’t need to specify your requirements
DynamoDB instantly scales up and down based on the activity of your application
Great for unpredictable workloads
You want to pay for only what you use (pay per request)

Which Pricing Model to Use

On-Demand Capacity	Provisioned Capacity
Unknown Workloads	Forecast read & write capacity requirements
Unpredictable Application Traffic	Predictable Application Traffic
Want Pay-Per-Use Model	App traffic is consistent or increases gradually
Spikey/Short-lived Projects

DynamoDB Transactions

ACID Transactions (Atomic, Consistent, Isolated, Durable)
Read or write multiple items across multiple tables as an all or nothing operations
Check for a pre-requisite condition before writing to a table

DynamoDB TTL

TTL attribute defines an expiry time for your data
Expired items marked for deletion
Great for removing irrelevant or old data:
Session data
Event logs
Temporary data
Reduces cost by automatically removing data which is no longer relevant
TTL expressed as epoch time i.e. when current time > TTL item expired and marked for deletion

Sample Commands

# ensure you have right IAM role & access permissions
aws iam get-user

# create sessiondata table
aws dynamodb create-table --table-name SessionData --attribute-defintiions \
AttributeName=UserID,AttributeType=N --key-schema \
AttributeName=UserID,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

# populate SessionData table
aws dynamodb bach-write-item --request-items file://items.json

DynamoDB Streams

Time-ordered sequence of item level modifications (insert, update, delete)

Logs are encrypted at rest and stored for 24hrs
Accessed using a dedicated endpoint
By default the Primary Key is recorded
Before and After images can be captured

Processing Streams

Events are recorded in near real-time
Apps can take actions based on contents of stream
Event source for Lambda
Lambda polls the DynamoDB Stream
Executes Lambda code based on a DynamoDB Streams event

Exam Tips - DynamoDB Streams

Time-ordered sequence of item level modifications in your DynamoDB Tables
Data is stored for 24 hours only
Can be used as an event source for Lambda so you can create applications which take actions based on events in your DynamoDB table

Provisioned Throughput Exceeded & Exponential Backoff

ProvisionedThroughputExceededException

Your request rate is too high for the read/write capacity provisioned on your DynamoDB table
SDK will automatically retries the requests until successful
If you are not using the SDK you can:
Reduce request frequency
Use Exponential Backoff

Exponential Backoff

Many components in a network can generate errors due to being overloaded
In addition to simple retries all AWS SDKs use Exponential Backoff
Progressively longer waits between consecutive retries e.g. 50ms, 100ms, 200ms for improved flow control
If after 1 minute this doesn’t work, your request size may be exceeding the throughput for your read/write capacity

Exam Tips - Provisioned Throughput & Exponential Backoff

If you see a ProvisionedThoguhputExceeded Error, this means the number of requests is too high
Exponential Backoff improves flow by retrying requests using progressively longer waits
This is not just true for DynamoDB, Exponential Backoff is a feature of every AWS SDK and applies to many services within AWS e.g. S#, CloudFormation, SES