Dynamo DB Notes
DynamoDB is a fully managed NoSQL database service.
Components
Tables
A table is a collection of data, similar to other database systems. For example a table called Employee can store name, designation, salary, manager, etc. Each item in the table has a unique identifier, or primary key. Besides the primary key, the table is schemaless and does not need to be defined while creating a table. Each item in a table can have its own distinct attributes.
Items
Each table contains zero or more items. An item is a unique group of attributes. In a Employee table, each item represents an employee. It is similar to rows, records, or tuples in other database systems. There is no limit to the number of items you can store in a table.
Attributes
Each item is composed of one or more attributes. An attribute is a lowest data element,similar to fields or columns in other database systems. For example, an item in a Employee table contains attributes called EmployeeID, LastName, FirstName, etc. An attribute can be scalar (single valued) or nested upto 32 levels.
Primary Key
It uniquely identifies each item in the table, there can be two kinds of primary key:
Partition Key: A single attribute makes the primary key. DynamoDB uses its value as input to an internal hash function. The output from the hash function determines the partition (physical storage) in which the item will be stored.
Partition key and sort key: A composite primary key, it is consists of two attributes. The first attribute is the partition key, and the second attribute is called the sort key. DynamoDB uses the partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value. In a table that has a partition key and a sort key, it’s possible for multiple items to have the same partition key value. However, those items must have different sort key values.
Note: Each primary key attribute must be a scalar (single valued). The data types allowed for primary key attributes are string, number, or binary. There are no such restrictions for other, non-key attributes.
Secondary Indexes
It alows to query a table using an alternate key, in addition to the primary key. A table can have one or more secondary indexes on a table. There are two kinds of indexes:
- Global secondary index: An index with a partition key and sort key that can be different from those on the table.
- Local secondary index: An index that has the same partition key as the table, but a different sort key.
DynamoDB Streams
It captures data modification events in a table. The data about these events appear in the stream in near-real time, and in the order that the events occurred. It can be used as a trigger for a Lambda function or with Kinesis . Each event is represented by a stream record, having name of the table, the event timestamp, and other metadata with a lifetime of 24 hours. If enabled, DynamoDB Streams writes a stream record whenever an item is:
- Added: The stream captures an image of the entire item, including all of its attributes.
- Updated: The stream captures the “before” and “after” image of any attributes that were modified in the item.
- Deleted: The stream captures an image of the entire item before it was deleted.
DynamoDB API
To interact with DynamoDB, any application must use a few simple API operations summary of which is below.
Control Plane
Operations to create and manage tables, work with indexes, streams, and other objects that are dependent on tables.
- CreateTable: Creates a new table. It can also create one or more secondary indexes, and enable DynamoDB Streams for the table.
- DescribeTable: Returns information about a table, like its primary key schema, throughput settings, and index information.
- ListTables: Returns the names of all tables in a list.
- UpdateTable: Modifies the settings of a table or its indexes, creates or removes new indexes on a table, or modifies DynamoDB Streams settings for a table.
- DeleteTable: Removes a table and all of its dependent objects from DynamoDB.
Data Plane
Operations to perform CRUD actions on data in a table. It also let you read data from a secondary index. These operations can be performed by either of the two methods below.
PartiQL
A SQL-Compatible Query Language to perform CRUD operations.
- ExecuteStatement: Reads multiple items, write or update a single item from a table. When writing or updating a single item, must specify the primary key attributes.
- BatchExecuteStatement: Writes, updates or reads multiple items from a table.
Classic APIs
Creating Data
- PutItem: Writes a single item to a table, atleast the primary key attributes needs to be specified.
- BatchWriteItem: Writes up to 25 items to a table. It can also delete multiple items from one or more tables.
Reading Data
- GetItem: Retrieves a single item from a table by specifying the primary key. It can retrieve the entire item, or just a subset of its attributes.
- BatchGetItem: Retrieves up to 100 items from one or more tables.
- Query: Retrieves all items by a specific partition key. It can retrieve entire items, or just a subset of their attributes. Optionally, a condition can be applied to the sort key values so that only a subset of the data with the same partition key is retrieved. This operation can be done on a table that has both a partition key and a sort key or on an index that has both a partition key and a sort key.
- Scan: Retrieves all items in the specified table or index. It can retrieve entire items, or just a subset of their attributes. Optionally, a filtering condition can be applied.
Updating Data
- UpdateItem: Modifies one or more attributes in an item by specifying the primary key. It can add new attributes and modify or remove existing attributes. It can also perform conditional updates, so that the update is only successful when a user-defined condition is met. Optionally, it can implement an atomic counter, which increments or decrements a numeric attribute without interfering with other write requests.
Deleting Data
- DeleteItem: Deletes a single item from a table by specifying the primary key.
- BatchWriteItem: Deletes up to 25 items from one or more tables.
DynamoDB Streams
Operations to enable or disable a stream on a table, and allow access to the data modification records contained in a stream.
- ListStreams: Returns a list of all streams, or just the stream for a specific table.
- DescribeStream: Returns information about a stream, such as its Amazon Resource Name (ARN) and where an application can begin reading the first few stream records.
- GetShardIterator: Returns a shard iterator, which is a data structure that an application can use to retrieve the records from the stream.
- GetRecords: Retrieves one or more stream records, using a given shard iterator.
Transactions
Transactions provide atomicity, consistency, isolation, and durability (ACID).
PartiQL - A SQL-Compatible Query Language
- ExecuteTransaction: A batch operation that allows CRUD operations to multiple items both within and across tables with a guaranteed all-or-nothing result.
Classic APIs
- TransactWriteItems: A batch operation that allows Put, Update, and Delete operations to multiple items both within and across tables with a guaranteed all-or-nothing result.
- TransactGetItems: A batch operation that allows Get operations to retrieve multiple items from one or more tables.
Naming Rules
The following are the naming rules for tables, attributes, and other objects in DynamoDB:
- All names must be encoded using UTF-8, and are case-sensitive.
- Table names and index names must be between 3 and 255 characters long, and can contain only the following characters:
a-zA-Z0-9_(underscore)-(dash).(dot)
- Attribute names must be at least one character long upto 255 characters long. Their size must not be greater than 64 KB long. The exceptions are:
- Secondary index partition key names.
- Secondary index sort key names.
- The names of any user-specified projected attributes (applicable only to local secondary indexes).
- Although reserved words and special characters are allowed it is recommended to avoid because placeholder variables are used whenever they are used in an expression of a CRUD operation. For a complete list of reserved words in DynamoDB, see documentation. Special characters are:
#(hash) and:(colon).
Data Types
Categories:
- Scalar Types: It represents exactly one value. The scalar types are number, string, binary, Boolean, and null.
- Document Types: It represents a complex structure with nested attributes, such like a JSON document. The document types are list and map.
- Set Types: It represents multiple scalar values. The set types are string set, number set, and binary set.
For further details like range, constraints, etc refer the documentation
Read Consistency
Each Region is independent and isolated from other AWS Regions. Every region consists of multiple distinct locations called Availability Zones which are isolated from failures in other Availability Zones, and have inexpensive, low-latency network connectivity to each other. This allows rapid replication of data among multiple Availability Zones in a Region.
When an application writes data to a table and receives a HTTP 200 response (OK), the write has occurred and is durable. The data is eventually consistent across all storage locations, usually within one second or less.
Eventually Consistent Reads
When data is read from a table, the response might not reflect the results of a recently completed write operation and may include stale data. If read request is repeated after a short time, the response should return the latest data.
Strongly Consistent Reads
When strongly consistent read is requested, the response returns most up-to-date data, reflecting the updates from all prior write operations that were successful. However, this consistency comes with some disadvantages:
- It might not be available if there is a network delay or outage. In this case, DynamoDB may return a server error
HTTP 500. - It may have higher latency than eventually consistent reads.
- It is not supported on global secondary indexes.
- It uses more throughput capacity than eventually consistent reads.
Read/Write Capacity Modes
The read/write capacity mode controls how read and write throughput are charged and for capacity management. It can be set when creating a table or changed later, but switching between read/write capacity modes is limited to once every 24 hours. Switching between the modes does not impact the usage of existing DynamoDB APIs in the code. All the modes deliver the same single-digit millisecond latency, service-level agreement (SLA) commitment, and security that DynamoDB offers.
On-Demand Mode
This mode offers pay-per-request pricing for read and write requests. DynamoDB instantly accommodates workloads as they ramp up or down to any previously reached traffic level. On-demand mode is a good option if any of the following are true:
- Creating new tables with unknown workloads.
- Have unpredictable application traffic.
- Preference for the ease of paying for only what is used.
Read Request Units
For reading an item up to 4 KB in size, read request units consumed will be:
- half for an eventually consistent read.
- one for a strongly consistent read.
- two for a transactional read.
For reading an item that is larger than 4 KB, DynamoDB needs additional read request units.
Write Request Units
For writing an item up to 1 KB in size, write request units consumed will be:
- one for normal write.
- two for a transactional write.
For writing an item that is larger than 1 KB, DynamoDB needs additional write request units.
Peak Traffic and Scaling Properties
On-demand mode automatically adapts to an application’s traffic volume. It instantly accommodates up to double the previous peak traffic on a table. For example, if an application’s traffic pattern varies between 25,000 and 50,000 strongly consistent reads per second where 50,000 reads per second is the previous traffic peak, on-demand capacity mode instantly accommodates sustained traffic of up to 100,000 reads per second. If an application sustains traffic of 100,000 reads per second, that peak becomes your new previous peak, enabling subsequent traffic to reach up to 200,000 reads per second.
If more than double previous peak is needed, DynamoDB automatically allocates more capacity as the traffic volume increases to help ensure that workload does not experience throttling. However, throttling can occur if double the previous peak is exceeded within 30 minutes. DynamoDB recommends spacing traffic growth over at least 30 minutes before driving more than 100,000 reads per second.
Initial Throughput for On-Demand Capacity Mode
- Newly created table: The previous peak is 2,000 write request units or 6,000 read request units. You can drive up to double the previous peak immediately, which enables newly created on-demand tables to serve up to 4,000 write request units or 12,000 read request units, or any linear combination of the two.
- Existing table switched: The previous peak is half the maximum write capacity units and read capacity units provisioned since the table was created, or the settings for a newly created table with on-demand capacity mode, whichever is higher. In other words, your table will deliver at least as much throughput as it did prior to switching to on-demand capacity mode.
Provisioned Mode
In this mode, the number of reads and writes per second needs to be specified in advance. It can be used with auto scaling to adjust table’s provisioned capacity automatically in response to traffic changes. This helps to govern DynamoDB use to stay at or below a defined request rate in order to obtain cost predictability. Provisioned mode is a good option if any of the following are true:
- Have predictable application traffic.
- Run applications whose traffic is consistent or ramps gradually.
- Can forecast capacity requirements to control costs.
Read Capacity Units
For reading an item up to 4 KB in size, read capacity units consumed will be:
- half for a eventually consistent read.
- one for a strongly consistent read.
- two for a transactional read.
If you need to read an item that is larger than 4 KB, DynamoDB must consume additional read capacity units.
Write Capacity Units
For writing an item up to 1 KB in size, write request units consumed will be:
- one for normal write.
- two for a transactional write.
To write an item that is larger than 1 KB, DynamoDB must consume additional write capacity units.
For example, suppose that a table is provisioned with 6 read capacity units and 6 write capacity units. With these settings, an application could do the following:
- Perform strongly consistent reads of up to 24 KB per second (4 KB × 6 read capacity units).
- Perform eventually consistent reads of up to 48 KB per second (twice as much read throughput).
- Perform transactional read requests of up to 12 KB per second.
- Write up to 6 KB per second (1 KB × 6 write capacity units).
- Perform transactional write requests of up to 3 KB per second.
Provisioned throughput is the maximum amount of capacity that an application can consume from a table or index. If your application exceeds your provisioned throughput capacity on a table or index, it is subject to request throttling. Throttling prevents an application from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException.
Choosing Initial Throughput Settings
Take the following into consideration:
- Item sizes: Some items are small enough that they can be read or written using a single capacity unit. Larger items require multiple capacity units.
- Expected read and write request rates: Estimate the number of reads and writes that need to be perform per second.
- Read consistency requirements: Read capacity units are based on strongly consistent read operations, which consume twice as many database resources as eventually consistent reads. Determine whether your application requires strongly consistent reads, or whether it can relax this requirement and perform eventually consistent reads instead.
Suppose the need is to read 80 items per second from a table. The items are 3 KB in size, and are strongly consistent reads.
- Each read requires one provisioned read capacity unit
3 KB / 4 KB = 0.75, or 1 read capacity unit(round up to the nearest whole number) The table’s provisioned read throughput should be 80 read capacity units1 read capacity unit per item × 80 reads per second = 80 read capacity unitsSuppose the need is to write 100 items per second to a table. The items are 512 bytes in size. - Each write requires one provisioned write capacity unit
512 bytes / 1 KB = 0.5, or 1(round up to the nearest whole number) The table’s provisioned write throughput should be 100 write capacity units1 write capacity unit per item × 100 writes per second = 100 write capacity units
DynamoDB Auto Scaling
It actively manages throughput capacity for tables and global secondary indexes. With auto scaling, define a range (upper and lower limits) for read and write capacity units. Optionally, also define a target utilization percentage within that range. Auto scaling seeks to maintain your target utilization, even as your application workload increases or decreases. It can increase a table’s provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that you don’t pay for unused provisioned capacity.
Reserved Capacity
It can be purchased in advance for tables that use the DynamoDB Standard table class. To use reserved capacity, a one-time upfront fee has to be paid and commit to a minimum provisioned usage level over a period of time. It is billed at the hourly rate can save cost savings on provisioned capacity. Any capacity that is provisioned in excess of reserved capacity is billed at standard provisioned capacity rates.
Table Behavior while Switching Read/Write Capacity Mode
When a table is switched from provisioned capacity mode to on-demand capacity mode, DynamoDB makes several changes to the structure of the table and its partitions. This process can take several minutes. During the switching period, the table delivers throughput that is consistent with the previously provisioned write capacity unit and read capacity unit amounts. When switching from on-demand capacity mode back to provisioned capacity mode, the table delivers throughput consistent with the previous peak reached when the table was set to on-demand capacity mode.
Table Classes
DynamoDB offers two table classes designed to help you optimize for cost. The choice of a table class is not permanent and can be changed. Each table class offers different pricing for data storage as well as throughput (read and write requests).
- Standard table class is the default, and is recommended for the vast majority of workloads.
- Standard-Infrequent Access (DynamoDB Standard-IA) table class is optimized for tables where storage is the dominant cost. For example, tables that store infrequently accessed data, such as application logs, old social media posts, e-commerce order history, and past gaming achievements, are good candidates for the Standard-IA table class.
| Characteristic | Standard | Standard-Infrequent Access |
|---|---|---|
| Throughput costs | lower | higher |
| Storage costs | higher | lower |
Similarities between the two are:
- Both the classes have the same performance, durability, and availability. Bothe the classes are compatible with all existing DynamoDB features such as auto scaling, on-demand mode, time-to-live (TTL), on-demand backups, point-in-time recovery (PITR), and global secondary indexes.
- They have the same APIs and service endpoints. This means that switching between the table classes does not require changing the application code.
Partitions and Data Distribution
DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region. Partition management is handled entirely by DynamoDB which occurs automatically in the background and is transparent to application.
When a table is created, the initial status of the table is CREATING. During this phase, DynamoDB allocates sufficient partitions to the table so that it can handle provisioned throughput requirements. An application can begin writing and reading table data after the table status changes to ACTIVE.
DynamoDB allocates additional partitions to a table in the following situations:
- If the table’s provisioned throughput settings are increased beyond what the existing partitions can support.
- If an existing partition fills to capacity and more storage space is required.
A table remains available throughout and fully supports its provisioned throughput requirements. Global secondary indexes in DynamoDB are also composed of partitions. The data in a global secondary index is stored separately from the data in its base table, but index partitions behave in much the same way as table partitions.
Relations vs DynamoDB
| Characteristic | Relational Database Management System (RDBMS) | Amazon DynamoDB |
|---|---|---|
| Optimal Workloads | Ad hoc queries; data warehousing; OLAP (online analytical processing). | Web-scale applications, including social networks, gaming, media sharing, and Internet of Things (IoT). |
| Data Model | The relational model requires a well-defined schema, where data is normalized into tables, rows, and columns. In addition, all of the relationships are defined among tables, columns, indexes, and other database elements. | DynamoDB is schemaless. Every table must have a primary key to uniquely identify each data item, but there are no similar constraints on other non-key attributes. It can manage structured or semistructured data, including JSON documents. |
| Data Access | SQL is the standard for storing and retrieving data. Relational databases offer a rich set of tools for simplifying the development of database-driven applications, but all of these tools use SQL. | You can use the AWS Management Console, the AWS CLI, or NoSQL WorkBench to work with DynamoDB and perform ad hoc tasks. PartiQL, a SQL-compatible query language, allows CRUD in DynamoDB. Applications can use the AWS SDKs to work with DynamoDB using object-based, document-centric, or low-level interfaces. |
| Performance | Relational databases are optimized for storage, so performance generally depends on the disk subsystem. Developers and database administrators must optimize queries, indexes, and table structures in order to achieve peak performance. | DynamoDB is optimized for compute, so performance is mainly a function of the underlying hardware and network latency. As a managed service, DynamoDB insulates users and applications from these implementation details, so that focus is on designing and building robust, high-performance applications. |
| Scaling | It is easiest to scale up with faster hardware. It is also possible for database tables to span across multiple hosts in a distributed system, but this requires additional investment. Relational databases have maximum sizes for the number and size of files, which imposes upper limits on scalability. | DynamoDB is designed to scale out using distributed clusters of hardware. This design allows increased throughput without increased latency. Customers specify their throughput requirements, and DynamoDB allocates sufficient resources to meet those requirements. There are no upper limits on the number of items per table, nor the total size of that table. |
Security
Security is a shared responsibility between AWS and its users. As per the shared responsibility model:
- Security of the cloud: AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud. The effectiveness of this security is regularly tested and verified by third-party auditors as part of the AWS compliance programs.
- Security in the cloud: The user’s responsibility is determined by the AWS service. They are also responsible for other factors including the sensitivity of data, their organization’s requirements, and applicable laws and regulations.
Data Protection
Data is redundantly stored on multiple devices across multiple facilities in an Amazon DynamoDB Region. DynamoDB protects user data stored at rest and also data in transit between on-premises clients and DynamoDB, and between DynamoDB and other AWS resources within the same AWS Region.
Encryption at Rest
All user data stored in Amazon DynamoDB is fully encrypted at rest. It can provide enhanced security by encrypting all data at rest using encryption keys stored in AWS KMS. This functionality helps reduce the operational burden and complexity involved in protecting sensitive data. With encryption at rest, security-sensitive applications that meet strict encryption compliance and regulatory requirements can be built.It provides an additional layer of data protection by securing the data in an encrypted table, including its primary key, local and global secondary indexes, streams, global tables, backups, and DynamoDB Accelerator (DAX) clusters whenever the data is stored in durable media. When creating a new table, one of the following AWS KMS keys can be chosen to encrypt a table:
- AWS owned key: Default encryption type. The key is owned by DynamoDB (no additional charge).
- AWS managed key: The key is stored in your account and is managed by AWS KMS (AWS KMS charges apply).
- Customer managed key: The key is stored in your account and is created, owned, and managed by you. You have full control over the KMS key (AWS KMS charges apply).
Internetwork Traffic Privacy
Connections are protected both between Amazon DynamoDB and on-premises applications and between DynamoDB and other AWS resources within the same AWS Region.
- Access to DynamoDB via the network is through AWS published APIs. Clients must support Transport Layer Security (TLS) 1.0+ (TLS 1.2 or above recommended). Clients must also support cipher suites with Perfect Forward Secrecy (PFS), such as Ephemeral Diffie-Hellman (DHE) or Elliptic Curve Diffie-Hellman Ephemeral (ECDHE). Additionally, requests must be signed using an access key ID and a secret access key that are associated with an IAM principal, or use the AWS Security Token Service (STS) to generate temporary security credentials to sign requests.
- Access to DynamoDB within same region is handled the VPC it resides in. VPC endpoint for DynamoDB is a logical entity within a VPC that allows connectivity only to DynamoDB. The VPC routes requests to DynamoDB and routes responses back to the VPC.
Identity and Access Management
This section contains information relevant ot DynamoDb. See AWS Identity and Access Management for details. IAM administrators control who can be authenticated (signed in) and authorized (have permissions) to use DynamoDB resources. They manage access permissions and implement security policies for both DynamoDB and DynamoDB Accelerator (DAX).
| Resource Type | ARN Format |
|---|---|
| Table | arn:aws:dynamodb:region:account-id:table/table-name |
| Index | arn:aws:dynamodb:region:account-id:table/table-name/index/index-name |
| Stream | arn:aws:dynamodb:region:account-id:table/table-name/stream/stream-label |
A permissions policy describes who has access to what. Policies attached to an IAM identity are referred to as identity-based policies (IAM policies). Policies attached to a resource are referred to as resource-based policies. DynamoDB supports only identity-based policies (IAM policies).
An example of permisssions policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DescribeQueryScanBooksTable",
"Effect": "Allow",
"Action": [
"dynamodb:DescribeTable",
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": "arn:aws:dynamodb:us-west-2:account-id:table/Books"
}
]
}AWS addresses some common use cases by providing standalone IAM policies that are created and administered by AWS. These AWS managed policies grant necessary permissions for common use cases so that you can avoid having to investigate which permissions are needed. DynamoDB specific managed policies are:
- AmazonDynamoDBReadOnlyAccess – Grants read-only access to DynamoDB resources through the AWS Management Console.
- AmazonDynamoDBFullAccess – Grants full access to DynamoDB resources through the AWS Management Console.
Use the IAM Condition element to implement a fine-grained access control policy. By adding a Condition element to a permissions policy, access to items and attributes in DynamoDB tables and indexes can alse be allowed or denied. There are two use cases for fine-graines access control:
- Grant permissions on a table, but restrict access to specific items in that table based on certain primary key values. An example might be a social networking app for games, where all users’ saved game data is stored in a single table, but no users can access data items that they do not own.
- Grant permissions on a table, but restrict access to specific atrributes in that table. An example might be an app that displays flight data for nearby airports, based on the user’s location. Airline names, arrival and departure times, and flight numbers are all displayed. However, attributes such as pilot names or the number of passengers are hidden.
Consider a mobile gaming app that lets players play different games. The app uses a DynamoDB table named GameScores to keep track of high scores and other user data. Each item in the table is uniquely identified by a user ID and the name of the game that the user played. The GameScores table has a primary key consisting of a partition key (UserId) and sort key (GameTitle). Users only have access to game data associated with their user ID. A user who wants to play a game must belong to an IAM role named GameRole, which has a security policy attached to it.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAccessToOnlyItemsMatchingUserID",
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:Query",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem",
"dynamodb:BatchWriteItem"
],
"Resource": [
"arn:aws:dynamodb:us-west-2:123456789012:table/GameScores"
],
"Condition": {
"ForAllValues:StringEquals": {
"dynamodb:LeadingKeys": [
"${www.amazon.com:user_id}"
],
"dynamodb:Attributes": [
"UserId",
"GameTitle",
"Wins",
"Losses",
"TopScore",
"TopScoreDateTime"
]
},
"StringEqualsIfExists": {
"dynamodb:Select": "SPECIFIC_ATTRIBUTES"
}
}
}
]
}In addition to granting permissions for specific DynamoDB actions (Action element) on the GameScores table (Resource element), the Condition element uses the following condition keys specific to DynamoDB that limit the permissions as follows:
dynamodb:LeadingKeys: This condition key allows users to access only the items where the partition key value matches their user ID. This ID,${www.amazon.com:user_id}, is a substitution variable.dynamodb:Attributes: This condition key limits access to the specified attributes so that only the actions listed in the permissions policy can return values for these attributes. In addition, theStringEqualsIfExists clause ensures that the app must always provide a list of specific attributes to act upon and that the app can’t request all attributes.
Note When using dynamodb:Attributes, then the names of all of the primary key and index key attributes must be specified for any table and secondary indexes that are listed in the policy. Otherwise, DynamoDB can’t use these key attributes to perform the requested action.
Compliance Validation by Industry
The security and compliance of DynamoDB is assessed by third-party auditors as part of multiple AWS compliance programs, including the following:
- System and Organization Controls (SOC)
- Payment Card Industry (PCI)
- Federal Risk and Authorization Management Program (FedRAMP)
- Health Insurance Portability and Accountability Act (HIPAA)
Resilience and Disaster Recovery
The AWS global infrastructure is built around AWS Regions and Availability Zones. AWS Regions provide multiple physically separated and isolated Availability Zones, which are connected with low-latency, high-throughput, and highly redundant networking. With Availability Zones, you can design and operate applications and databases that automatically fail over between Availability Zones without interruption. Availability Zones are more highly available, fault tolerant, and scalable than traditional single or multiple data center infrastructures. In addition to the AWS global infrastructure, Amazon DynamoDB offers the following features for data resiliency and backup needs:
- On-demand backup and restore: DynamoDB provides on-demand backup capability. It allows creation of full backups of tables for long-term retention and archival.
- Point-in-time recovery: Point-in-time recovery helps protect tables from accidental write or delete operations. It also takes care of creating, maintaining, or scheduling on-demand backups.
Security Best Practices
Preventive
- Encryption at rest
- Use IAM roles to authenticate access to DynamoDB: Do not store AWS credentials directly in the application or EC2 instance. An IAM role allows for temporary access keys that can be used to access AWS services and resources.
- Use IAM policies for DynamoDB base authorization: Implementing least privilege is key in reducing security risk and the impact that can result from errors or malicious intent. Attach permissions policies to IAM identities (that is, users, groups, and roles) and thereby grant permissions to perform operations on DynamoDB resources.
- Use IAM policy conditions for fine-grained access control: Specify conditions when granting permissions using an IAM policy. Examples of fine-grained access:
- Grant permissions to allow users read-only access to certain items and attributes in a table or a secondary index.
- Grant permissions to allow users write-only access to certain attributes in a table, based upon the identity of that user.
- Use a VPC endpoint and policies to access DynamoDB: If access to DynamoDB is required from within a VPC, use a VPC endpoint to limit access from only the required VPC. Doing this prevents that traffic from traversing the open internet and being subject to that environment. Using a VPC endpoint for DynamoDB allows you to control and limit access using the following:
- VPC endpoint policies: These policies are applied on the DynamoDB VPC endpoint. They allow control and limit to API access to the DynamoDB table.
- IAM policies: By using the
aws:sourceVpcecondition on policies attached to IAM users, groups, or roles, you can enforce that all access to the DynamoDB table is via the specified VPC endpoint.
- Consider client-side encryption: If storing sensitive or confidential data in DynamoDB, encrypt that data as close as possible to its origin so that the data is protected throughout its lifecycle. Encrypting your sensitive data in transit and at rest helps ensure that plaintext data isn’t available to any third party. The Amazon DynamoDB Encryption Client is a software library that helps you protect your table data before you send it to DynamoDB.
Detective
- Use AWS CloudTrail to monitor AWS managed KMS key usage
- Monitor DynamoDB operations using CloudTrail
- Use DynamoDB Streams to monitor data plane operations
- Monitor DynamoDB configuration with AWS Config
- Monitor DynamoDB compliance with AWS Config rules
- Tag your DynamoDB resources for identification and automation
Monitoring and Logging
Collect monitoring data from all of the parts of your AWS solution to easily debug a multi-point failure if one occurs. Create a monitoring plan that includes answers to the following questions:
- What are your monitoring goals?
- What resources will you monitor?
- How often will you monitor these resources?
- What monitoring tools will you use?
- Who will perform the monitoring tasks?
- Who should be notified when something goes wrong?
To establish a normal DynamoDB performance baseline, at a minimum, monitor the following items:
- The number of read or write capacity units consumed over the specified time period, to track how much of your provisioned throughput is used.
- Requests that exceeded a table’s provisioned write or read capacity during the specified time period, to determine which requests exceed the provisioned throughput quotas of a table.
- System errors, to determine if any requests resulted in an error.
Monitoring Tools
Automated:
- Amazon CloudWatch Alarms
- Amazon CloudWatch Logs
- Amazon CloudWatch Events
- AWS CloudTrail Log Monitoring
Manual:
- DynamoDB dashboard shows:
- Recent alerts
- Total capacity
- Service health
- CloudWatch home page shows:
- Current alarms and status
- Graphs of alarms and resources
- Service health status
- In addition, CloudWatch can do the following:
- Create customized dashboards to monitor the services
- Graph metric data to troubleshoot issues and discover trends
- Search and browse all AWS resource metrics
- Create and edit alarms to be notified of problems
Amazon CloudWatch Contributor Insights for Amazon DynamoDB is a diagnostic tool for identifying the most frequently accessed and throttled keys in a table or index at a glance. By enabling it on a table or global secondary index, view the most accessed and throttled items in those resources.
Best Practices
NoSQL Design
An important difference between RDBMS and NoSQL is:
- In RDBMS, data can be queried flexibly, but queries are relatively expensive and don’t scale well in high-traffic situations.
- In a NoSQL database such as DynamoDB, data can be queried efficiently in a limited number of ways, outside of which queries can be expensive and slow.
These differences make database design different between the two systems:
- RDBMS is designed for flexibility without worrying about implementation details or performance. Query optimization generally doesn’t affect schema design, but normalization is important.
- DynamoDB is designed for a schema specifically to make the most common and important queries as fast and as inexpensive as possible. Its data structures are tailored to the specific requirements of a business use case.
The first step in designing your DynamoDB application is to identify the specific query patterns that the system must satisfy.
In particular, it is important to understand three fundamental properties of access patterns:
- Data size: Knowing how much data will be stored and requested at one time will help determine the most effective way to partition the data.
- Data shape: Instead of reshaping data when a query is processed (as an RDBMS system does), a NoSQL database organizes data so that its shape in the database corresponds with what will be queried. This is a key factor in increasing speed and scalability.
- Data velocity: DynamoDB scales by increasing the number of physical partitions that are available to process queries, and by efficiently distributing data across those partitions. Knowing in advance what the peak query loads will be might help determine how to partition data to best use I/O capacity.
After identifying specific query requirements, organize data according to general principles that govern performance:
- Keep related data together: Instead of distributing related data items across multiple tables, keep related items as close together as possible. As a general rule, should maintain as few tables as possible in a DynamoDB application. Exceptions are cases where high-volume time series data are involved, or datasets that have very different access patterns. A single table with inverted indexes can usually enable simple queries to create and retrieve the complex hierarchical data structures required by your application.
- Use sort order: Related items can be grouped together and queried efficiently if their key design causes them to sort together. This is an important NoSQL design strategy.
- Distribute queries: It is also important that a high volume of queries not be focused on one part of the database, where they can exceed I/O capacity. Instead, design data keys to distribute traffic evenly across partitions as much as possible, avoiding “hot spots.”
- Use global secondary indexes: By creating specific global secondary indexes, enable different queries than the main table can support, and that are still fast and relatively inexpensive.
Partition Key Design
The primary key that uniquely identifies each item in a table can be simple (a partition key only) or composite (a partition key combined with a sort key). Generally, design an application for uniform activity across all logical partition keys in the table and its secondary indexes. You can determine the access patterns that the application requires, and estimate the total read capacity units (RCU) and write capacity units (WCU) that each table and secondary index requires.
Use burst capacity effectively. Whenever a partition’s throughput is not fully being used, DynamoDB reserves a portion of that unused capacity (of 5 mins) for later bursts of throughput to handle usage spikes. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly - even faster than the per-second provisioned throughput capacity of the table. DynamoDB can also consume burst capacity for background maintenance and other tasks without prior notice. DynamoDB adaptive capacity feature enables DynamoDB to run imbalanced workloads indefinitely and minimizes throttling due to throughput exceptions. It also helps reduce costs by enabling to only provision throughput capacity needed. Adaptive capacity is enabled automatically for every DynamoDB table, at no additional cost and is not explicitly enabled or disabled. It is useful in scenarios to -
- Boost capapcity to high-traffc partitions: It’s not always possible to distribute read and write activity evenly. When data access is imbalanced, a “hot” partition can receive a higher volume of read and write traffic compared to other partitions. Adaptive capacity automatically and instantly increasing throughput capacity for partitions that receive more traffic, provided that traffic does not exceed your table’s total provisioned capacity or the partition maximum capacity.
- Isolate frequently accessed items: If an application drives disproportionately high traffic to one or more items, adaptive capacity rebalances partitions such that frequently accessed items don’t reside on the same partition. This isolation of frequently accessed items reduces the likelihood of request throttling due to your workload exceeding the throughput quota on a single partition.InfoThis isolation functionality is not available for tables using Provisioned Read/Write Capacity Mode that have enabled DynamoDB Streams.
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html