DynamoDB Key Diagnostics Library
π‘οΈ
DynamoDB Key Diagnostics Library is a Java DynamoDB client wrapper that automatically logs your key usage information to Kinesis as your application reads/writes data from/to DynamoDB. You can then use Kinesis Data Analytics to feed into CloudWatch to monitor and alarm if any single key gets too hot, and to feed into S3/Athena/QuickSight to report on your detailed key usage and have heatmaps to help diagnose your application.
.
βββ README.md <-- This instructions file
βββ LICENSE.txt <-- Apache Software License 2.0
βββ NOTICE.txt <-- Copyright notices
βββ checkstyle.xml <-- Checkstyle for the Key Diagnostics client
βββ pom.xml <-- Java dependencies, Docker integration test orchestration
βββ resources
β βββ python <-- Contains the AWS Lambda function for emitting hot key metrics and logs
β β βββ src
β β β βββ diagnostics
β β β βββ hot_key_logger_lambda.py <-- The actual Lambda function
β β βββ tests
β β βββ diagnostics
β β βββ test_hot_key_logger_lambda.py <-- Unit test for the Lambda function
β βββ DynamoDB_Key_Diagnostics_Library.yml <-- CloudFormation template, see "Packaging and Deployment" section below
βββ src
β βββ main
β β βββ java
β β βββ com.amazonaws.services.dynamodb.diagnostics <-- Main package for Key Diagnostics Client
β β βββ DynamoDBKeyDiagnosticsClient.java <-- Contains inject methods for handler entrypoints
β β βββ DynamoDBKeyDiagnosticsClientAsync.java <-- Similar to DynamoDBKeyDiagnosticsClient, but supports async DynamoDB APIs
β β βββ DynamoDBKeyDiagnosticsClientBuilder.java <-- Provides dependencies like the DynamoDB client for injection
β β βββ KinesisStreamReporter <-- Reporter class that asynchronously sends key usage information to a Kinesis stream
β βββ test <-- Unit and integration tests
β βββ java
β βββ com.amazonaws.services.dynamodb.diagnostics <-- Contains integration tests and unit tests.
β
βββ samples <-- Contains the Movies demo application that uses the Key Diagnostics client
βββ movies
βββ main
β βββ java
β β βββ com.amazonaws.services.dynamodb.diagnostics.demo
β β βββ MoviesApplication.java <-- Simulates hot key scenario with certain "hit movies"
β β
β βββ resources
β βββ log4j.properties <-- Log4j configuration for the demo application
βββ checkstyle.xml <-- Checkstyle for the Movies demo application
βββ pom.xml <-- Java and the Key Diagnostics client dependencies
Prerequisites
To use the DynamoDB Key Diagnostics Library or run the demo, you must have the following:
- Java 1.8 or later
- Maven 3 or later
- an AWS account
Setup process
Note: At this time, the library aggregates the metrics for keys at minute and second granularity. Depending on your business requirements, you may choose to modify the client to aggregate data differently. Currently, you can set up this postβs CloudFormation template in the following AWS Regions: US East (N Virginia), US West (Oregon), EU (Ireland), and EU (Frankfurt).
With the following setup, the DynamoDB Key Diagnostics Library will log the values of your partition key, sort key or any attributes for the selected DynamoDB table. The key usage information will be stored on S3, and specific hot keys will be logged and displayed through Amazon CloudWatch and Amazon QuickSight.
Step 1: Install the Key Diagnostics Library
To install the Key Diagnostics Library, run the following command:
mvn install
Step 2: Configure your AWS credentials
Configure your AWS CLI credentials, if you haven't already. The following AWS resources will be synthesized under the configured account.
aws configure
Make sure you have Amazon S3, AWS Lambda, Amazon Kinesis, Amazon CloudWatch and CloudFormation permissions with the configured credentials.
Step 3: Create and deploy the required AWS resources by using the CloudFormation template
You will now deploy a Lambda function for reporting and monitoring metrics. To do this, first upload the provided Lambda function to Amazon S3. If you don't have an Amazon S3 bucket already, create one. (Throughout the following instructions, replace the placeholder names with your own names.)
export BUCKET_NAME=my_cool_new_bucket
aws s3 mb s3://$BUCKET_NAME
Then, package the provided Hot Key Lambda function the Amazon S3 bucket.
aws cloudformation package \
--template-file resources/DynamoDB_Key_Diagnostics_Library.yaml \
--s3-bucket $BUCKET_NAME \
--output-template-file packaged.yaml
You can then create the rest of the necessary AWS resources (such as the Amazon Kinesis Data Streams stream, Amazon Kinesis Data Analytics application, and CloudWatch alarm). Also, provide a CloudFormation stack name.
STACK_NAME=KeyDiagnosticsStack
aws cloudformation deploy \
--template-file packaged.yaml \
--stack-name $STACK_NAME \
--capabilities CAPABILITY_IAM
Customizing your Kinesis Data Stream according to your DynamoDB Table
Depending on the provisioned capacity of your DynamoDB table, you may change the shard count of the Kinesis Data Stream used to process your requests. The default and minimum is 4 shards. To override the shard count, you add the following override instead:
SHARD_COUNT=10
aws cloudformation deploy \
--template-file packaged.yaml \
--stack-name postreview \
--capabilities CAPABILITY_IAM
--parameter-overrides KinesisSourceStreamShardCount=$SHARD_COUNT
CloudFormation does not automatically start the Kinesis Data Analytics application, so to start the application, navigate to the Amazon Kinesis console or run the following commands.
# Find out the Kinesis Analtyics Application Name by going to the Kinesis console or `aws kinesisanalytics list-applications`
KINESIS_ANALYTICS_APP_NAME="Put your application name here"
# Then, find out the InputID
INPUT_ID=`aws kinesisanalytics describe-application \
--application-name $KINESIS_ANALYTICS_APP_NAME \
--query 'ApplicationDetail.InputDescriptions[0].InputId'`
# Start the Kinesis Data Analytics app
aws kinesisanalytics start-application \
--application-name $KINESIS_ANALYTICS_APP_NAME \
--input-configurations Id=$INPUT_ID,InputStartingPositionConfiguration={InputStartingPosition=NOW}
You now are ready to run the demo Movies example application in the repository (step 3.1) or change your code to use the Key Diagnostics Library (step 3.2).
Step 3.1: Running the Demo
This demo uses the IMDb meta-data dataset to create an application that rates movies by putting in items into DynamoDB. Certain movies will be "trending", thus creating an uneven load on certain hash keys.
After you have installed the Key Diagnostics Library dependencies and setup all the AWS resources, navigate to the samples/movies/
directory.
Then, execute the demo by running:
KINESIS_STREAM_NAME="Put your Kinesis Data Stream name here"
REGION="Put the region where the Kinesis Data Stream and DynamoDB table are set up"
mvn package exec:java@movies -Dexec.args="trend $KINESIS_STREAM_NAME $REGION"
Step 3.2: Integrating with your existing DynamoDB code
To use the Key Diagnostics client, you first need to create a Kinesis stream that it can log to, then you can use that stream name along with the Kinesis client and the DynamoDB client you're wrapping to create the Key Diagnostics client. You also need to specify which key attributes in which tables you need to monitor - the easiest way to do that is to use the factory method that just monitors all the key attributes for all the tables and global secondary indexes in your account:
DynamoDBKeyDiagnosticsClient ddbClient = DynamoDBKeyDiagnosticsClient.monitorAllPartitionKeys(
dynamoDB,
kinesisClient,
kinesisStreamName
);
If you do need to specify your own attributes to monitor (e.g. if you are considering creating a new global secondary index on a new attribute and are wondering if it has hot values) then you can create it as follows:
DynamoDBKeyDiagnosticsClient ddbClient = new DynamoDBKeyDiagnosticsClient(
dynamoDB,
kinesisClient,
kinesisStreamName,
ImmutableMap.of("MyTable", ImmutableList.of("MyAttribute"))
);
After you created the diagnostics client, you can then use it everywhere you would've used the regular AmazonDynamoDB client (it implements the AmazonDynamoDB interface). The diagnostics client creates a thread pool to asynchronously log the key usage information to Kinesis, so when you're done with it you should close()
it so that it can shut down those threads.
Step 4: Visualization through Amazon Athena and QuickSight
If you are interested in creating dashboards or querying the key usage information, or wish to understand what the access patterns of certain attributes, we highly recommend setting up Athena and QuickSight.
First, go to the Athena Console, and put in the following under New query 1, then click Run Query. This will create an Athena database for the key usage information stored on S3:
CREATE DATABASE IF NOT EXISTS dynamodbkeydiagnosticslibrary
COMMENT 'Athena database for DynamoDB Key Diagnostics Library';
Then, create the Athena table. Following the demo app, we will use movies
as the table name. If you synthesized the AWS with the provided CloudFormation template in Step 1, the S3 Location should be something similar to: s3:///keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/
CREATE EXTERNAL TABLE `movies`(
`second` timestamp COMMENT 'Second aggregated results',
`tablename` string COMMENT 'DynamoDB table name',
`hashkey` string COMMENT 'The partition key attribute name',
`hashkeyvalue` string COMMENT 'The partition key attribute value',
`operation` string COMMENT 'DynamoDB operation',
`totalio` float COMMENT 'Total IO consumed')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3:///keydiagnosticsstack-aggregatedresultbucket-ejkhrnvyw8ku/keydiagnostics/'
After setting the Athena table, you can use QuickSight to visualize the key usage pattern of your application:
- Go to the QuickSight console and click Manage data on the upper right.
- Click New data set, choose Athena and pick a data source name. Then, you should be able to select the Athena database and table created in the previous section.
- Choose Import to SPICE for quicker analytics, then click Visualize!
- Now you should be able to create graphs by filtering table names, time range, partition keys, operation etc. The following is a heat map that shows what movies are popular over a certain time range:
Testing
Running unit tests
We use JUnit
for testing our code. Unit tests mock out the AmazonDynamoDBClient
and do not require connectivity to a DynamoDB endpoint. You can run unit tests with the following command:
mvn test
Running integration tests
Integration tests do not mock out the AmazonDynamoDBClient
and require connectivity to a DynamoDB endpoint. As such, the POM starts DynamoDB Local from the Dockerhub image for integration tests.
mvn verify -P integration-tests