Contributors: jadimmock
Alexa Skills Kit Testing Framework
This framework lets you script and execute complex conversations with your Alexa skills. Conversations are specified in YAML format. The SDK uses Skill Invocation API and Skill Simulation API of the Amazon's Skill Management API (SMAPI) to fire requests at your skill endpoint and evaluate its responses with help of assertions specified in YAML. It is built on a core Java SDK you could also use for programmatic access e.g. for writing unit tests for skills implemented in Java.
Learn how to use this framework in the lab guides which backed the Test Automation for Alexa skills workshop at re:Invent 2017 conference in Las Vegas.
Components
The SDK consists of three main components. Each of them can be used on its own and is dedicated to one specific use case.
Java SDK is the core component which is encapsulating the access to the Test APIs as part of SMAPI. Neither your skill needs to be written in Java nor do you need to write a single line of Java to leverage the Test SDK. But if you have a skill implemented in Java you could use the core Java SDK to write unit tests. It got a nice and fluent interface.
Test CLI sits on top of the Java SDK. As it comes as a JAR package it will be invoked from the commandline. The CLI takes YAML files from local storage that specify the conversation you'd like to run against your skill endpoint.
Lambda Handler also sits on top of the Java SDK. It implements a LambdaRequestHandler. The JAR package can be uploaded to a Lambda function. The Lambda Handler consumes YAML script files from S3 buckets whose name is defined in environment variables. To make things easier for you, this repository provides a CloudFormation template. The stack created from it has a Lambda function hosting the test client and an S3 bucket from which it reads they YAML scripts specifying your to be simulated conversations with your skill.
YAML scripts
YAML files will be created to specify multi-step and even multi-path conversations with you skill. As the test client keeps track of session state multiple requests always belong to one an the same session.
configuration:
endpoint:
type: InvocationApi | SimulationApi
skillId: # <<enter your skill-id>>
region: NA | EU
locale: en-US | en-GB | de-DE | en-IN | en-CA
StopIt: &Exit
- intent: AMAZON.StopIntent
- response.shouldEndSession == true
GuessFour: &GuessFour
- intent: GuessNumber
- number: 4
- response.outputSpeech.ssml =~ /.*Great. That’s it.*/i : *Exit
GuessSix: &GuessSix
- intent: GuessNumber
- number: 6
- response.outputSpeech.ssml =~ /.*Great. That’s it.*/i : *Exit
GuessFive: &GuessFive
- intent: GuessNumber
- number: 5
- response.shouldEndSession == false
- response.outputSpeech.ssml =~ /.*My number is higher.*/i : *GuessSix
- response.outputSpeech.ssml =~ /.*My number is lower.*/i : *GuessFour
- response.outputSpeech.ssml =~ /.*Great. That’s it.*/i : *Exit
Launch:
- response.shouldEndSession == false
- *GuessFive
Please note: as YAML anchors are used, the logical structure of this file is upside down (anchors can only be referenced AFTER they have been defined). Read it from bottom to top to follow the logical path of execution.
This is a typical YAML conversation script which covers most of the supported concepts:
-
Configuration: This element is mandatory. It sets the environment for your test execution. There's a lot of things you can set up. The endpoint's skillId and type are required. To get all the setup options for the configuration see below.
-
Launch: This node is mandatory. The client uses this as the entry point. It implicitly fires a LaunchRequest at your skill unless you're providing an intent or utterance sub-element (which in this case would open up the skill session with a one-shot rather than a LaunchRequest).
-
Assertions: Assertions are expressed in simplified JSONPath. Here are just a few examples:
- response.shouldEndSession == false # session left open in the skill response
- response.reprompt.outputSpeech.text # reprompt speech exists in the skill response
- !(response.card) # a card does not exist in the skill response
- response.outputSpeech.ssml != response.reprompt.outputSpeech.text # output speech and reprompt speech differ
- response.outputSpeech.ssml =~ /.*test.*/i # outspeech contains 'test' in the skill response
- sessionAttributes.key >= 10 # a session attribute named key as a value >= 10
You're always validating a JSON response coming from your skill against an expected output. See JSON schema reference for custom skills in Alexa.
In the above example the expectation for a response returned from the LaunchRequest is that the session is left open by the skill. If an assertion is not met, the test client throws an exception resulting in termination of the whole test execution.
-
Gotos: You can reference another YAML element (going forward this is referred to as a conversation step) in the script that you'd like to follow up with (e.g. *GuessFive). In the above example we continue with the GuessFive conversation step after the LaunchRequest was fired and validated. Learn more about YAML anchors
-
Conditional Paths: A condition is similar to an assertion with one important difference. It has a value assigned normally containing a Goto (YAML anchor reference). Assigning a value to an assertion automatically turns it into a condition. The condition won't raise an exception in case the expression isn't true. But if the condition is met the test client will continue with the conversation step referenced as a value to this condition. This is how you create multi-path conversations. The test client dynamically follows a path and is dependant on the skill's response.
-
Conversation Steps always represent a call to your skill. It usually consists of the following:
- an anchor definition defined as a value to this node as otherwise you won't be able to enter it (with a Goto). The Launch node is the only kind of conversation step that doesn't need it as it automatically is the entry point.
- either an intent or utterance attribute. The intent element contains the name of the intent you'd like to fire at your skill whereas the utterance attribute would contain the spoken text used to call your skill. The intent element will be taken in case you configured your script to access InvocationApi. utterance definitions only make sense and are supported when working with the SimulationApi. The API the test client will access is set in the configuration section of the YAML file.
- zero to many slot value assignments. In case you are firing an intent at your skill you can optionally give it some slot values. The attribute key needs to be the name of the slot in your skill.
- zero to many of the above-mentioned assertions expressed in JSONPath
- zero to many of the above-mentioned conditional paths expressed in JSONPath and assigned with a Goto (conversation step) reference to follow up with in case the expression is true.
- zero to many Gotos to follow up with in any case - without making these step(s) dependant on a validation expression. You see such a Goto in the Launch node in the above example.
It's worth to mention that the order in which you define each of these sub-elements only matters if they are within the same category type (e.g. assertions and conditional paths will be processed top down). Otherwise, the test client will bring it in the following order, processing them one by one:
-
It first looks for an intent or utterance element to set the type of request fired at your skill
-
It then looks for slot value assignments it will only consider if it's working with InvocationApi of course.
-
Next, it processes the assertions in the order they are defined and applies it to the response returned from the skill.
-
Next, it processes the conversational paths in the order they are defined. More than one paths can be entered if the skill response match more than just one condition.
-
Lastly, the test client follow any Gotos in the order they are defined.
Get started
In order to get started the test client needs access to your skill. It is using the Login With Amazon SSO client Id and client secret of a Security profile you have to set up in the Amazon developer console + needs the refresh token of that profile given the proper rights to access skills for testing. Learn how to set it up and get client id, client secret and refresh token at the very bottom of this README.
Run it
The YAML conversation scripts can be executed from commandline (by using the Test CLI provided in this SDK) or in Lambda (by using the Lambda handler as part of this SDK).
Test CLI
The CLI component expects an lwaClientId, lwaClientSecret and lwaRefreshToken as environment variables set up in the runtime environment. Learn how to get these values in the Getting started section.
$ export lwaClientId=...
$ export lwaClientSecret=...
$ export lwaRefreshToken=...
Now you can use the JAR package you get when you built the project and reference the YAML script via file path. You can also download the latest built.
$ java -jar alexa-skills-kit-tester-java-1.1.0.jar -f ./path/to/your/script.yml
Lambda Handler
Use the provided CloudFormation template and create a stack from it in AWS CloudFormation. During set up the template asks you for the Login With Amazon client id, client secret and refresh token. They all will be set as environment variables in the Lambda function. The templates also creates a new S3 bucket from which the Lambda function will get the YAML script files.
After the CloudFormation stack was successfully created, go to the newly created S3 bucket and upload one to many YAML scripts you wrote for your skill. Next, you can run the Lambda function and see the results in the log output. Please note, that the test client will pick all *.yml files it finds in the S3 bucket. This might lead to long running executions. The Lambda function by default is set to the maximum runtime of 300 seconds.
As the test client raises an exception on missed assertions defined in the YAML scripts, the Lambda execution will terminate. You could set up CloudWatch to catch those failures and send out an alarm. Think of the following scenario:
Create a CloudWatch rule that periodically triggers the test Lambda function. Create a CloudWatch Alarm which gets triggered on failed Lambda executions. Do this with your live skill and your test client will simulate entire conversations with your skill every X minutes and sends out a notification via e-mail or SMS if your skill returns with unexpected responses. What you will get is an early warning system providing proactive monitoring for your Alexa skills in production.
Java SDK
If your skill is implemented in Java you can use the core Java component of this SDK to write unit tests. Even if you would like to customize the test execution flow a bit more and you don't want to rely on YAML scripts, this might be useful for you.
<dependencies>
...
<dependency>
<groupId>io.klerch</groupId>
<artifactId>alexa-skills-kit-tester-java</artifactId>
<version>1.1.0</version>
<scope>test</scope>
</dependency>
...
</dependencies>
Choose an Endpoint type
Your test environment needs an endpoint configuration in order to tell it how you would like to address your skill. You can choose from: AlexaLambdaEndpoint: Fires a sequence of request payloads at the Lambda function which is your skill endpoint. It communicates via AWS API and needs your AWS credentials set in your environment.
final AlexaEndpoint lambdaEndpoint = AlexaLambdaEndpoint.create("lamdaFunctionName").build();
The Lambda function is referenced by name and must exist in the AWS account whose credentials you set up in the execution environment (most likely from system properties, , the ~/.aws/-folder or - in case you're running your test client in another Lambda function - from the IAM execution role)
AlexaInvocationApiEndpoint: Fires a sequence of request payloads at your skill via Invocation API which is part of SMAPI. It needs access to your developer account. You need to set the lwaClientId, lwaClientSecret and lwaRefreshToken as environment variables or pass it in to the builder.
final AlexaEndpoint endpoint = AlexaInvocationApiEndpoint.create("skillId") // mandatory
.withEndpointRegion("NA") // optional, defaults to NA
.withLwaClientId("yourClientId") // optional, if set as lwaClientId in your environment
.withLwaClientSecret("yourClientSecret") // optional, if set as lwaClientSecret in your environment
.withLwaRefreshToken("yourRefreshToken") // optional, if set as lwaClientRefreshToken in your environment
.build();
The endpoint region is either "NA" (North America) or "EU" (Europe). If you don't provide it the endpoint region defaults to NA.
AlexaSimulationApiEndpoint: Fires a sequence of utterances at your skill via Simulation API which is part of SMAPI. It needs access to your developer account. You need to set the lwaClientId, lwaClientSecret and lwaRefreshToken as environment variables or pass it in to the builder.
final AlexaEndpoint endpoint = AlexaSimulationApiEndpoint.create("skillId") // mandatory
.withLocale(Locale.US) // optional, defaults to en-US
.withLwaClientId("yourClientId") // optional, if set as lwaClientId in your environment
.withLwaClientSecret("yourClientSecret") // optional, if set as lwaClientSecret in your environment
.withLwaRefreshToken("yourRefreshToken") // optional, if set as lwaClientRefreshToken in your environment
.build();
When you set up the interaction models of your skill in the developer console you explicitly did it for one locale. If you just provide the skill-id as a parameter the locale defaults to "en-US".
AlexaRequestStreamHandlerEndpoint: Fires a sequence of requests payloads at your speechlet handler implementation in Java. This only works for skills written in Java and could be used for unit testing.
final AlexaEndpoint endpoint = AlexaRequestStreamHandlerEndpoint.create(MySpeechlet.class).build();
You directly point to the entry class of your skill written in Java that implements the SpeechletRequestStreamHandler. You could also give it an instance in case you'd like to mock you test object.
Set up the Test Client
After you set up your endpoint you need to assign it to an AlexaClient which will orchestrate the entire conversation with your skill.
final AlexaClient client = AlexaClient.create(endpoint).build();
Of course, you could still use YAML files to inject configuration settings and let the AlexaClient run the script instead of defining custom conversation steps programmatically as described below.
AlexaClient.create("./path/to/your/script.yml").build().startScript();
There are a few more settings you'd like to customize that are considered when firing requests at your skill. Please note: if you're using the AlexaSimulationApiEndpoint those settings will be ignored.
AlexaClient client = AlexaClient.create(endpoint)
.withAccessToken("my-access-token") // optional. simulates account-linked user
.withApplicationId(skillId) // optional. Should be set in case your skill verifies incoming requests
.withApiEndpoint(AlexaClient.API_ENDPOINT.Europe) // optional. defaults to NorthAmerica
.withDebugFlagSessionAttribute("flag") // optional. when set, a session-attribute called "flag" with value true is in each of your requests
.withDeviceId("my-device-id") // optional. if not set the id will be left empty in request payload
.withDeviceIdRandomized() // optional. generates a device-id for you
.withLocale(Locale.UK) // optional. defaults to en-US
.withSupportedInterface(DisplayInterface.builder().build()) // to simulate requests coming from display device
.withSupportedInterface(AudioPlayerInterface.builder().build()) // to simulate requests coming from audio device
.withTimestamp(new Date()) // optional. if not set, the client assigns the current date and time to the request
.withUserId("my-user-id") // optional. if not set the client will generate a user-id
.build();
The client is now set and ready to start a conversation with your skill.
Instead of giving it a file path, you could also pass in an InputStream or File object.
Simulating multi-turn interactions with your skill
What we're now doing is to code the conversation step by step.
client.startSession()
.launch().done()
.help().done()
.intent("startIntroIntent").done()
.intent("introducedIntent", "name", "John Doe").done()
.stop().done();
This one line of code has a conversation with your skill consisting of five user interactions (all within one session as the client takes care of sending the same sessionId in each of the requests + routing outgoing session-attributes to the next request). The conversation starts with the invocation of a skill followed by a user asking for help, two requests that trigger custom intents whereas the second invocation passes in a slot. Finally, the conversation ends with the Stop-intent (e.g. user says "Stop"). Note: the client even emits _SessionEndedRequest_s. That being said, six requests are fired at the skill endpoint.
If you're using the AlexaSimulationApiEndpoint it gets even better. Now you can type in what the user says.
client
.say("start skill name").done()
.say("ask skill name for help").done()
.say("ask skill name for my introduction").done()
.say("ask skill name for John Doe").done()
.say("ask skill name to stop").done();
Please note! Currently the Simulation API does not support multi-turn dialogs within one session. Each of these steps opens a new session. No session attributes will be taken over to the next request.
Validating skill responses
You're already familiar with assertions. There are a few shortcuts for e.g. looking at String in an outputspeech or existence of an asset like a card in the response. You could also just give it a JSONPath expression like you do in the YAML assertions.
client.startSession()
.launch()
.assertFalse("response.shouldEndSession")
.assertContains("response.outputSpeech.ssml", "Welcome")
.done()
.help()
.assertEquals("response.outputSpeech.ssml", "This is your help")
.done()
.intent("startIntroIntent")
.assertExists("sessionAttributes.username")
.done()
.intent("introducedIntent", "name", "John Doe")
.assertExecutionTimeLessThan(1000)
.assertThat(response -> response.getVersion().equals("1.0.0"))
.done()
.stop()
.assertTrue("response.shouldEndSession")
.done();
We didn't test on the session being open after each step just because it's not necessary. The client takes note when the skill closes the session and would throw an error if another request is fired at the endpoint - the session cannot take another request as it is closed.
Conditions
A condition really much does the same as an assertion but with one difference. It only validates without throwing an exception. Therefore, you can use conditional paths to make decisions while having the conversation and go into different directions the same way a user would do when using your skill. Use the if-methods and give it the condition followed by an anonymous function to follow up with.
client.startSession()
.launch()
.assertFalse("response.shouldEndSession")
.ifExists("response.outputSpeech.ssml", session -> {
session.stop()
.assertTrue("response.shouldEndSession");
})
.done();
The log output
To dig into test results and investigate potential errors this test framework writes logs that are easy to read. Here's an example:
[INFO] Reading out *.yml conversation script files in folder '' in bucket 'io.klerch.alexa.test'
[INFO] Found 1 conversation script files in bucket 'io.klerch.alexa.test'
[INFO] Load conversation script file lab07-final.yml from S3 bucket io.klerch.alexa.test
[START] session start request with sessionId 'SessionId.68943a7e-727c-4f32-b980-85beb8a77a14' ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] session start request.
[START] launch request ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] launch request in 454 ms.
->[TRUE] response.shouldEndSession == false is TRUE.
[START] intent request 'GuessNumber' { number: 5 } ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'GuessNumber' in 446 ms.
->[FALSE] response.outputSpeech.ssml =~ /.*My number is lower.*/i is TRUE.
->[TRUE] response.outputSpeech.ssml =~ /.*My number is higher.*/i is TRUE.
[START] intent request 'GuessNumber' { number: 6 } ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'GuessNumber' in 445 ms.
->[TRUE] response.outputSpeech.ssml =~ /.*My number is higher.*/i is TRUE.
[START] intent request 'GuessNumber' { number: 7 } ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'GuessNumber' in 461 ms.
->[TRUE] response.outputSpeech.ssml =~ /.*My number is higher.*/i is TRUE.
[START] intent request 'GuessNumber' { number: 8 } ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'GuessNumber' in 434 ms.
->[TRUE] response.outputSpeech.ssml =~ /.*My number is higher.*/i is TRUE.
[START] intent request 'GuessNumber' { number: 9 } ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'GuessNumber' in 434 ms.
->[FALSE] response.outputSpeech.ssml =~ /.*My number is higher.*/i is TRUE.
->[TRUE] response.outputSpeech.ssml =~ /.*Great. That’s it.*/i is TRUE.
[START] intent request 'AMAZON.StopIntent' ...
Endpoint: arn:aws:lambda:us-east-1:661661179496:function:io-klerch-alexa-numberguess-skill
[DONE] intent request 'AMAZON.StopIntent' in 501 ms.
->[TRUE] response.shouldEndSession == true is TRUE.
->[FALSE] response.outputSpeech.ssml =~ /.*Great. That’s it.*/i is TRUE.
Set up Login With Amazon
LWA credentials are required by the test client to get access to your skill. You need three things: a client-id, a client-secret and refresh-token from LWA.
1. Go to the Amazon developer console, click on Developer Console in the header section and navigate to Apps & Services
2. Click on Login With Amazon and Create A New Security Profile
3. Give it a name, description and policy-url https://example.com as suggested below and hit Save.
4. Click on Show ClientID and Secret. Copy the client-id and client-secret.
5. Back in the browser, go to Manage -> Web Settings and click on Edit. Set https://example.com as the Allowed Return Url.
6. Browse to
https://www.amazon.com/ap/oa?client_id={clientId}&scope=alexa::ask:skills:test&response_type=code&redirect_uri=https://example.com
after you replaced {clientId} with your clientId you got in step 4.
7. Follow the authentication and authorization procedure. You end up being redirected to example.com. Look at the URL in the address line of your browser. It should contain an authorization code.
8. Take the code you retrieved in the previous step and use it in the following HTTP POST request command in your local shell.
curl -d "client_id={clientId}&client_secret={secret}&code={code}&grant_type=authorization_code&redirect_uri=https://example.com" -H "Content-Type: application/x-www-form-urlencoded" -X POST https://api.amazon.com/auth/o2/token
- where {clientId} needs to be replaced by the clientId you got in step 4.
- where {secret} needs to be replaced by the clientSecret you got in step 4.
- where {code} needs to be replaced by the authorization code you received in step 8.
This curl command returns a JSON payload that contains the refresh token you need.
Side note If you made a mistake in step 8 and retrieve an error it is necessary to return to step 6 as the authorization code is only valid for one request.