Alexa Skill development and testing with Java

This article is supposed to give a brief overview over Amazon Alexa Skills development with the Alexa Java API. The mayority of tutorials on Alexa skills appear to be targeted on node.js developers, so I would like to highlight the Java way and point out some things that I missed in the official trainings and some things that I would have solved differently.

For demonstration purposes, I wrote a simple fun application which finds crew members on the enterprise spaceship. A user could ask Alexa questions like “Where is captain Picard” or “Ask Enterprise where Captain Picard is” – so this application makes perfectly no sense, but it demonstrates everything a developer has to know to implement own basic skills.

The Speechlet interface

Providing an Alexa-enabled applikation requires the developer to provide an implementation to the Speechlet interface, which is:

The functions are quite straightforward – the session related functions handle init and cleanup work for the task of instantiating or terminating a session, which in the Alexa domian the the lifetime of a conversation with the user. OnIntent gets invoked on any voice interaction the Alexa backend is able to map to an intent based on the predefined utterances schema.

Lets take our nonsense Enterprise crew resolver:

We deliberately do not deliver cards to the customers application, if a card is required, there is another signature of the newTellResponse method. As we do not have access to the board computer and there is no Amazon Alexa service for spaceships or even a region outside the the earth atmosphere yet, we inject a mock resolver for testing purposes.

Mocking an Alexa call for jUnit

Testing the data providers behind the Alexa API behaves as good as everyday testing, but mocking an Alexa request does not appear to be part of the primary feature set of the API, which means that we have to completely mock the request before passing it to our handler.

Fortunately, Amazon used a library related to immutables.net for their API, so it is possible to handcraft an IntentRequest which closely resembles an actual search request for Captain Picard as following:

What I would like to be changed in the Java API

Builders all the way

Builders are good. Please use them on the response types, aswell. For example, the code below feels very, very 90s:

Way too much ceremony. What I would like to have written without providing own facades is:

Easy testing

Mocking a request for semi-end to end testing like in the example above works, but it is not really comfortable. I would appreciate a function which exports the request to a JSON file, together with a corresponting input function. This would make it easy to mock the request without using the builders directly.

Besides, once a speech has been created, it is not possible to extract the speech text out of it without applying dirty reflection to break the private property barrier. Why not just provide a getSsml() member to make e2e testers happy?

Plaintext or Ssml?

Honestly, I do not want to use the Plaintext response at all. Ssml is a superset of Plaintext and allows more detailed control on the way Alexa text-to-speech works, for instance if it is a requirement to spell out a word instead of speaking it. So, why not just use Ssml all the way and improve the speech renderer so it does not crash if no <speech></speech> tags are present?

dreese.de v8 (c)1994-2018 Manfred Dreese