<Say>
<Say>
a provided text using text-to-speech.
Join our Discord community - we're here to help.
Description
The <Say>
verb converts text to speech that is read back to the caller.
Example
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="man" language="en">Hello World</Say>
</Response>
An example of <Say>
using SSML for describing the speech output may look as follows:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>
Here is an <say-as interpret-as="characters">SSML</say-as> example. You can pause <break time="3s"/>
the text, play a sound such as <audio src="https://www.example.com/MY_MP3_FILE.mp3">(could not play audio file)</audio>
or use other commands as specified in the Speech Synthesis Markup Language Version 1.1 specification.
</Say>
</Response>
Attributes
The following attributes are supported:
Attribute Name | Allowed Values | Default Value |
---|---|---|
answer | true , false | true |
loop | A number that is 0 or greater | 1 |
voice | man , woman or See Premium Voices | woman |
language | See Premium Voices | en-US |
statusCallback | URL | none |
statusCallbackMethod | POST or GET | POST |
Attribute: answer
When set to false
, and the call was not yet answered by another operation (Dial
, for example, does not cause a call to be answered by itself - until the receiver answers the call), then the <Say>
verb will cause the specified media to be played using "early media" (SIP response code 183
) without answering the call.
Please note, the usage of the <Say>
verb with the answer = false
that is followed by a <Reject>
verb will generate a billable event.
Attribute: loop
How many times to repeat the same text to the caller.
Attribute: voice
Which voice model to use for generating the synthesized voice. Additional models may be offered in the future.
Attribute: language
In which language, of those supported, to generate the speech in. The language is a hint to the speech syntehsizer, where the text must actually be written in the specified language - no translation will be done on the text before performing speech synthesis.
Attribute: statusCallback
A URL to be called when the audio output has completed playing. This URL will be called with all the parameters of a standard CXML request, but its output is discarded.
Attribute: statusCallbackMethod
The HTTP method to use for the statusCallback
URL.
We encourage you to use the <Say>
verb in development. Production services are encouraged to use studio quality recordings.
Cloudonix supports SSML as the content of the <Say>
element, where the <Say>
element replaces the SSML document element <speak>
(i.e. the content of the <speak>
element from an SSML document can be used as is as the content of a <Say>
element), with the following exceptions:
<lexicon>
and<lookup>
are unsupported and it is an error to include these in the SSML content.<emphasis>
and<prosody>
are handled as simple text.<phoneme>
will pronounce the display content of the element (if exists) instead of the IPA code.
Premium Voices
Cloudonix supports multiple Text-To-Speech engines, which are supported Over-The-Top (OTT) or as Bring-Your-Own-Voice (BYOV).
Over The Top Voices
Cloudonix provides unified access to its built-in voice engines. Currently, the supported engines are: aws
- Amazon Polly, and gcp
- Google AI Voices.
By default, Cloudonix uses Amazon Polly, with a female voice - however, you may change that.
Example - Google Voice AI with Female Journey Voice
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="Google:en-US-Journey-F" language="en-US">Hello World</Say>
</Response>
Example - Amazon Polly with Male Neural Voice
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="Polly:Gregory" language="en-US">Hello World</Say>
</Response>
List of supported Over-The-Top voices
Bring Your Own Voice
BYOV feature allows you to pay for TTS services by providing your own credentials. Some of the TTS services are available only through BYOV. In order to provide your own credentials for the TTS service you want to use, login to the Cloudonix Cockpit and use the 3rd-party Authorizations settings page to add the passwords or API keys you have received from the TTS service.
Retrieving list of voices
After the credentials have been set up properly, you can use the Cloudonix REST API to retrieve a list of all voices that are
available through the BYOV feature - by using the REST API endpoint /domains/{domain}/resources/voices
.
The result will be an array of JSON objects, one for each TTS voice that you can use, as per the credentials you have configured. For each JSON object, the following properties are displayed:
Property Name | Description |
---|---|
provider | The name of the Text-To-Speech (TTS) service provider through which this voice is available. |
voice | The value to use with the <Say> verb's voice attribute to use this voice. |
languages | An array of language codes, any of which can be used for the <Say> verb's language attribute, with this voice. |
gender | A description of the gender this voice may sound like. |
pricing | The Cloudonix pricing for this voice. Either:standard - included in the Cloudonix billing planpremium - consumes AI "usage minutes"customer-pay - available through customer provided 3rd-party credentials |
Example
$ curl 'https://api.cloudonix.io/domains/cloudonix-demo-customer.cloudonix.net/resources/voices'
--header 'Authorization: Bearer XI•••••••••••••••'
[
{
"voice": "AWS:Patrick",
"gender": "male",
"languages": [
"en-US"
],
"provider": "Polly",
"pricing": "customer-pay"
},
…
{
"voice": "AWS-Neural:Inês",
"gender": "female",
"languages": [
"pt-PT"
],
"provider": "Neural",
"pricing": "customer-pay"
},
…
{
"voice": "Eleven:Eric",
"gender": "male",
"languages": [
],
"provider": "Eleven",
"pricing": "customer-pay"
},
…
{
"voice": "Azure:en-AU-CarlyNeural",
"gender": "female",
"languages": [
"en-AU"
],
"provider": "Azure",
"pricing": "customer-pay"
},
…
{
"voice": "Google:da-DK-Wavenet-C",
"gender": "male",
"languages": [
"da-DK"
],
"provider": "Google",
"pricing": "customer-pay"
},
…
]