Skip to main content

<Say>

TL;DR

<Say> a provided text using text-to-speech.

Need Help? Let's Talk

Join our Discord community - we're here to help.

Description

The <Say> verb converts text to speech that is read back to the caller.

Example

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="man" language="en">Hello World</Say>
</Response>

An example of <Say> using SSML for describing the speech output may look as follows:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>
Here is an <say-as interpret-as="characters">SSML</say-as> example. You can pause <break time="3s"/>
the text, play a sound such as <audio src="https://www.example.com/MY_MP3_FILE.mp3">(could not play audio file)</audio>
or use other commands as specified in the Speech Synthesis Markup Language Version 1.1 specification.
</Say>
</Response>

Attributes

The following attributes are supported:

Attribute NameAllowed ValuesDefault Value
answertrue, falsetrue
loopA number that is 0 or greater1
voiceman, woman or See Premium Voiceswoman
languageSee Premium Voicesen-US
statusCallbackURLnone
statusCallbackMethodPOST or GETPOST

Attribute: answer

When set to false, and the call was not yet answered by another operation (Dial, for example, does not cause a call to be answered by itself - until the receiver answers the call), then the <Say> verb will cause the specified media to be played using "early media" (SIP response code 183) without answering the call.

DID YOU KNOW...

Please note, the usage of the <Say> verb with the answer = false that is followed by a <Reject> verb will generate a billable event.

Attribute: loop

How many times to repeat the same text to the caller.

Attribute: voice

Which voice model to use for generating the synthesized voice. Additional models may be offered in the future.

Attribute: language

In which language, of those supported, to generate the speech in. The language is a hint to the speech syntehsizer, where the text must actually be written in the specified language - no translation will be done on the text before performing speech synthesis.

Attribute: statusCallback

A URL to be called when the audio output has completed playing. This URL will be called with all the parameters of a standard CXML request, but its output is discarded.

Attribute: statusCallbackMethod

The HTTP method to use for the statusCallback URL.

We encourage you to use the <Say> verb in development. Production services are encouraged to use studio quality recordings.

Cloudonix supports SSML as the content of the <Say> element, where the <Say> element replaces the SSML document element <speak> (i.e. the content of the <speak> element from an SSML document can be used as is as the content of a <Say> element), with the following exceptions:

  • <lexicon> and <lookup> are unsupported and it is an error to include these in the SSML content.
  • <emphasis> and <prosody> are handled as simple text.
  • <phoneme> will pronounce the display content of the element (if exists) instead of the IPA code.

Premium Voices

Cloudonix supports multiple Text-To-Speech engines, which are supported Over-The-Top (OTT) or as Bring-Your-Own-Voice (BYOV).

Over The Top Voices

Cloudonix provides unified access to its built-in voice engines. Currently, the supported engines are: aws - Amazon Polly, and gcp - Google AI Voices. By default, Cloudonix uses Amazon Polly, with a female voice - however, you may change that.

Example - Google Voice AI with Female Journey Voice

<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="Google:en-US-Journey-F" language="en-US">Hello World</Say>
</Response>
Example - Amazon Polly with Male Neural Voice
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say voice="Polly:Gregory" language="en-US">Hello World</Say>
</Response>

List of supported Over-The-Top voices

Bring Your Own Voice

BYOV feature allows you to pay for TTS services by providing your own credentials. Some of the TTS services are available only through BYOV. In order to provide your own credentials for the TTS service you want to use, login to the Cloudonix Cockpit and use the 3rd-party Authorizations settings page to add the passwords or API keys you have received from the TTS service.

Retrieving list of voices

After the credentials have been set up properly, you can use the Cloudonix REST API to retrieve a list of all voices that are available through the BYOV feature - by using the REST API endpoint /domains/{domain}/resources/voices.

The result will be an array of JSON objects, one for each TTS voice that you can use, as per the credentials you have configured. For each JSON object, the following properties are displayed:

Property NameDescription
providerThe name of the Text-To-Speech (TTS) service provider through which this voice is available.
voiceThe value to use with the <Say> verb's voice attribute to use this voice.
languagesAn array of language codes, any of which can be used for the <Say> verb's language attribute, with this voice.
genderA description of the gender this voice may sound like.
pricingThe Cloudonix pricing for this voice. Either:
standard - included in the Cloudonix billing plan
premium - consumes AI "usage minutes"
customer-pay - available through customer provided 3rd-party credentials

Example

$ curl 'https://api.cloudonix.io/domains/cloudonix-demo-customer.cloudonix.net/resources/voices'
--header 'Authorization: Bearer XI•••••••••••••••'

[
{
"voice": "AWS:Patrick",
"gender": "male",
"languages": [
"en-US"
],
"provider": "Polly",
"pricing": "customer-pay"
},

{
"voice": "AWS-Neural:Inês",
"gender": "female",
"languages": [
"pt-PT"
],
"provider": "Neural",
"pricing": "customer-pay"
},

{
"voice": "Eleven:Eric",
"gender": "male",
"languages": [

],
"provider": "Eleven",
"pricing": "customer-pay"
},

{
"voice": "Azure:en-AU-CarlyNeural",
"gender": "female",
"languages": [
"en-AU"
],
"provider": "Azure",
"pricing": "customer-pay"
},

{
"voice": "Google:da-DK-Wavenet-C",
"gender": "male",
"languages": [
"da-DK"
],
"provider": "Google",
"pricing": "customer-pay"
},

]