`<Gather>`

TL;DR

<Gather> input, either via keypad, voice or both.

Need Help? Let's Talk

Join our Discord community - we're here to help.

Description

Gather input from the caller, either through the DTMF keypad or speech, possibly while prompting the caller for input.

Example

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Gather/>
</Response>

Or a more a complete example, passing the caller input to an application server:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Gather input="dtmf" timeout="3" numDigits="1" action="https://example.com/handleInput">
    <Say>Please enter a digit</Say>
  </Gather>
</Response>

Attributes

The following attributes are supported:

Attribute Name	Allowed Values	Default Value
`action`	URL (relative or absolute)	current document URL
`method`	`GET`, `POST`	`POST`
`input`	`dtmf`, `speech`, `dtmf speech`	`dtmf`
`finishOnKey`	`0`-`9`, `#`, `*`, and "" (the empty string)	`#`
`numDigits`	positive integer	unlimited
`maxTimeout`	a positive number of seconds	`30`
`timeout`	a positive number of seconds	`5`
`speechTimeout`	a positive number of seconds or `auto`	`auto`
`speechEngine`	`aws`, `google`	`aws`
`language`	Language identifier, according to BCP 47 langauge tags. Click here for more information.	`en-US`
`actionOnEmptyResult`	boolean	`false`
`maxDuration`	a positive number of seconds	`300`
`speechDetection`	`auto`, `low`, `normal`, `high`, `stt`	`auto`
`interruptible`	`true`, `false`	`true`

Cloudonix supports nested playback verbs inside the <Gather> verb, so the application can start capturing input from the caller before outputing instructions to the caller. If the caller starts input, <Gather> will abort the playback. The timeout and speechTimeout start counting only once nested playback (if any) ended or was aborted.

The only supported nested verbs are <Say>, <Play>,<Pause> and <Converse>.

Note: If <Gather> completes without detecting any DTMF or speech, the action URL would not be invoked and instead the application will run the next verb in the CXML document (or hangup the call if the <Gather> verb is the last in the CXML document). Also see the actionOnEmptyResult attribute.

Supported Languages

Cloudonix provides support for Amazon Polly and Google AI based transcription. Depending on your chosen engine, you can find a list of the support languages and their respective codes in the following links:

Google AI Voices
Amazon Polly Voices - Only streaming capable languages are supported!

Attribute: `action`

The action attribute takes an absolute or relative URL as a value. When the caller finishes entering digits (or the timeout is reached), Cloudonix will make a CXML request to this URL. That request will include the following parameters and CXML request headers:

Parameter Name	Description
`CallSid`	A unique identifier for this call, generated by Cloudonix.
`AccountSid`	Your Cloudonix account ID.
`From`	The phone number or client identifier of the party that initiated the call. Phone numbers are formatted with a '+' and country code, e.g. +16175551212 (E.164 format). Client identifiers begin with the client: URI scheme; for example, on a call from a client named 'charlie' the From parameter will be client:charlie.
`To`	The phone number or client identifier of the called party. Phone numbers are formatted with a '+' and country code, e.g. +16175551212(E.164 format). Client identifiers begin with the client: URI scheme; for example, for a call to a client named 'joey', the To parameter will be client:joey.
`CallStatus`	A descriptive status for the call. The value is one of the following: `queued`, `ringing`, `in-progress`, `completed`, `busy`, `failed` or `no-answer`. See the `CallStatus` section below for more details.
`ApiVersion`	The version of the Cloudonix API used to handle this call. For incoming calls, this is determined by the API version set on the called number. For outgoing calls, this is the version used by the REST API request from the outgoing call.
`Direction`	A string describing the direction of the call: `inbound` for inbound calls, `outbound-api` for calls initiated via the REST API, `outbound-dial` for calls initiated by a `Dial` verb.
`ForwardedFrom`	This parameter is set only when Cloudonix receives a forwarded call, but its value depends on the caller's carrier including information when forwarding. Not all carriers support passing this information.
`CallerName`	This parameter is set when the `IncomingPhoneNumber` that received the call has had its `VoiceCallerIdLookup` value set to true.
`ParentCallSid`	A unique identifier for the call that created this leg. This parameter is not passed if this is the first leg of a call.
`Digits`	If the `input` attribute included `dtmf` and the caller has entered some digits, this field will include the DTMF digits entered (not including the finish key, if any).
`SpeechResult`	If the `input` attribute included `speech` and caller speech was detected and transcribed, this field will contain the transcription results.

If action was given then after sending the request, Cloudonix will stop running the current CXML document and will start running the result from the action URL. If an action attribute was not specified, the CXML request with the caller's input will be sent to the current document's URL, and the resulting CXML document run again - please note that this may lead to unwanted looping behavior so you should take care to specify an action or otherwise handle the possibility of a loop in your application code.

After <Gather> ends, if the caller did not enter neither DTMF or speech (as requested), and actionOnEmptyResult was not set, the application will continue to the next verb in the application without sending anything.

Attribute: `method`

The method attribute takes the value GET or POST. This sets the method Cloudonix will use in the CXML request to the action URL.

Attribute: `input`

Specify which input (DTMF or speech) Cloudonix should accept with the input attribute.

The default input for <Gather> is dtmf. You can set input to dtmf, speech, or dtmf speech.

DTMF digits entered will be sent in the Digits parameter of the application request sent to the action URL. If speech was detected, the speech transcription will be sent in the SpeechResult parameter of the CXML request sent to the action URL.

Attribute: `finishOnKey`

finishOnKey attribute lets you set a value that your can press in order to submit the pressed digits or speech.

For example, if you set finishOnKey to # (or leave it as the default) and the caller enters 1234#, Cloudonix will immediately stop waiting for more input after they press # and submit the parameter Digits=1234 to your action URL.

Note: The finishOnKey digit is never submitted in the resutl of <Gather>

Allowed values for this attribute are:

# (this is the default value)
*
Single digits 0–9
A blank string

If you specify a blank string (e.g. <Gather finishOnKey="">) <Gather> will capture all caller input and no key will end the <Gather>. In this case, we will send an HTTP request to action URL only after a timeout is breached, or the numDigits threshold is reached.

Attribute: `numDigits`

You can set the number of digits you expect from your caller by including numDigits in <Gather>. If numDigits is set, once the caller enters that many digits, <Gather> will end immediately and submit the entered digits to the action URL.

Attribute: `maxTimeout`

How long to wait for the user to first input speech or DTMF. If there is complete silence from the caller for this long (default: 30 seconds), the <Gather> verb will complete with an empty result.

Attribute: `timeout`

How long, in seconds, should <Gather> wait for the user to input another DTMF digit, before sending the result to the action URL.

Attribute: `speechTimeout`

How long, in seconds, should <Gather> wait for the user to say something else - when gathering speech - before sending the result to the action URL.

Attribute: `speechEngine`

The speech-to-text service to use. Cloudonix by default uses the standard AWS speech-to-text service, but the application can request to use the standard GCP speech-to-text service which uses a different model that may or may not be better suited for specific use cases.

Both speech-to-text services are limited to a total of 5 minutes of speech input per <Gather> verb.

Attribute: `actionOnEmptyResult`

Whether to send a CXML request to the action URL even if there is no caller input after the timeout, instead of running the next verb.

Attribute: `maxDuration`

The total amount of time to wait for the user to complete input - measured from the end of playing back contained elements. If the user continuously provides input, when this timer elapses (default: 5 minutes) <Gather> will complete with a result of whatever input the user has provided by then.

Attribute: `speechDetection`

Fine tune the sensitivity of the speech detection - i.e. when does <Gather> decided that the caller has started and finished talking, which affects when the timeouts are counted from. The default value should work well for most phone converstaions taking place over an office/home environments. Set speechDetection to low to be less sensitive to noise - which will help with detecting silence in noisy environments, or high to be more sensitive to noise, which will help detect speech over bad analog lines. Set to stt to detect speech only as recognizable text from the speech-to-text engine - this may ignore speech in unrecognized languages.

Attribute: `interruptible`

Whether caller input (either dtmf or speech) can interrupt nested playback. The default value of this attribute is true - meaning that caller input will interrupt nested playback, such that the currently playing element is immediately stopped and any following playback elements (as explained at the beginning) are outright cancelled. Any input that caused the interruption will be recorded and submitted as the result of <Gather>.

When setting interruptible="false", the nested playback will continue until the end and the caller cannot interrupt it. Further more, any input from the caller - while the nested playback is playing - is ignored and will not be submitted in the <Gather> result.

Timeout considerations

Consider the following CXML application:

<?xml version="1.0" encoding="UTF-8">
<Response>
  <Gather input="dtmf speech" timeout="3" speechTimeout="5" maxTimeout="10" maxDuration="60">
    <Play>https://example.com/welcome-message.mp3</Play>
  </Gather>
</Response>

In the above case, <Gather> will play the welcome message, wait for the playback to complete, and then start a timer. If the caller didn't say anything, or pressed any DTMF key, after 10 seconds <Gather> will complete with an empty result and the application will end; If the caller has punched a DTMF key, within the first 10 seconds, <Gather> will wait up to 3 more seconds for another DTMF key (and then 3 more seconds for another after that, up to the number of keys specified by numDigits) and whenever there is a lul of more than 3 seconds between DTMF keys - <Gather> will submit the list of DTMF keys entered so far as the result; If the caller has said something, within the first 10 seconds, <Gather> will wait for them to finish speaking then start a timer and wait up to 5 seconds for the caller to say something else (and then 5 more when they finish speaking again) and whenever there is a lul of more than 5 seconds between speaking - <Gather> will submit the text spoken so far as the result; If the caller speaks continuously for a minute (60 seconds), <Gather> will complete anyway and submit the first minute worth of spoken text as the result. In the above case, because no action was specified, the result will be submitted back to the URL from which this application was loaded.

Description​

Example​

Attributes​

Attribute: action​

Attribute: method​

Attribute: input​

Attribute: finishOnKey​

Attribute: numDigits​

Attribute: maxTimeout​

Attribute: timeout​

Attribute: speechTimeout​

Attribute: speechEngine​

Attribute: actionOnEmptyResult​

Attribute: maxDuration​

Attribute: speechDetection​

Attribute: interruptible​

Timeout considerations​