<Gather>
<Gather>
input, either via keypad, voice or both.
Join our Discord community - we're here to help.
Description
Gather input from the caller, either through the DTMF keypad or speech, possibly while prompting the caller for input.
Example
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Gather/>
</Response>
Or a more a complete example, passing the caller input to an application server:
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Gather input="dtmf" timeout="3" numDigits="1" action="https://example.com/handleInput">
<Say>Please enter a digit</Say>
</Gather>
</Response>
Attributes
The following attributes are supported:
Attribute Name | Allowed Values | Default Value |
---|---|---|
action | URL (relative or absolute) | current document URL |
method | GET , POST | POST |
input | dtmf , speech , dtmf speech | dtmf |
finishOnKey | 0 -9 , # , * , and "" (the empty string) | # |
numDigits | positive integer | unlimited |
maxTimeout | a positive number of seconds | 30 |
timeout | a positive number of seconds | 5 |
speechTimeout | a positive number of seconds or auto | auto |
speechEngine | aws , google | aws |
language | Language identifier, according to BCP 47 langauge tags. Click here for more information. | en-US |
actionOnEmptyResult | boolean | false |
maxDuration | a positive number of seconds | 300 |
speechDetection | auto , low , normal , high , stt | auto |
interruptible | true , false | true |
Cloudonix supports nested playback verbs inside the <Gather>
verb, so the application can start capturing input from the caller before outputing instructions to the caller. If the caller starts input, <Gather>
will abort the playback. The timeout
and speechTimeout
start counting only once nested playback (if any) ended or was aborted.
The only supported nested verbs are <Say>
, <Play>
,<Pause>
and <Converse>
.
Note: If
<Gather>
completes without detecting any DTMF or speech, theaction
URL would not be invoked and instead the application will run the next verb in the CXML document (or hangup the call if the<Gather>
verb is the last in the CXML document). Also see theactionOnEmptyResult
attribute.
Cloudonix provides support for Amazon Polly and Google AI based transcription. Depending on your chosen engine, you can find a list of the support languages and their respective codes in the following links:
- Google AI Voices
- Amazon Polly Voices - Only streaming capable languages are supported!
Attribute: action
The action
attribute takes an absolute or relative URL as a value. When the caller finishes entering digits (or the timeout is reached), Cloudonix will make a CXML request to this URL.
That request will include the following parameters and CXML request headers:
Parameter Name | Description |
---|---|
CallSid | A unique identifier for this call, generated by Cloudonix. |
AccountSid | Your Cloudonix account ID. |
From | The phone number or client identifier of the party that initiated the call. Phone numbers are formatted with a '+' and country code, e.g. +16175551212 (E.164 format). Client identifiers begin with the client: URI scheme; for example, on a call from a client named 'charlie' the From parameter will be client:charlie. |
To | The phone number or client identifier of the called party. Phone numbers are formatted with a '+' and country code, e.g. +16175551212(E.164 format). Client identifiers begin with the client: URI scheme; for example, for a call to a client named 'joey', the To parameter will be client:joey. |
CallStatus | A descriptive status for the call. The value is one of the following: queued , ringing , in-progress , completed , busy , failed or no-answer . See the CallStatus section below for more details. |
ApiVersion | The version of the Cloudonix API used to handle this call. For incoming calls, this is determined by the API version set on the called number. For outgoing calls, this is the version used by the REST API request from the outgoing call. |
Direction | A string describing the direction of the call: inbound for inbound calls, outbound-api for calls initiated via the REST API, outbound-dial for calls initiated by a Dial verb. |
ForwardedFrom | This parameter is set only when Cloudonix receives a forwarded call, but its value depends on the caller's carrier including information when forwarding. Not all carriers support passing this information. |
CallerName | This parameter is set when the IncomingPhoneNumber that received the call has had its VoiceCallerIdLookup value set to true. |
ParentCallSid | A unique identifier for the call that created this leg. This parameter is not passed if this is the first leg of a call. |
Digits | If the input attribute included dtmf and the caller has entered some digits, this field will include the DTMF digits entered (not including the finish key, if any). |
SpeechResult | If the input attribute included speech and caller speech was detected and transcribed, this field will contain the transcription results. |
If action
was given then after sending the request, Cloudonix will stop running the
current CXML document and will start running the result from the action
URL. If an
action
attribute was not specified, the CXML request with the caller's input will be
sent to the current document's URL, and the resulting CXML document run again - please
note that this may lead to unwanted looping behavior so you should take care to specify
an action
or otherwise handle the possibility of a loop in your application code.
After <Gather>
ends, if the caller did not enter neither DTMF or speech (as requested),
and actionOnEmptyResult
was not set, the application will continue to the next verb in the application without sending anything.
Attribute: method
The method
attribute takes the value GET
or POST
. This sets the method Cloudonix will
use in the CXML request to the action
URL.
Attribute: input
Specify which input
(DTMF or speech) Cloudonix should accept with the input
attribute.
The default input
for <Gather>
is dtmf
. You can set input
to dtmf
, speech
, or
dtmf speech
.
DTMF digits entered will be sent in the Digits
parameter of the application request sent
to the action
URL. If speech was detected, the speech transcription will be sent in the SpeechResult
parameter of the CXML request sent to the action
URL.
Attribute: finishOnKey
finishOnKey
attribute lets you set a value that your can press in order to submit the
pressed digits or speech.
For example, if you set finishOnKey
to #
(or leave it as the default) and the caller enters
1234#
, Cloudonix will immediately stop waiting for more input after they press #
and submit
the parameter Digits=1234
to your action URL.
Note: The
finishOnKey
digit is never submitted in the resutl of<Gather>
Allowed values for this attribute are:
#
(this is the default value)*
- Single digits
0
–9
- A blank string
If you specify a blank string (e.g. <Gather finishOnKey="">
) <Gather>
will capture all caller input and no key will end
the <Gather>
. In this case, we will send an HTTP request to action
URL only after a timeout is breached, or the numDigits
threshold is reached.
Attribute: numDigits
You can set the number of digits you expect from your caller by including numDigits
in <Gather>
. If numDigits
is set,
once the caller enters that many digits, <Gather>
will end immediately and submit the entered digits to the action URL.
Attribute: maxTimeout
How long to wait for the user to first input speech or DTMF. If there is complete silence from the caller for this long
(default: 30 seconds), the <Gather>
verb will complete with an empty result.
Attribute: timeout
How long, in seconds, should <Gather>
wait for the user to input another DTMF digit, before
sending the result to the action
URL.
Attribute: speechTimeout
How long, in seconds, should <Gather>
wait for the user to say something else -
when gathering speech - before sending the result to the action
URL.
Attribute: speechEngine
The speech-to-text service to use. Cloudonix by default uses the standard AWS speech-to-text service, but the application can request to use the standard GCP speech-to-text service which uses a different model that may or may not be better suited for specific use cases.
Both speech-to-text services are limited to a total of 5 minutes of speech input per
<Gather>
verb.
Attribute: actionOnEmptyResult
Whether to send a CXML request to the action
URL even if there is no caller input after
the timeout, instead of running the next verb.
Attribute: maxDuration
The total amount of time to wait for the user to complete input - measured from the end of playing back contained elements. If the user
continuously provides input, when this timer elapses (default: 5 minutes) <Gather>
will complete with a result of
whatever input the user has provided by then.
Attribute: speechDetection
Fine tune the sensitivity of the speech detection - i.e. when does <Gather>
decided that the caller has started and finished talking,
which affects when the timeouts are counted from. The default value should work well for most phone converstaions taking place over an
office/home environments. Set speechDetection
to low
to be less sensitive to noise - which will help with detecting silence in noisy
environments, or high
to be more sensitive to noise, which will help detect speech over bad analog lines. Set to stt
to detect
speech only as recognizable text from the speech-to-text engine - this may ignore speech in unrecognized languages.
Attribute: interruptible
Whether caller input (either dtmf or speech) can interrupt nested playback. The default value of this attribute
is true
- meaning that caller input will interrupt nested playback, such that the currently playing element is
immediately stopped and any following playback elements (as explained at the beginning) are outright cancelled.
Any input that caused the interruption will be recorded and submitted as the result of <Gather>
.
When setting interruptible="false"
, the nested playback will continue until the end and the caller cannot
interrupt it. Further more, any input from the caller - while the nested playback is playing - is ignored
and will not be submitted in the <Gather>
result.
Timeout considerations
Consider the following CXML application:
<?xml version="1.0" encoding="UTF-8">
<Response>
<Gather input="dtmf speech" timeout="3" speechTimeout="5" maxTimeout="10" maxDuration="60">
<Play>https://example.com/welcome-message.mp3</Play>
</Gather>
</Response>
In the above case, <Gather>
will play the welcome message, wait for the playback to complete, and then start a timer.
If the caller didn't say anything, or pressed any DTMF key, after 10 seconds <Gather>
will complete with an empty result and
the application will end; If the caller has punched a DTMF key, within the first 10 seconds, <Gather>
will wait up to 3 more
seconds for another DTMF key (and then 3 more seconds for another after that, up to the number of keys specified by numDigits
)
and whenever there is a lul of more than 3 seconds between DTMF keys - <Gather>
will submit the list of DTMF keys entered so
far as the result; If the caller has said something, within the first 10 seconds, <Gather>
will wait for them to finish speaking
then start a timer and wait up to 5 seconds for the caller to say something else (and then 5 more when they finish speaking again)
and whenever there is a lul of more than 5 seconds between speaking - <Gather>
will submit the text spoken so far as the result;
If the caller speaks continuously for a minute (60 seconds), <Gather>
will complete anyway and submit the first minute worth of
spoken text as the result. In the above case, because no action
was specified, the result will be submitted back to the URL from
which this application was loaded.