Platform · Languages & voice engine
Built for how India actually talks.
Pick a primary language and Vocily AI wires the speech-to-text model, voice, and fallback chain to match. 17+ Indian languages, two code-mix auto-modes, and per-language voice binding so your bot stops sounding like an English voice attempting Hindi.
Problem · Solution
The problem today
Most voice AI platforms treat Indian languages as an afterthought — a Hindi voice that's really an English voice with an accent, an STT model that mangles 'Bansal' and 'Bhansali' the same way, and zero awareness when a caller code-mixes mid-sentence. The result is bots that sound foreign on the very calls that need to feel local. Provider config is the other half of the pain: choosing between Sarvam and Deepgram, picking the right Cartesia voice for Tamil, configuring fallbacks if one provider hiccups — every team rebuilds this stack from scratch.
How Vocily AI handles it
Language-first agent setup
Pick your callers' primary language and Vocily AI recommends the best STT model, the right TTS voice, and a sensible fallback. Override anything; the defaults are coherent.
17+ Indian languages with code-mix auto-modes
Hindi, English (Indian/US/UK), Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, Urdu, Assamese — plus Hinglish auto and Indian Multilingual auto powered by Sarvam saaras:v3.
Mid-call language switching
When a caller flips from English to Hindi at turn three, the agent follows — hot-swapping STT and the TTS voice mid-conversation. No reconnect, no awkward pause.
Per-language TTS voice mapping
Bind a native Hindi voice for Hindi turns and a separate English voice for English turns inside the same agent. Stops the 'English voice attempting Hindi' problem.
Vendor retry + automatic fallback
When a provider hiccups, Vocily AI retries silently with a spoken 'one moment' so callers never hear dead silence. If Sarvam lags more than 5 seconds, Deepgram picks up — buffered audio replayed, switchover silent.
What's in it
What the voice engine ships with.
The configuration surface that lets you tune an agent for real Indian conversations — not just toggle on a generic 'Hindi mode.'
Languages live
17+ Indian and English variants supported out of the box.
- Hindi
- Native Hindi STT + voices.
- English
- Indian English, US, UK accents.
- Indian languages
- Bengali · Gujarati · Kannada · Malayalam · Marathi · Odia · Punjabi · Tamil · Telugu · Urdu · Assamese
- Code-mix modes
- Hinglish auto (mixes Hindi + English mid-sentence) and Indian Multilingual auto (handles whatever the caller throws).
- Engine
- Sarvam saaras:v3 for code-mix; provider-best for monolingual.
Voice & accent
Voices tuned per language, not borrowed across them.
- Per-language voice
- Bind a distinct voice per language inside the same agent.
- Accent profiles
- Indian English accents available where the customer matters.
- Providers routed
- Cartesia · ElevenLabs · Sarvam · Smallest — picked per language by default, overridable.
- Pronunciation dictionary
- Custom word → spoken-form mapping, provider-agnostic. Brand names, customer surnames, technical jargon.
Accuracy on your domain
STT trained on your vocabulary, not just generic Indian speech.
- Keyword boosting
- Custom vocabulary bias per agent — brand names, product SKUs, customer-name lists, domain terms.
- Examples
- 'Vocily AI', 'Cal.com', customer-name CSV, product codes — never get garbled.
- Scope
- Per-agent, so different agents in the same workspace can bias different vocabularies.
Number, currency & date formatting
How the agent speaks numbers — natural or formal — per agent.
- Phone numbers
- Digit-by-digit or grouped.
- Currency
- 'twenty-three fifty' vs 'twenty-three rupees and fifty paise' vs '₹23.50'.
- Dates
- 'March 5th' vs 'the fifth of March' vs '5/3/2026'.
- Times
- 12-hour, 24-hour, conversational ('half past three').
LLM model choice
Pick the reasoning model that fits the use case and budget. STT, TTS, and LLM all run on Vocily-managed provider routing.
- OpenAI
- GPT-4o-mini · GPT-4o · GPT-5.4-mini — selectable per agent.
- Anthropic
- Claude on the Vocily-managed routing layer.
- Gemini on the Vocily-managed routing layer.
- Automatic fallback
- Configure a backup model; if the primary errors or times out, the platform switches mid-turn.
- Pricing
- Per-minute voice and per-message chat — model cost included in the rate. No separate provider bills to manage.
Resilience & low-latency modes
What happens when providers blink — and how to skip the pipeline when you need speed.
- Vendor retry
- Automatic retry on transient STT/TTS/LLM failures with a spoken 'one moment' so callers don't hear silence.
- STT fallback chain
- Lag-triggered: if primary STT falls 5s+ behind, a backup provider takes over with buffered audio replayed.
- Speech-to-Speech mode
- Ultra-low-latency S2S via OpenAI Realtime or Gemini Live — skip STT → LLM → TTS for native interruption handling.
Common questions
What teams ask before they switch.
Yes — if the caller flips from English to Hindi between turns, the agent matches: STT swaps, the TTS voice swaps, the call continues. No reconnect, no replay.