Voice, vernacular, victory!

Technology companies and startups are betting big on voice AI models. Experts say Indian firms are more likely to crack it with affordable pricing and local context.

ETtech
Dheemanth Reddy, cofounder, Maya Research when he started his work on the voice AI model, he had one clear objective. He wanted his family, parents and grandparents, to be able to interact with AI in their native tongue – Telugu.

“When AI development took place in the US, what was missing was that 90% of the world still spoke a different language than English. The US and Europe do not represent the entire world, and the only medium for the global south such as India to access this piece of this technology is voice and that is why we started building Maya Research,” he told ET.

Maya Research is one of the few AI companies with Indian founders that are building speech AI models from scratch, along with Pixa AI, and Soket Labs.


Voice AI is touted to be the next frontier in the AI race with big technology companies and startups betting big on the technology. According to Peak XV’s Rajan Anandan, 2026 will be the year of voice AI for India and the Indian companies will win it through affordable pricing and context of local languages.

Funding rounds

Winning the voice, the India way

ADVERTISEMENT
Reddy moved to India last year from New York to build Maya. “Building from the US is like solving India’s problems sitting in the Valley and I will be making the same mistake as everyone else. I must be where the customers are, which is India,” he said.

There are dozens of companies in India that are focusing on this space right now. Reddy agrees that voice AI models are dime-a-dozen. “You go into HSR and throw a stone, you will find a bunch of people building Voice AI,” he quipped. But Maya Research is focusing on building the foundational model, with their own data collection and labelling team.

Reddy explained that the company has data collectors on ground, who have preset rules of how data has to be collected, which are emotional and conversational data. “The way to teach models conversational and emotional ability is to collect the data that has multiple speakers, with various dialects and slangs,” he said.

The company is building a 3 billion parameter model first in English and will soon launch in two Indian languages, Telugu and Hindi, with the plans to expand to Kannada, Tamil and Malayalam next.

ADVERTISEMENT
Sparsh Agrawal, founder, Pixa AI, which has developed speech-to-speech models, said that one of the reasons for starting up is that most of the voice AI models they had used were bland and lacked emotions. In addition, no one model could do speech and music together and that is what they have built at Pixa AI.

By training multiple small language models, which are specialised on music, voice quality and accuracy, and combining them to create a foundational model.

ADVERTISEMENT
Kalpa Labs, a YC-backed company by Indian founders, is also working on a speech model. Sarvam AI has launched their voice AI model, which is speech-to-text, which the company has claimed to have outperformed the current benchmarks.

At the same time, models such as Cartesia are getting better with Gujarati and Marathi and ElevenLabs’ Tamil capabilities are getting better as well, founders told ET.

The reality of voice AI models

With the demand for voice AI solutions increasing, multiple startups are using foundational voice models to deliver services such as call centers, loan collection and support. Most companies have seen the demand grow significantly in the last six months as the quality of voice models improve. This presents a huge opportunity for the voice AI foundational models.

Sudhama Bhatia, cofounder, Travon AI, said that no models are good at all and companies like theirs use a mixture of foundational models to build services on top of them. Travon works with hospitals, recruitment firms and collection agencies for appointment booking, automated candidate screening and AI-driven collections.

Prateek Sachan, founder, Bolna AI, a voice AI orchestration platform said that India is a consumer market and recent times have seen multilingual capabilities evolve. The company also uses Sarvam as a part of models it is offering, along with global models.

It helps companies bring down the pricing. Currently, voice AI services cost anywhere from Rs 4-10 per minute, including telephony charge and platform fee. Rajan Anandan, managing director, Peak XV said in a recent interaction that using Indian voice AI companies can bring the fee down to Rs 3 per minute.

The challenge is that there are a handful of Indian companies that are working on them. Multiple founders ET spoke to admitted that there are not many Indians models that are currently available to use at the scale of global models.

One of the founders, on the condition of anonymity, told ET that while there are many companies working on speech models, there are multiple limitations to them.

“For one, they are not proven yet as they are new and unclear how they will handle interruptions and application programming interface calls,” the founder, quoted above, said.

“It will be hard to get enterprise scale adoption at this point as customers will want to trust platforms that have scaled,” said another.

Pixa AI’s Agrawal agreed that enterprise scale adoption is harder and the company is currently focusing on entertainment and companionship use cases. “But eventually everyone will get into this,” he added.

Challenges

This apart, the voice AI models deployment and development are still wrought with issues.

“The biggest challenge is that even the best global models talk gibberish when it comes to Indian languages such as Hindi. They might say one sentence correctly but the next time it spits out unintelligible sentences. This is a problem even with top tier models,” pointed out Bhatia.

Hallucination continues to be an issue as well. “No matter how many times you train the model, this continues and that is something one needs to optimise for,” he added.

Agrawal said that building voice AI models from ground up requires talent, which are expensive for the Indian market and are hard to come by. “The difference between exceptional and great is 100X. Just great is not enough. How do you convince exceptional talent to join you? That is a huge challenge,” he added.
Download
The Economic Times Business News App
for the Latest News in Business, Sensex, Stock Market Updates & More.
Download
The Economic Times News App
for Quarterly Results, Latest News in ITR, Business, Share Market, Live Sensex News & More.
READ MORE
ADVERTISEMENT

READ MORE:

LOGIN & CLAIM

50 TIMESPOINTS

More from our Partners

Loading next story
Business News › Tech › AI › Voice, vernacular, victory!
Text Size:AAA
Success
This article has been saved

*

+