We Use Cookies!!!
We use cookies to ensure that we give you the best experience on our website. Read cookies policies.
Leading NLP business wants to make their speech assistant IVR solution available to help a variety of Indian language-speaking customers as well. This company has previously developed voice bots in the call center sector to assist customers over a call in various native languages. Our assistance includes a total of 250 hours of speech data in five distinct languages.
This company's NLP solutions are well known in the industry. This AI business specializes in the speech sector and offers a range of services in the areas of voice bots, natural language processing, and automatic speech recognition. With approximately 90% accuracy, their voice bot solutions are now accessible in English and a few other languages. However, they now aim to develop their AI model to also understand Indian languages.
The customer has extremely precise needs for 250 hours of call center speech data in Tamil, Hindi, Malayalam, Marathi, and Gujarati from a variety of industries, including travel, retail and e-Commerce, delivery and logistics, BFSI, and healthcare, with each conversation lasting 10–15 minutes. Various service/issue scenarios relevant to that industry should be covered in this call, along with both positive and unfavorable client experiences. Additionally, the client requests data on various speech accents in every language. This means we have to arrange the data from different states and locations of the native language-speaking places.
We are competent to handle this request thanks to our expertise working on comparable problems. With our readily available, off-the-shelf voice data in practically all languages, we first assisted clients. Clients were able to test the data rapidly and conduct some preliminary testing to complete the requirement using OTS speech data with transcription. Clients want to acquire custom data for the remaining batch, therefore we involve our global community in that. In a 3 week period with one layer of QA, we acquired 50 hours of speech data in each Gujarati, Hindi, Tamil, Malayalam, and Marathi.
We gave all speech data audio files in individual channels as well as composite channels, together with user-specific metadata like age, gender, location, and accent of the speaker, to make it more helpful. All due to our in-house developed YUGO voice data collection tool.
Total Speech Hours
Languages
Collection Time