Go back

Tamil (India) Scripted Monologue Speech Dataset for Retail & E-commerce Domain

The audio dataset comprises scripted monologue speech data in Retail & E-commerce domain, featuring native Tamil speakers from India. It includes speech data, detailed metadata, and accurate transcriptions.

Total Volume

6000+ prompts

Last updated

July 2024

Number of participants

60+

Get this Speech Dataset

Tamil (India)scripted monologue AI dataset in Retail & E-commerce domain

Download

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Tamil Scripted Monologue Speech Dataset for the Retail & E-commerce Domain. This meticulously curated dataset is designed to advance the development of Tamil language speech recognition models, particularly for the Retail & E-commerce industry.

Speech Data

This training dataset comprises over 6,000 high-quality scripted prompt recordings in Tamil. These recordings cover various topics and scenarios relevant to the Retail & E-commerce domain, designed to build robust and accurate customer service speech technology.

•Participant Diversity:

•

Speakers: 60 native Tamil speakers from different regions of India.

•

Regions: Ensures a balanced representation of Tamil accents, dialects, and demographics.

•

Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:

•

Recording Nature: Audio recordings of scripted prompts/monologues.

•

Audio Duration: Average duration of 5 to 30 seconds per recording.

•

Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.

•

Environment: Recordings are conducted in quiet settings without background noise and echo.

•

Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Retail & E-commerce sector. Topics include:

•Customer Service Interactions

•Order and Payment Processes

•Product and Service Inquiries

•Technical Support

•General Information and Advice

•Promotional and Sales Events

•Domain Specific Statements

•

Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Retail & E-commerce interactions:

•

Names: Region-specific names of males and females in various formats.

•

Addresses: Region-specific addresses in different spoken formats.

•

Dates & Times: Inclusion of date and time in various retail and e-commerce contexts, such as delivery dates or promotional periods.

•

Product Names: Specific names of products, brands, and categories relevant to the retail sector.

•

Numbers & Prices: Various numbers and prices related to product quantities, discounts, and transaction amounts.

•

Order IDs and Tracking Numbers: Inclusion of order identification and tracking information for realistic customer service scenarios.

Each scripted prompt is crafted to reflect real-life scenarios encountered in the Retail & E-commerce domain, ensuring applicability in training robust natural language processing and speech recognition models.

Transcription Data

In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

•

Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.

•

Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.

•

Quality: All transcriptions are verified for accuracy and consistency by native Tamil transcribers.

Metadata

The dataset provides comprehensive metadata for each audio recording and participant:

•

Participant Metadata: Unique identifier, age, gender, country, state, and dialect.

•

Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, and file format.

This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Tamil language speech recognition models.

Usage and Applications

This dataset is a versatile resource for various applications within speech recognition, natural language processing, and AI-driven conversational technologies.

•

Speech Recognition Model Training: High-quality audio recordings and precise transcriptions for training and fine-tuning Tamil speech recognition models.

•

Voice Synthesis: The diverse and high-quality audio data can train generative AI models for creating synthetic voices.

•

Voice Assistants: Ideal for training voice assistants tailored to the Retail & E-commerce domain.

•

Chatbots: Transcription data can train conversational models, enabling chatbots to respond to customer queries effectively.

•

Entity Recognition: Sentences include names, dates, currencies, and other domain-specific entities for training NLP models for named entity recognition (NER) tasks.

•

Language Understanding: Improve language understanding applications like sentiment analysis and topic modeling within the Retail & E-commerce sector.

Secure and Ethical Collection

•Our proprietary data collection and transcription platform, “Yugo” was used throughout the dataset creation process.

•Data remained within our secure platform, ensuring data security and confidentiality.

•The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•The dataset does not include any personally identifiable information about any participant, making it safe to use.

License

This Tamil Scripted Monologue Speech Dataset, created by FutureBeeAI, is available for commercial use.

Use Cases

Use of scripted speech monologues datasets for Automatic Speech Recognition

ASR

Conversational AI

Chatbot

Language modelling

Use of scripted speech monologues datasets for TTS

TTS

Speech Analytics

Dataset Sample(s)

TRANSCRIPTION

SPEAKER	DURATION	TRANSCRIPT
Female(31)	00:00:05	ஷாப்பிங் செல்லும்போது, டயப்பர்களில் கூட பல தேர்வுகள் உள்ளன.

Dataset Demographics

Language

Tamil

Language code

ta-in

Country

India

Accents

Kongu Tamil,...more

Gender Distribution

M:60, F:40

Age Group

18-70

Audio File Details

Environment

Silent

Bit Depth

16 bit

Sample rate

8KHz & 16KHz

Channel

Mono

Audio file duration

5 to 30 seconds

Read the License Terms

Browse FAQs

Download Sample Speech Dataset Now!

Explore Audio Data, Metadata and Transcription to get more clarity and hands on experience of this dataset.

Download Free Dataset

Similar to Retail & E-Commerce Scripted Monologue Speech Datasets

Scripted sentence recording dataset for conversational AI for Retail & E-commerce domain in French (France)

French Retail Scripted Monologue Speech Data

Recordings of scripted prompts in French Langauge for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

Automatic speech recognition training dataset for Retail & E-commerce in Spanish (Spain)

Spanish Retail Scripted Monologue Speech Data

Recordings of scripted prompts in Spanish language for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

Retail & E-commerce scripted monologue speech data for Machine learning in Arabic (Egypt)

Egyptian Arabic Retail Scripted Monologue Data

Recordings of scripted prompts in Egyptian Arabic for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

Filipino (Philippines)scripted monologue AI dataset in Retail & E-commerce domain

Filipino Retail Scripted Monologue Speech Data

Recordings of scripted prompts in Filipino language for Retail & E-commerce.

6000+ prompts

60+ people

ASR

Conversational AI

View All

Start your AI/ML model creation journey with FutureBeeAI!

Tamil (India) Scripted Monologue Speech Dataset for Retail & E-commerce Domain

Category

Total Volume

Last updated

Number of participants

Get this Speech Dataset

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Transcription Data

Metadata

Usage and Applications

Secure and Ethical Collection

License

Use Cases

ASR

Conversational AI

Chatbot

Language modelling

TTS

Speech Analytics

Dataset Sample(s)

TRANSCRIPTION

Dataset Demographics

Language

Language code

Country

Accents

Gender Distribution

Age Group

Audio File Details

Environment

Bit Depth

Sample rate

Channel

Audio file duration

Download Sample Speech Dataset Now!

Similar to Retail & E-Commerce Scripted Monologue Speech Datasets

French Retail Scripted Monologue Speech Data

Spanish Retail Scripted Monologue Speech Data

Egyptian Arabic Retail Scripted Monologue Data

Filipino Retail Scripted Monologue Speech Data

More in Tamil (India)

French Retail Scripted Monologue Speech Data

Egyptian Arabic BFSI Scripted Monologue Data

Turkish Real Estate Scripted Monologue Speech Data

Bengali Real Estate Scripted Monologue Speech Data

Start your AI/ML model creation journey with FutureBeeAI!

We Use Cookies!!!