Introduction
Welcome to the US English Scripted Monologue Speech Dataset for the General Domain. This meticulously curated dataset is designed to advance the development of General domain English language speech recognition models.
Speech Data
This training dataset comprises over 6,000 high-quality scripted prompt recordings in US English. These recordings cover various General domain topics and scenarios, designed to build robust and accurate speech technology.
•Participant Diversity:
•
Speakers:
60 native English speakers from different regions of US.
•
Regions:
Ensures a balanced representation of US English accents, dialects, and demographics.
•
Participant Profile:
Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
•Recording Details:
•
Recording Nature:
Audio recordings of scripted prompts/monologues.
•
Audio Duration:
Average duration of 5 to 30 seconds per recording.
•
Formats:
WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
•
Environment:
Recordings are conducted in quiet settings without background noise and echo.
•
Topic Diversity:
The dataset encompasses a wide array of topics and conversational scenarios from the General domain. Topics include:
•Daily Conversations
•Topic Specific Conversation
•General Information and Advice
•Idoms and Sayings
•
Other Elements:
To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in general interactions:
•
Names:
Region-specific names of males and females in various formats.
•
Addresses:
Region-specific addresses in different spoken formats.
•
Dates & Times:
Inclusion of date and time in various contexts.
•
Organization Names:
Names of different types of organizations.
•
Numbers & Currencies:
Various numbers and currencies in domain-specific interactions.
Each scripted prompt is crafted to reflect real-life scenarios encountered in the General domain, ensuring applicability in training robust natural language processing and speech recognition models.
Transcription Data
In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.
•
Content:
Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
•
Format:
Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
•
Quality:
All transcriptions are verified for accuracy and consistency by native English transcribers.
Metadata
The dataset provides comprehensive metadata for each audio recording and participant:
•
Participant Metadata:
Unique identifier, age, gender, country, state, and dialect.
•
Other Metadata:
Recording transcript, recording environment, device details, sample rate, bit depth, and file format.
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of English language speech recognition models.
Usage and Applications
This dataset is a versatile resource for various applications within speech recognition, natural language processing, and AI-driven conversational technologies.
•
Speech Recognition Model Training:
High-quality audio recordings and precise transcriptions for training and fine-tuning English speech recognition models.
•
Voice Synthesis:
The diverse and high-quality audio data can train generative AI models for creating synthetic voices.
•
Voice Assistants:
Ideal for training & fine-tuning voice assistants.
•
Entity Recognition:
Sentences include names, dates, currencies, and other domain-specific entities for training NLP models for named entity recognition (NER) tasks.
•
Language Understanding:
Improve language understanding applications like sentiment analysis and topic modeling within the Real Estate sector.
Secure and Ethical Collection
•Our proprietary data collection and transcription platform, “Yugo” was used throughout the dataset creation process.
•Data remained within our secure platform, ensuring data security and confidentiality.
•The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.
•The dataset does not include any personally identifiable information about any participant, making it safe to use.
License
This US English Scripted Monologue Speech Dataset, created by FutureBeeAI, is available for commercial use.