Go back

Danish (Denmark) Wake Words & Voice Command Speech Dataset

The audio dataset comprises audio recordings of wake words and voice commands, featuring native Danish speakers from Denmark. It includes audio recordings with detailed metadata and accurate transcriptions.

Total Volume

20,000+ recordings

Last updated

July 2024

Number of participants

50+

Get this Speech Dataset

Wake words & Command dataset for training & fine-tuning of voice assistants in Danish (Denmark)

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Danish Wake Word & Command Dataset, meticulously designed to advance the development and accuracy of voice-activated systems. This dataset features an extensive collection of wake words and commands, essential for triggering and interacting with voice assistants and other voice-activated devices. Our dataset ensures these systems respond promptly and accurately to user inputs, enhancing their reliability and user experience.

Speech Data

This training dataset comprises over 20,000 audio recordings of wake words and command phrases designed to build robust and accurate voice assistant speech technology. Each participant recorded 400 recordings in diverse environments and at varying speeds. This dataset contains audio recordings of wake words, as well as wake words followed by commands.

•Participant Diversity:

•

Speakers: 50 native Danish speakers from the FutureBeeAI Community.

•

Regions: Various states/provinces of Denmark, ensuring a balanced representation of accents, dialects, and demographics.

•

Profile: Participants range from 18 to 70 years old, with a gender ratio of 60% male and 40% female.

•Recording Details:

•

Nature: Scripted audio recordings of wake words and command phrases.

•

Duration: Average of 1 to 15 seconds per recording.

•

Formats: WAV format with stereo channels, 16-bit depth, and sample rates from 16 to 48 kHz.

Dataset Diversity

This dataset includes recordings of various types of wake words and commands, in different environments and at different speeds, making it highly diverse.

•Different Types of Wake Words:

•

Automobile Wake words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge, etc

•

Voice Assistant Wake Words: Hey Siri, Ok google, Alexa, Hey Cartana, Hi Bixby, Hey Celia, Hey Google, etc

•

Home Appliences Wake Words: Hi LG, Ok LG, Hello Lloyd, etc

•

Different Types of Voice Commands: Depending on application and use case the dataset contains various types of commands like

•

Automobile: Playing Music, Checking for Direction, Integrating with at-home devices, Booking appointment, Voice Search, Voice Ordering, Providing feedback, and more

•

Voice Assistant: Asking general question, defination, translation, explanation, Asking for trivia or fun facts, Playing Music, Make a call, Controlling at-home devices, Checking direction, nearby places and traffic condition, Shopping, Calender, Reminder and To-do list, and many more

•

Home Appliences: Controlling Appliences, Checking Appliences Status, Setting up reminders or alarms, To-do list and shopping lists, and many more

•Different Recording Environment:

•Without any background noise or echo

•Background traffic noise

•Background people talking

•Different Recording Pace

•Normal speaking speed

•Fast speaking speed

This extensive coverage ensures the dataset includes realistic scenarios, which is essential for developing effective voice assistant speech recognition models.

Metadata

The dataset provides comprehensive metadata for each audio recording and participant:

•

Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.

•

Other Metadata: Recording transcript, Recording environment, Recording pace, device details, sample rate, bit depth, file format, etc.

This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Danish voice assistant speech recognition models.

Usage and Applications

The Wake Word Dataset is a valuable resource for various applications in the field of voice recognition and AI-driven voice technology. This dataset can be leveraged to enhance the performance and functionality of voice-activated systems across different domains.

•

Voice Assistant Activation: Training and fine-tuning models to accurately detect and respond to wake words, ensuring seamless activation of voice assistants.

•

Smart Home Devices: Developing robust voice activation features for smart home devices.

•

Automotive Voice Control: Implement precise voice activation in automotive systems for navigation, entertainment, and more.

•

Wearable Technology: Enhance user experience with accurate wake word detection in wearable devices facilitating hands-free operation and interaction.

•

Consumer Electronics: Improving the voice control capabilities of a wide range of consumer electronics, from smart TVs to IoT devices.

•

Generative AI: Integrating wake word detection in generative AI models to initiate context-aware conversations and interactions.

Secure and Ethical Collection

•Proprietary data collection platform 'Yugo' used throughout the dataset creation process.

•Data remained within a secure environment, ensuring confidentiality and security.

•The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•No personally identifiable information is included, making the dataset safe to use.

Updates and Customization

Understanding the importance of diverse environments for robust voice assistant models, our wake word and voice command dataset is regularly updated with new audio data captured in various real-world conditions.

•Customization & Custom Collection Options:

•

Environmental Conditions: Custom collection in specific environmental conditions upon request.

•

Sample Rates: Customizable from 8kHz to 48kHz.

•

Diverse Pace: Custom collection can be done at a diverse pace upon request.

•

Device Specific: Recording can be done with the specific mobile brand or operating system.

•

Custom Wake Word & Commands: Recording can be done on custom wake words or voice command with our community.

License

This Wake Word & Command Dataset is created by FutureBeeAI and is available for commercial use.

Use Cases

Wake Word Detection

Command Recognition

Voice Assistant

Dataset Sample(s)

Samples will be available soon!

Dataset Demographics

Language

Danish

Language code

Country

Denmark

Accents

Insular Danish,...more

Gender Distribution

M:60, F:40

Age Group

18-70

Audio File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Sample rate

16kHz & 48kHz

Channel

Monologue

Audio file duration

1 to 15 seconds

Read the License Terms

Browse FAQs

Similar to Wake Words & Voice Command Datasets

Wake words & Command dataset for training & fine-tuning of voice assistants in English (Philippines)

Philippines English Wake Words & Commands Data

Wake words and commands audio recordings in Philippines English

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Wake words & Command dataset for training & fine-tuning of voice assistants in Telugu (India)

Telugu Wake Words & Commands Dataset

Wake words and commands audio recordings in Telugu

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Wake words & Command dataset for training & fine-tuning of voice assistants in Malay (Malaysia)

Malay Wake Words & Commands Dataset

Wake words and commands audio recordings in Malay

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

Wake words & Command dataset for training & fine-tuning of voice assistants in Czech (Czech Republic)

Czech Wake Words & Commands Dataset

Wake words and commands audio recordings in Czech

20000+ Recordings

50+ people

Wake Word Detection

Command Recognition

View All

Danish (Denmark) Wake Words & Voice Command Speech Dataset

Category

Total Volume

Last updated

Number of participants

Get this Speech Dataset

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Dataset Diversity

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Wake Word Detection

Command Recognition

Voice Assistant

Dataset Sample(s)

Samples will be available soon!

Dataset Demographics

Language

Language code

Country

Accents

Gender Distribution

Age Group

Audio File Details

Environment

Bit Depth

Sample rate

Channel

Audio file duration

Similar to Wake Words & Voice Command Datasets

Philippines English Wake Words & Commands Data

Telugu Wake Words & Commands Dataset

Malay Wake Words & Commands Dataset

Czech Wake Words & Commands Dataset

Start your AI/ML model creation journey with FutureBeeAI!

We Use Cookies!!!