Go back

Portuguese (Portugal) In-car Speech Dataset

The audio dataset comprises recordings of wake words and commands specific to in-car activities, featuring native Portuguese speakers from Portugal. It includes speech data, detailed metadata, and accurate transcriptions.

Total Volume

5000+ prompts

Last updated

Aug 2024

Number of participants

50+

Get this Speech Dataset

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the Portuguese Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

Speech Data

This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

•Participant Diversity:

•

Speakers: 50+ native Portuguese speakers from the FutureBeeAI Community.

•

Regions: Ensures a balanced representation of Portuguese accents, dialects, and demographics.

•

Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•

Recording Nature: Scripted wake word and command type of audio recordings.

•

Duration: Average duration of 5 to 20 seconds per audio recording.

•

Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

Dataset Diversity

Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

•

Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

•

Different Cars: Data collection was carried out in different types and models of cars.

•Different Types of Voice Commands:

•Navigational Voice Commands

•Mobile Control Voice Commands

•Car Control Voice Commands

•Multimedia & Entertainment Commands

•General, Question Answer, Search Commands

•

Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

•Morning

•Afternoon

•Evening

•

Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

•

Noise Level: Silent, Low Noise, Moderate Noise, High Noise

•

Parking Location: Indoor, Outdoor

•

Car Windows: Open, Closed

•

Car AC: On, Off

•

Car Engine: On, Off

•

Car Movement: Stationary, Moving

Metadata

The dataset provides comprehensive metadata for each audio recording and participant:

•

Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

•

Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Portuguese voice assistant speech recognition models.

Usage and Applications

This In-car Speech Dataset is a valuable resource for various applications in the field of in-car voice recognition and AI-driven voice technology. This dataset can be leveraged to enhance the performance and functionality of voice-activated systems across different domains.

•

Speech Recognition Model Training: Provides high-quality audio data for training models to accurately recognize and respond to in-car voice commands.

•

Safety and Emergency Response: Supports the development of systems that recognize and respond to emergency commands and safety alerts.

•

Driver Assistance: Facilitates the creation of advanced driver-assistance systems (ADAS) that leverage voice commands for hands-free operation.

Secure and Ethical Collection

•Our proprietary data collection platform, “Yugo,” was used throughout the process of this dataset creation.

•Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.

•The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•It does not include any personally identifiable information about any participant, which makes the dataset safe to use.

Updates and Customization

Understanding the importance of diverse environments for robust voice assistant models, our in-car voice dataset is regularly updated with new audio data captured in various real-world conditions.

•Customization & Custom Collection Options:

•

Environmental Conditions: Custom collection in specific environmental conditions upon request.

•

Sample Rates: Customizable from 8kHz to 48kHz.

•

Diverse Pace: Custom collection can be done at a diverse pace upon request.

•

Device Specific: Recording can be done with the specific mobile brand or operating system.

License

This PortugueseIn-car audio dataset is created by FutureBeeAI and is available for commercial use.

Use Cases

In-car ASR

Driver Assistance

Conversational AI

Dataset Sample(s)

Samples will be available soon!

Dataset Demographics

Language

Portuguese

Language code

Country

Portugal

Accents

southern and central,...more

Gender Distribution

M:60, F:40

Age Group

18-70

Audio File Details

Environment

Silent & Noisy

Bit Depth

16 bit

Sample rate

16KHz & 48 KHz

Channel

Mono

Audio file duration

5 to 20 seconds

Read the License Terms

Browse FAQs

Similar to In-Car Speech Datasets

Gujarati In-car Speech Dataset

Automobile-specific wake words & commands in the in-car environment.

5000+ Recordings

50+ people

In-car ASR

Driver Assistance

Hindi In-car voice dataset for Automobile

Hindi In-car Speech Dataset

Automobile-specific wake words & commands in the in-car environment.

5000+ Recordings

50+ people

In-car ASR

Driver Assistance

Filipino In-car Speech Dataset

Automobile-specific wake words & commands in the in-car environment.

5000+ Recordings

50+ people

In-car ASR

Driver Assistance

New English In-car voice dataset for Automobile

New English In-car Speech Dataset

Automobile-specific wake words & commands in the in-car environment.

5000+ Recordings

50+ people

In-car ASR

Driver Assistance

View All

Portuguese (Portugal) In-car Speech Dataset

Category

Total Volume

Last updated

Number of participants

Get this Speech Dataset

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Dataset Diversity

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

In-car ASR

Driver Assistance

Conversational AI

Dataset Sample(s)

Samples will be available soon!

Dataset Demographics

Language

Language code

Country

Accents

Gender Distribution

Age Group

Audio File Details

Environment

Bit Depth

Sample rate

Channel

Audio file duration

Similar to In-Car Speech Datasets

Gujarati In-car Speech Dataset

Hindi In-car Speech Dataset

Filipino In-car Speech Dataset

New English In-car Speech Dataset

More in Portuguese (Portugal)

Indian English In-car Speech Dataset

Hindi In-car Speech Dataset

French In-car Speech Dataset

Korean In-car Speech Dataset

Start your AI/ML model creation journey with FutureBeeAI!

We Use Cookies!!!