Go back

English (US) Call Center Speech Dataset for Healthcare

The audio dataset comprises call center conversations for the Healthcare domain, featuring native English speakers from US. It includes speech data, detailed metadata and accurate transcriptions.

Total Volume

30 Speech Hours

Last updated

Jun 2024

Number of participants

Get this Speech Dataset

Speech training dataset for Healthcare in English (India)

Download

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Welcome to the US English Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

Speech Data

This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.

•Participant Diversity:

•

Speakers: 60 expert native US English speakers from the FutureBeeAI Community.

•

Regions: Different states/provinces of United States of America, ensuring a balanced representation of US accents, dialects, and demographics.

•

Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:

•

Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.

•

Call Duration: Average duration of 5 to 15 minutes per call.

•

Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.

•

Environment: Without background noise and without echo.

Topic Diversity

This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

•Inbound Calls:

•Appointment Scheduling

•New Patient Registration

•Surgery Consultation

•Consultation regarding Diet, and many more

•Outbound Calls:

•Appointment Reminder

•Health and Wellness Subscription Programs

•Lab Tests Results

•Health Risk Assessments

•Preventive Care Reminders, and many more

This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

Transcription

To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

•

Speaker-wise Segmentation: Time-coded segments for both agents and customers.

•

Non-Speech Labels: Tags and labels for non-speech elements.

•

Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the US English language.

Metadata

The dataset provides comprehensive metadata for each conversation and participant:

•

Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.

•

Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of US English call center speech recognition models.

Usage and Applications

This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:

•

Speech Recognition Models: Training and fine-tuning speech recognition models for US English.

•

Speech Analytics Models: Building speech analytics models to extract insights, identify patterns, and glean valuable information from customer conversation, enables data-driven decision-making and process optimization within the Healthcare sector.

•

Smart Assistants and Chatbots: Developing conversational agents and virtual assistants for customer service in the Healthcare industries.

•

Sentiment Analysis: Analyzing customer sentiment and improving customer experience based on call center interactions.

•

Generative AI: Training generative AI models capable of generating human-like responses, summaries, or content tailored to the Healthcare domain.

Secure and Ethical Collection

•Our proprietary data collection and transcription platform, “Yugo” was used throughout the process of this dataset creation.

•Throughout the data collection process, the data remained within our secure platform and did not leave our environment, ensuring data security and confidentiality.

•The data collection process adhered to strict ethical guidelines, ensuring the privacy and consent of all participants.

•It does not include any personally identifiable information about any participant, which makes the dataset safe to use.

•The dataset does not contain any copyrighted content.

Updates and Customization

Understanding the importance of diverse environments for robust ASR models, our call center voice dataset is regularly updated with new audio data captured in various real-world conditions.

•Customization & Custom Collection Options:

•

Environmental Conditions: Custom collection in specific environmental conditions upon request.

•

Sample Rates: Customizable from 8kHz to 48kHz.

•

Transcription Customization: Tailored to specific guidelines and requirements.

License

This Healthcare domain call center audio dataset is created by FutureBeeAI and is available for commercial use.

Use Cases

Call Center Conversational AI

Use of speech data for Automatic Speech Recognition

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

ATTRIBUTES

Channel 1	Channel 2	Format
Male(29)	Female(24)	wav, json

TRANSCRIPTION

LABEL	START	END	CHANNEL	TRANSCRIPT
Speech	4.774	5.799	Speaker 2	Hello Futurebee.
Speech	7.578	8.477	Speaker 1	Hello Futurebee.
Noise	7.703	7.961	-	-
Noise	9.519	9.878	-	-
Speech	12.669	14.336	Speaker 2	Hello is this <PII>Mr. Micheal</PII>?
Speech	15.579	17.335	Speaker 1	This is, whom I speaking to?
Speech	18.562	24.603	Speaker 2	Hi this is Kelly. I am with [filler], Dr Brigers office. [filler] I am calling because we
Noise	24.329	24.518	-	-
Speech	25.082	26.722	Speaker 2	needed you a little pre test
Speech	27.361	29.411	Speaker 2	screening before your appointment tomorrow.
Speech	30.103	31.312	Speaker 2	You have just ten minutes.
Speech	33.868	37.679	Speaker 1	[filler] yeah I think so. [filler] yeah, yeah I have got.
Noise	37.722	38.222	-	-
Speech	39.207	44.883	Speaker 2	Okay perfect. So we just do this phone call to make the check in process easier once you get here because
Speech	45.649	47.847	Speaker 2	ever since our office is reopened
Speech	48.283	50.347	Speaker 2	we had such a backlog of patients
Speech	51.024	53.265	Speaker 2	that when we reopened
Noise	52.816	54.182	-	-
Speech	54.133	60.191	Speaker 2	we are trying to take on more patients than usual. So by doing this process it helps us to get through
Speech	60.740	61.731	Speaker 2	the check in process
Speech	61.957	65.248	Speaker 2	more quickly when you actually come to the office. Okay?
Speech	68.328	75.453	Speaker 1	I will tell you what, that makes me real happy because one of the things I hate about going into doctors offices is I get there in time to my appointment.
Speech	75.894	81.087	Speaker 1	And then I get a quick half an hour to call out all the paper work. That's in a, that's in a eliminate this right?
Speech	83.287	86.912	Speaker 2	For the most part, [filler] thankfully you are already [filler]
Speech	87.552	90.194	Speaker 2	repeat patient that we won't have to call out
Speech	90.587	91.686	Speaker 2	any paper work
Speech	92.436	95.052	Speaker 2	like sometimes you have to do with your new patient.
Speech	95.953	100.677	Speaker 2	[filler], but I just want you to know that like I said we have been really
Speech	101.403	103.453	Speaker 2	swamped with new patients and
Speech	103.843	105.686	Speaker 2	a backlog of patients so
Speech	106.170	108.412	Speaker 2	please continue to be patient with us if you are
Speech	108.811	110.569	Speaker 2	Appointment is on perfectly on time.
Speech	114.045	116.670	Speaker 1	Okay. Alright [filler], thanks for giving me a heads up.
Speech	117.203	120.170	Speaker 2	Alright so I just have a few questions to ask you.
Speech	121.927	126.644	Speaker 2	[filler] and they are just some general questions about your health within the last few days.
Speech	127.412	131.703	Speaker 2	So just think back to your last few days and you can give me an answer. Are you ready?
Speech	135.287	135.961	Speaker 1	I am ready.
Speech	135.453	138.978	Speaker 2	Okay my first question is, have you had a new fever?
Speech	139.425	145.336	Speaker 2	of a hundred and four degrees or higher, Of one hundred point four degrees or higher? Yes or no?
Speech	148.151	156.961	Speaker 1	[filler], I have not had any meter to take my own temperature in the last forty eight hours. I have a sense of fever. So [filler], I don't have any measurements to tell you this. But no, I, I don't.
Noise	157.032	157.663	-	-
Speech	158.788	163.216	Speaker 2	Okay so that means that within the last couple of days there has been no symptoms that could be
Noise	159.274	160.066	-	-
Speech	163.757	165.532	Speaker 2	connected with a fever either right?
Speech	167.757	168.191	Speaker 1	Correct
Speech	169.223	172.032	Speaker 2	Okay that's great. Let me just type that into my computer.
Noise	172.626	172.723	-	-
Speech	174.830	176.247	Speaker 2	Okay and my next question.
Speech	176.782	181.449	Speaker 2	Have you had a new cough, you can now attribute to another health condition?

TRANSCRIPTION

TIME	TRANSCRIPT
4.774 5.799	Hello Futurebee.
7.578 8.477	Hello Futurebee.
7.703 7.961	-
9.519 9.878	-
12.669 14.336	Hello is this <PII>Mr. Micheal</PII>?
15.579 17.335	This is, whom I speaking to?
18.562 24.603	Hi this is Kelly. I am with [filler], Dr Brigers office. [filler] I am calling because we
24.329 24.518	-
25.082 26.722	needed you a little pre test
27.361 29.411	screening before your appointment tomorrow.
30.103 31.312	You have just ten minutes.
33.868 37.679	[filler] yeah I think so. [filler] yeah, yeah I have got.
37.722 38.222	-
39.207 44.883	Okay perfect. So we just do this phone call to make the check in process easier once you get here because
45.649 47.847	ever since our office is reopened
48.283 50.347	we had such a backlog of patients
51.024 53.265	that when we reopened
52.816 54.182	-
54.133 60.191	we are trying to take on more patients than usual. So by doing this process it helps us to get through
60.740 61.731	the check in process
61.957 65.248	more quickly when you actually come to the office. Okay?
68.328 75.453	I will tell you what, that makes me real happy because one of the things I hate about going into doctors offices is I get there in time to my appointment.
75.894 81.087	And then I get a quick half an hour to call out all the paper work. That's in a, that's in a eliminate this right?
83.287 86.912	For the most part, [filler] thankfully you are already [filler]
87.552 90.194	repeat patient that we won't have to call out
90.587 91.686	any paper work
92.436 95.052	like sometimes you have to do with your new patient.
95.953 100.677	[filler], but I just want you to know that like I said we have been really
101.403 103.453	swamped with new patients and
103.843 105.686	a backlog of patients so
106.170 108.412	please continue to be patient with us if you are
108.811 110.569	Appointment is on perfectly on time.
114.045 116.670	Okay. Alright [filler], thanks for giving me a heads up.
117.203 120.170	Alright so I just have a few questions to ask you.
121.927 126.644	[filler] and they are just some general questions about your health within the last few days.
127.412 131.703	So just think back to your last few days and you can give me an answer. Are you ready?
135.287 135.961	I am ready.
135.453 138.978	Okay my first question is, have you had a new fever?
139.425 145.336	of a hundred and four degrees or higher, Of one hundred point four degrees or higher? Yes or no?
148.151 156.961	[filler], I have not had any meter to take my own temperature in the last forty eight hours. I have a sense of fever. So [filler], I don't have any measurements to tell you this. But no, I, I don't.
157.032 157.663	-
158.788 163.216	Okay so that means that within the last couple of days there has been no symptoms that could be
159.274 160.066	-
163.757 165.532	connected with a fever either right?
167.757 168.191	Correct
169.223 172.032	Okay that's great. Let me just type that into my computer.
172.626 172.723	-
174.830 176.247	Okay and my next question.
176.782 181.449	Have you had a new cough, you can now attribute to another health condition?

Dataset Demographics

Language

English

Language code

en-us

Country

USA

Accents

Arizona,...more

Gender Distribution

M:60, F:40

Age Group

18-70

Audio File Details

Environment

Silent, Noisy

Bit Depth

16 bit

Format

wav

Sample rate

8khz & 16khz

Channel

Stereo

Audio file duration

5-15 minutes

Read the License Terms

Browse FAQs

Download Sample Speech Dataset Now!

Explore Audio Data, Metadata and Transcription to get more clarity and hands on experience of this dataset.

Download Free Dataset

Similar to Healthcare Call Center Speech Datasets

Bengali (India) training dataset for Healthcare AI

Indian Bengali Healthcare CC Speech Data

Healthcare call center audio data in Indian Bengali.

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Machine learning speech data for Healthcare call center in Korean (South Korea)

Marathi Healthcare CC Speech Data

Healthcare call center audio data in Marathi.

40 Speech Hours

80 People

Call Center Conversational AI

ASR

Audio data in Bengali (Bangladesh) for Healthcare call center

Bengali (Bangladesh) Healthcare CC Speech Data

Healthcare call center audio data in Bengali (Bangladesh)

30 Speech Hours

60 People

Call Center Conversational AI

ASR

Audio data in Vietnamese (Vietnam) for Healthcare call center

Vietnamese Healthcare CC Speech Data

Healthcare call center audio data in Vietnamese

30 Speech Hours

60 People

Call Center Conversational AI

ASR

View All

Start your AI/ML model creation journey with FutureBeeAI!

English (US) Call Center Speech Dataset for Healthcare

Category

Total Volume

Last updated

Number of participants

Get this Speech Dataset

Request Custom Collection

About this Off-the-shelf Speech Dataset

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Secure and Ethical Collection

Updates and Customization

License

Use Cases

Call Center Conversational AI

ASR

Chatbot

Language Modelling

TTS

Speech Analytics

Dataset Sample(s)

ATTRIBUTES

TRANSCRIPTION

TRANSCRIPTION

Dataset Demographics

Language

Language code

Country

Accents

Gender Distribution

Age Group

Audio File Details

Environment

Bit Depth

Format

Sample rate

Channel

Audio file duration

Download Sample Speech Dataset Now!

Similar to Healthcare Call Center Speech Datasets

Indian Bengali Healthcare CC Speech Data

Marathi Healthcare CC Speech Data

Bengali (Bangladesh) Healthcare CC Speech Data

Vietnamese Healthcare CC Speech Data

More in English (US)

Australian English Travel CC Speech Data

Marathi BFSI CC Speech Data

Portuguese (Brazil) Healthcare CC Speech Data

Malay Delivery & Lgc CC Speech Data

Start your AI/ML model creation journey with FutureBeeAI!

We Use Cookies!!!