Tamil General Conversational Text Dataset
The text dataset consist of various general conversations in Tamil between two people.
Category
Text-based Conversational AI
Total volume
10K+ chats
Last Updated
Aug 2022
Number of participants
150 people
Get this AI Dataset
About This OTS Dataset
What’s Included
This training dataset comprises more than 10,000 conversational text data between two native Tamil people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
Use Cases
Chatbot
Text analytics
Text recognition
Text prediction
Smart assistants
Dataset Sample(s)
Samples will be available soon!
Contact us to get the samples immediately for this dataset.
Contact Us
Dataset Details
Dataset type
General domain chats
Volume
10K+ chats
Media type
Text
Language
Tamil
Topics
50+
File Details
Number of thread
50
Word count
300-700 words
Format
txt, docx
Annotation
NA
Need datasets for a specific AI/ML use case? Don’t worry, we’ve got you covered! 👍
Contact Us