Your request cart is empty!
Dataset Description
179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wav
Kannada is one of the
Ancient Indian languages belong to the Dravidian family. It has its own
script. The language in a region is influenced by other languages of the
region, the mother tongue of the speaker, etc. The reading speed, loudness,
frequency etc also differ depending on certain factors like age, gender, etc.
Linguistic data consortium identified four regional dialects and collected the
speech corpus through fieldwork. This read data is collected from various age
groups, of male and female native speakers in equal numbers. This data includes
Texts, Sentences, Date Formats, and different wordlists.
Total Speakers - 656 (328 Female and 328 Male)
Domains |
Audio Segments |
Each
Domain Duration |
Contemporary
Text (News) |
600 |
66:06:09 |
Creative
Text |
600 |
33:09:20 |
Sentence |
14,887 |
13:58:15 |
Date
Format |
1,200 |
1:16:22 |
Command
and Control Words |
17,988 |
12:31:43 |
Person
Name |
12,009 |
13:04:49 |
Place
Name |
6,032 |
4:48:42 |
Most
Frequent Word - Part |
18,065 |
12:21:24 |
Most
Frequent Word - Full Set |
8,000 |
02:08:58 |
Phonetically
Balanced |
9,360 |
02:40:58 |
Form
and Function - Word |
10,368 |
03:14:38 |
A detailed explanation of the Kannada Speech Corpus will be available in the Kannada Speech Data Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy, L., Narayan Choudhary, Vijayalaxmi F. Patil, Chetan Suryakant
Baji, Malini N. Abhyankar, Rajesha N.
& Manasa G. 2019. Kannada Raw Speech Corpus.
Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Vijayalaxmi F Patil, Chetan Suryakant Baji, Rajesha N., Manasa G, Sunitha Rajendra, Reshma S, Kavitha L, Malini N. Abhyankar
- Corpus Type Raw Corpus
- Catalogue Number 1129
- ISBN 978-81-7343-228-6
- Data Source On Field
- Duration 179:32:52
- # of Audio Segments 99109
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.