Your request cart is empty!
Dataset Description
23:43:04 Hours | 15.3 GB | 56 Speakers| 14,455 Audio Segments | 48 kHz | 16 bit wav.
English language is a blend of Anglo-Saxon
which is the prominent language of Britain in middle ages. It has been
propagated to every corner of the world by colonists. English emerges as the
most visible legacy of British in India because India was under British raj for
almost two centuries and English is a part of education system here. Most of
the states in India use their regional languages and do not have a common language
to communicate. So English is used for inter-state communication.
LDC-IL has 23 hours Indian English –
Kannada Variant speech data. The LDC-IL Indian English Speech data set consists
of different types of datasets that are made up of word lists, sentences, texts
and date formats. Approximately 15 minutes of speech (per speaker) has taken
from 29 female and 27 Male from Kannada mother tongue speakers of different age
groups. Each speaker recorded these datasets which are randomly selected from a
master dataset.
The available Speech Corpus details:
Total Speakers 56 (29 Female and 27 Male)
Domains |
Audio
Segments |
Each
Domain Duration |
Contemporary Text (News) |
52 |
7:19:31 |
Creative Text |
58 |
3:57:15 |
Sentence |
1522 |
1:54:10 |
Date Format |
106 |
0:04:32 |
Command and Control Words |
2543 |
1:55:43 |
Person Name |
2040 |
0:39:43 |
Place Name |
762 |
2:38:49 |
Most Frequent Word - Part |
1563 |
1:09:10 |
Most Frequent Word - Full Set |
3999 |
2:49:55 |
Phonetically Balanced |
1194 |
0:49:21 |
Form and Function - Word |
616 |
0:24:55 |
A detailed explanation of the Indian English Raw Speech Corpus - Kannada Variant will be available in the Indian English Raw Speech Corpus - Kannada Variant Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy L., Narayan Kumar Choudhary, Bharatha
Raju A., Rejitha KS, Rajesha N., Manasa G., 2021. Indian English Raw Speech Corpus - Kannada Variant. Central Institute of Indian Languages,
Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Kumar Choudhary, Bharatha Raju A., Rejitha K.S., Rajesha N., Manasa G.
- Corpus Type Raw Corpus
- Catalogue Number 1279
- ISBN 978-81-948885-9-8
- Data Source On Field
- Duration 23:43:04
- # of Audio Segments 14455
- Release Date 15-Jun-2021
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.