Indian English Raw Speech Corpus - Kannada Variant

0 reviews requests (8)

Owner Central Institute of Indian Languages

Catalogue Number: 1279

Stock In Stock

Please Login to see the price

Tags: Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

English language is a blend of Anglo-Saxon which is the prominent language of Britain in middle ages. It has been propagated to every corner of the world by colonists. English emerges as the most visible legacy of British in India because India was under British raj for almost two centuries and English is a part of education system here. Most of the states in India use their regional languages and do not have a common language to communicate. So English is used for inter-state communication.

LDC-IL has 23 hours Indian English – Kannada Variant speech data. The LDC-IL Indian English Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 29 female and 27 Male from Kannada mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details:

Total Speakers 56 (29 Female and 27 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	52	7:19:31
Creative Text	58	3:57:15
Sentence	1522	1:54:10
Date Format	106	0:04:32
Command and Control Words	2543	1:55:43
Person Name	2040	0:39:43
Place Name	762	2:38:49
Most Frequent Word - Part	1563	1:09:10
Most Frequent Word - Full Set	3999	2:49:55
Phonetically Balanced	1194	0:49:21
Form and Function - Word	616	0:24:55

A detailed explanation of the Indian English Raw Speech Corpus - Kannada Variant will be available in the Indian English Raw Speech Corpus - Kannada Variant Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy L., Narayan Kumar Choudhary, Bharatha Raju A., Rejitha KS, Rajesha N., Manasa G., 2021. Indian English Raw Speech Corpus - Kannada Variant. Central Institute of Indian Languages, Mysore.

Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Kumar Choudhary, Bharatha Raju A., Rejitha K.S., Rajesha N., Manasa G.
Corpus Type Raw Corpus
Catalogue Number 1279
ISBN 978-81-948885-9-8
Data Source On Field
Duration 23:43:04
# of Audio Segments 14455
Release Date 15-Jun-2021
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Indian English Raw Speech Corpus - Kannada Variant

Indian English Raw Speech Corpus - Kannada Variant

Dataset Description

Item specifics

Write a review