Kannada Raw Speech Corpus

0 reviews requests (29)

Owner Central Institute of Indian Languages

Catalogue Number: 1129

Stock In Stock

OverView

179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kH

Please Login to see the price

Tags: Kannada Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wav

Kannada is one of the Ancient Indian languages belong to the Dravidian family. It has its own script. The language in a region is influenced by other languages of the region, the mother tongue of the speaker, etc. The reading speed, loudness, frequency etc also differ depending on certain factors like age, gender, etc. Linguistic data consortium identified four regional dialects and collected the speech corpus through fieldwork. This read data is collected from various age groups, of male and female native speakers in equal numbers. This data includes Texts, Sentences, Date Formats, and different wordlists.

The available Speech Corpus details:

Total Speakers - 656 (328 Female and 328 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	600	66:06:09
Creative Text	600	33:09:20
Sentence	14,887	13:58:15
Date Format	1,200	1:16:22
Command and Control Words	17,988	12:31:43
Person Name	12,009	13:04:49
Place Name	6,032	4:48:42
Most Frequent Word - Part	18,065	12:21:24
Most Frequent Word - Full Set	8,000	02:08:58
Phonetically Balanced	9,360	02:40:58
Form and Function - Word	10,368	03:14:38

A detailed explanation of the Kannada Speech Corpus will be available in the Kannada Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Vijayalaxmi F. Patil, Chetan Suryakant Baji, Malini N. Abhyankar, Rajesha N. & Manasa G. 2019. Kannada Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Vijayalaxmi F Patil, Chetan Suryakant Baji, Rajesha N., Manasa G, Sunitha Rajendra, Reshma S, Kavitha L, Malini N. Abhyankar
Corpus Type Raw Corpus
Catalogue Number 1129
ISBN 978-81-7343-228-6
Data Source On Field
Duration 179:32:52
# of Audio Segments 99109
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Kannada Raw Speech Corpus

OverView

Kannada Raw Speech Corpus

Dataset Description

Item specifics

Write a review