Kannada Raw Speech Corpus

Kannada Raw Speech Corpus

0 reviews requests (1)
Catalogue Number: 1129
Stock In Stock

OverView

179:32:52 hours of 115 Gigabytes speech data | 656 Speakers | 99109 Audio segments | 48 k
Please Login to see the price

Dataset Description

179:32:52 hours of 115 Gigabytes speech data | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wav

Kannada is one of the Ancient Indian languages which belong to the Dravidian family. It has its own script. The language in a region is influenced by other languages of the region, the mother tongue of the speaker, etc. The reading speed, loudness, frequency etc also differ depending on certain factors like age, gender etc. Linguistic data consortium identified four regional dialects and collected the speech corpus through fieldwork. This read data is collected from various age groups, of male and female native speakers in equal number. This data includes Texts, Sentences, Date Formats, and different wordlists.

 

The available Speech Corpus details are as follows.

    •          Total Speakers - 656 (328 Female and 328 Male)
    •          Contemporary Text (News) - 600 Audio Segments - 66:06:09 Hours
    •          Creative Text - 600 Audio Segments - 33:09:20 Hours
    •          Sentence - 14887 Audio Segments - 13:58:15 Hours
    •          Date Format - 1200 Audio Segments - 1:16:22 Hours
    •          Command and Control Words - 17988 Audio Segments - 12:31:43 Hours
    •          Person Name - 12009 Audio Segments - 13:04:49 Hours
    •          Place Noun - 6032 Audio Segments - 4:48:42 Hours
    •          Most Frequent Word-Part - 18065 Audio Segments - 12:21:24 Hours
    •          Most Frequent Word-Full Set - 8000 Audio Segments - 6:45:56 Hours
    •          Phonetically Balanced - 9360 Audio Segments - 6:47:23 Hours
    •          Form and Function- Word - 10368 Audio Segments - 8:42:49 Hours

 

A much more detailed explanation of the Kannada Speech Corpus will be available in the Kannada Speech Data Documentation.

 

For any research based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Vijayalaxmi F. Patil, Chetan Suryakant Baji, Malini N. Abhyankar,  Rajesha N. & Manasa G. 2019. Kannada Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Vijayalaxmi F Patil, Chetan Suryakant Baji, Rajesha N., Manasa G, Sunitha Rajendra, Reshma S, Kavitha L, Malini N. Abhyankar
  • Corpus Type Raw Corpus
  • Catalogue Number 1129
  • ISBN 978-81-7343-228-6
  • Data Source On Field
  • Duration 179:32:52
  • # of Audio Segments 99109
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review