Kannada Raw Speech Corpus

Kannada Raw Speech Corpus

0 reviews requests (21)
Catalogue Number: 1129
Stock In Stock

OverView

179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kH
Please Login to see the price

Dataset Description

179:32:52 hours of 115 GB | 656 Speakers | 99109 Audio segments | 48 kHz | 16 bit wav

Kannada is one of the Ancient Indian languages belong to the Dravidian family. It has its own script. The language in a region is influenced by other languages of the region, the mother tongue of the speaker, etc. The reading speed, loudness, frequency etc also differ depending on certain factors like age, gender, etc. Linguistic data consortium identified four regional dialects and collected the speech corpus through fieldwork. This read data is collected from various age groups, of male and female native speakers in equal numbers. This data includes Texts, Sentences, Date Formats, and different wordlists.

 The available Speech Corpus details:

     Total Speakers - 656 (328 Female and 328 Male)


Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

600

66:06:09

Creative Text

600

33:09:20

Sentence

14,887

13:58:15

Date Format

1,200

1:16:22

Command and Control Words

17,988

12:31:43

Person Name

12,009

13:04:49

Place Name

6,032

4:48:42

Most Frequent Word - Part

18,065

12:21:24

Most Frequent Word - Full Set

8,000

02:08:58

Phonetically Balanced

9,360

02:40:58

Form and Function - Word

10,368

03:14:38

      

A  detailed explanation of the Kannada Speech Corpus will be available in the Kannada Speech Data Documentation. 

For any research-based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Vijayalaxmi F. Patil, Chetan Suryakant Baji, Malini N. Abhyankar,  Rajesha N. & Manasa G. 2019. Kannada Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Vijayalaxmi F Patil, Chetan Suryakant Baji, Rajesha N., Manasa G, Sunitha Rajendra, Reshma S, Kavitha L, Malini N. Abhyankar
  • Corpus Type Raw Corpus
  • Catalogue Number 1129
  • ISBN 978-81-7343-228-6
  • Data Source On Field
  • Duration 179:32:52
  • # of Audio Segments 99109
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review