Konkani Raw Speech Corpus

0 reviews requests (20)

Owner Central Institute of Indian Languages

Catalogue Number: 1135

Stock In Stock

OverView

Please Login to see the price

Tags: Konkani Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Konkani belongs to the Indo-European family of languages. Konkani is the official language of Goa. However, the language is spoken widely across four states- Maharashtra, Goa, Karnataka and Kerala. Konkani is the only Indian language written in five different scripts - Devanagari, Roman, Kannada, Malayalam, and Persian-Arabic.

The LDC-IL speech data is collected from the regions of North Goa, South Goa, Karwar (Karnataka) and Sindhudurgh (Maharastra) from both genders and different age groups.Approximately 15 to 20 minutes of speech (per speaker) taken from 267 female and 237 male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details:

Total Speakers 504 (267 Female and 237 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	477	49:52:09
Creative Text	480	22:09:05
Sentence	12,050	15:51:11
Date Format	953	01:50:39
Command and Control Words	14,944	16:11:02
Person Name	9,588	15:55:43
Place Name	4,812	05:31:03
Most Frequent Word - Part	16,376	16:03:13
Most Frequent Word - Full Set	5,998	05:55:07
Phonetically Balanced	2,975	02:49:36
Form and Function - Word	4,285	04:29:03

A detailed explanation of the Konkani Speech Corpus will be available in the Konkani Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Saurabh Varik & Rashmi Shet Tanawade. 2019. Konkani Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Rashmi S. Shet Tanawade, Yashwant D. Gawas
Corpus Type Raw Corpus
Catalogue Number 1135
ISBN 978-81-7343-234-7
Data Source On Field
Duration 156:37:51
# of Audio Segments 72938
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Konkani Raw Speech Corpus

OverView

Konkani Raw Speech Corpus

Dataset Description

Item specifics

Write a review