Konkani Raw Speech Corpus
OverView
156:37:51 Hours | 100 GB | 504 Speakers | 72,938 Audio Segments | 48 kHz | 16 bit wav.Your request cart is empty!
Dataset Description
156:37:51 Hours | 100 GB | 504 Speakers | 72,938 Audio
Segments | 48 kHz | 16 bit wav.
Konkani belongs to the
Indo-European family of languages. Konkani is the official language of Goa.
However, the language is spoken widely across four states- Maharashtra, Goa,
Karnataka and Kerala. Konkani is the only Indian language written in five
different scripts - Devanagari, Roman, Kannada, Malayalam, and
Persian-Arabic.
The LDC-IL speech data
is collected from the regions of North Goa, South Goa, Karwar (Karnataka) and
Sindhudurgh (Maharastra) from both genders and different age
groups.Approximately 15 to 20 minutes of speech (per speaker) taken
from 267 female and 237 male
native speakers of different age groups. Each speaker recorded these datasets
which are randomly selected from a master dataset.
The available Speech Corpus details:
Total Speakers 504 (267 Female and 237 Male)
Domains |
Audio Segments |
Each
Domain Duration |
Contemporary
Text (News) |
477 |
49:52:09 |
Creative
Text |
480 |
22:09:05 |
Sentence |
12,050 |
15:51:11 |
Date
Format |
953 |
01:50:39 |
Command
and Control Words |
14,944 |
16:11:02 |
Person
Name |
9,588 |
15:55:43 |
Place
Name |
4,812 |
05:31:03 |
Most
Frequent Word - Part |
16,376 |
16:03:13 |
Most
Frequent Word - Full Set |
5,998 |
05:55:07 |
Phonetically
Balanced |
2,975 |
02:49:36 |
Form
and Function - Word |
4,285 |
04:29:03 |
For any research-based citations, please use the
following citations:
- Ramamoorthy,
L., Narayan Choudhary, Saurabh Varik & Rashmi Shet Tanawade. 2019. Konkani Raw Speech Corpus. Central
Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha
N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw
Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian
Languages. Central Institute of
Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Rashmi S. Shet Tanawade, Yashwant D. Gawas
- Corpus Type Raw Corpus
- Catalogue Number 1135
- ISBN 978-81-7343-234-7
- Data Source On Field
- Duration 156:37:51
- # of Audio Segments 72938
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.