Your request cart is empty!
Dataset Description
89:17:25 Hours | 58 GB speech data | 307 Speakers | 58544 Audio segments | 48 kHz | 16 bit wav.
The Marathi language is an Indo-Aryan language. The Marathi language is
prevalent in the 9th century. Standard Marathi (Puneri) is the official
language of the State of Maharashtra. Standard Marathi is based on dialects
used by academics and the print media. It is believed that the language of the Marathi language is influenced by Sanskrit. Marathi is written in the
Devanagari script. The phoneme inventory of Marathi is similar to that of many
other Indo-Aryan languages.
The LDC-IL speech data
is collected from the regions of Marathwada, Puneri, Vidharbh, and Goa from both
the genders and different age groups. Each speaker recorded these datasets
which are randomly selected from a master dataset.
The available Speech Corpus details:
Total
Speakers 307 (156 Female and 151 Male)
Domains |
Audio
Segments |
Each
Domain Duration |
Contemporary
Text (News) |
302 |
22:26:06 |
Creative Text |
302 |
13:37:34 |
Sentence |
7,555 |
6:49:58 |
Date Format |
604 |
0:39:57 |
Command and
Control Words |
9,068 |
7:50:10 |
Person Name |
6,058 |
7:44:56 |
Place Name |
3,037 |
2:49:32 |
Most Frequent
Word - Part |
9,104 |
7:22:57 |
Most Frequent
Word - Full Set |
10,987 |
9:53:28 |
Phonetically
Balanced |
4,609 |
4:10:47 |
Form and
Function - Word |
6,918 |
5:52:00 |
- Ramamoorthy, L., Narayan Choudhary, Gajanan R Apine
& Apurva P Betkekar. 2019. Marathi Raw Speech Corpus. Central Institute of
Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Gajanan R Apine, Apurva P. Betkekar, Godavari Thakur
- Corpus Type Raw Corpus
- Catalogue Number 1152
- ISBN 978-81-7343-251-4
- Data Source On Field
- Duration 89:17:25
- # of Audio Segments 58544
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.