Marathi Raw Speech Corpus

0 reviews requests (19)

Owner Central Institute of Indian Languages

Catalogue Number: 1152

Stock In Stock

OverView

89:17:25 Hours |

Please Login to see the price

Tags: Marathi Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

The Marathi language is an Indo-Aryan language. The Marathi language is prevalent in the 9th century. Standard Marathi (Puneri) is the official language of the State of Maharashtra. Standard Marathi is based on dialects used by academics and the print media. It is believed that the language of the Marathi language is influenced by Sanskrit. Marathi is written in the Devanagari script. The phoneme inventory of Marathi is similar to that of many other Indo-Aryan languages.

The LDC-IL speech data is collected from the regions of Marathwada, Puneri, Vidharbh, and Goa from both the genders and different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details:

Total Speakers 307 (156 Female and 151 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	302	22:26:06
Creative Text	302	13:37:34
Sentence	7,555	6:49:58
Date Format	604	0:39:57
Command and Control Words	9,068	7:50:10
Person Name	6,058	7:44:56
Place Name	3,037	2:49:32
Most Frequent Word - Part	9,104	7:22:57
Most Frequent Word - Full Set	10,987	9:53:28
Phonetically Balanced	4,609	4:10:47
Form and Function - Word	6,918	5:52:00

A detailed explanation of the Marathi Speech Corpus will be available in the Marathi Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Gajanan R Apine & Apurva P Betkekar. 2019. Marathi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Gajanan R Apine, Apurva P. Betkekar, Godavari Thakur
Corpus Type Raw Corpus
Catalogue Number 1152
ISBN 978-81-7343-251-4
Data Source On Field
Duration 89:17:25
# of Audio Segments 58544
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Marathi Raw Speech Corpus

OverView

Marathi Raw Speech Corpus

Dataset Description

Item specifics

Write a review