Maithili Raw Speech Corpus

0 reviews requests (20)

Owner Central Institute of Indian Languages

Catalogue Number: 1139

Stock In Stock

OverView

78:45:33 Hours

Please Login to see the price

Tags: Maithili Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Maithili is an Indio-Aryan language, a direct descendant of Sanskrit, which is spoken in the states of Bihar, Jarkhand, and part of Nepal. It is one of the scheduled languages of India. The LDC-IL speech data is collected from geographic dialects of Sotipura, Bajjika and Thethi dialects. It is collected from both genders and of different age groups.

The available Speech Corpus details:

Total Speakers 306 (150 Female and 156 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	291	22:33:41
Creative Text	294	15:34:55
Sentence	7,451	07:08:48
Date Format	585	00:31:41
Command and Control Words	8,924	07:07:34
Person Name	5,917	07:49:33
Place Name	2,952	02:47:49
Most Frequent Word - Part	8,699	06:56:24
Most Frequent Words-FullSet	5,996	04:58:30
Phonetically Balanced Words	3,040	02:26:27
Form and Function Words	1,049	00:50:11

A detailed explanation of the Maithili Speech Corpus will be available in the Maithili Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Arun Kumar Singh, Dinesh Mishra & Atuleshwar Jha. 2019. Maithili Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Dinesh Mishra, Arun Kumar Singh, Atuleshwar Jha
Corpus Type Raw Corpus
Catalogue Number 1139
ISBN 978-81-7343-238-5
Data Source On Field
Duration 78:45:33
# of Audio Segments 45,198
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Maithili Raw Speech Corpus

OverView

Maithili Raw Speech Corpus

Dataset Description

Item specifics

Write a review