Maithili Raw Speech Corpus

Maithili Raw Speech Corpus

0 reviews requests (12)
Catalogue Number: 1139
Stock In Stock

OverView

78:45:33 Hours 
Please Login to see the price

Dataset Description

78:45:33 Hours  | 49.2 GB  | 306 Speakers | 45,198 Audio Segments | 48 kHz | 16 bit wav

Maithili is an Indio-Aryan language, a direct descendant of Sanskrit, which is spoken in the states of Bihar, Jarkhand, and part of Nepal. It is one of the scheduled languages of India. The LDC-IL speech data is collected from geographic dialects of Sotipura, Bajjika and Thethi dialects. It is collected from both genders and of different age groups.

The available Speech Corpus details:


Total Speakers 306 (150 Female and 156 Male)



Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

291

22:33:41

Creative Text

294

15:34:55

Sentence

7,451

07:08:48

Date Format

585

00:31:41

Command and Control Words

8,924

07:07:34

Person Name

5,917

07:49:33

Place Name

2,952

02:47:49

Most Frequent Word - Part

8,699

06:56:24

Most Frequent Words-FullSet

5,996

04:58:30

Phonetically Balanced Words

3,040

02:26:27

Form and Function Words

1,049

00:50:11

                        


A  detailed explanation of the Maithili Speech Corpus will be available in the Maithili Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Ramamoorthy, L., Narayan Choudhary, Arun Kumar Singh, Dinesh Mishra & Atuleshwar Jha. 2019. Maithili Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Dinesh Mishra, Arun Kumar Singh, Atuleshwar Jha
  • Corpus Type Raw Corpus
  • Catalogue Number 1139
  • ISBN 978-81-7343-238-5
  • Data Source On Field
  • Duration 78:45:33
  • # of Audio Segments 45,198
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review