Maithili Raw Speech Corpus

Maithili Raw Speech Corpus

0 reviews requests (3)
Catalogue Number: 1139
Stock In Stock

OverView

LDC-IL Maithili Raw speech data of  
Please Login to see the price

Dataset Description

LDC-IL Maithili Raw speech data of  72:02:12 (hh:mm:ss)  hours. The LDC-IL Maithili Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

The data is taken from 149  female and 151 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

Corpus details:

    • A total of 300 speakers (149 Female and 151 Male.)
    • 35109 audio segments
    • 44.7 gigabytes of WAV files and Metadata Txt Files
    • 72:02:12 (hh:mm:ss) hours of speech data

A much more detailed explanation of the LDC-IL Maithili Raw Speech Corpus will be available in the Maithili Speech Data Documentation.

Maithili is an Indio-Aryan language, a direct descendent of Sanskrit, which is spoken in the states of Bihar, Jarkhand and part of Nepal.  It is one of the scheduled languages of India.

The LDC-IL speech data is collected from geographic dialects of Sotipura, Bajjika and Thēthi dialects. It is collected from both genders and of different age group.

The LDC-IL Maithili Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.

A much more detailed explanation of the Maithili Speech Corpus will be available in the Maithili Speech Data Documentation. 

For any research based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Arun Kumar Singh, Dinesh Mishra & Atuleshwar Jha. 2019. Maithili Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Dinesh Mishra, Arun Kumar Singh, Atuleshwar Jha
  • Corpus Type Raw Corpus
  • Catalogue Number 1139
  • ISBN 978-81-7343-238-5
  • Data Source On Field
  • Duration 72:02:12
  • # of Audio Segments 35109
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review