Marathi Raw Speech Corpus

Marathi Raw Speech Corpus

0 reviews requests (4)
Catalogue Number: 1152
Stock In Stock

OverView

89:17:25 hours of 58 Gigabytes speech data | 307 Speakers | 58544 Audio segments | 48 kHz | 16 bit wav.
Please Login to see the price

Dataset Description

89:17:25 hours of 58 Gigabytes speech data | 307 Speakers | 58544 Audio segments | 48 kHz | 16 bit wav.

Marathi language is an Indo-Aryan language. Marathi language is prevalent from the 9th century. Standard Marathi (Puneri) is the official language of the State of Maharashtra. Standard Marathi is based on dialects used by academics and the print media. It is believed that the language of Marathi language is influenced by Sanskrit. Marathi is written in the Devanagari script. The phoneme inventory of Marathi is similar to that of many other Indo-Aryan languages. 

The LDC-IL speech data is collected from the regions of Marathwada, Puneri, Vidharbh and Goa from both the genders and different age group.

The LDC-IL Marathi Speech data set consists of different types of datasets that are made up of word lists, sentences running texts and date formats.

The available Speech Corpus details for Marathi are as follows.

Total of 307 speakers (156 Female and 151 Male.)

The available Speech data detail

 Total of 307 speakers (156 Female and 151 Male.)

    •   Contemporary Text (News) - 302 Audio Segments - 22:26:06 Hours
    •   Created Text - 302 Audio Segments - 13:37:34 Hours
    •   Sentence - 7555 Audio Segments - 6:49:58 Hours
    •   Date Format - 604 Audio Segments - 0:39:57 Hours
    •   Command and Control Words - 9068 Audio Segments - 7:50:10 Hours
    •   Person Name - 6058 Audio Segments - 7:44:56 Hours
    •   Place Name - 3037 Audio Segments - 2:49:32 Hours
    •   Most Frequent Word-Part - 9104 Audio Segments - 7:22:57 Hours
    •   Most Frequent Word-Full Set - 10987 Audio Segments - 9:53:28 Hours
    •   Phonetically Balanced - 4609 Audio Segments - 4:10:47 Hours
    •   Form and Function Word - 6918 Audio Segments - 5:52:00 Hours

A much more detailed explanation of the Marathi Speech Corpus will be available in the Marathi Speech Data Documentation. 

For any research based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary, Gajanan R Apine & Apurva P Betkekar. 2019. Marathi Raw Speech Corpus.  Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Gajanan R Apine, Apurva P. Betkekar, Godavari Thakur
  • Corpus Type Raw Corpus
  • Catalogue Number 1152
  • ISBN 978-81-7343-251-4
  • Data Source On Field
  • Duration 89:17:25
  • # of Audio Segments 58544
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review