Marathi Raw Speech Corpus

Marathi Raw Speech Corpus

0 reviews requests (15)
Catalogue Number: 1152
Stock In Stock

OverView

89:17:25 Hours |
Please Login to see the price

Dataset Description

89:17:25 Hours | 58 GB speech data | 307 Speakers | 58544 Audio segments | 48 kHz | 16 bit wav.


The Marathi language is an Indo-Aryan language. The Marathi language is prevalent in the 9th century. Standard Marathi (Puneri) is the official language of the State of Maharashtra. Standard Marathi is based on dialects used by academics and the print media. It is believed that the language of the Marathi language is influenced by Sanskrit. Marathi is written in the Devanagari script. The phoneme inventory of Marathi is similar to that of many other Indo-Aryan languages. 

The LDC-IL speech data is collected from the regions of Marathwada, Puneri, Vidharbh, and Goa from both the genders and different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.


 The available Speech Corpus details:


Total Speakers 307 (156 Female and 151 Male)

 

Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

302

22:26:06

Creative Text

302

13:37:34

Sentence

7,555

6:49:58

Date Format

604

0:39:57

Command and Control Words

9,068

7:50:10

Person Name

6,058

7:44:56

Place Name

3,037

2:49:32

Most Frequent Word - Part

9,104

7:22:57

Most Frequent Word - Full Set

10,987

9:53:28

Phonetically Balanced

4,609

4:10:47

Form and Function - Word

6,918

5:52:00


A  detailed explanation of the Marathi Speech Corpus will be available in the Marathi Speech Data Documentation.
For any research-based citations, please use the following citations: 

  • Ramamoorthy, L., Narayan Choudhary, Gajanan R Apine & Apurva P Betkekar. 2019. Marathi Raw Speech Corpus.  Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Saurabh Varik, Bhageshree Khandale, Gajanan R Apine, Apurva P. Betkekar, Godavari Thakur
  • Corpus Type Raw Corpus
  • Catalogue Number 1152
  • ISBN 978-81-7343-251-4
  • Data Source On Field
  • Duration 89:17:25
  • # of Audio Segments 58544
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review