Malayalam Raw Speech Corpus

0 reviews requests (21)

Owner Central Institute of Indian Languages

Catalogue Number: 1143

Stock In Stock

OverView

Please Login to see the price

Tags: Malayalam Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Malayalam is the official language of Kerala and Laccadive Islands. It belongs to the Dravidian language family. According to the formation of Kerala and the language of Travancore, Cochin, and Malabar regions are influenced by different internal and external factors so LDC-IL considered Malayalam has three specifically different varieties, thus collected speech data from Thiruvananthapuram, Ernakulam, and Kozhikode.

LDC-IL has 164 hours Malayalam speech data. The LDC-IL Malayalam Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 231 female and 227 Male native speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details:

Total Speakers 458(231 Female and 227 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	449	71:29:21
Creative Text	449	54:41:20
Sentence	7,452	06:56:46
Date Format	598	00:53:45
Command and Control Words	8,923	07:09:37
Person Name	5,819	05:26:33
Place Name	2,906	02:28:24
Most Frequent Word - Part	8,763	06:51:31
Most Frequent Word - Full Set	1,979	02:08:58
Phonetically Balanced	3,096	02:40:09
Form and Function - Word	3,236	03:14:38

A detailed explanation of the Malayalam Speech Corpus will be available in the Malayalam Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Saritha S.L., Rejitha K.S., Sajila S. & Midhun P. G. 2019. Malayalam Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Saritha S L, Rejitha K.S., Sajila S, Midhun P G
Corpus Type Raw Corpus
Catalogue Number 1143
ISBN 978-81-7343-242-2
Data Source On Field
Duration 164:01:02
# of Audio Segments 43670
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Malayalam Raw Speech Corpus

OverView

Malayalam Raw Speech Corpus

Dataset Description

Item specifics

Write a review