Your request cart is empty!
Dataset Description
164:01:02
Hours | 105 GB | 458Speakers| 43670 Audio Segments |48 kHz | 16 bit wav.
Malayalam is the official language of Kerala and Laccadive Islands. It belongs to the Dravidian language family. According to the formation of Kerala and the language of Travancore, Cochin, and Malabar regions are influenced by different internal and external factors so LDC-IL considered Malayalam has three specifically different varieties, thus collected speech data from Thiruvananthapuram, Ernakulam, and Kozhikode.
LDC-IL has 164 hours
Malayalam speech data. The LDC-IL Malayalam Speech data set consists of
different types of datasets that are made up of word lists, sentences, running
texts and date formats. Approximately 15 minutes of speech (per speaker) has taken
from 231 female and 227 Male native speakers of different age groups. Each
speaker recorded these datasets which are randomly selected from a master
dataset.
The available Speech Corpus details: 
Total Speakers 458(231 Female and 227 Male)
| Domains | Audio
  Segments | Each
  Domain Duration | 
| Contemporary
  Text (News) | 449 | 71:29:21 | 
| Creative
  Text | 449 | 54:41:20 | 
| Sentence | 7,452 | 06:56:46 | 
| Date
  Format | 598 | 00:53:45 | 
| Command
  and Control Words | 8,923 | 07:09:37 | 
| Person
  Name | 5,819 | 05:26:33 | 
| Place
  Name | 2,906 | 02:28:24 | 
| Most
  Frequent Word - Part | 8,763 | 06:51:31 | 
| Most
  Frequent Word - Full Set | 1,979 | 02:08:58 | 
| Phonetically
  Balanced | 3,096 | 02:40:09 | 
| Form
  and Function - Word | 3,236 |  03:14:38 | 
A detailed explanation of the Malayalam Speech Corpus will be available in the Malayalam Speech Data Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy, L., Narayan Choudhary, Saritha S.L., Rejitha K.S., Sajila S. & Midhun P. G. 2019. Malayalam Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Saritha S L, Rejitha K.S., Sajila S, Midhun P G
- Corpus Type Raw Corpus
- Catalogue Number 1143
- ISBN 978-81-7343-242-2
- Data Source On Field
- Duration 164:01:02
- # of Audio Segments 43670
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.

 
 
		