Telugu Raw Speech Corpus

1 reviews requests (15)

Owner Central Institute of Indian Languages

Catalogue Number: 1173

Stock In Stock

OverView

Please Login to see the price

Tags: Telugu Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

Telugu is the official language of Telangana and the Andhra Pradesh States. It belongs to the Dravidian language family. Among the Dravidian languages, Telugu is spoken by the largest population. Telugu is agglutinative in nature and its vocabulary is very much influenced by Sanskrit. LDC-IL considered Telugu has three specifically different varieties, thus collected speech data from Telangana, Rayalaseema and Coastal Andhra. The LDC-IL Telugu Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats. Each speaker recorded these datasets which are randomly selected from a master dataset. Speech is in .wav format and Metadata is in .txt format.

The available Speech Corpus details:

Total Speakers 80 (24 Female and 56 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	77	8:28:19
Creative Text	77	7:01:16
Sentence	1,828	1:20:55
Date Format	142	0:13:58
Command and Control Words	2,170	1:43:49
Person Name	1,438	1:09:31
Place Name	707	0:33:24
Most Frequent Word - Part	2,162	1:31:24
Most Frequent Word - Full Set	1,909	0:41:23

A detailed explanation of the Telugu Speech Corpus will be available in the Telugu Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary & Rajesha N. 2019. Telugu Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Rajesha N.
Corpus Type Raw Corpus
Catalogue Number 1173
ISBN 978-81-7343-272-9
Data Source On Field
Duration 22:43:59
# of Audio Segments 10510
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Telugu Raw Speech Corpus

OverView

Telugu Raw Speech Corpus

Dataset Description

Item specifics

Write a review