Telugu Raw Speech Corpus

Telugu Raw Speech Corpus

1 reviews requests (13)
Catalogue Number: 1173
Stock In Stock

OverView

22:43:59 Hours | 15 GB | 80 Speakers | 10,510  Audio Segments | 48 kHz | 16 bit wav. 
Please Login to see the price

Dataset Description

22:43:59 Hours | 15 GB | 80 Speakers | 10,510  Audio Segments | 48 kHz | 16 bit wav. 

Telugu is the official language of Telangana and the Andhra Pradesh States. It belongs to the Dravidian language family. Among the Dravidian languages, Telugu is spoken by the largest population. Telugu is agglutinative in nature and its vocabulary is very much influenced by Sanskrit.  LDC-IL considered Telugu has three specifically different varieties, thus collected speech data from Telangana, Rayalaseema and Coastal Andhra. The LDC-IL Telugu Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats. Each speaker recorded these datasets which are randomly selected from a master dataset. Speech is in .wav format and Metadata is in .txt format.


The available Speech Corpus details:


Total Speakers 80 (24  Female and 56 Male)


Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

77

8:28:19

Creative Text

77

7:01:16

Sentence

1,828

1:20:55

Date Format

142

0:13:58

Command and Control Words

2,170

1:43:49

Person Name

1,438

1:09:31

Place Name

707

0:33:24

Most Frequent Word - Part

2,162

1:31:24

Most Frequent Word - Full Set

1,909

0:41:23


A detailed explanation of the Telugu Speech Corpus will be available in the Telugu Speech Data Documentation. 

For any research-based citations, please use the following citations:

  • Ramamoorthy, L., Narayan Choudhary & Rajesha N. 2019. Telugu  Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Rajesha N.
  • Corpus Type Raw Corpus
  • Catalogue Number 1173
  • ISBN 978-81-7343-272-9
  • Data Source On Field
  • Duration 22:43:59
  • # of Audio Segments 10510
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review