Odia Raw Speech Corpus

0 reviews requests (11)

Owner Central Institute of Indian Languages

Catalogue Number: 1282

Stock In Stock

Please Login to see the price

Tags: Odia Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

138:06:18 hours | 89 GB | 474 Speakers | 73,418 Audio segments | 48 kHz | 16 bit wav.Odia is an Indo-Aryan language; which is mainly spoken in the state of Odisha and also in some of the border states like West Bengal, Jharkhand, Chhatisgarh and Andhra Pradesh. It is designated with Classical Language Status by the Govt. of India. The LDC-IL Odia speech data is collected from the Central and Northern parts of Odisha from both the genders and different age groups. This data consists of different types of datasets that are made up of word lists, sentences include running texts and date formats.

The available Speech Corpus details:

Total Speakers 474 (239 Female and 235 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	449	42:49:56
Creative Text	450	19:43:50
Sentence	11,248	8:22:57
Date Format	900	1:27:49
Command and Control Words	13,499	14:18:49
Person Name	8,998	5:01:40
Place Name	4,496	13:22:45
Most Frequent Word - Part	8,994	9:40:04
Most Frequent Word - Full Set	10,989	10:21:04
Phonetically Balanced	10,438	10:05:10
Form and Function - Word	2,957	2:52:14

A detailed explanation of the Bengali Speech Corpus will be available in the Odia Raw Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Raja Kumar Naik, Pramod Kumar Rout, Kshirod Kumar Das & Santosh Kumar Mohanty. 2021. Odia Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Kumar Choudhary, Raja Kumar Naik, Pramod Kumar Rout, Kshirod Kumar Das, Santosh Kumar Mohanty
Corpus Type Raw Corpus
Catalogue Number 1282
ISBN 978-93-91386-00-9
Data Source On Field
Duration 138:06:18
# of Audio Segments 73,418
Release Date 15-Jun-2021
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Odia Raw Speech Corpus

Odia Raw Speech Corpus

Dataset Description

Item specifics

Write a review