Bodo Raw Speech Corpus

0 reviews requests (14)

Owner Central Institute of Indian Languages

Catalogue Number: 1112

Stock In Stock

OverView

176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segme...

Please Login to see the price

Tags: Bodo Raw Speech Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wav

Bodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic family. It is the language of Bodos, which are the major tribes of the Indian State of Assam.

The LDC-IL Bodo speech data is collected from the regions of Chirang, Baksa Sonitpur Udalguri, Kamrup, Barpeta, Udalguri, Kokrajhar districts of Assam State of India which covers Bwrdwnari, Eastern, and Standard dialects. The data is collected from both the genders and different age groups.

The available Speech Corpus details:

Total Speakers 456 (220 Female and 236 Male)

Domains	Audio Segments	Each Domain Duration
Contemporary Text (News)	411	53:47:56
Creative Text	413	26:47:07
Sentence	10,257	09:16:54
Date Format	938	01:58:08
Command and Control Words	12,348	14:19:32
Person Name	8,222	14:49:44
Place Name	4,115	05:17:14
Most Frequent Word - Part	12,397	14:34:05
Most Frequent Word - Full Set	6,994	04:30:14
Phonetically Balanced	15,999	20:07:33
Form and Function - Word	6,383	08:28:25

A detailed explanation of the Bodo Speech Corpus will be available in the Bodo Speech Data Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Bridul Basumatary & Farson Daimary. 2019. Bodo Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Bridul Basumatary, Farson Daimary
Corpus Type Raw Corpus
Catalogue Number 1112
ISBN 978-81-7343-211-8
Data Source On Field
Duration 176:53:28
# of Audio Segments 77443
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

Bodo Raw Speech Corpus

OverView

Bodo Raw Speech Corpus

Dataset Description

Item specifics

Write a review