Bodo Raw Speech Corpus

Bodo Raw Speech Corpus

0 reviews requests (10)
Catalogue Number: 1112
Stock In Stock

OverView

176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segme...
Please Login to see the price

Dataset Description

176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wav

Bodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic family. It is the language of Bodos, which are the major tribes of the Indian State of Assam.

Bodo, one of the scheduled language of India, is one of the Tonal languages of the world. There are two clearly distinguishable kinds of tones in Bodo which are known as Low and High. The language belongs to the Tibeto Burmese linguistic family. It is the language of Bodos, which are the major tribes of the Indian State of Assam.

The LDC-IL Bodo speech data is collected from the regions of Chirang, Baksa Sonitpur Udalguri, Kamrup, Barpeta, Udalguri, Kokrajhar districts of Assam State of India which covers Bwrdwnari, Eastern, and Standard dialects. The data is collected from both the genders and different age groups.

The available Speech Corpus details:

Total Speakers 456 (220 Female and 236 Male)


Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

411

53:47:56

Creative Text

413

26:47:07

Sentence

10,257

09:16:54

Date Format

938

01:58:08

Command and Control Words

12,348

14:19:32

Person Name

8,222

14:49:44

Place Name

4,115

05:17:14

Most Frequent Word - Part

12,397

14:34:05

Most Frequent Word - Full Set

6,994

04:30:14

Phonetically Balanced

15,999

20:07:33

Form and Function - Word

6,383

08:28:25


A detailed explanation of the Bodo Speech Corpus will be available in the Bodo Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Ramamoorthy, L., Narayan Choudhary, Bridul Basumatary & Farson Daimary. 2019. Bodo Raw Speech Corpus.  Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Bridul Basumatary, Farson Daimary
  • Corpus Type Raw Corpus
  • Catalogue Number 1112
  • ISBN 978-81-7343-211-8
  • Data Source On Field
  • Duration 176:53:28
  • # of Audio Segments 77443
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review