Assamese Raw Speech Corpus

Assamese Raw Speech Corpus

0 reviews requests (2)
Catalogue Number: 1273
Stock In Stock
Please Login to see the price

Dataset Description

54:21:12 Hours | 32.5 GB | 304 Speakers | 37,570 Audio Segments | 48 kHz | 16 bit wav. 

Assamese is the official language of AssamIts linguistic presence is widely presented in the state of Assam and some parts of Arunachal Pradesh and Nagaland.According to 2011 census, the Assamese Language is spoken by 15 million speakers.Assamese a widely spoken language does encounter several dialectal variations. The regional dialects can be broadly divided into two parts - the Eastern Group and the Western Group.LDC-IL divided the Assamese speaking areas into these four regions Xiboxagoria, Central Assam, Kamrupi, Goalparia and have collected speech data from each speaker. LDC-IL Assamese Speech data set consists of different types of datasets that are made up of word lists, sentences, running texts and date formats.


The available Speech Corpus details:

Total Speakers 304 (154 Female and 150 Male)


Domains

Audio Segments

Each Domain

Duration

Contemporary Text (News)

304

17:23:25

Creative Text

304

11:44:37

Sentence

7593

5:55:29

Date Format

599

0:33:59

Command and Control Words

9118

4:56:49

Person Name

6081

5:38:07

Place Name

3044

1:58:33

Phonetically Balanced-W4

6567

3:41:45

Form and Function-

Word-W5

3960

2:28:28


A detailed explanation of the Assamese Speech Corpus will be available in the Assamese Speech Data Documentation. 

For any research-based citations, please use the following citations: 

  • Ramamoorthy L., Narayan Kumar Choudhary, Atreyee Sharma, Jahnobi Kalita, Samhita Bharadwaj, Plabita Bora, Priyanshee Adhyapak, Mustafiza Tamim, Rajesha N., Manasa G..  2021. Assamese Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

Item specifics

  • Authors Ramamoorthy L., Narayan Kumar Choudhary, Atreyee Sharma, Jahnobi Kalita, Samhita Bharadwaj, Plabita Bora, Priyanshee Adhyapak, Mustafiza Tamim, Rajesha N., Manasa G.
  • Corpus Type Raw Corpus
  • Catalogue Number 1273
  • ISBN 978-81-948885-5-0
  • Data Source On Field
  • Duration 54:21:18
  • # of Audio Segments 37,570
  • Release Date 15/06/2021
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review