Indian English Raw Speech Corpus - Bengali Variant

Indian English Raw Speech Corpus - Bengali Variant

0 reviews requests (4)
Catalogue Number: 1278
Stock In Stock
Please Login to see the price

Dataset Description

25:47:11 Hours | 15.5 GB | 53 Speakers| 16,044 Audio Segments | 48 kHz | 16 bit wav.

English language is a blend of Anglo-Saxon which is the prominent language of Britain in middle ages. It has been propagated to every corner of the world by colonists. English emerges as the most visible legacy of British in India because India was under British raj for almost two centuries and English is a part of education system here. Most of the states in India use their regional languages and do not have a common language to communicate. So English is used for inter-state communication.

LDC-IL has 25 hours Indian English - Bengali Variant speech data. The LDC-IL Indian English Speech data set consists of different types of datasets that are made up of word lists, sentences, texts and date formats. Approximately 15 minutes of speech (per speaker) has taken from 27 female and 26 Male from Bengali mother tongue speakers of different age groups. Each speaker recorded these datasets which are randomly selected from a master dataset.

The available Speech Corpus details: 


Total Speakers 53 (27 Female and 26 Male)


Domains

Audio Segments

Each Domain Duration

Contemporary Text (News)

52

6:03:15

Creative Text

52

2:41:17

Sentence

1300

1:29:35

Date Format

104

0:08:56

Command and Control Words

2882

3:09:13

Person Name

1040

0:33:56

Place Name

519

1:30:22

Most Frequent Word - Part

1442

1:22:38

Most Frequent Word - Full Set

5985

6:01:44

Phonetically Balanced

1782

1:52:21

Form and Function - Word

886

0:53:54



A detailed explanation of the Indian English Raw Speech Corpus - Bengali Variant will be available in the Indian English Raw Speech Corpus - Bengali Variant Documentation. 

For any research-based citations, please use the following citations: 

Item specifics

  • Authors Ramamoorthy L., Narayan Kumar Choudhary, Arundhati Sengupta, Rejitha K.S., Rajesha N., Manasa G.
  • Corpus Type Raw Corpus
  • Catalogue Number 1278
  • ISBN 978-81-948885-1-2
  • Data Source On Field
  • Duration 25:50:17
  • # of Audio Segments 16,044
  • Release Date 15-Jun-2021
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review