A Gold Standard Manipuri Raw Text Corpus

A Gold Standard Manipuri Raw Text Corpus

0 reviews requests (16)
Catalogue Number: 1146
Stock In Stock

OverView

61,45,278 words 
Please Login to see the price

Dataset Description

61,45,278 words | 4,31,27,842 characters | 6 Domains

Manipuri Text Corpus is encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata information. The corpus has been created from contemporary texts in a typed method. LDC-IL Manipuri Text Corpus size is 6145278 words drawn from 1202 different titles. The six major domains are Aesthetics, Commerce, Mass Media, Official Documents, Science & Technology and Social Sciences respectively.  


The available Text Corpus Details:


Domains

Words

Percentage of Total

Corpus

Aesthetics 

37,72,994

61.40 %

Commerce

18,450

0.30 %

Mass Media

7,75,261

12.62 %

Official

4,42,950

7.21 %

Science and Technology

3,04,545

4.96 %

Social Sciences

8,31,078

13.52 %


A  detailed explanation of the Manipuri Text Corpus will be available in the Manipuri Text Corpus Documentation.

For any research-based citations, please use the following citations:

  • Ramamoorthy, L., Narayan  Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu, Longjam Anand Singh & M. Bidyarani Devi. 2019. A Gold Standard Manipuri Raw Text Corpus.  Central Institute of Indian Languages, Mysore.
  • Choudhary, Narayan & L. Ramamoorthy. 2019. "LDC-IL Raw Text Corpora: An Overview"  in  Linguistic Resources for AI/NLP in Indian Languages, Central Institute of Indian Languages, Mysore. pp. 1-10.

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu, Longjam Anand Singh,Bidyarani Devi M
  • Corpus Type Raw Corpus
  • Catalogue Number 1146
  • ISBN 978-81-7343-245-3
  • Data Source Typed+Cleaned
  • Character Count 43127842
  • Word Count 6145278
  • Release Date 04-Apr-2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review