A Gold Standard Manipuri Raw Text Corpus

0 reviews requests (29)

Owner Central Institute of Indian Languages

Catalogue Number: 1146

Stock In Stock

OverView

61,45,278 words

Please Login to see the price

Tags: Manipuri Raw Text Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

61,45,278 words | 4,31,27,842 characters | 6 Domains

Manipuri Text Corpus is encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata information. The corpus has been created from contemporary texts in a typed method. LDC-IL Manipuri Text Corpus size is 6145278 words drawn from 1202 different titles. The six major domains are Aesthetics, Commerce, Mass Media, Official Documents, Science & Technology and Social Sciences respectively.

The available Text Corpus Details:

Domains	Words	Percentage of Total Corpus
Aesthetics	37,72,994	61.40 %
Commerce	18,450	0.30 %
Mass Media	7,75,261	12.62 %
Official	4,42,950	7.21 %
Science and Technology	3,04,545	4.96 %
Social Sciences	8,31,078	13.52 %

A detailed explanation of the Manipuri Text Corpus will be available in the Manipuri Text Corpus Documentation.

For any research-based citations, please use the following citations:

Ramamoorthy, L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu, Longjam Anand Singh & M. Bidyarani Devi. 2019. A Gold Standard Manipuri Raw Text Corpus. Central Institute of Indian Languages, Mysore.
Choudhary, Narayan & L. Ramamoorthy. 2019. "LDC-IL Raw Text Corpora: An Overview" in Linguistic Resources for AI/NLP in Indian Languages, Central Institute of Indian Languages, Mysore. pp. 1-10.

Item specifics

Authors Ramamoorthy L., Narayan Choudhary, Amom Nandaraj Meetei, Yumnam Premila Chanu, Longjam Anand Singh,Bidyarani Devi M
Corpus Type Raw Corpus
Catalogue Number 1146
ISBN 978-81-7343-245-3
Data Source Typed+Cleaned
Character Count 43127842
Word Count 6145278
Release Date 04-Apr-2019
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

A Gold Standard Manipuri Raw Text Corpus

OverView

A Gold Standard Manipuri Raw Text Corpus

Dataset Description

Item specifics

Write a review