A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II

0 reviews requests (9)

Owner Central Institute of Indian Languages

Catalogue Number: 1509

Stock In Stock

OverView

22,19,592 Words | 55 Titles | XML format | 4 Domains | 28 Sub-categoriesChhattisgarhi, a tongue of approximately 17 million people, carries profound cultural and historical significance within the region of Chhattisgarh. The Chhattisgarhi R...

Please Login to see the price

Tags: A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II

Categories Cart Account Search Recent View Go to Top

Dataset Description

22,19,592 Words | 55 Titles | XML format | 4 Domains | 28 Sub-categories

Chhattisgarhi, a tongue of approximately 17 million people, carries profound cultural and historical significance within the region of Chhattisgarh. The Chhattisgarhi Raw Text Corpus endows an unrivaled window in documenting the colloquialisms, idioms, regional vocabularies, and grammar that are essential to establishing frameworks for linguistic processing. The Chhattisgarhi Raw Text Corpus is an extensive repository encapsulating the viable linguistic elements of Chhattisgarhi textual materials.

The corpus of Chhattisgarhi text can be broadly classified as literary and non-literary texts. Data has been collected from books, magazines, newspapers and websites and it is verified to be true to the original texts and then warehoused. Chhattisgarhi Text Corpus encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata information. The corpus has been created from the contemporary text in typed and crawled methods.

A detailed explanation of the Chhattisgarhi Raw Text Corpus Vol. II will be available in the Chhattisgarhi Text Corpus Documentation.

For any research-based citations, please use the following citations:

Ankita Tiwari, Dr. Satyaendra Kumar Awasthi, Shantanu Kumar, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan. 2025. A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II. Central Institute of Indian Languages, Mysore. ISBN: 978-93-48633-16-3.
Dr. Rejitha K. S., Dr. Narayan Kumar Choudhary. 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. ISBN: 978-93-48633-33-0.

Item specifics

Authors Ankita Tiwari, Dr. Satyaendra Kumar Awasthi, Shantanu Kumar, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan
Corpus Type Raw Text Corpus
Catalogue Number 1509
ISBN 978-93-48633-16-3
Data Source On Field
Release Date 20/03/2025
Terms and Conditions General instructions for use of the resources provided by LDC-IL.

A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II

OverView

A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II

Dataset Description

Item specifics

Write a review