Nepali Parts of Speech Annotated Corpus

Nepali Parts of Speech Annotated Corpus

0 reviews requests (0)
Catalogue Number: 1696
Stock In Stock

OverView

820475 Tags| 682565 Words |54540 SentencesThe Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Sched...
Please Login to see the price

Dataset Description

820475 Tags| 682565 Words |54540 Sentences

The Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Scheduled Indian languages. The corpus is annotated with Part-of-Speech (PoS) tags based on the Bureau of Indian Standards (BIS) PoS Tagset. This data is a significant resource for natural language processing and linguistic research. LDC-IL developed annotated text corpora for Nepali. The Nepali PoS annotated corpus is automatically tagged and then verified by linguistic experts to ensure accuracy and consistency.
Nepali PoS annotated Corpus contains 820475 Part-of-Speech tags.

For any research-based citations, please use the following citations:

1. Umesh Chamling Rai, Dr. Narayan Choudhary, Rajesha N., Prof. Shailendra Mohan. 2026. Nepali Parts of Speech Annotated Corpus. Central Institute of Indian Languages, Mysore. 978-81-69175-03-6.

2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2026. LDC-IL Parts of Speech Annotated Corpus Based on BIS Framework. Central Institute of Indian Languages, Mysore. 978-81-69175-60-9.

Item specifics

  • Authors Umesh Chamling Rai, Dr. Narayan Choudhary, Rajesha N., Prof. Shailendra Mohan
  • Corpus Type Parts of Speech Annotated Text Corpus
  • Catalogue Number 1696
  • ISBN 978-81-69175-03-6
  • Data Source Annotated
  • Word Count 682565
  • Release Date 3/23/2026
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
  • Tag Count 820475
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review