Manipuri Parts of Speech Annotated Corpus
OverView
543469 Tags| 449451 Words | 44619 SentencesThe Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Sche...Your request cart is empty!
Dataset Description
543469 Tags| 449451 Words | 44619 Sentences
The Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Scheduled Indian languages. The corpus is annotated with Part-of-Speech (PoS) tags based on the Bureau of Indian Standards (BIS) PoS Tagset. This data is a significant resource for natural language processing and linguistic research. LDC-IL developed annotated text corpora for Manipuri. The Manipuri PoS annotated corpus is automatically tagged and then verified by linguistic experts to ensure accuracy and consistency.
Manipuri PoS annotated Corpus contains 543469 Part-of-Speech tags.
For any research-based citations, please use the following citations:
1. Amom Nandaraj Meetei, Yumnam Premila Chanu, Dr. Narayan Choudhary. Rajesha N.. 2026. Manipuri Parts of Speech Annotated Corpus. Central Institute of Indian Languages, Mysore. 978-81-69175-43-2.
2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2026. LDC-IL Parts of Speech Annotated Corpus Based on BIS Framework. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0
Item specifics
- Authors Amom Nandaraj Meetei, Yumnam Premila Chanu, Dr. Narayan Choudhary. Rajesha N.
- Corpus Type Parts of Speech Annotated Text Corpus
- Catalogue Number 1698
- ISBN 978-81-69175-34-0
- Data Source Annotated
- Word Count 449451
- Release Date 3/23/2026
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.
- Tag Count 543469
