Punjabi Parts of Speech Annotated Corpus

0 reviews requests (0)

Owner Central Institute of Indian Languages

Catalogue Number: 1694

Stock In Stock

OverView

1298034 Tags | 1150325 Words | 65452 SentencesThe Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for S...

Please Login to see the price

Tags: Punjabi Parts of Speech PoS Annotated Text Corpus

Categories Cart Account Search Recent View Go to Top

Dataset Description

1298034 Tags | 1150325 Words | 65452 Sentences

The Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Scheduled Indian languages. The corpus is annotated with Part-of-Speech (PoS) tags based on the Bureau of Indian Standards (BIS) PoS Tagset. This data is a significant resource for natural language processing and linguistic research. LDC-IL developed annotated text corpora for Punjabi. The Punjabi PoS annotated corpus is automatically tagged and then verified by linguistic experts to ensure accuracy and consistency.
Punjabi PoS annotated Corpus contains 1298034 Part-of-Speech tags.

For any research-based citations, please use the following citations:

1. Shalinder Singh, Dr. Narayan Choudhary, Rajesha N., Manasa G. 2026. Punjabi Parts of Speech Annotated Corpus. Central Institute of Indian Languages, Mysore. 978-81-69175-64-7.

2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2026. LDC-IL Parts of Speech Annotated Corpus Based on BIS Framework. Central Institute of Indian Languages, Mysore. 978-81-69175-60-9.

Item specifics

Corpus Type Parts of Speech Annotated Text Corpus
Data Source Annotated
Word Count 1150325
Tag Count 1298034

Commercial User

Click here to download

Non-Commercial User

Click here to download

LDC-IL Raw Text Corpora: An Overview

Click here to download

LDC-IL Raw Speech Corpora: An Overview

Click here to download

Punjabi Parts of Speech Annotated Corpus

OverView

Punjabi Parts of Speech Annotated Corpus

Halabi Parallel Text Corpus: Linguistic Features and Structures

Dogri Parallel Text Corpus: Linguistic Features and Structures

The Mother Tongue Parallel Text Corpus of India Vol. I

Dataset Description

Item specifics

Write a review