Tamil  Parts of Speech Annotated Corpus

Tamil Parts of Speech Annotated Corpus

0 reviews requests (0)
Catalogue Number: 1693
Stock In Stock

OverView

2131256 Tags | 1750935 Words | 172089 SentencesThe Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for ...
Please Login to see the price

Dataset Description

2131256 Tags | 1750935 Words | 172089 Sentences

The Linguistic Data Consortium for Indian Languages (LDC-IL) is developed Parts-of-Speech annotated corpus for Scheduled Indian languages. The corpus is annotated with Part-of-Speech (PoS) tags based on the Bureau of Indian Standards (BIS) PoS Tagset. This data is a significant resource for natural language processing and linguistic research. LDC-IL developed annotated text corpora for Tamil . The Tamil PoS annotated corpus is automatically tagged and then verified by linguistic experts to ensure accuracy and consistency.
Tamil PoS annotated Corpus contains 2131256 Part-of-Speech tags.

For any research-based citations, please use the following citations:

1. Dr. Amudha R, Dr. Kamaraj S, Dr. Prem Kumar L. R., Dr. Narayan Choudhary 2026. Tamil Parts of Speech Annotated Corpus. Central Institute of Indian Languages, Mysore. 978-81-69175-98-2

2. Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2026. LDC-IL Parts of Speech Annotated Corpus Based on BIS Framework. Central Institute of Indian Languages, Mysore. 978-81-69175-60-9.

Item specifics

  • Authors Dr. Amudha R, Dr. Kamaraj S, Dr. Prem Kumar L. R., Dr. Narayan Choudhary
  • Corpus Type Parts of Speech Annotated Text Corpus
  • Data Source Annotated
  • Word Count 1750935
  • Tag Count 2131256
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review