A Gold Standard Kashmiri Raw Text Corpus Vol. II

A Gold Standard Kashmiri Raw Text Corpus Vol. II

0 reviews requests (0)
Catalogue Number: 1510
Stock In Stock

OverView

‎10, 13,658 words | 123 Titles | XML format | 6 domains |59 sub-categoriesA Gold Standard Kashmiri Raw Text Corpus Vol. II is a comprehensive collection of ‎Kashmiri ‎language texts, comprising 10, 13,658 words and 57, 28,547 cha...
Please Login to see the price

Dataset Description

‎10, 13,658 words | 123 Titles | XML format | 6 domains |59 sub-categories


A Gold Standard Kashmiri Raw Text Corpus Vol. II is a comprehensive collection of ‎Kashmiri ‎language texts, comprising 10, 13,658 words and 57, 28,547 characters. This corpus ‎includes ‎extracts from books, newspapers, and magazines, providing a diverse range of ‎linguistic data. It ‎serves as a valuable resource for linguistic research, language processing ‎applications, and the ‎preservation of the Kashmiri language. This volume has the ‎representation of six major domains ‎covered as compared to previous volume which has only ‎two major domains of Aesthetics and ‎social sciences. The corpus has been meticulously ‎compiled and is available for access through the ‎Linguistic Data Consortium for Indian ‎Languages (LDC-IL). Researchers and developers can utilize ‎this resource to enhance their ‎understanding and applications related to the Kashmiri language. The representations of the ‎six major domains are Aesthetics, Commerce, Mass Media, Official Document, Science and ‎Technology and Social Science etc.‎ 

A detailed explanation of the Kashmiri Text Corpus will be available in the Kashmiri Raw Text ‎Corpus Documentation.‎

For any research-based citations, please use the following citations:‎

  1. ‎Dr. Zargar Adil Ahmad, Bi Bi Mariyam, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, ‎Prof. Shailendra Mohan. 2025. A Gold Standard Kashmiri Raw Text Corpus Vol. II. Central Institute of ‎Indian Languages, Mysore. 978-93-48633-27-9.‎
  2. ‎Rejitha K. S. and Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Central Institute ‎of Indian Languages, Mysore. 978-93-48633-33-0.‎

Item specifics

  • Authors Dr. Zargar Adil Ahmad, Bi Bi Mariyam, Rajesha N., Manasa G.,Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan
  • Corpus Type Raw Text Corpus
  • Catalogue Number 1510
  • ISBN 978-93-48633-27-9
  • Data Source On field
  • # of Audio Segments ‎10, 13,658 words
  • Release Date 20/03/2025
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review