A Gold Standard Kashmiri Raw Text Corpus

A Gold Standard Kashmiri Raw Text Corpus

0 reviews requests (2)
Catalogue Number: 1131
Stock In Stock
Please Login to see the price

Dataset Description

Kashmiri language is one of the 22 scheduled languages of India and is a part of the Eighth Schedule in the constitution of Jammu and Kashmir.

Kashmiri text has been typed in Unicode by using the In Script Keyboard in XML files. Metadata information has also been provided along with the data. The corpus has been developed from the available contemporary text. Kashmiri Text Corpus in LDC-IL comprises of 466,054 Words and character count is 2646948, drawn from books, newspapers and magazines. The representations of the two major domains are Aesthetics and Social Sciences etc.

 Overview

Kashmiri language is one of the 22 scheduled languages of India and is the part of Eighth schedule in the constitution of Jammu and Kashmir. It belongs to Dardic group of Indo-Aryan Language family. Like other Indo-Aryan languages, Kashmiri also comprises of many dialects. Kashmiri language was traditionally written in Sharda Script after the 8th Century A.D. However, with the passage of time Devanagri and Perso-Arabic scripts were adapted to write Kashmiri language. The Kashmiri text can be broadly classified in two types: literary text and non-literary text. LDCIL tried to cover the entire categories in standard list. Some categories like Novel, Short Stories Criticism, and Literature have a huge number of books, but some categories like Epic, Letters, Administration, Botany, Physics, Chemistry, Zoology and Legislature have very less number of books.

More detailed explanation of the Kashmiri Text Corpus will be available in the Kashmiri Raw Text Corpus Documentation.

For any research based citations, please use the following citations:

Item specifics

  • Authors Ramamoorthy L., Narayan Choudhary, Shahid Mushtaq Bhat
  • Corpus Type Raw Corpus
  • Catalogue Number 1131
  • ISBN 978-81-7343-230-9
  • Data Source Typed+Cleaned
  • Character Count 2646948
  • Word Count 466054
  • Release Date 04/04/2019
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User

Write a review

Please login or register to review