Downloads

  1. English Compound Noun Compositionality Dataset
  2. Hindi POS Tagger
  3. Hindi Dependency Parser
  4. Hindi WordNet in Python
  5. Kannada POS Tagger
  6. Telugu POS Tagger
  7. Indonesian and Malay Tools


Compound Noun Compositionality Dataset


Compositionality Dataset described in Reddy, McCarthy and Manandhar (2011, IJCNLP).
Alternate download link from Diana McCarthy


POS Taggers, Corpora, Lemmatizers, Morph Analyzers for Indian Languages


Most of these tools are developed by the methods described in Reddy and Sharoff (2011, CLIA @ IJCNLP). Some of the taggers are built using cross-lingual resources and some using mono-lingual resources. Please read corresponding README's of each tool for additional information.

This work is supported by Sketch Engine and Intellitext project.

If you need resources for any other Indian languages, please contact me.


Kannada Tools


Download v2.0
Sample Output of the tagger
For the complete corpus described in the paper, please contact me. Alternate download link from Serge Sharoff


Telugu Tools


Download v2.0
Sample Output of the tagger


Hindi Tools


Download v3.0
Sample Output of the tagger
Project Page: https://bitbucket.org/sivareddyg/hindi-part-of-speech-tagger


Indonesian and Malay morphological analyzer, part-of-speech (POS) tagger, Machine Translation System


With support from Sketch Engine, I have made few contributions to the Apertium Indonesian-Malay language pair. All the tools can be downloaded from http://sourceforge.net/projects/apertium/files/apertium-id-ms/


Hindi WordNet in Python


Download v1.3
Demo Program
Project Page: https://bitbucket.org/sivareddyg/python-hindi-wordnet


Hindi Dependency Parser


Download v2.0
Sample Output
Project Page: https://bitbucket.org/sivareddyg/hindi-dependency-parser


Comments

kannada tagger

hello sir,
i have downloaded the kannada pos tagger. I need to use it in windows. I am not getting how to use it or install.please tell me the procedure to install or use in windows
thank you

Telugu tagger

hello sir,
can i know what method you used in tagging... I mean Maximum entropy or crf or hmm or any other...I also want to know the algorithms used by you...

dependency annotations

Hi Siva,
I have downloaded the Hindi dependency parser.
Can you please tell me the annotation scheme used for dependency grammar?
What the column number 5 represents (it is after POS tag column and has numbers 0, 2,4, 9 etc).
for dependency tag set i referred one of the sites of IIIT.
Thanks.
Vipul

Hindi POS tagset

Hi Siva,

From Where I can download Hindi POS tagset??

Thanks

Hindi POS tagger

Hi Siva,
Please help me executing Hindi POS tagger. I have downloaded python (for win32) but not quite clear how to run the tagger. I have mailed you for the same as well. Sorry for that.
Vipul Dalal

It only works on Linux PC. I

It only works on Linux PC. I did not test it on Windows.

Wordnet in python

Thanks Siva for porting Hindi Wordnet to Python. It has made my work easier.

Word Sense Disambiguation for Telugu

Can anyone know about Telugu Wordnet or any other resources like Telugu to Telugu dictionary for Word Sense Disambiguation for Telugu?

Admin

I would like to hear from you. Users are welcome to add comments on the tools, provide suggestions, and report bugs.

Siva

list of nouns in Tamil.

Hi Siva,

I think it's really helpful that you have set up a website and have shared the tools you have developed.

I am developing a tool to do word level translations for Tamil. List of cooccurring nouns is one of the features I am using. I wasn't able to find a good POS tagger to do this task. Then, I was looking for a list of nouns in Tamil. For this, I was looking for an online dictionary from which I can extract it. But, I only found web interfaces where I can query individual words. Do you know any place where I can get a good list? Thanks for your time.

Arun

Tamil Wordlist and POS Tagger

Hi Arun,

Thanks for your encouraging words. Regarding your question, wordlist and co-occurring words of any word are easy to obtain using Sketch Engine. Currently co-occurring words are not compiled but I can compile them if it is highly important e.g co-occurring words of "house" are http://bit.ly/Hkj2bM

The wordlist functionality is ready for now. You need to register an account with Sketch Engine to access this wordlist. You can register for a free account and access the wordlist, but I will appreciate if you buy an account if the results are beneficial. Sketch Engine has invested many human hours to collect these corpora. A sample wordlist for top 100 words look like this in SKetch Engine http://sivareddy.in/lcl/tamil_wordlist.html

Register for an account and login into the Sketch Engine. Use the corpus named TamilWaC. After selecting the corpus, click wordlist functionality on the left hand menu. You can get a list of words.

Regarding POS tagger, I have not built one for Tamil yet (in my todo list). You can download IIIT tagger here and give it a try http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php Let me know how it goes.

If you need Tamil corpus you can download Wikimedia corpus but you need to clean it a bit http://dumps.wikimedia.org/tawiki/20120321 Since your motive is to build translation lists, Tamil Wiktionary may also help http://dumps.wikimedia.org/tawiktionary/20120323/

all the best,
Siva

HI,do you have any resource

HI,do you have any resource for tulu language?
thank you.

Tulu POS Tagger

You may try Kannada resources for Tulu. To collect Tulu corpus, you can try BooTCat http://bootcat.sslmit.unibo.it/

Site Counter