Downloads
- English Compound Noun Compositionality Dataset
- Hindi POS Tagger
- Hindi Dependency Parser
- Hindi WordNet in Python
- Kannada POS Tagger
- Telugu POS Tagger
- Indonesian and Malay Tools
Compound Noun Compositionality Dataset
Compositionality Dataset described in Reddy, McCarthy and Manandhar (2011, IJCNLP).
Alternate download link from Diana McCarthy
POS Taggers, Corpora, Lemmatizers, Morph Analyzers for Indian Languages
Most of these tools are developed by the methods described in Reddy and Sharoff (2011, CLIA @ IJCNLP). Some of the taggers are built using cross-lingual resources and some using mono-lingual resources. Please read corresponding README's of each tool for additional information.
This work is supported by Sketch Engine and Intellitext project.
If you need resources for any other Indian languages, please contact me.
Kannada Tools
Download v2.0
Sample Output of the tagger
For the complete corpus described in the paper, please contact me. Alternate download link from Serge Sharoff
Telugu Tools
Download v2.0
Sample Output of the tagger
Hindi Tools
Download v2.0
Sample Output of the tagger
Indonesian and Malay morphological analyzer, part-of-speech (POS) tagger, Machine Translation System
With support from Sketch Engine, I have made few contributions to the Apertium Indonesian-Malay language pair. All the tools can be downloaded from http://sourceforge.net/projects/apertium/files/apertium-id-ms/
Hindi WordNet in Python
Download v1.2
Demo Program
Hindi Dependency Parser
Download
Sample Output

Comments
Wordnet in python
Thanks Siva for porting Hindi Wordnet to Python. It has made my work easier.
Word Sense Disambiguation for Telugu
Can anyone know about Telugu Wordnet or any other resources like Telugu to Telugu dictionary for Word Sense Disambiguation for Telugu?
Admin
I would like to hear from you. Users are welcome to add comments on the tools, provide suggestions, and report bugs.
Siva
list of nouns in Tamil.
Hi Siva,
I think it's really helpful that you have set up a website and have shared the tools you have developed.
I am developing a tool to do word level translations for Tamil. List of cooccurring nouns is one of the features I am using. I wasn't able to find a good POS tagger to do this task. Then, I was looking for a list of nouns in Tamil. For this, I was looking for an online dictionary from which I can extract it. But, I only found web interfaces where I can query individual words. Do you know any place where I can get a good list? Thanks for your time.
Arun
Tamil Wordlist and POS Tagger
Hi Arun,
Thanks for your encouraging words. Regarding your question, wordlist and co-occurring words of any word are easy to obtain using Sketch Engine. Currently co-occurring words are not compiled but I can compile them if it is highly important e.g co-occurring words of "house" are http://bit.ly/Hkj2bM
The wordlist functionality is ready for now. You need to register an account with Sketch Engine to access this wordlist. You can register for a free account and access the wordlist, but I will appreciate if you buy an account if the results are beneficial. Sketch Engine has invested many human hours to collect these corpora. A sample wordlist for top 100 words look like this in SKetch Engine http://sivareddy.in/lcl/tamil_wordlist.html
Register for an account and login into the Sketch Engine. Use the corpus named TamilWaC. After selecting the corpus, click wordlist functionality on the left hand menu. You can get a list of words.
Regarding POS tagger, I have not built one for Tamil yet (in my todo list). You can download IIIT tagger here and give it a try http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php Let me know how it goes.
If you need Tamil corpus you can download Wikimedia corpus but you need to clean it a bit http://dumps.wikimedia.org/tawiki/20120321 Since your motive is to build translation lists, Tamil Wiktionary may also help http://dumps.wikimedia.org/tawiktionary/20120323/
all the best,
Siva
HI,do you have any resource
HI,do you have any resource for tulu language?
thank you.
Tulu POS Tagger
You may try Kannada resources for Tulu. To collect Tulu corpus, you can try BooTCat http://bootcat.sslmit.unibo.it/