1. GraphParser: An Ungrounded and Grounded Semantic Parser
  2. English Compound Noun Compositionality Dataset
  3. Hindi POS Tagger
  4. Hindi Dependency Parser
  5. Hindi WordNet in Python
  6. Kannada POS Tagger
  7. Telugu POS Tagger
  8. Indonesian and Malay Tools

GraphParser: An Ungrounded and Grounded Semantic Parser

Graph Parser is a semantic parser which converts Natural Language Sentences/Questions to predicate-argument graphs, which can in-turn be converted to logical queries and executed on Freebase knowledge-graph. Please read more about it in our paper Large-scale Semantic Parsing without Question-Answer Pairs.

Semantic Parser Demo

Download the code and data

Compound Noun Compositionality Dataset

Compositionality Dataset described in Reddy, McCarthy and Manandhar (2011, IJCNLP).
Alternate download link from Diana McCarthy

POS Taggers, Corpora, Lemmatizers, Morph Analyzers for Indian Languages

Most of these tools are developed by the methods described in Reddy and Sharoff (2011, CLIA @ IJCNLP). Some of the taggers are built using cross-lingual resources and some using mono-lingual resources. Please read corresponding README's of each tool for additional information.

This work is supported by Sketch Engine and Intellitext project.

If you need resources for any other Indian languages, please contact me.

Kannada Tools

Download v2.0
Sample Output of the tagger
For the complete corpus described in the paper, please contact me. Alternate download link from Serge Sharoff

Telugu Tools

Download v3.0
Sample Output of the tagger
Project Page:

Hindi Tools

Download v3.0
Sample Output of the tagger
Project Page:

Indonesian and Malay morphological analyzer, part-of-speech (POS) tagger, Machine Translation System

With support from Sketch Engine, I have made few contributions to the Apertium Indonesian-Malay language pair. All the tools can be downloaded from

Hindi WordNet in Python

Download v1.4
Other versions
Demo Program
Project Page:

Hindi Dependency Parser

Download v2.0
Sample Output
Project Page:


Tamil dataset

How to download tamil dataset.? Kindly provide links to download tamil dataset.

Tamil Grammar Rule

Dear Siva,
This is Ravishankar from Chennai. I am working on Tamil grammar rule based sentiment for Tweets (Tamil Movies). How many grammar rules are there in Tamil and how to use those grammar rules for Tweets for categorization.

Hi Siva, Great work! Do you

Hi Siva,

Great work! Do you have chunker for hindi/indian language as well? If not can you suggest me one?

Problem for downloading GraphParser

Hi Sir
I have been used below link but I was faced with "GraphParser Empty folder" in google drive.
And tnx for creative approach in paper.

kannada tagger

hello sir,
i have downloaded the kannada pos tagger. I need to use it in windows. I am not getting how to use it or install.please tell me the procedure to install or use in windows
thank you

Telugu tagger

hello sir,
can i know what method you used in tagging... I mean Maximum entropy or crf or hmm or any other...I also want to know the algorithms used by you...

dependency annotations

Hi Siva,
I have downloaded the Hindi dependency parser.
Can you please tell me the annotation scheme used for dependency grammar?
What the column number 5 represents (it is after POS tag column and has numbers 0, 2,4, 9 etc).
for dependency tag set i referred one of the sites of IIIT.

Hindi POS tagset

Hi Siva,

From Where I can download Hindi POS tagset??


Hindi POS tagger

Hi Siva,
Please help me executing Hindi POS tagger. I have downloaded python (for win32) but not quite clear how to run the tagger. I have mailed you for the same as well. Sorry for that.
Vipul Dalal

It only works on Linux PC. I

It only works on Linux PC. I did not test it on Windows.

Wordnet in python

Thanks Siva for porting Hindi Wordnet to Python. It has made my work easier.

Word Sense Disambiguation for Telugu

Can anyone know about Telugu Wordnet or any other resources like Telugu to Telugu dictionary for Word Sense Disambiguation for Telugu?


I would like to hear from you. Users are welcome to add comments on the tools, provide suggestions, and report bugs.


list of nouns in Tamil.

Hi Siva,

I think it's really helpful that you have set up a website and have shared the tools you have developed.

I am developing a tool to do word level translations for Tamil. List of cooccurring nouns is one of the features I am using. I wasn't able to find a good POS tagger to do this task. Then, I was looking for a list of nouns in Tamil. For this, I was looking for an online dictionary from which I can extract it. But, I only found web interfaces where I can query individual words. Do you know any place where I can get a good list? Thanks for your time.


Tamil Wordlist and POS Tagger

Hi Arun,

Thanks for your encouraging words. Regarding your question, wordlist and co-occurring words of any word are easy to obtain using Sketch Engine. Currently co-occurring words are not compiled but I can compile them if it is highly important e.g co-occurring words of "house" are

The wordlist functionality is ready for now. You need to register an account with Sketch Engine to access this wordlist. You can register for a free account and access the wordlist, but I will appreciate if you buy an account if the results are beneficial. Sketch Engine has invested many human hours to collect these corpora. A sample wordlist for top 100 words look like this in SKetch Engine

Register for an account and login into the Sketch Engine. Use the corpus named TamilWaC. After selecting the corpus, click wordlist functionality on the left hand menu. You can get a list of words.

Regarding POS tagger, I have not built one for Tamil yet (in my todo list). You can download IIIT tagger here and give it a try Let me know how it goes.

If you need Tamil corpus you can download Wikimedia corpus but you need to clean it a bit Since your motive is to build translation lists, Tamil Wiktionary may also help

all the best,

HI,do you have any resource

HI,do you have any resource for tulu language?
thank you.

Tulu POS Tagger

You may try Kannada resources for Tulu. To collect Tulu corpus, you can try BooTCat

Site Counter