1. GraphParser: An Ungrounded and Grounded Semantic Parser
  2. English Compound Noun Compositionality Dataset
  3. Hindi POS Tagger
  4. Hindi Dependency Parser
  5. Hindi WordNet in Python
  6. Kannada POS Tagger
  7. Telugu POS Tagger
  8. Indonesian and Malay Tools

GraphParser: An Ungrounded and Grounded Semantic Parser

Graph Parser is a semantic parser which converts Natural Language Sentences/Questions to predicate-argument graphs, which can in-turn be converted to logical queries and executed on Freebase knowledge-graph. Please read more about it in our paper Large-scale Semantic Parsing without Question-Answer Pairs.

Semantic Parser Demo

Download the code and data

Compound Noun Compositionality Dataset

Compositionality Dataset described in Reddy, McCarthy and Manandhar (2011, IJCNLP).
Alternate download link from Diana McCarthy

POS Taggers, Corpora, Lemmatizers, Morph Analyzers for Indian Languages

Most of these tools are developed by the methods described in Reddy and Sharoff (2011, CLIA @ IJCNLP). Some of the taggers are built using cross-lingual resources and some using mono-lingual resources. Please read corresponding README's of each tool for additional information.

This work is supported by Sketch Engine and Intellitext project.

If you need resources for any other Indian languages, please contact me.

Kannada Tools

Download v2.0
Sample Output of the tagger
For the complete corpus described in the paper, please contact me. Alternate download link from Serge Sharoff

Telugu Tools

Download v3.0
Sample Output of the tagger
Project Page: https://bitbucket.org/sivareddyg/telugu-part-of-speech-tagger

Hindi Tools

Download v3.0
Sample Output of the tagger
Project Page: https://bitbucket.org/sivareddyg/hindi-part-of-speech-tagger

Indonesian and Malay morphological analyzer, part-of-speech (POS) tagger, Machine Translation System

With support from Sketch Engine, I have made few contributions to the Apertium Indonesian-Malay language pair. All the tools can be downloaded from http://sourceforge.net/projects/apertium/files/apertium-id-ms/

Hindi WordNet in Python

Download v1.4
Other versions
Demo Program
Project Page: https://bitbucket.org/sivareddyg/python-hindi-wordnet

Hindi Dependency Parser

Download v2.0
Sample Output
Project Page: https://bitbucket.org/sivareddyg/hindi-dependency-parser


Tamil Wordlist and POS Tagger

Hi Arun,

Thanks for your encouraging words. Regarding your question, wordlist and co-occurring words of any word are easy to obtain using Sketch Engine. Currently co-occurring words are not compiled but I can compile them if it is highly important e.g co-occurring words of "house" are http://bit.ly/Hkj2bM

The wordlist functionality is ready for now. You need to register an account with Sketch Engine to access this wordlist. You can register for a free account and access the wordlist, but I will appreciate if you buy an account if the results are beneficial. Sketch Engine has invested many human hours to collect these corpora. A sample wordlist for top 100 words look like this in SKetch Engine http://sivareddy.in/lcl/tamil_wordlist.html

Register for an account and login into the Sketch Engine. Use the corpus named TamilWaC. After selecting the corpus, click wordlist functionality on the left hand menu. You can get a list of words.

Regarding POS tagger, I have not built one for Tamil yet (in my todo list). You can download IIIT tagger here and give it a try http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php Let me know how it goes.

If you need Tamil corpus you can download Wikimedia corpus but you need to clean it a bit http://dumps.wikimedia.org/tawiki/20120321 Since your motive is to build translation lists, Tamil Wiktionary may also help http://dumps.wikimedia.org/tawiktionary/20120323/

all the best,

Tulu POS Tagger

You may try Kannada resources for Tulu. To collect Tulu corpus, you can try BooTCat http://bootcat.sslmit.unibo.it/

Site Counter