My research is focused on Computational Linguistics and Natural Language Processing and more broadly Text Mining, how to automatically extract knowledge and information from text and be able to link it with the real world. This covers a wide range of specific topics, in which I have worked in the past years:
- Question Answering
- Information Extraction
- Information Retrieval
- Anaphora Resolution
- Textual Entailment
- Word Sense Disambiguation
- Sentiment Analysis
- Opinion Mining
All these topics have been covered through different research projects in which I have participated. This is a list of the main projects in which I have been involved, starting with the most recent:
ULM1 (Spinoza Prize project Vossen SPI 30-673, 2014-2017, Vrije University of Amsterdam, The Netherlands)
The goal of the Spinoza project Understanding of language by machines (ULM) is to develop computer models that can assign deeper meaning to language that approximates human understanding and to use these models to automatically read and understand text. Current approaches to natural language understanding consider language as a closed-world of relations between words. Words and text are however highly ambiguous and vague. People do not notice this ambiguity when using language within their social communicative context. This project tries to get a better understanding of the scope and complexity of this ambiguity and how to model the social communicative contexts to help resolving it.
OpeNER (European project 2012-2014, Vrije University of Amsterdam, The Netherlands)
The main goal was to provide a set of ready to use tools to perform some natural language processing tasks, free and easy to adapt for Academia, Research and Small and Medium Enterprise to integrate them in their workflow. More precisely, OpeNER aims to be able to detect and disambiguate entity mentions and perform sentiment analysis and opinion detection on the texts, to be able for example, to extract the sentiment and the opinion of customers about certain resource (e.g. hotels and accommodations) in Web reviews.
DutchSemCor (Dutch national project 2011-2012, University of Tilburg and Vrije University of Amsterdam, The Netherlands)
The goal of DutchSemCor was to deliver a one-million word Dutch corpus that was fully sense-tagged with senses and domain tags from the Cornetto database. 250K words of this corpus were manually tagged. The remainder was automatically tagged using three different word-sense-disambiguation systems (WSD), and was validated by human annotators.
DIIM (Valencian regional project 2009-2011, University of Alicante, Spain)
The topic of the project was the development of intelligent and interactive techniques for text mining.
QALL-ME (European project 2006-2009, University of Alicante, Spain)
The general objective was to establish a shared infrastructure for multilingual and multimodal open domain Question Answering for mobile phones.
R2D2 (Spanish national project 2004-2006, University of Alicante, Spain)
This project was aimed at the evaluation and development of Question Answering and Document Retrieval systems in Multilingual scenarios.
3LB (Spanish national project 2003-2004, University of Alicante, Spain)
The goal of this project was to build three treebanks or syntactic annotated corpora for Spanish, Basque and Catalan. Besides, semantic annotations using WordNet as sense repository and anaphoric annotation of elliptic elements were carried out. Automatic modules were developed to perform these tasks automatically.