Previous projects
AnnoMarket, Cloud-based Text Annotation Marketplace
A two-year European project funded by the European Commission through the Seventh Research Framework Program (FP7-SME) and under Project No 296322. The project started in June 2012.
AnnoMarket aims to revolutionize the text annotation market, by delivering an affordable, open marketplace for pay-as-you-go, cloud-based extraction resources and services, in multiple languages.
Project website…Our contribution:
- Large scale web crawls and focus web crawls.
- Providing multi-lingual web corpora resources.
DOPA, Data Supply Chains for Pools, Services and Analytics in Economics and Finance
A two-year European project funded by the European Commission through the Seventh Research Framework Program (FP7-SME) and under Project No 296448. The project started in May 2012.
DOPA achieves breakthroughs in large scale, high-quality information sourcing and processing on a distributed platform: it helps bring together related data from disparate sources thanks to automated entity linkage while making sens of this wealth of data through visualization tools.
Project website… Our contribution:
- Creating large-scale multilingual time series of economic and financial information from the Web and online social networks.
- Strictly respecting legal framework, Intellectual Property and Privacy rights.
- Selecting active sources (RSS feeds, news, forums, blogs,...) that focus on various aspects relevant in this domain (E-Reputation, customers opinion, stock trading, company news etc.)
- Achieving large-scale coverage without forgoing the quality of the data.
TrendMiner, Large-scale, Cross-lingual Trend Mining and Summarization of Real-time Media Streams
A three-year European project funded by the European Commission through the Seventh Research Framework Program (FP7-ICT) and under Project No 287863. The project started in November 2011.
The goal of this project is to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarization of large-scale stream media. This is achieved through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. Scalability and affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media.
Project website…Our contribution:
- Providing a scalable infrastructure to partners, with support for integration and experiment.
- Designing and developing an application-aware crawler mechanism for social media.