Gianluca Demartini
building Data Scientists since 2014

From People to Entities: Typed Search in the Enterprise and the Web

Gianluca Demartini, Ph.D. on April 6th 2011

The full Ph.D. thesis is available for download here. (downloaded 9036 times since April 2011)

Since January 2014: Also available as printed book in the Studies on the Semantic Web book series: purchase here

The slides presented during the Ph.D. defense are available for download here. (downloaded 3823 times since April 2011)

Abstract

The exponential growth of digital information available in Enterprises and on the Web creates the need for search tools that can respond to the most sophisticated informational needs. Retrieving relevant documents is not enough anymore and finding entities rather than just textual resources provides great support to the final user both on the Web and in Enterprises. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web pages. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic. Aggregation of information spread over different documents is a key aspect in this process. Finding experts is a problem mostly considered in the Enterprise setting where teams for new projects need to be built and problems need to be solved by the right persons. In the first part of the thesis, we propose a model for expert finding based on the well consolidated vector space model for Information Retrieval and investigate its effectiveness. We can define Entity Retrieval by generalizing the expert finding problem to any entity. In Entity Retrieval the goal is to rank entities according to their relevance to a query (e.g., "Countries where I can pay in Euro"); the set of entities to be ranked is assumed to be loosely defined by a generic category, given in the query itself (e.g., countries), or by some example entities (e.g., Italy, Germany, France). In the second part of the thesis, we investigate different methods based on Semantic Web and Natural Language Processing techniques for solving these tasks both in Wikipedia and, generally, on the Web. Evaluation is a critical aspect of Information Retrieval. We contributed to the field of Information Retrieval evaluation by organizing an evaluation initiative for Entity Retrieval. Opinions and other relevant information about entities can be provided by different sources in different contexts. News articles report about events where entities are involved. In such setting the temporal dimension is critical as news stories develop over time and new entities appear in the story and others are not relevant anymore. In the third part of this thesis, we study the problem of Entity Retrieval for news applications and the importance of the news trail history (i.e., past related articles) to determine the relevant entities in current articles. We also study opinion evolution about entities. In the last years, the blogosphere has become a vital part of the Web, covering a variety of different points of view and opinions on political and event-related topics such as immigration, election campaigns, or economic developments. We propose a method for automatically extracting public opinion about specific entities from the blogosphere. In summary, we develop methods to find entities that satisfy the user's need aggregating knowledge from different sources and we study how entity relevance and opinions evolve over time.


PhD tag cloud
© 2011 - Contact: Gianluca Demartini   L3S Valid XHTML 1.0 Transitional CSS Valido!

Gianluca Demartini, Ph.D.
School of Information Technology and Electrical Engineering,
University of Queensland

GP South Building, Staff House Road
St Lucia
QLD 4072 Australia

Office: +61 7 336 58325
demartini@acm.org

Photo of Gianluca Demartini

Dr. Gianluca Demartini is an Associate Professor in Data Science at the University of Queensland, School of Electrical Engineering and Computer Science. His main research interests are Information Retrieval, Semantic Web, and Human Computation. His research has been supported by the Australian Research Council (ARC), the Swiss National Science Foundation (SNSF), the EU H2020 framework program, the UK Engineering and Physical Sciences Research Council (EPSRC), Facebook, Google, and the Wikimedia Foundation. He received Best Paper Awards at the ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR) in 2023, AAAI Conference on Human Computation and Crowdsourcing (HCOMP) in 2018 and at the European Conference on Information Retrieval (ECIR) in 2016, the Best Short Paper Award at ECIR in 2020 and the Best Demo Award at the International Semantic Web Conference (ISWC) in 2011. He has published more than 200 peer-reviewed scientific publications including papers at major venues such as WWW, ACM SIGIR, VLDBJ, ISWC, and ACM CHI.
He has given several invited talks, tutorials, and keynotes at a number of academic conferences (e.g., ISWC, ICWSM, WebScience, and the RuSSIR Summer School), companies (e.g., Facebook), and Dagstuhl seminars. He is a senior member of the ACM since 2020, an ACM Distinguished Speaker since 2015, and has been a TEDx speaker in 2019.
He serves as associate editor for the Transactions on Graph Data and Knowledge (TGDK) Journal and as an editorial board member for the Information Retrieval journal. He is a steering committee member for the AAAI HCOMP conference. He was PC Chair for the ACM Conference on Research and Development in Information Retrieval (SIGIR) in 2022. He was General co-Chair for the ACM International Conference on Information and Knowledge Management (CIKM) 2021. He was Crowdsourcing and Human Computation Track co-Chair at WWW 2018 and co-chair for the Human Computation and Crowdsourcing Track at ESWC 2015. He has been Senior Program Committee member for, among others, the ACM Conference on Research and Development in Information Retrieval (SIGIR), the ACM Web Search and Data Mining (WSDM) Conference, the International Joint Conference on Artificial Intelligence (IJCAI), the AAAI Conference on Human Computation and Crowdsourcing (HCOMP), and the International Conference on Web Engineering (ICWE). He co-organized several workshops and tutorials at international conferences as well as the Entity Ranking Track at the Initiative for the Evaluation of XML Retrieval in 2008 and 2009.
Before joining the University of Queensland, he was Lecturer at the University of Sheffield in UK, post-doctoral researcher at the eXascale Infolab at the University of Fribourg in Switzerland, visiting researcher at UC Berkeley, junior researcher at the L3S Research Center in Germany, and intern at Yahoo! Research in Spain. In 2011, he obtained a Ph.D. in Computer Science at the Leibniz University of Hanover focusing on Semantic Search.

From People to Entities: Typed Search in the Enterprise and the Web

Gianluca Demartini, Ph.D. on April 6th 2011

The full Ph.D. thesis is available for download here. (downloaded 9036 times since April 2011)

Since January 2014: Also available as printed book in the Studies on the Semantic Web book series: purchase here

The slides presented during the Ph.D. defense are available for download here. (downloaded 3823 times since April 2011)

Abstract

The exponential growth of digital information available in Enterprises and on the Web creates the need for search tools that can respond to the most sophisticated informational needs. Retrieving relevant documents is not enough anymore and finding entities rather than just textual resources provides great support to the final user both on the Web and in Enterprises. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web pages. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic. Aggregation of information spread over different documents is a key aspect in this process. Finding experts is a problem mostly considered in the Enterprise setting where teams for new projects need to be built and problems need to be solved by the right persons. In the first part of the thesis, we propose a model for expert finding based on the well consolidated vector space model for Information Retrieval and investigate its effectiveness. We can define Entity Retrieval by generalizing the expert finding problem to any entity. In Entity Retrieval the goal is to rank entities according to their relevance to a query (e.g., "Countries where I can pay in Euro"); the set of entities to be ranked is assumed to be loosely defined by a generic category, given in the query itself (e.g., countries), or by some example entities (e.g., Italy, Germany, France). In the second part of the thesis, we investigate different methods based on Semantic Web and Natural Language Processing techniques for solving these tasks both in Wikipedia and, generally, on the Web. Evaluation is a critical aspect of Information Retrieval. We contributed to the field of Information Retrieval evaluation by organizing an evaluation initiative for Entity Retrieval. Opinions and other relevant information about entities can be provided by different sources in different contexts. News articles report about events where entities are involved. In such setting the temporal dimension is critical as news stories develop over time and new entities appear in the story and others are not relevant anymore. In the third part of this thesis, we study the problem of Entity Retrieval for news applications and the importance of the news trail history (i.e., past related articles) to determine the relevant entities in current articles. We also study opinion evolution about entities. In the last years, the blogosphere has become a vital part of the Web, covering a variety of different points of view and opinions on political and event-related topics such as immigration, election campaigns, or economic developments. We propose a method for automatically extracting public opinion about specific entities from the blogosphere. In summary, we develop methods to find entities that satisfy the user's need aggregating knowledge from different sources and we study how entity relevance and opinions evolve over time.


PhD tag cloud
© 2011 - Contact: Gianluca Demartini   L3S Valid XHTML 1.0 Transitional CSS Valido!