Constructing Domain Templates with Concept Hierarchy as Background Knowledge

Authors

  • M. Trampuš Jozef Stefan Institute
  • D. Mladenic Jozef Stefan Institute

DOI:

https://doi.org/10.5755/j01.itc.43.4.6899

Keywords:

text mining, open-domain information extraction, schema induction, graph mining

Abstract

In recent years, both academia and the industry have seen a push for converting unstructured data, most commonly text, into structured representations. A relatively poorly explored challenge in this area is that of domain template construction: for a domain, we wish to find the attributes with which texts from that domain can be meaningfully represented. For example, given the domain of news reports on bombing attacks, we would like to identify the existence of concepts like "victim" and "perpetrator". We introduce two new methods for this task, both operating on semantic representations of input data and exploiting the hierarchical organization of features, something not explored in prior art. We evaluate on multiple datasets/domains and achieve performance at least comparable to a state of the art method while additionally identifying fine-grained type information for properties: for example, the bombing attack victim is found to be of type "defender" (policeman, guard, ...). We also provide the first fully documented evaluation methodology, publicly available labeled datasets and golden standard outputs for this research problem, supporting and facilitating future work in the area.

DOI: http://dx.doi.org/10.5755/j01.itc.43.4.6899

Downloads

Published

2014-12-16

Issue

Section

Articles