Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Evgeniy Gabrilovich and Shaul Markovitch. IJCAI 2007
- Comments
- Classifies work in the field into three main directions:
- text fragments as bags of words in vector space (distributional similarity)
- text fragments as bags of concepts (using Latent Semantic Analysis)
- using lexical resources (Wordnet etc.) (also use concepts but not based on world knowledge, and work only at the level of individual words; also, it relies on human-organized knowledge)
- Distinguishes similarity and relatedness, e.g.
- "Budanitsky and Hirst [2006] argued that the notion of relatedness is more general than that of similarity, as the former subsumes many different kind of specific relations, including meronymy, antonymy, functional association, and others. They further maintained that computational linguistics applications often require measures of relatedness rather than the more narrowly defined measures of similarity."
- "When only the similarity relation is considered, using lexical resources was often successful enough, ... However, when the entire language wealth is considered in an attempt to capture more general semantic relatedness, lexical techniques yield substantially inferior results"
- Interesting papers
- Statistical methods
- S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS 1990
- Lillian Lee. Measures of distributional similarity. ACL 1999.
- Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. Similarity-based models of word cooccurrence probabilities. ML 1999
- Expert-based methods (e.g. using Wordnet)
- Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. CL 2006
- Mario Jarmasz. Roget’s thesaurus as a lexical resource for natural language processing. Master's Thesis 2003
- Others
- Justin Zobel and Alistair Moffat. Exploring the similarity space. ACM SIGIR Forum, 1998. [for justifying the cosine metric?]
BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness. Roberto Navigli and Simone Paolo Ponzetto. AAAI 2012
- Key claims
- Our approach is based on ... a ... multilingual knowledge base, which is used to compute semantic graphs in a variety of languages. ... information from these graphs is then combined to produce ... disambiguated translations [which] are connected by means of strong semantic relations.
- ... what we explore here is the joint contribution obtained by using a multilingual knowledge base for this [(cross language semantic relatedness)] task.
- Given a pair of words in two languages we use BabelNet to collect their translations, compute semantic graphs in a variety of languages, and then combine the empirical evidence from these different languages by intersecting their respective graphs.
- Interesting papers
- Hassan, S., and Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge. EMNLP 2009 (introduces knowledge-based approach to computing semantic relatedness across different languages)
- Agirre, E.; Alfonseca, E.; Hall, K.; Kravalova, J.; Pasca, M.; Soroa, A. A study on similarity and relatedness using distributional and WordNet-based approaches. NAACL-HLT 2009 (seminal finding that knowledge-based approaches to semantic relatedness can compete and even outperform distributional methods in a cross-lingual setting)
- Nastase, V.; Strube, M.; B ̈ rschinger, B.; Zirn, C.; and Elghafari, A. WikiNet: A very large scale multi-lingual concept network. LREC 2010.
- de Melo, G., and Weikum, G. MENTA: Inducing multilingual taxonomies from Wikipedia. CIKM 2010
WikiRelate! Computing Semantic Relatedness Using Wikipedia. Michael Strube and Simone Paolo Ponzetto. AAAI 2006
- Key ideas
- Use the Wikipedia category hierarchy instead of the WordNet hierarchy.
No comments:
Post a Comment