Wednesday, January 30, 2013

A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka. EMNLP 2009

  • Key ideas
    • Past work modelled similarity between two words in terms of context overlap, where context consisted of other words known to be closely related to the word (derived either from a corpus or an ontology like wordnet). On the other hand, the authors claim:
      • We propose a relational model to compute the semantic similarity between two words. Intuitively, if the relations that exist between a and b are typical relations that hold between synonymous word pairs, then we get a high similarity score for a and b.
    • Define relations as patterns such as "X is a Y". For each word pair, compute a feature vector with a weight for each pattern (relation). Do this for a set of seed pairs, and compute a "prototype" vector. For a new word pair, declare similar if its vector is similar to the prototype vector (i.e. is n^T p is high).
    • Many patterns represent same/similar relations. They solve this problem at 2 levels:
      • They cluster similar patterns together, and use the clusters as features (instead of patterns).
      • Since the clusters may also be similar, use a correlation matrix in the dot product, i.e. instead of n^T p, use n^T C p.
  • Comments
    • Presents a view of the SS task as an integral part of various tasks including synonym generation (same as lexicon induction?), thesaurus generation, WSD, IR---query expansion, cluster labeling, etc.

No comments:

Post a Comment