Research Notes

Wednesday, January 30, 2013

A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka. EMNLP 2009

Key ideas

Past work modelled similarity between two words in terms of context overlap, where context consisted of other words known to be closely related to the word (derived either from a corpus or an ontology like wordnet). On the other hand, the authors claim:

We propose a relational model to compute the semantic similarity between two words. Intuitively, if the relations that exist between a and b are typical relations that hold between synonymous word pairs, then we get a high similarity score for a and b.

Define relations as patterns such as "X is a Y". For each word pair, compute a feature vector with a weight for each pattern (relation). Do this for a set of seed pairs, and compute a "prototype" vector. For a new word pair, declare similar if its vector is similar to the prototype vector (i.e. is n^T p is high).
Many patterns represent same/similar relations. They solve this problem at 2 levels:

They cluster similar patterns together, and use the clusters as features (instead of patterns).
Since the clusters may also be similar, use a correlation matrix in the dot product, i.e. instead of n^T p, use n^T C p.

Comments

Presents a view of the SS task as an integral part of various tasks including synonym generation (same as lexicon induction?), thesaurus generation, WSD, IR---query expansion, cluster labeling, etc.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)