A Relational Model of Semantic Similarity between Words using Automatically Extracted Lexical Pattern Clusters from the Web. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka. EMNLP 2009
- Key ideas
- Past work modelled similarity between two words in terms of context overlap, where context consisted of other words known to be closely related to the word (derived either from a corpus or an ontology like wordnet). On the other hand, the authors claim:
- We propose a relational model to compute the semantic similarity between two words. Intuitively, if the relations that exist between a and b are typical relations that hold between synonymous word pairs, then we get a high similarity score for a and b.
- Define relations as patterns such as "X is a Y". For each word pair, compute a feature vector with a weight for each pattern (relation). Do this for a set of seed pairs, and compute a "prototype" vector. For a new word pair, declare similar if its vector is similar to the prototype vector (i.e. is n^T p is high).
- Many patterns represent same/similar relations. They solve this problem at 2 levels:
- They cluster similar patterns together, and use the clusters as features (instead of patterns).
- Since the clusters may also be similar, use a correlation matrix in the dot product, i.e. instead of n^T p, use n^T C p.
- Comments
- Presents a view of the SS task as an integral part of various tasks including synonym generation (same as lexicon induction?), thesaurus generation, WSD, IR---query expansion, cluster labeling, etc.
No comments:
Post a Comment