Research Notes: February 2013

Monday, February 11, 2013

Comparison of Semantic Similarity for Different Languages Using the Google n-gram Corpus and Second-Order Co-occurrence Measures. Colette Joubarne, Diana Inkpen. Advances in AI 2011

Claims

many languages without sufficient corpora to achieve valid measures of semantic similarity.
manually-assigned similarity scores from one language can be transferred to another language,
automatic word similarity measure based on second-order co-occurrences in the Google n-gram corpus, for English, German, and French

Semantic similarity estimation from multiple ontologies. Montserrat Batet, David Sánchez, Aida Valls, Karina Gibert. Appl Intell 2013

Claims

enable similarity estimation across multiple ontologies
solve missing values, when partial knowledge is available
capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge

Key ideas

Consider sub-cases

both concepts appear in one ontology
concepts appear in different ontologies
missing concepts
etc.

requires a taxonomy structure (other relations not useful?)

Related work

mapping the local terms of distinct ontologies into an existent single one
creating a new ontology by integrating existing ones
compute the similarity between terms as a function of some ontological features
ontologies are connected by a new imaginary root node
matching concept labels of different ontologies
graph-based ontology alignment ... by means of path-based
similarity measures.
combines path length and common specificity.

Experiments

general purpose and biomedical benchmarks of word pairs
baseline: related works in multi-ontology similarity assessment.

Friday, February 8, 2013

A Graph-Theoretic Framework for Semantic Distance. Vivian Tsang, Suzanne Stevenson. CL 2010

Problem: similarity of texts (not single words)
Claims

"[we do] integration of distributional and ontological factors in measuring semantic distance between two sets of concepts (mapped from two texts) [within a network flow formalism]"

Key ideas

"Our goal is to measure the distance between two subgraphs (representing two texts to be compared), taking into account both the ontological distance between the component concepts and their frequency distributions. To achieve this, we measure the amount of “effort” required to transform one profile to match the other graphically: The more similar they are, the less effort it takes to transform one into the other. (This view is similar to that motivating the use of “earth mover’s distance” in computer vision [Levina and Bickel 2001].)"
"[our] notion of semantic distance as transport effort of concept frequency over the relations (edges) of an ontology differs significantly from ... [using] concept vectors of frequency. ... our approach can [compare] texts that use related but non-equivalent concepts." (seems to the main argument in favor of graph-based iterative methods)
"viewed as a supply–demand problem, in which we find the minimum cost flow (MCF) from the supply profile to the demand profile ... Each edge ... has a cost ... Each node [has a] supply ... [or] demand ... The goal is to find a flow from supply nodes to demand nodes that satisfies the supply/demand constraints of each node and minimizes the overall “transport cost.”"

Comments

Requires an ontology
"Distributional" refers to term frequencies within the compared text (not in some corpus)

Interesting papers

Using network flows

Pang, Bo and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL 2004
Barzilay, Regina and Mirella Lapata. Collective content selection for concept-to-text generation. HLT/EMNLP 2005

Mihalcea, Rada. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. HLT/EMNLP 2005

Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation. Xianpei Han Jun Zhao. ACL 2010

Claims

"proposes a reliable semantic relatedness measure between concepts ... which can capture both the explicit semantic relations between concepts and the implicit semantic knowledge embedded in [multiple] graphs and networks."

Key ideas

“two concepts are semantic related if they are both semantic related to the neighbor concepts of each other”

Interesting papers

Amigo, E., Gonzalo, J., Artiles, J. and Verdejo, F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 2008

Disambiguating Identity Web References using Web 2.0 Data and Semantics. Matthew Rowe, Fabio Ciravegna. Journal of Web Semantics 2010

Comments

Use ideas such as "Average First-Passage Time" of a graph

Interesting papers

L. Lovasz, Random walks on graphs: A survey. Combinatorics 1993
M. Saerens, F. Fouss, L. Yen, P. Dupont, The principal components analysis of a graph, and its relationships to spectral clustering. ECML 2004

Thursday, February 7, 2013

A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia. Ziqi Zhang, Anna Lisa Gentile, Lei Xia, José Iria, Sam Chapman. LREC 2010

Key ideas

Model many kinds of features on a graph
Convert edge weights into probabilities; use p(t)(i|j) to model relatedness (where t is the number of steps in the walk)

Interesting papers

Hughes, T., Ramage, D. Lexical semantic relatedness with random graph walks. EMNLP-CONLL 2007
Weale, T., Brew, C., and Fosler-Lussier, E. (2009). Using the Wiktionary Graph Structure for Synonym Detection. ACL-IJCNLP 2009 (applies page rank to sem-rel)

Lexical Semantic Relatedness with Random GraphWalks. Thad Hughes and Daniel Ramage. EMNLP-CONLL 2007

Key ideas

"[We] compute word-specific probability distributions over how often a particle visits all other nodes in the graph when “starting” from a specific word. We compute the relatedness of two words as the similarity of their stationary distributions."
Construct graph-cum-Markov chain (stochastic matrix) for each edge-type. Add the matrices and normalize. (Is there a case for weighted combination?)
"to compute [word-specific] stationary distribution ... we [start at the word, and] at every step of the walk, we will return to [it] with probability \beta . Intuitively, ... nodes close to [the word] should be given higher weight ... also guarantees that the stationary distribution exists and is unique (Bremaud, 1999)."
"because [the matrix] is sparse, each iteration of the above computation is ... linear in the total number of edges. Introducing an edge type that is dense would dramatically increase running time."

Claims

"the application of random walk Markov chain theory to measuring lexical semantic relatedness"
"[past work] has only considered using one stationary distribution per specially-constructed graph as a probability estimator ... we [use] distinct stationary distributions resulting from random walks centered at different positions in the word graph."

Evaluation

"For consistency with previous literature, we use rank correlation (Spearman’s coefficient) ... because [it models the] ordering of the scores ... many applications that make use of lexical relatedness scores (e.g. as features to a machine learning algorithm) would better be served by scores on a linear scale with human judgments."

Interesting papers

Julie Weeds and David Weir. Co-occurrence retrieval: A flexible framework for lexical distributional similarity. CL 2005 (survey of co-occurrence-based distributional similarity measures of semantic relatedness)
P. Berkhin. A survey on pagerank computing. Internet Mathematics 2005
Lillian Lee. On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics 2001 (surveys divergence measures for distributions)

Random Walks for Text Semantic Similarity. Daniel Ramage, Anna N. Rafferty, and Christopher D. Manning. ACL-IJCNLP 2009

Claims

"random graph walk algorithm for semantic similarity of texts [(not words)] ... [faster than a] mathematically equivalent model based on summed similarity judgments of individual words.
"walks effectively aggregate information over multiple types of links and multiple input words"

Key ideas

"determine an initial distribution ... for a [given text passage,] ... simulate a random walk ... we compare the resulting stationary distributions from two such walks".
"We compute the stationary distribution ... with probability \beta of returning to the initial distribution at each time step as the limit as t goes to \infty. ... the resulting stationary distribution can be factored as the weighted sum of the stationary distributions of each word represented in the initial distribution ... [it can be shown that] the stationary distribution is ... the weighted sum of the stationary distribution of each underlying word"

Evaluation

"We add the additional baseline of always guessing the majority class label because the data set is skewed toward “paraphrase”."
Another measure for comparing distributions: the dice measure extended to weighted features (Curran, 2004).

Comments

"The random walk framework smoothes an initial distribution of words into a much larger lexical space. In one sense, this is similar to the technique of query expansion used in information retrieval ... this expansion is analogous to taking only a single step of the random walk."
"The random walk framework can be used to evaluate changes to lexical resources ... this provides a more semantically relevant evaluation of updates to a resource than, for example, counting how many new words or links between words have been added."

Interesting papers

T. H. Haveliwala. Topic-sensitive pagerank. WWW 2002 (computational steps are similar; see teleport vector)
R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. AAAI 2006 (includes a survey of semantic similarity measures)
E. Minkov and W. W. Cohen. Learning to rank typed graph walks: Local and global approaches. WebKDD and SNA-KDD 2007.

WikiWalk: Random walks on Wikipedia for Semantic Relatedness. Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko Agirre, Aitor Soroa. ACL-IJCNLP 2009

Claims

"we base our random walk algorithms after the ones described in (Hughes and Ramage, 2007) and (Agirre et al., 2009), but use Wikipedia-based methods to construct the graph."
"evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes"
"previous work on Wikipedia has made limited use of [link] information,"
"Our results show the importance of pruning the dictionary" (i.e. words from Wikipedia chosen as nodes)

Key ideas

computation of relatedness for a word pair has three steps

each input word/text ... [is converted to a] teleport vector.
Personalized PageRank is ... [run to get] the stationary distribution
the stationary distributions [are compared]

Interesting papers

E. Agirre and A. Soroa. Personalizing pagerank for word sense disambiguation. EACL 2009
M. D. Lee, B. Pincombe, and M. Welsh. An empirical evaluation of models of text document similarity. Cognitive Science Society, 2005 (data set)

Wednesday, February 6, 2013

A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa. NAACL-HLT 2009

Claims

a supervised combination of [our methods] yields the best published results on all datasets
we pioneer cross-lingual similarity
A discussion on the differences between learning similarity and relatedness scores

Cross lingual similarity

Wordnet-based: Since it a multilingual aligned WordNet, the monolingual methods are directly applicable
Distributional: translate target into source language (using machine translation) and then use monolingual method

Results

[among distributional methods], the method based on context windows provides the best results for similarity, and the bag-of-
words representation [does best] for relatedness.
upper-bounding combined performance: "we took the output[s] of three systems ... we implemented an oracle that chooses [among the outputs] ... the rank that is most similar to the rank of the pair in the gold-standard. ... gives as an indication of the correlations that could be achieved by choosing for each pair the rank output by the best classifier for that pair."

On evaluation

"Pearson correlation suffers much when the scores of two systems are not linearly correlated, [e.g.] due to the different nature of the techniques applied ... Spearman correlation provides an evaluation metric that is independent of such data-dependent transformations"

Lexical Co-occurrence, Statistical Significance, and Word Association. Dipak L. Chaudhari, Om P. Damani, Srivatsan Laxman. EMNLP 2011

Claims

We propose a new measure of word association based on a new notion of statistical significance for lexical co-occurrences.
We ... construct a significance test that allows us to detect different kinds of co-occurrences within a single unified framework

Key ideas

Existing co-occurrence measures ... assume that each document is drawn from a multinomial distribution based on global unigram frequencies ... [The problem with this] is

the overbearing influence of the unigram frequencies on the detection of word associations. For example, the association between anomochilidae (dwarf pipe snakes) and snake could go undetected ... since less than 0.1% of the pages containing snake also contained anomochilidae.
the expected span of a word pair is very sensitive to the associated unigram frequencies: the expected span of a word pair composed of low frequency unigrams is much larger than that with high frequency unigrams. This is contrary to how word associations appear in language, where semantic relationships manifest with small inter-word distances irrespective of the underlying unigram distributions.

To solve the above, "we employ a null model that represents each document as a bag of words"

A random permutation of the associated bag of words gives a linear representation for the document.
If the observed span distribution of a word-pair resembles that under the (random permutation) null model, then the relation between the words is not strong enough for one word to influence the placement of the other.

Experiments

New data sets (from the "free association" problem)

Edinburg (Kiss et al.,1973), Florida (Nelson et al., 1980), Goldfarb-Halpern (Goldfarb and Halpern, 1984), Kent (Kent and
Rosanoff, 1910), Minnesota (Russell and Jenkins, 1954), White-Abrams (White and Abrams, 2004)

Comments

The basic approach in this kind of modeling: "We need a null hypothesis that can account for an observed co-occurrence as a pure chance event and this in-turn requires a corpus generation model. Documents in a corpus can be assumed to be generated independent of each other."
Comprehensive list of co-occurrence measures

CSR, CWCD (Washtell and Markert, 2009), Dice (Dice, 1945), LLR (Dunning, 1993), Jaccard (Jaccard, 1912), Ochiai (Janson and Vegelius,1981), Pearson’s X^2 test, PMI (Church and Hanks, 1989), SCI (Washtell and Markert, 2009), T-test

Harnessing different knowledge sources to measure semantic relatedness under a uniform model. Ziqi Zhang, Anna Lisa Gentile, Fabio Ciravegna. EMNLP 2011

Claims

introduces a method of harnessing different knowledge sources under a uniform model for measuring semantic relatedness between words or concepts.
we identify two issues that have not been addressed in the previous works. First, existing works typically employ a single knowledge source of semantic evidence ... Second, ... evaluated in general domains only ... evaluation ... in specific domains is ... important.

Key ideas

knowledge from different sources are mapped into a graph representation in 3 stages, and a general graph-based (random walk) algorithm is used for final relatedness computation
Random walk: "formalizes the idea that taking successive steps along the paths in a graph, the “easier” it is to arrive at a target node starting from a source node, the more related the two nodes are ... P(t)(j|i) [is] the probability of reaching other nodes from a starting
node on the graph after t steps ... [following] Rowe and Ciravegna (2010) ... set t=2 in order to preserve locally connected nodes ... Effectively, this formalizes the notion that two concepts related to a third concept is also semantically related [similar to] Patwardhan and Pedersen (2006)".
The stages involve “feature integration” as merging feature types from different knowledge sources into single types of features based on their similarity in semantics.

"the difference between cross-source feature combination and integration is that the former introduces more types of features, whereas the latter retains same number of feature types but increases feature values for each type. Both have the effect of establishing additional path (via features) between concepts, but in different ways."

Comments

"Zhang et al. (2010) argue that ... different knowledge sources may complement each other."
"evaluation of [semantic relatedness] methods in specific domains is increasingly important" (They also evaluate on (biomedical) domain-specific data sets)
"Wikipedia ... [has] reasonable coverage of many domains (Holloway et al., 2007; Halavais, 2008)."
Classifies SR approaches

path based: use wordnet-like semantic network
Information Content (IC) based: use taxonomy (a special case of network) and a corpus
statistical

distributional
co-occurrence-based

hybrid: combine the above, e.g. Riensche et al. (2007), Pozo et al. (2008), Han and Zhao (2010). Note: the idea of combining methods is distinguished from the idea of combining knowledge sources.

Evaluation

Data sets: general (Rubenstein and Goodenough, Miller and Charles, Finkelstein et al.) and biomedical (Petrakis et al. (2006), Pedersen et al. (2006))
Measure: Spearman correlation ("better metric ... (Zesch and Gurevych, 2010)")
"some datasets have a ... low sample size, ... correlation values [could have] occurred by chance. Therefore, we measure the statistical significance of correlation by computing the p-value for the correlation values"

Interesting papers

Random walk for semantic relatedness

Zhang, Z., Gentile, A., Xia, L., Iria, J., Chapman, S. A random graph walk based approach to compute semantic relatedness using knowledge from Wikipedia. LREC 2010. (compare this with Manning's paper)

Rowe, M., Ciravegna, F. Disambiguating identity web references using Web 2.0 data and semantics. The Journal of Web Semantics 2010

Hybrid methods

Tsang, V., Stevenson, S. A graph-theoretic framework for semantic distance. CL 2010
Han, X., Zhao, J. Structural semantic relatedness: a knowledge-based method to named entity disambiguation. ACL 2010

Patwardhan, S., Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. EACL 2006 ("second-order context")

Tuesday, February 5, 2013

Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. Samer Hassan and Rada Mihalcea. EMNLP 2009

Key Ideas

Introduce the problem of cross-lingual semantic relatedness.
Map words in different languages to their concept vectors (concepts are Wikipedia articles, similar to Gabrilovich and Markovitch, AAAI 2007). Map concepts using Wikipedia langlinks. The vectors are now comparable.

Comments

Wikipedia data accessed using Wikipedia Miner.
Experiments

WS-30 (G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes 1998) and WS-353 (L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: the concept revisited. WWW 2001) semantic similarity evaluation sets were translated and used for evaluation.
Detailed description of creation of data sets and evaluation sets (including instructions given to annotators).
Also devise an "obvious" baseline which illustrates where their method helps.

EvaluatingWordNet-based Measures of Lexical Semantic Relatedness. Alexander Budanitsky, Graeme Hirst. CL 2006

Comments on the problem

Distinguishing semantic similarity and relatedness

"... semantic relatedness is a more general concept than similarity; similar entities are semantically related by virtue of their similarity (bank–trust company), but dissimilar entities may also be semantically related by lexical relationships such as meronymy (car–wheel) and antonymy (hot–cold), ..."
"the more-general idea of relatedness, not just similarity ... not just ... relationships in WordNet ... but also associative and ad hoc relationships ... just about any kind of functional relation or frequent association in the world. ... Morris and Hirst (2004, 2005) have termed these non-classical lexical semantic relationships ... shown in experiments ... that around 60% of the lexical relationships ... in a text are of this nature."

"[A study found that] the words sex, drinking, and drag racing were semantically related, by all being “dangerous behaviors”, in the context of an article about teenagers emulating what they see in movies. Thus lexical semantic relatedness is sometimes constructed in context and cannot always be determined purely from an a priori lexical resource ... However, [such] ad hoc relationships accounted for only a small fraction of those reported [in the study]"
"... in this paper the term concept will refer to a particular sense of a given word. ... when we say that two words are “similar”, ... they denote similar concepts; ... [and] not ... similar~~ity of~~ distributional or co-occurrence behavior of the words, ...While similarity of denotation might be inferred from similarity of distributional or co-occurrence behavior (Dagan 2000; Weeds 2003), the two are distinct ideas."

"All approaches to measuring semantic relatedness that use a lexical resource construe the resource, in one way or another, as a network or directed graph, and then base the measure of relatedness on properties of paths in this graph." (Compare with probabilistic approaches.)

Relating semantic relatedness and distributional similarity

"Weeds (2003), in her study of 15 distributional-similarity measures, found that words distributionally similar to hope (noun) included confidence, dream, feeling, and desire; Lin (1998b) found pairs such as earnings–profit, biggest–largest, nylon–silk, and pill–tablet. ... if two concepts are similar or related, it is likely that their role in the world will be similar, so similar things will be said about them, and so the contexts of occurrence of the corresponding words will be similar. And conversely (albeit with less certainty), if the contexts of occurrence of two words are similar, then similar things are being said about each, so they are playing similar roles in the world and hence are semantically similar — at least to the extent of these roles."
Differences between the two

"while semantic relatedness is inherently a relation on concepts, ... distributional similarity is a (corpus-dependent) relation on words."
"whereas semantic relatedness is symmetric, distributional similarity is a potentially asymmetrical relationship. If distributional similarity is conceived of as substitutability, ... then asymmetries arise ...; for example, ... fruit substitutes for apple better than apple substitutes for fruit."
"Imbalance in the corpus and data sparseness is an additional source of anomalous results even for “good” measures."

Evaluation issues

"severe limitation on the data means that this was not really a fair test of the principles underlying the [distributional] hypothesis; a fair test would require data allowing the comparison of any ... two words in WordNet, but obtaining such [corpus] data for less-frequent words ... would be a massive task."

Comments on experiments

Lists 3 kinds of evaluation

"theoretical examination .. for ... mathematical properties thought desirable, such as whether it is a metric ..., whether it has singularities, whether its parameter-projections are smooth functions, ..."
"comparison with human judgments. Insofar as human judgments of similarity and relatedness are deemed to be correct by definition, this clearly gives the best assessment of the “goodness” of a measure."
"evaluate ... with respect to ... performance in the framework of a particular application."

"While comparison with human judgments is the ideal way to evaluate a measure of similarity or semantic relatedness, in practice the tiny amount of data available (and only for similarity, not relatedness) is quite inadequate." and "Finkelstein [-353] ... is still very small, and, as Jarmasz and Szpakowicz (2003) point out, is culturally and politically biased."
"... often what we are really interested in is the relationship between the concepts for which the words are merely surrogates; the human judgments that we need are of the relatedness of word-senses, not words. So the experimental situation would need to set up contexts that bias the sense selection for each target word and yet don’t bias the subject’s judgment of their a priori relationship, an almost self-contradictory situation." (and hence justifying extrinsic evaluation)
Application to malapropism detection

Monday, February 4, 2013

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Evgeniy Gabrilovich and Shaul Markovitch. IJCAI 2007

Comments

Classifies work in the field into three main directions:

text fragments as bags of words in vector space (distributional similarity)
text fragments as bags of concepts (using Latent Semantic Analysis)
using lexical resources (Wordnet etc.) (also use concepts but not based on world knowledge, and work only at the level of individual words; also, it relies on human-organized knowledge)

Distinguishes similarity and relatedness, e.g.

"Budanitsky and Hirst [2006] argued that the notion of relatedness is more general than that of similarity, as the former subsumes many different kind of specific relations, including meronymy, antonymy, functional association, and others. They further maintained that computational linguistics applications often require measures of relatedness rather than the more narrowly defined measures of similarity."
"When only the similarity relation is considered, using lexical resources was often successful enough, ... However, when the entire language wealth is considered in an attempt to capture more general semantic relatedness, lexical techniques yield substantially inferior results"

Interesting papers

Statistical methods

S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS 1990
Lillian Lee. Measures of distributional similarity. ACL 1999.
Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. Similarity-based models of word cooccurrence probabilities. ML 1999

Expert-based methods (e.g. using Wordnet)

Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. CL 2006
Mario Jarmasz. Roget’s thesaurus as a lexical resource for natural language processing. Master's Thesis 2003

Others

Justin Zobel and Alistair Moffat. Exploring the similarity space. ACM SIGIR Forum, 1998. [for justifying the cosine metric?]

BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness. Roberto Navigli and Simone Paolo Ponzetto. AAAI 2012

Key claims

Our approach is based on ... a ... multilingual knowledge base, which is used to compute semantic graphs in a variety of languages. ... information from these graphs is then combined to produce ... disambiguated translations [which] are connected by means of strong semantic relations.
... what we explore here is the joint contribution obtained by using a multilingual knowledge base for this [(cross language semantic relatedness)] task.
Given a pair of words in two languages we use BabelNet to collect their translations, compute semantic graphs in a variety of languages, and then combine the empirical evidence from these different languages by intersecting their respective graphs.

Interesting papers

Hassan, S., and Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge. EMNLP 2009 (introduces knowledge-based approach to computing semantic relatedness across different languages)
Agirre, E.; Alfonseca, E.; Hall, K.; Kravalova, J.; Pasca, M.; Soroa, A. A study on similarity and relatedness using distributional and WordNet-based approaches. NAACL-HLT 2009 (seminal finding that knowledge-based approaches to semantic relatedness can compete and even outperform distributional methods in a cross-lingual setting)
Nastase, V.; Strube, M.; B ̈ rschinger, B.; Zirn, C.; and Elghafari, A. WikiNet: A very large scale multi-lingual concept network. LREC 2010.
de Melo, G., and Weikum, G. MENTA: Inducing multilingual taxonomies from Wikipedia. CIKM 2010

WikiRelate! Computing Semantic Relatedness Using Wikipedia. Michael Strube and Simone Paolo Ponzetto. AAAI 2006

Key ideas

Use the Wikipedia category hierarchy instead of the WordNet hierarchy.

Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Philip Resnik. JAIR 1999

Key Ideas
Comments

Semantic similarity as a special case of semantic relatedness (relation is IS-A)

For example, car-gasoline are related, but car-bicycle are similar.

"... measures of similarity ... are seldom accompanied by an independent characterization of the phenomenon they are measuring ... The worth of a similarity measure is in its fidelity to human behavior, as measured by predictions of human performance on experimental tasks."

Note

polysemy as a special case of homonymy (the different meanings have a common aspect, and are hence called `senses')

For example, `man' is polysemous (species, gender, adult) while `bank' is homonymous (river-edge, money-place).

An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. Yuhua Li, Zuhair A. Bandar, and David McLean. IEEE KDE 2003

Key Ideas
Comments

"Similarity between two words is often represented by similarity between concepts associated with the two words."
"Evidence from psychological experiments demonstrate that similarity is context-dependent and may be asymmetric ... Experimental results investigating the effects of asymmetry suggest that the average difference in ratings for a word pair is less than 5 percent"

Friday, February 1, 2013

Learning Discriminative Projections for Text Similarity Measures. Wen-tau Yih, Kristina Toutanova, John C. Platt, Christopher Meek. CoNLL 2011

Claims:

We propose a new projection learning framework, Similarity Learning via Siamese Neural Network (S2Net), to discriminatively learn the concept vector representations of input text objects.

Comment:

Input is pairs of words that are known to be similar/dissimilar.

Web-Scale Distributional Similarity and Entity Set Expansion. Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas. EMNLP 2009

Claims

propose an algorithm for large-scale term similarity computation

Comments

Lists applications of semantic similarity: word classification, word sense disambiguation, context-spelling correction, fact extraction, semantic role labeling, query expansion, textual advertising
they apply the learned similarity matrix to the task of automatic
set expansion

Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches. Shuming Shi, Huibin Zhang, Xiaojie Yuan, Ji-Rong Wen. COLING 2010

Claims

perform an empirical comparison of [previous research work] [on semantic class mining]
propose a frequency-based rule to select appropriate approaches for different types of terms.

Comments

Jargon: semantically similar words are also called "peer terms or coordinate terms".
States that "DS [distributional similarity] approaches basically exploit second-order co-occurrences to discover strongly associated concepts." How is that?
Extrinsic evaluation by set expansion

A Mixture Model with Sharing for Lexical Semantics. Joseph Reisinger, Raymond Mooney. EMNLP 2010

Claims

Multi-prototype representations [are good for] words with several unrelated meanings (e.g. bat and club), but are not suitable for representing the common ... structure [shared across senses] found in highly polysemous words such as line or run. We introduce a mixture model for capturing this---mixture of a Dirichlet Process clustering model and a background model.
we derive a multi-prototype representation capable of capturing varying degrees of sharing between word senses, and demonstrate its effectiveness in the word-relatedness task in the presence of highly polysemous words.

Comments

Positions lexical semantics as the umbrella task with subtasks such as word relatedness and selectional preferences