## Monday, February 11, 2013

#### Comparison of Semantic Similarity for Different Languages Using the Google n-gram Corpus and Second-Order Co-occurrence Measures. Colette Joubarne, Diana Inkpen. Advances in AI 2011

• Claims
• many languages without sufficient corpora to achieve valid measures of semantic similarity.
• manually-assigned similarity scores from one language can be transferred to another language,
• automatic word similarity measure based on second-order co-occurrences in the Google n-gram corpus, for English, German, and French

#### Semantic similarity estimation from multiple ontologies. Montserrat Batet, David Sánchez, Aida Valls, Karina Gibert. Appl Intell 2013

• Claims
• enable similarity estimation across multiple ontologies
• solve missing values, when partial knowledge is available
• capture the strongest semantic evidence that results in the most accurate similarity assessment, when dealing with overlapping knowledge
• Key ideas
• Consider  sub-cases
• both concepts appear in one ontology
• concepts appear in different ontologies
• missing concepts
• etc.
• requires a taxonomy structure (other relations not useful?)
• Related work
• mapping the local terms of distinct ontologies into an existent single one
• creating a new ontology by integrating existing ones
• compute the similarity between terms as a function of some ontological features
• ontologies are connected by a new imaginary root node
• matching concept labels of different ontologies
• graph-based ontology alignment ... by means of path-based
similarity measures.
• combines path length and common specificity.
• Experiments
• general purpose and biomedical benchmarks of word pairs
• baseline: related works in multi-ontology similarity assessment.

## Friday, February 8, 2013

#### A Graph-Theoretic Framework for Semantic Distance. Vivian Tsang, Suzanne Stevenson. CL 2010

• Problem: similarity of texts (not single words)
• Claims
• "[we do] integration of distributional and ontological factors in measuring semantic distance between two sets of concepts (mapped from two texts) [within a network flow formalism]"
• Key ideas
• "Our goal is to measure the distance between two subgraphs (representing two texts to be compared), taking into account both the ontological distance between the component concepts and their frequency distributions. To achieve this, we measure the amount of “effort” required to transform one profile to match the other graphically: The more similar they are, the less effort it takes to transform one into the other. (This view is similar to that motivating the use of “earth mover’s distance” in computer vision [Levina and Bickel 2001].)"
• "[our] notion of semantic distance as transport effort of concept frequency over the relations (edges) of an ontology differs  significantly from ... [using] concept vectors of frequency. ... our approach can [compare] texts that use related but non-equivalent concepts." (seems to the main argument in favor of graph-based iterative methods)
• "viewed as a supply–demand problem, in which we find the minimum cost flow (MCF) from the supply profile to the demand profile ... Each edge ... has a cost ... Each node [has a] supply ... [or] demand ... The goal is to find a flow from supply nodes to demand nodes that satisfies the supply/demand constraints of each node and minimizes the overall “transport cost.”"
• Requires an ontology
• "Distributional" refers to term frequencies within the compared text (not in some corpus)
• Interesting papers
• Using network flows
• Pang, Bo and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL 2004
• Barzilay, Regina and Mirella Lapata. Collective content selection for concept-to-text generation. HLT/EMNLP 2005
• Mihalcea, Rada. Unsupervised large-vocabulary word sense disambiguation with graph-based algorithms for sequence data labeling. HLT/EMNLP 2005

#### Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation. Xianpei Han Jun Zhao. ACL 2010

• Claims
• "proposes a reliable semantic relatedness measure between concepts ... which can capture both the explicit semantic relations between concepts and the implicit semantic knowledge embedded in [multiple] graphs and networks."
• Key ideas
• “two concepts are semantic related if they are both semantic related to the neighbor concepts of each other”
• Interesting papers
• Amigo, E., Gonzalo, J., Artiles, J. and Verdejo, F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 2008

#### Disambiguating Identity Web References using Web 2.0 Data and Semantics. Matthew Rowe, Fabio Ciravegna. Journal of Web Semantics 2010

• Use ideas such as "Average First-Passage Time" of a graph
• Interesting papers
• L. Lovasz, Random walks on graphs: A survey. Combinatorics 1993
• M. Saerens, F. Fouss, L. Yen, P. Dupont, The principal components analysis of a graph, and its relationships to spectral clustering. ECML 2004

## Thursday, February 7, 2013

#### A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia. Ziqi Zhang, Anna Lisa Gentile, Lei Xia, José Iria, Sam Chapman. LREC 2010

• Key ideas
• Model many kinds of features on a graph
• Convert edge weights into probabilities; use p(t)(i|j) to model relatedness (where t is the number of steps in the walk)
• Interesting papers
• Hughes, T., Ramage, D. Lexical semantic relatedness with random graph walks. EMNLP-CONLL 2007
• Weale, T., Brew, C., and Fosler-Lussier, E. (2009). Using the Wiktionary Graph Structure for Synonym Detection. ACL-IJCNLP 2009 (applies page rank to sem-rel)

#### Lexical Semantic Relatedness with Random GraphWalks. Thad Hughes and Daniel Ramage. EMNLP-CONLL 2007

• Key ideas
• "[We] compute word-specific probability distributions over how often a particle visits all other nodes in the graph when “starting” from a specific word. We compute the relatedness of two words as the similarity of their stationary distributions."
• Construct graph-cum-Markov chain (stochastic matrix) for each edge-type. Add the matrices and normalize. (Is there a case for weighted combination?)
• "to compute [word-specific] stationary distribution ... we [start at the word, and] at every step of the walk, we will return to [it] with probability \beta . Intuitively, ... nodes close to [the word] should be given higher weight ... also guarantees that the stationary distribution exists and is unique (Bremaud, 1999)."
• "because [the matrix] is sparse, each iteration of the above computation is ... linear in the total number of edges. Introducing an edge type that is dense would dramatically increase running time."
• Claims
• "the application of random walk Markov chain theory to measuring lexical semantic relatedness"
• "[past work] has only considered using one stationary distribution per specially-constructed graph as a probability estimator ... we [use] distinct stationary distributions resulting from random walks centered at different positions in the word graph."
• Evaluation
• "For consistency with previous literature, we use rank correlation (Spearman’s coefficient) ... because [it models the] ordering of the scores ... many applications that make use of lexical relatedness scores (e.g. as features to a machine learning algorithm) would better be served by scores on a linear scale with human judgments."
• Interesting papers
• Julie Weeds and David Weir. Co-occurrence retrieval: A flexible framework for lexical distributional similarity. CL 2005 (survey of co-occurrence-based distributional similarity measures of semantic relatedness)
• P. Berkhin. A survey on pagerank computing. Internet Mathematics 2005
• Lillian Lee. On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics 2001 (surveys divergence measures for distributions)

#### Random Walks for Text Semantic Similarity. Daniel Ramage, Anna N. Rafferty, and Christopher D. Manning. ACL-IJCNLP 2009

• Claims
• "random graph walk algorithm for semantic similarity of texts [(not words)] ... [faster than a] mathematically equivalent model based on summed similarity judgments of individual words.
• "walks effectively aggregate information over multiple types of links and multiple input words"
• Key ideas
• "determine an initial distribution ... for a [given text passage,] ... simulate a random walk ... we compare the resulting stationary distributions from two such walks".
• "We compute the stationary distribution ... with probability \beta of returning to the initial distribution at each time step as the limit as t goes to \infty. ... the resulting stationary distribution can be factored as the weighted sum of the stationary distributions of each word represented in the initial distribution ... [it can be shown that] the stationary distribution is ... the weighted sum of the stationary distribution of each underlying word"
• Evaluation
• "We add the additional baseline of always guessing the majority class label because the data set is skewed toward “paraphrase”."
• Another measure for comparing distributions: the dice measure extended to weighted features (Curran, 2004).
• "The random walk framework smoothes an initial distribution of words into a much larger lexical space. In one sense, this is similar to the technique of query expansion used in information retrieval ... this expansion is analogous to taking only a single step of the random walk."
• "The random walk framework can be used to evaluate changes to lexical resources ... this provides a more semantically relevant evaluation of updates to a resource than, for example, counting how many new words or links between words have been added."
• Interesting papers
• T. H. Haveliwala. Topic-sensitive pagerank. WWW 2002 (computational steps are similar; see teleport vector)
• R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. AAAI 2006 (includes a survey of semantic similarity measures)
• E. Minkov and W. W. Cohen. Learning to rank typed graph walks: Local and global approaches. WebKDD and SNA-KDD 2007.

#### WikiWalk: Random walks on Wikipedia for Semantic Relatedness. Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko Agirre, Aitor Soroa. ACL-IJCNLP 2009

• Claims
• "we base our random walk algorithms after the ones described in (Hughes and Ramage, 2007) and (Agirre et al., 2009), but use Wikipedia-based methods to construct the graph."
• "evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes"
• "previous work on Wikipedia has made limited use of [link] information,"
• "Our results show the importance of pruning the dictionary" (i.e. words from Wikipedia chosen as nodes)
• Key ideas
• computation of relatedness for a word pair has three steps
• each input word/text ... [is converted to a] teleport vector.
• Personalized PageRank is ... [run to get] the stationary distribution
• the stationary distributions [are compared]
• Interesting papers
• E. Agirre and A. Soroa. Personalizing pagerank for word sense disambiguation. EACL 2009
• M. D. Lee, B. Pincombe, and M. Welsh. An empirical evaluation of models of text document similarity. Cognitive Science Society, 2005 (data set)

## Wednesday, February 6, 2013

#### A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa. NAACL-HLT 2009

• Claims
• a supervised combination of [our methods] yields the best published results on all datasets
• we pioneer cross-lingual similarity
• A discussion on the differences between learning similarity and relatedness scores
• Cross lingual similarity
• Wordnet-based: Since it a multilingual aligned WordNet, the monolingual methods are directly applicable
• Distributional: translate target into source language (using machine translation) and then use monolingual method
• Results
• [among distributional methods], the method based on context windows provides the best results for similarity, and the bag-of-
words representation [does best] for relatedness.
• upper-bounding combined performance: "we took the output[s] of  three systems ... we implemented an oracle that chooses  [among the outputs] ... the rank that is most similar to the rank  of the pair in the gold-standard. ... gives as an indication of the  correlations that could be achieved by choosing for each pair the rank output by the best classifier for that pair."
• On evaluation
• "Pearson correlation suffers much when the scores of two systems are not linearly correlated, [e.g.] due to the different nature of the techniques applied ... Spearman correlation provides an evaluation metric that is independent of such data-dependent transformations"

#### Lexical Co-occurrence, Statistical Significance, and Word Association. Dipak L. Chaudhari, Om P. Damani, Srivatsan Laxman. EMNLP 2011

• Claims
• We propose a new measure of word association based on a new  notion of statistical significance for lexical co-occurrences.
• We ... construct a significance test that allows us to detect different kinds of co-occurrences within a single unified framework
• Key ideas
• Existing co-occurrence measures ... assume that each document is drawn from a multinomial distribution based on global unigram frequencies ... [The problem with this] is
• the overbearing influence of the unigram  frequencies on the detection of word associations. For example, the association between anomochilidae (dwarf pipe snakes) and snake could go undetected ... since less than 0.1% of the pages containing snake also contained anomochilidae.
• the expected span of a word pair is very sensitive to the associated unigram frequencies: the expected span of a word pair composed of low frequency unigrams is much larger than that with high frequency unigrams. This is contrary to how word associations appear in language, where semantic relationships manifest with small  inter-word distances irrespective of the underlying unigram  distributions.
• To solve the above, "we employ a null model that represents each document as a bag of words"
• A random permutation of the associated bag of words gives a  linear representation for the document.
• If the observed span distribution of a word-pair resembles that under the (random permutation) null model, then the relation  between the words is not strong enough for one word to  influence the placement of the other.
• Experiments
• New data sets (from the "free association" problem)
• Edinburg (Kiss et al.,1973), Florida (Nelson et al., 1980), Goldfarb-Halpern (Goldfarb and Halpern, 1984), Kent (Kent and
Rosanoff, 1910), Minnesota (Russell and Jenkins, 1954), White-Abrams (White and Abrams, 2004)
• The basic approach in this kind of modeling: "We need a null hypothesis that can account for an observed  co-occurrence as a pure chance event and this in-turn requires a corpus generation model. Documents in a corpus can be assumed to be generated independent of each other."
• Comprehensive list of co-occurrence measures
• CSR, CWCD (Washtell and Markert, 2009), Dice (Dice, 1945), LLR (Dunning, 1993), Jaccard (Jaccard, 1912), Ochiai (Janson and Vegelius,1981), Pearson’s X^2 test, PMI (Church and Hanks, 1989), SCI (Washtell and Markert, 2009), T-test

#### Harnessing different knowledge sources to measure semantic relatedness under a uniform model. Ziqi Zhang, Anna Lisa Gentile, Fabio Ciravegna. EMNLP 2011

• Claims
• introduces a method of harnessing different knowledge sources  under a uniform model for measuring semantic relatedness between words or concepts.
• we identify two issues that have not been addressed in the previous works. First, existing works typically employ a single knowledge  source of semantic evidence ... Second, ... evaluated in general domains only ... evaluation ... in specific domains is ... important.
• Key ideas
• knowledge from different sources are mapped into a graph representation in 3 stages, and a general graph-based (random walk) algorithm is used for final relatedness computation
• Random walk: "formalizes the idea that taking successive steps along the paths in a graph, the “easier” it is to arrive at a target node starting from a source node, the more related the two nodes are ... P(t)(j|i) [is] the probability of reaching other nodes from a starting
node on the graph after t steps ... [following] Rowe and Ciravegna (2010) ... set t=2 in order to preserve locally connected nodes ... Effectively, this formalizes the notion that two concepts related to a third concept is also semantically related [similar to] Patwardhan and Pedersen (2006)".
• The stages involve “feature integration” as merging feature types from different knowledge sources into single types of features based on their similarity in semantics.
• "the difference between cross-source feature combination and integration is that the former introduces more types of features, whereas the latter retains same number of feature types but increases feature values for each type. Both have the effect of establishing additional path (via features) between concepts, but in different ways."
• "Zhang et al. (2010) argue that ... different knowledge sources may complement each other."
• "evaluation of [semantic relatedness] methods in specific domains is increasingly important" (They also evaluate on (biomedical) domain-specific data sets)
• "Wikipedia ... [has] reasonable coverage of many domains (Holloway et al., 2007; Halavais, 2008)."
• Classifies SR approaches
• path based: use wordnet-like semantic network
• Information Content (IC) based: use taxonomy (a special case of network) and a corpus
• statistical
• distributional
• co-occurrence-based
• hybrid: combine the above, e.g. Riensche et al. (2007), Pozo et al. (2008), Han and Zhao (2010). Note: the idea of combining methods is distinguished from the idea of combining knowledge sources.
• Evaluation
• Data sets: general (Rubenstein and Goodenough, Miller and Charles, Finkelstein et al.) and biomedical (Petrakis et al. (2006), Pedersen et al. (2006))
• Measure: Spearman correlation ("better metric ... (Zesch and Gurevych, 2010)")
• "some datasets have a ... low sample size, ... correlation values [could have] occurred by chance. Therefore, we measure the statistical significance of correlation by computing the p-value for the correlation values"
• Interesting papers
• Random walk for semantic relatedness
• Zhang, Z., Gentile, A., Xia, L., Iria, J., Chapman, S. A random graph walk based approach to compute semantic relatedness using  knowledge from Wikipedia. LREC 2010. (compare this with Manning's paper)
• Rowe, M., Ciravegna, F. Disambiguating identity web references using Web 2.0 data and semantics. The Journal of Web Semantics 2010
• Hybrid methods
• Tsang, V., Stevenson, S. A graph-theoretic framework for semantic distance. CL 2010
• Han, X., Zhao, J. Structural semantic relatedness: a knowledge-based method to named entity disambiguation. ACL 2010
• Patwardhan, S., Pedersen, T. 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. EACL 2006 ("second-order context")

## Tuesday, February 5, 2013

#### Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. Samer Hassan and Rada Mihalcea. EMNLP 2009

• Key Ideas
• Introduce the problem of cross-lingual semantic relatedness.
• Map words in different languages to their concept vectors (concepts are Wikipedia articles, similar to Gabrilovich and Markovitch, AAAI 2007). Map concepts using Wikipedia langlinks. The vectors are now comparable.
• Wikipedia data accessed using Wikipedia Miner.
• Experiments
• WS-30 (G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes 1998) and WS-353 (L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: the concept revisited. WWW 2001) semantic similarity evaluation sets were translated and used for evaluation.
• Detailed description of creation of data sets and evaluation sets (including instructions given to annotators).
• Also devise an "obvious" baseline which illustrates where their method helps.

#### EvaluatingWordNet-based Measures of Lexical Semantic Relatedness. Alexander Budanitsky, Graeme Hirst. CL 2006

• Distinguishing semantic similarity and relatedness
• "... semantic relatedness is a more general concept than similarity; similar entities are semantically related by virtue of their similarity (bank–trust company), but dissimilar entities may also be semantically related by lexical relationships such as meronymy (car–wheel) and antonymy (hot–cold), ..."
• "the more-general idea of relatedness, not just similarity ... not just ... relationships in WordNet ... but also associative and ad  hoc relationships ... just about any kind of functional relation or frequent association in the world. ... Morris and Hirst (2004, 2005) have termed these non-classical lexical semantic  relationships ... shown in experiments ... that around 60% of the lexical relationships ... in a text are of this nature."
• "[A study found that] the words sex, drinking, and drag racing were semantically related, by all being “dangerous behaviors”, in the context of an article about teenagers emulating what they see in movies. Thus lexical semantic relatedness is sometimes constructed  in context and cannot always be determined purely from an a priori lexical resource ... However, [such] ad hoc relationships accounted for only a small fraction of those reported [in the study]"
• "... in this paper the term concept will refer to a particular sense of a  given word. ... when we say that two words are “similar”, ... they denote similar concepts; ... [and] not ... similarity of distributional or co-occurrence behavior of the words, ...While similarity of denotation might be inferred from similarity of distributional or co-occurrence behavior (Dagan 2000; Weeds 2003), the two are distinct ideas."
• "All approaches to measuring semantic relatedness that use a lexical  resource construe the resource, in one way or another, as a network  or directed graph, and then base the measure of relatedness on  properties of paths in this graph." (Compare with probabilistic approaches.)
• Relating semantic relatedness and distributional similarity
• "Weeds (2003), in her study of 15 distributional-similarity measures,  found that words distributionally similar to hope (noun) included  confidence, dream, feeling, and desire; Lin (1998b) found pairs such  as earnings–profit, biggest–largest, nylon–silk, and pill–tablet. ... if two concepts are similar or related, it is likely that their role in the  world will be similar, so similar things will be said about them, and so the contexts of occurrence of the corresponding words will be similar.  And conversely (albeit with less certainty), if the contexts of  occurrence of two words are similar, then similar things are being  said about each, so they are playing similar roles in the world and  hence are semantically similar — at least to the extent of these roles."
• Differences between the two
•  "while semantic relatedness is inherently a relation on concepts, ... distributional similarity is a (corpus-dependent) relation on words."
• "whereas semantic relatedness is symmetric, distributional  similarity is a potentially asymmetrical relationship. If  distributional similarity is conceived of as substitutability, ... then asymmetries arise ...; for example, ... fruit substitutes for apple better than apple substitutes for fruit."
• "Imbalance in the corpus and data sparseness is an additional  source of anomalous results even for “good” measures."
• Evaluation issues
• "severe limitation on the data means that this was not really a fair test of the principles underlying the [distributional] hypothesis; a fair test  would require data allowing the comparison of any ... two words in WordNet, but obtaining such [corpus] data for less-frequent words ... would be a massive task."
• Lists 3 kinds of evaluation
• "theoretical examination .. for ... mathematical properties thought desirable, such as whether it is a metric ..., whether it has singularities, whether its parameter-projections are smooth  functions, ..."
• "comparison with human judgments. Insofar as human  judgments of similarity and relatedness are deemed to be  correct by definition, this clearly gives the best assessment of  the “goodness” of a measure."
• "evaluate ... with respect to ... performance in the framework of a particular application."
• "While comparison with human judgments is the ideal way to  evaluate a measure of similarity or semantic relatedness, in practice  the tiny amount of data available (and only for similarity, not  relatedness) is quite inadequate." and "Finkelstein [-353] ... is still very small, and, as Jarmasz and Szpakowicz (2003) point out, is culturally and politically biased."
• "... often what we are really interested in is the relationship between the concepts for which the words are merely surrogates; the human  judgments that we need are of the relatedness of word-senses, not  words. So the experimental situation would need to set up contexts  that bias the sense selection for each target word and yet don’t bias the subject’s judgment of their a priori relationship, an almost  self-contradictory situation." (and hence justifying extrinsic evaluation)
• Application to malapropism detection

## Monday, February 4, 2013

#### Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Evgeniy Gabrilovich and Shaul Markovitch. IJCAI 2007

• Classifies work in the field into three main directions:
• text fragments as bags of words in vector space (distributional similarity)
• text fragments as bags of concepts (using Latent Semantic Analysis)
• using lexical resources (Wordnet etc.) (also use concepts but not based on world knowledge, and work only at the level of individual words; also, it relies on human-organized knowledge)
• Distinguishes similarity and relatedness, e.g.
• "Budanitsky and Hirst [2006] argued that the notion of relatedness is more general than that of similarity, as the former subsumes many different kind of specific relations, including meronymy, antonymy, functional association, and others. They further maintained that computational linguistics applications often require measures of relatedness rather than the more narrowly defined measures of similarity."
• "When only the similarity relation is considered, using lexical resources was often successful enough, ... However, when the entire language wealth is considered in an attempt to capture more general semantic relatedness, lexical techniques yield substantially inferior results"
• Interesting papers
• Statistical methods
• S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. JASIS 1990
• Lillian Lee. Measures of distributional similarity. ACL 1999.
• Ido Dagan, Lillian Lee, and Fernando C. N. Pereira. Similarity-based models of word cooccurrence probabilities. ML 1999
• Expert-based methods (e.g. using Wordnet)
• Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. CL 2006
• Mario Jarmasz. Roget’s thesaurus as a lexical resource for natural language processing. Master's Thesis 2003
• Others
• Justin Zobel and Alistair Moffat. Exploring the similarity space. ACM SIGIR Forum, 1998. [for justifying the cosine metric?]

#### BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness. Roberto Navigli and Simone Paolo Ponzetto. AAAI 2012

• Key claims
• Our approach is based on ... a ... multilingual knowledge base, which is used to compute semantic graphs in a variety of languages. ... information from these graphs is then combined to produce ... disambiguated translations [which] are connected by means of strong semantic relations.
• ... what we explore here is the joint contribution obtained by using a multilingual knowledge base for this [(cross language semantic relatedness)] task.
• Given a pair of words in two languages we use BabelNet to collect their translations, compute semantic graphs in a variety of languages, and then combine the empirical evidence from these different languages by intersecting their respective graphs.
• Interesting papers
• Hassan, S., and Mihalcea, R. Cross-lingual semantic relatedness using encyclopedic knowledge. EMNLP 2009 (introduces knowledge-based approach to computing semantic relatedness across different languages)
• Agirre, E.; Alfonseca, E.; Hall, K.; Kravalova, J.; Pasca, M.; Soroa, A. A study on similarity and relatedness using distributional and WordNet-based approaches. NAACL-HLT 2009 (seminal finding that knowledge-based approaches to semantic relatedness can compete and even outperform distributional methods in a cross-lingual setting)
• Nastase, V.; Strube, M.; B ̈ rschinger, B.; Zirn, C.; and Elghafari, A. WikiNet: A very large scale multi-lingual concept network. LREC 2010.
• de Melo, G., and Weikum, G. MENTA: Inducing multilingual taxonomies from Wikipedia. CIKM 2010

#### WikiRelate! Computing Semantic Relatedness Using Wikipedia. Michael Strube and Simone Paolo Ponzetto. AAAI 2006

• Key ideas
• Use the Wikipedia category hierarchy instead of the WordNet hierarchy.

#### Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Philip Resnik. JAIR 1999

• Key Ideas
• Semantic similarity as a special case of semantic relatedness (relation is IS-A)
• For example, car-gasoline are related, but car-bicycle are similar.
• "... measures of similarity ... are seldom accompanied by an independent characterization of the phenomenon they are measuring ... The worth of a similarity measure is in its fidelity to human behavior, as measured by predictions of human performance on experimental tasks."
• Note
• polysemy as a special case of homonymy (the different meanings have a common aspect, and are hence called senses')
• For example, man' is polysemous (species, gender, adult) while `bank' is homonymous (river-edge, money-place).

#### An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. Yuhua Li, Zuhair A. Bandar, and David McLean. IEEE KDE 2003

• Key Ideas
• "Similarity between two words is often represented by similarity between concepts associated with the two words."
• "Evidence from psychological experiments demonstrate that similarity is context-dependent and may be asymmetric ... Experimental results investigating the effects of asymmetry suggest that the average difference in ratings for a word pair is less than 5 percent"

## Friday, February 1, 2013

#### Learning Discriminative Projections for Text Similarity Measures. Wen-tau Yih, Kristina Toutanova, John C. Platt, Christopher Meek. CoNLL 2011

• Claims:
• We propose a new projection learning framework, Similarity Learning via Siamese Neural Network (S2Net), to discriminatively learn the concept vector representations of input text objects.
• Comment:
• Input is pairs of words that are known to be similar/dissimilar.

#### Web-Scale Distributional Similarity and Entity Set Expansion. Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas. EMNLP 2009

• Claims
• propose an algorithm for large-scale term similarity computation
• Lists applications of semantic similarity: word classification, word sense disambiguation, context-spelling correction, fact extraction, semantic role labeling, query expansion, textual advertising
• they apply the learned similarity matrix to the task of automatic
set expansion

#### Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches. Shuming Shi, Huibin Zhang, Xiaojie Yuan, Ji-Rong Wen. COLING 2010

• Claims
• perform an empirical comparison of [previous research work] [on semantic class mining]
• propose a frequency-based rule to select appropriate approaches for different types of terms.