Research Notes

Introduce the problem of cross-lingual semantic relatedness.
Map words in different languages to their concept vectors (concepts are Wikipedia articles, similar to Gabrilovich and Markovitch, AAAI 2007). Map concepts using Wikipedia langlinks. The vectors are now comparable.

WS-30 (G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes 1998) and WS-353 (L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing search in context: the concept revisited. WWW 2001) semantic similarity evaluation sets were translated and used for evaluation.
Detailed description of creation of data sets and evaluation sets (including instructions given to annotators).
Also devise an "obvious" baseline which illustrates where their method helps.

"... semantic relatedness is a more general concept than similarity; similar entities are semantically related by virtue of their similarity (bank–trust company), but dissimilar entities may also be semantically related by lexical relationships such as meronymy (car–wheel) and antonymy (hot–cold), ..."
"the more-general idea of relatedness, not just similarity ... not just ... relationships in WordNet ... but also associative and ad hoc relationships ... just about any kind of functional relation or frequent association in the world. ... Morris and Hirst (2004, 2005) have termed these non-classical lexical semantic relationships ... shown in experiments ... that around 60% of the lexical relationships ... in a text are of this nature."

"[A study found that] the words sex, drinking, and drag racing were semantically related, by all being “dangerous behaviors”, in the context of an article about teenagers emulating what they see in movies. Thus lexical semantic relatedness is sometimes constructed in context and cannot always be determined purely from an a priori lexical resource ... However, [such] ad hoc relationships accounted for only a small fraction of those reported [in the study]"
"... in this paper the term concept will refer to a particular sense of a given word. ... when we say that two words are “similar”, ... they denote similar concepts; ... [and] not ... similar~~ity of~~ distributional or co-occurrence behavior of the words, ...While similarity of denotation might be inferred from similarity of distributional or co-occurrence behavior (Dagan 2000; Weeds 2003), the two are distinct ideas."

"All approaches to measuring semantic relatedness that use a lexical resource construe the resource, in one way or another, as a network or directed graph, and then base the measure of relatedness on properties of paths in this graph." (Compare with probabilistic approaches.)

"Weeds (2003), in her study of 15 distributional-similarity measures, found that words distributionally similar to hope (noun) included confidence, dream, feeling, and desire; Lin (1998b) found pairs such as earnings–profit, biggest–largest, nylon–silk, and pill–tablet. ... if two concepts are similar or related, it is likely that their role in the world will be similar, so similar things will be said about them, and so the contexts of occurrence of the corresponding words will be similar. And conversely (albeit with less certainty), if the contexts of occurrence of two words are similar, then similar things are being said about each, so they are playing similar roles in the world and hence are semantically similar — at least to the extent of these roles."
Differences between the two

"while semantic relatedness is inherently a relation on concepts, ... distributional similarity is a (corpus-dependent) relation on words."
"whereas semantic relatedness is symmetric, distributional similarity is a potentially asymmetrical relationship. If distributional similarity is conceived of as substitutability, ... then asymmetries arise ...; for example, ... fruit substitutes for apple better than apple substitutes for fruit."
"Imbalance in the corpus and data sparseness is an additional source of anomalous results even for “good” measures."

"severe limitation on the data means that this was not really a fair test of the principles underlying the [distributional] hypothesis; a fair test would require data allowing the comparison of any ... two words in WordNet, but obtaining such [corpus] data for less-frequent words ... would be a massive task."

"theoretical examination .. for ... mathematical properties thought desirable, such as whether it is a metric ..., whether it has singularities, whether its parameter-projections are smooth functions, ..."
"comparison with human judgments. Insofar as human judgments of similarity and relatedness are deemed to be correct by definition, this clearly gives the best assessment of the “goodness” of a measure."
"evaluate ... with respect to ... performance in the framework of a particular application."

"While comparison with human judgments is the ideal way to evaluate a measure of similarity or semantic relatedness, in practice the tiny amount of data available (and only for similarity, not relatedness) is quite inadequate." and "Finkelstein [-353] ... is still very small, and, as Jarmasz and Szpakowicz (2003) point out, is culturally and politically biased."
"... often what we are really interested in is the relationship between the concepts for which the words are merely surrogates; the human judgments that we need are of the relatedness of word-senses, not words. So the experimental situation would need to set up contexts that bias the sense selection for each target word and yet don’t bias the subject’s judgment of their a priori relationship, an almost self-contradictory situation." (and hence justifying extrinsic evaluation)
Application to malapropism detection

Tuesday, February 5, 2013