Research Notes

Model many kinds of features on a graph
Convert edge weights into probabilities; use p(t)(i|j) to model relatedness (where t is the number of steps in the walk)

Hughes, T., Ramage, D. Lexical semantic relatedness with random graph walks. EMNLP-CONLL 2007
Weale, T., Brew, C., and Fosler-Lussier, E. (2009). Using the Wiktionary Graph Structure for Synonym Detection. ACL-IJCNLP 2009 (applies page rank to sem-rel)

"[We] compute word-specific probability distributions over how often a particle visits all other nodes in the graph when “starting” from a specific word. We compute the relatedness of two words as the similarity of their stationary distributions."
Construct graph-cum-Markov chain (stochastic matrix) for each edge-type. Add the matrices and normalize. (Is there a case for weighted combination?)
"to compute [word-specific] stationary distribution ... we [start at the word, and] at every step of the walk, we will return to [it] with probability \beta . Intuitively, ... nodes close to [the word] should be given higher weight ... also guarantees that the stationary distribution exists and is unique (Bremaud, 1999)."
"because [the matrix] is sparse, each iteration of the above computation is ... linear in the total number of edges. Introducing an edge type that is dense would dramatically increase running time."

"the application of random walk Markov chain theory to measuring lexical semantic relatedness"
"[past work] has only considered using one stationary distribution per specially-constructed graph as a probability estimator ... we [use] distinct stationary distributions resulting from random walks centered at different positions in the word graph."

"For consistency with previous literature, we use rank correlation (Spearman’s coefficient) ... because [it models the] ordering of the scores ... many applications that make use of lexical relatedness scores (e.g. as features to a machine learning algorithm) would better be served by scores on a linear scale with human judgments."

Julie Weeds and David Weir. Co-occurrence retrieval: A flexible framework for lexical distributional similarity. CL 2005 (survey of co-occurrence-based distributional similarity measures of semantic relatedness)
P. Berkhin. A survey on pagerank computing. Internet Mathematics 2005
Lillian Lee. On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics 2001 (surveys divergence measures for distributions)

"random graph walk algorithm for semantic similarity of texts [(not words)] ... [faster than a] mathematically equivalent model based on summed similarity judgments of individual words.
"walks effectively aggregate information over multiple types of links and multiple input words"

"determine an initial distribution ... for a [given text passage,] ... simulate a random walk ... we compare the resulting stationary distributions from two such walks".
"We compute the stationary distribution ... with probability \beta of returning to the initial distribution at each time step as the limit as t goes to \infty. ... the resulting stationary distribution can be factored as the weighted sum of the stationary distributions of each word represented in the initial distribution ... [it can be shown that] the stationary distribution is ... the weighted sum of the stationary distribution of each underlying word"

"We add the additional baseline of always guessing the majority class label because the data set is skewed toward “paraphrase”."
Another measure for comparing distributions: the dice measure extended to weighted features (Curran, 2004).

"The random walk framework smoothes an initial distribution of words into a much larger lexical space. In one sense, this is similar to the technique of query expansion used in information retrieval ... this expansion is analogous to taking only a single step of the random walk."
"The random walk framework can be used to evaluate changes to lexical resources ... this provides a more semantically relevant evaluation of updates to a resource than, for example, counting how many new words or links between words have been added."

T. H. Haveliwala. Topic-sensitive pagerank. WWW 2002 (computational steps are similar; see teleport vector)
R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. AAAI 2006 (includes a survey of semantic similarity measures)
E. Minkov and W. W. Cohen. Learning to rank typed graph walks: Local and global approaches. WebKDD and SNA-KDD 2007.

"we base our random walk algorithms after the ones described in (Hughes and Ramage, 2007) and (Agirre et al., 2009), but use Wikipedia-based methods to construct the graph."
"evaluates methods for building the graph, including link selection strategies, and two methods for representing input texts as distributions over the graph nodes"
"previous work on Wikipedia has made limited use of [link] information,"
"Our results show the importance of pruning the dictionary" (i.e. words from Wikipedia chosen as nodes)

E. Agirre and A. Soroa. Personalizing pagerank for word sense disambiguation. EACL 2009
M. D. Lee, B. Pincombe, and M. Welsh. An empirical evaluation of models of text document similarity. Cognitive Science Society, 2005 (data set)

Thursday, February 7, 2013