Research Notes

Tuesday, April 23, 2013

Addressing polysemy in bilingual lexicon extraction from comparable corpora. Darja Fiser, Nikola Ljubesic, Ozren Kubelka. LREC 2012

Idea

Get source word senses (using sense tagger), construct context vectors for each sense, and then find target translation.

To compute sense-specific vectors: split occurrences of source word into groups, and build context vectors separately for each group.
Translate context vectors into target language using seed lexicon

Combine info from several taggers to improve accuracy.

Take only those words where the tags of both taggers agree.

Comments (on the classical approach, using a context vector of words)

"The main idea behind [the classical] approach is the assumption that a source word and its translation appear in similar contexts in their respective languages, so that in order to identify them their contexts are compared via a seed dictionary (Fung, 1998; Rapp, 1999)"
"[the classical approach] approach gives good results for a specialized domain even though the seed dictionary is quite small (Fiser et al., 2011)."
"... for closely related languages, ... the same quality of the results can be achieved by exploiting the lexical overlap between the languages instead of using a seed dictionary (Ljubesic and Fiser, 2011).

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)