Tuesday, April 23, 2013

Addressing polysemy in bilingual lexicon extraction from comparable corpora. Darja Fiser, Nikola Ljubesic, Ozren Kubelka. LREC 2012
  • Idea
    • Get source word senses (using sense tagger), construct context vectors for each sense, and then find target translation.
      • To compute sense-specific vectors: split occurrences of source word into groups, and build context vectors separately for each group.
      •  Translate context vectors into target language using seed lexicon
    • Combine info from several taggers to improve accuracy.
      • Take only those words where the tags of both taggers agree.
  • Comments (on the classical approach, using a context vector of words)
    • "The main idea behind [the classical] approach is the assumption that a source word and its translation appear in similar contexts in their respective languages, so that in order to identify them their contexts are compared via a seed dictionary (Fung, 1998; Rapp, 1999)"
    • "[the classical approach] approach gives good results for a specialized domain even though the seed dictionary is quite small (Fiser et al., 2011)."
    • "... for closely related languages, ... the same quality of the results can be achieved by exploiting the lexical overlap between the languages instead of using a seed dictionary (Ljubesic and Fiser, 2011).

No comments:

Post a Comment