Addressing polysemy in bilingual lexicon extraction from comparable corpora. Darja Fiser, Nikola Ljubesic, Ozren Kubelka. LREC 2012
- Idea
- Get source word senses (using sense tagger), construct context vectors for each sense, and then find target translation.
- To compute sense-specific vectors: split occurrences of source word into groups, and build context vectors separately for each group.
- Translate context vectors into target language using seed lexicon
- Combine info from several taggers to improve accuracy.
- Take only those words where the tags of both taggers agree.
- Comments (on the classical approach, using a context vector of words)
- "The main idea behind [the classical] approach is the assumption that a source word and its translation appear in similar contexts in their respective languages, so that in order to identify them their contexts are compared via a seed dictionary (Fung, 1998; Rapp, 1999)"
- "[the classical approach] approach gives good results for a specialized domain even though the seed dictionary is quite small (Fiser et al., 2011)."
- "... for closely related languages, ... the same quality of the results can be achieved by exploiting the lexical overlap between the languages instead of using a seed dictionary (Ljubesic and Fiser, 2011).
No comments:
Post a Comment