Research Notes: June 2013

Sunday, June 30, 2013

Cross-Lingual Latent Topic Extraction. Duo Zhang, Qiaozhu Mei, ChengXiang Zhai. ACL 2010

Key ideas

Input: unaligned document sets in two languages, a bilingual dictionary
Output:

a set of aligned topics (word distributions) in the two languages, that can characterize the shared topics
a topic coverage distribution for each language (coverage of each topic in that language)

Method:

Start with ML objective of PLSA
Add a term to incorporate dictionary constraints (DC)
Dictionary modeled as a weighted bipartite graph (weight = translation probability)
ML using Generalized EM (because DC maximization has no closed-form solution)

We can't maximize DC; instead just try to improve over current

Subscribe to: Posts (Atom)