Sunday, June 30, 2013

Cross-Lingual Latent Topic Extraction. Duo Zhang, Qiaozhu Mei, ChengXiang Zhai. ACL 2010
  • Key ideas
    • Input: unaligned document sets in two languages, a bilingual dictionary
    • Output: 
      • a set of aligned topics (word distributions) in the two languages, that can characterize the shared topics
      •  a topic coverage distribution for each language (coverage of each topic in that language)
    • Method:
      • Start with ML objective of PLSA
      • Add a term to incorporate dictionary constraints (DC)
      • Dictionary modeled as a weighted bipartite graph (weight = translation probability)
      • ML using Generalized EM (because DC maximization has no closed-form solution)
        • We can't maximize DC; instead just try to improve over current