Monday, January 14, 2013

Notes on COLING 2012 - Part 1

Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking. Estelle DELPECH, Béatrice DAILLE, Emmanuel MORIN, Claire LEMAIRE

  • Problem: Extract translations of phrases (not just single words). Focus on fertile translations---target has more words than source.
  • Key ideas: 
    • split source term into morphemes (helps handle the multi-word case, and also fertility.)
    • translate morphemes (A key assumption here is that the parts of the source phrase are compositional)
    • recompose into target phrase. This creates several candidates (e.g. by permutation, which are ranked.

Multi-way Tensor Factorization for Unsupervised Lexical Acquisition. Tim Van de Cruys, Laura Rimell, Thierry Poibeau, Anna Korhonen

  • Problem: Cluster verbs in a corpus, based on (a) what arguments it can take (b) what arguments it prefers (among those possible), and (c) do the first two jointly.
  • Key idea: Use non-negative tensor factorization (Shashua, A. and Hazan, T. Non-negative tensor factorization with applications to statistics and computer vision. ICML 2005) to cluster the verbs.

Incremental Learning of Affix Segmentation. Wondwossen Mulugeta, Michael Gasser, Baye Yimam

  • Problem: Affix segmentation (or morpho-analysis) for Amharic (whose morphology seems as complex as Indian languages).
  • Approach: Directly used Inductive Logic Programming as described in [Manandhar, S. , Džeroski, S. and Erjavec, T. Learning multilingual morphology with CLOG. ILP 1998]
    • Given data of the form: stem([s,e,b,e,r,k,u,l,h],[s,e,b,e,r] [1,1,1,2]). [seber is the stem of seberkulh]
    • Learn a set of rules of the form "p :- q" meaning "Do p, if q is true". Example of p: stem(Word, Stem, [1, 2, 7, 0]):-
      set_affix(Word, Stem, [y], [], [u], []),
      feature([1, 2, 7, 0], [simplex, imperfective, tppn, noobj]),
      template(Stem, [1, 0, 1, 1])
      .
    • The order of training data matters a lot. First simpler examples should be given, followed by more complex ones.

No comments:

Post a Comment