Notes on COLING 2012 - Part 1
Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking. Estelle DELPECH, Béatrice DAILLE, Emmanuel MORIN, Claire LEMAIRE
- Problem: Extract translations of phrases (not just single words). Focus on fertile translations---target has more words than source.
- Key ideas: 
- split source term into morphemes (helps handle the multi-word case, and also fertility.)
- translate morphemes (A key assumption here is that the parts of the source phrase are compositional)
- recompose into target phrase. This creates several candidates (e.g. by permutation, which are ranked.
Multi-way Tensor Factorization for Unsupervised Lexical Acquisition. Tim Van de Cruys, Laura Rimell, Thierry Poibeau, Anna Korhonen
- Problem: Cluster verbs in a corpus, based on (a) what arguments it can take (b) what arguments it prefers (among those possible), and (c) do the first two jointly.
- Key idea: Use non-negative tensor factorization (Shashua, A. and Hazan, T. Non-negative tensor factorization with applications to statistics and computer vision. ICML 2005) to cluster the verbs.
 
Incremental Learning of Affix Segmentation. Wondwossen Mulugeta, Michael Gasser, Baye Yimam
- Problem: Affix segmentation (or morpho-analysis) for Amharic (whose morphology seems as complex as Indian languages).
- Approach: Directly used Inductive Logic Programming as described in [Manandhar, S. , Džeroski, S. and Erjavec, T. Learning multilingual morphology with CLOG. ILP 1998]
- Given data of the form: stem([s,e,b,e,r,k,u,l,h],[s,e,b,e,r] [1,1,1,2]). [seber is the stem of seberkulh]
- Learn a set of rules of the form "p :- q" meaning "Do p, if q is true". Example of p: stem(Word, Stem, [1, 2, 7, 0]):-
 set_affix(Word, Stem, [y], [], [u], []),
 feature([1, 2, 7, 0], [simplex, imperfective, tppn, noobj]),
 template(Stem, [1, 0, 1, 1]).
- The order of training data matters a lot. First simpler examples should be given, followed by more complex ones.
 
 
 
 
          
      
 
  
 
 
 
 
 
 
 
 
 
 
No comments:
Post a Comment