Friday, January 13, 2012

SIGIR Poster on Multi-Document Summarization

I read the poster MSSF: A Multi-Document Summarization Framework based on Submodularity (SIGIR 2011). It talked solving the problem using a budgeted maximum coverage problem based on submodularity. Some points to read about
  • Let E be a finite set. The function f:Pow(E)->R is submodular if, for any S \subset T \subset E, and s \in E \ T we have
f(T U {s}) - f(T) <= f(S U {s}) - f(S).
Submodular functions are closed under non-negative linear combinations.
  • Budgeted Maximum Coverage Problem (NP-Hard): Given a set E of sentences, where each subset S has an influence i(S) and a cost c(S), and a budget b, find a subset S of E which has the larget possible influence while the total cost does not exceed b.
  • For example, b is the maximum number of sentences in the summary, c(S) = number of sentences in S, and i(S) is a submodular function measuring the influence of the summary S, then there exist algorithms to solve this problem approximately. Different functions i(S) were proposed in the poster, for different settings (simple summarization, query-focused ~, update ~, comparitive ~). For the simple case
i(S) = \sum{s_i \in E\S} \sum{s_j \in S} sim(s_i, s_j) - \sum{s_i,s_j\in S, s_i != s_j} sim(s_i, s_j) = f1(S) - f2(S)
  • f1(S) measures to what degree S represents E, and f2(S) measures redundancies within S. f1(S) and -f2(S) are both submodular, so i(S) is also submodular.
Things to look up
  • Budgeted Maximum Coverage Problem. Khuller, Moss, Naor. 1999, Minoux 1978.
  • Maximum Marginal Relevance, Update summarization, Comparitive summarization, Query-focused summarization

Thursday, January 5, 2012

I am attending a School on Network Science at ECE, IISc, organized as part of the IMI, and funded by ICTS. I jotted down some points for revision/future look-up.

Networks, Rumours, and Epidemics [Dr Ayalvadi Ganesh]
References
  • Grimmett and Frieze
  • Damon Mosk-Aoyoma and Devavrat Shah. Distributed Computation of Separable Functions. IEEE Trans. Info. Theory. 2008
  • Consensus. de Groot et al.
  • Voter model. Hassin and Peleg, James Cruise and Ganesh Ayalvadi
  • Wright-Fisher model, Morale model
  • Wisdom of Crowds. Golub and Jackson
  • Acemoglu, Dahleh, Lobel, Ozdaglar
  • Kermack and McKendrick. Epidemic model using ODEs
  • Ganesh Ayalvadi, Moussalie, Tousley. Effect effect of network topology on epidemic spread. IEEE InfoComm. 2005
  • Draief, Ganesh, Moussalie. thresholds for Virus Spread on Nertworks. Ann. Appl. Prob. 2008
Topics
  • Coupon Collector Problem
  • Stochastic Domination, Strassen's Theorem
  • Markov Chains---irreducible, aperiodoc, ergodic, balance equations, invariant distribution, etc.
  • Markov Process (continuous time)
  • Poisson Process---thinning
  • Martingales, stopping time, Optional Stopping Theorem
  • Binary Entropy function, Conditional entropy
  • Random walks---coalescing RW
  • Coupling from the past in MCMC
  • Erdos-Renyi graphs, scale-free graphs, expander/ing graphs, unimodular graphs
  • Linear recurrences, harmonic functions
  • Lattices---infinite lattice
  • Perron-Frobenius Theorem---sprectral radius, Perron eigenvalue
  • Eigenvalue of e^A, where A is a matrix
  • Generalized isoperimetric constant
Ising Models [Andrea Montanari]
Topics
  • Hammersley-Clifford Theorem
  • Inclusion-Exclusion Principle
  • Union bound
  • Measures, Concentration
  • Gibbs sampling, MCMC, Metropolis
  • Constrastive Divergence (Hinton)
  • Glauber Markov Chain
  • tanh, atanh---properties
  • "One-dimensional recursion"
  • Inequalities---Markov, Chebyshev, ...
  • Sterling number---as related to enumerating number of graphs with p vertices and Cp edges 
  • Belief propagation, Generalized BP
  • Contraction (on graphs)