Research Notes: April 2012

quantum computing
- lattice (set theory)
- spectral theorem

designing search
- design depends on the context, which comprises
    - user (expert, novice, disabled, etc.)
    - task (adhoc, targeted, transactional, etc.)
    - environment (at home, on the move, on a desktop, on an phone, etc.)
- prioritize design goals
    - who am i designing for?
    - who am i _not_ designing for?
- faceted browsing: facets should be
    - exhaustive
    - consistent (e.g. (N.India, S.India) is ok, but (N.India, Bangalore) or (N.India, > Rs.10000) is not ok)
    - Orthogonal / Non-overlapping
    - Of roughly the same size

Paolo Boldi
    - centrality measures; empirical study of how it correlates to node importance
    - has open software on his website for this (HyperANF at ...unimi.it)
    - geodesic, centrality
    - centrality measures (first 3 are geometric indexes)
        - based on degree
        - based on no. of paths
        - based on how close it is to others
        - spectral indexes
    - Lin centrality, harmonic centrality
    - Kendall's tau
    - computing centrality using diffusion
    - Pregel implementation
    - whitelist
    - naturalistic study

- neyman pearson lemma
- probability ranking principle (stephen robertson)

Explaining query modifications: An alternative interpretation of term addition and removal
- how and why do users modify queries
- what do they assume about how the search engine works

Predicting the Future Impact of News Events
- use various attributes to predict the various attributes
- use std ML techniques (SVR, feature selection)

Detection of News Feeds Items Appropriate for Children
- classify an article as 'appropriate' or 'not'
- use BBC data (labeled 'for children', and 'for adults')
- std readability measures (use info gain for feature selection)
    - ARI, flesch-kincaid

Yoelle Marak
- usage data
    - query log
    - click data
    (- mouse track
    -eye track)
- searchwiki
- demographics of web search. wsdm 2011
- anatomy of the long tail. wsdm 2010

- B-cubed precision and recall: to evaluate soft-clustering

- kdd 2006. "... center ... graph ... extraction ..."

- trustRank (pagerank with bias towards inlinks from reliable pages)

- unique sets ratio

- Geometric MAP, instead if MAP

- statistical significance testing
    - parametric tests (eg. T-test)
    - non-parametric tests

- TREC. Banks. Variance due to topics.
- Chi-square test: to check if disribution is Normal
- Shapiro-Wilks test

- BoxCox transformation
- ACE algo (alternating conditional expectation)

- axiomatic approach. fang et al. sigir 2004

- 'bursty' distribution

- Lemur, Terrier systems

Leif Azzopardi et al. top-K retrieval.
    Chen and Karger. SIGIR 2006. Top-K retrieval.
    Prob Ranking Principle. Robertson.
    MMR. Carbonell and Goldstein. 1998
    modern portfolio theory. wang and zhu. ecir 2009
    quantum prob ranking principle. zuccon and azzopardi. ecir 2010
    comparison of ranking principles. zuccon, azzopardi, van rijsbergen. ictir 2011
    gollapudi and sharma. 2009. facilities placement and top-k
- language model with dirichlet smoothing
- alpha-nDCG@10
- diversification
    - kuland and kee, santos et al

latent variable model
- max margin latent variable model, for inducing relevance functions
- log linear model p(x) = e^h(x) / sum_x' e^h(x')
- subgradient method for parameter estimation
- data sets (image)    - SUN, MIR Flickr
- Global SVM, Transductive SVM

1000 search engines
- problem of search
    - many data sources: www, wiki, news, patents, tweet
    - many result types: docs, temp, curr, people
    - many relevances: topical, recency
- probabilistic relational algebra (PRA)

context aware recommendation
- tensor factorization
- ndcg, auc, loss functions
- stochastic GD, alternating least squares, bundle methods

TagME
- tagme.dl.unipi.it
- milne and witten 2008
- earthmover distance (on graphs?)
- "combinatorial"
- wsdm 2012 paper on result clustering
- Lingo, Lingo 3G
- Carpineto, Osinski, et al. ACM Comp Surv 2009
- Carpineto et al. SIGIR 2010.
    - Optimal seach results clustering
    - kSSL (method for evaluating search result clustering), AMBIENT (data set)
- relative mean difference
- TreeNet tool
- Boosted decision trees

* Listen to all talks, to guage the audience. Then tune own presentation accordingly. (Dont say things that everyone knows. Dont forget to say things that most dont know.)

Research Notes

Friday, April 6, 2012

Notes from ECIR 2012