## Friday, May 25, 2012

C. E. Liu, K. Thambiratnam, F. Seide. InterSpeech 2007.

Application: Given an audio clip, and some text metadata, generate terms that should be used to index the clip.

Problem: Given the text metadata, adapt the vocabulary using an external corpus (e.g. the internet), and then choose indexing terms from the adapted vocabulary. More precisely, look at an external corpus and guess which words have been mentioned in the audio but are not present in the vocabulary of the indexer (and also, of course, in the metadata).

Highlights:
• Use text metadata to query internet search engine. Pick useful words from retrieved document set, and update the vocabulary. Also use the retrieved document set for adapting the language model.
• Distinguish between term frequency TF_td = c(t,d)/\sum{t'} c(t',d), and tapered term frequency TTF_td = 1+log(TF_td). In the Stanford NLP course, they define TF_td = 1 + log(c(t,d)). Which one is used when?
• Similar to the above, they define TFIDF and TTFIDF.
• Problem reduces to this: for each word in each retrieved document, classify it as a candidate OOV term (i.e. predict that it has been mentioned in the audio) or not. To do this, build a classifier as usual, using audio transcripts as the ground truth training data. The feature vector for a word consisted of its TF, TFIDF, TTF, TTFIDF, POS, etc.