Advanced Topics in Information Retrieval
Spring 2008

This is largely a placeholder at the moment. For the nonce, here are some general outline thoughts for the class:

Friday, February 8. Passages/XML

James presenting

We consider the topic of passage retrieval and XML element retrieval for this class. The general idea is that there are situations in which a portion of a document is more useful than the entire thing. In passage retrieval, a key question is finding the portions. In XML retrieval, the XML markup elements may provide the right portions.

  1. (Required) Kaszkiel and Zobel, "Passage retrieval revisited", SIGIR 1997 (9 pages, at ACM). This paper evaluates a range of passage retrieval approaches. It builds on a classic SIGIR 1994 paper by Callan.
  2. (At least the poster) Jiang and Zhai, "Extraction of coherent relevant passages using hidden Markov models". ACM Transactions on Information Systems, 24(3):295-319. (25 pages, at ACM). This paper talks about a way to extract just the right length of passage. A sketch of the work is available in a poster presented at SIGIR:
  3. (Skim at least) Fuhr et al, "Overview of the INEX 2007 Ad Hoc Track". Pre-proceedings of INEX 2007. (22 pages, pdf). Provides an overview of retrieval using XML elements, much of which is summary evaluation results. You may find this description of the query format useful:
  4. Ogilvie and Callan, "Hierarchical language models for XML component retrieval". In Proceedings of the INEX 2005 workshop. (15 pages, pdf). This paper sketches how element retrieval in XML can be done in a language modeling framework.
  5. (Required) Kams and Koolen, "On the relation between relevant passages and XML document structure." Proceedings of the SIGIR 2007 workshop on focused retrieval. (5 pages, pdf, pdf slides, full workshop proceedings). This paper links XML element retrieval and passage retrieval: is there a relationship at all?
  6. Itakura and Clarke, "From passages into elements in XML retrieval". Proceedings of the SIGIR 2007 workshop on focused retrieval. (6 pages, pdf, proceedings). How should those relevant XML elements be found?

Friday, February 15. Learning to rank

Ben presenting