Precise Zero-Shot Dense Retrieval without Relevance Labels
Carnegie Mellon University · University of Waterloo
Abstract
While dense retrieval has been shown to be effective and efficient across tasks and languages, it remains difficult to create effective fully zero-shot dense retrieval systems when no relevance labels are available. In this paper, we recognize the difficulty of zero-shot learning and encoding relevance. Instead, we propose to pivot through Hypothetical Document Embeddings (HyDE). Given a query, HyDE first zero-shot prompts an instruction-following language model (e.g., InstructGPT) to generate a hypothetical document. The document captures relevance patterns but is "fake" and may contain hallucinations. Then, an unsupervised contrastively learned encoder (e.g., Contriever) encodes the document into an…
Citation impact
- FWCI
- 36.37
- Percentile
- 100%
- References
- 53
Authors
4Topics & keywords
- Relevance (law)
- Computer science
- Similarity (geometry)
- Relevance feedback
- Embedding
- Vector space model
- Artificial intelligence
- Information retrieval
- Quality Education