H.3.8. Natural Language Processing
Mozhgan Akaberi; Maryam Khodabakhsh; Seyedehfatemeh Karimi; Hoda Mashayekhi
Abstract
The exponential growth of digital information has increased the demand for robust and efficient Information Retrieval (IR) systems. Query Performance Prediction (QPP) is a critical task for identifying difficult queries and enhancing retrieval strategies. However, existing QPP methods suffer from several ...
Read More
The exponential growth of digital information has increased the demand for robust and efficient Information Retrieval (IR) systems. Query Performance Prediction (QPP) is a critical task for identifying difficult queries and enhancing retrieval strategies. However, existing QPP methods suffer from several limitations: (1) score-based approaches fail to capture the structural relationships among retrieved documents, (2) supervised methods require labeled training data, making them costly and impractical for new domains, and (3) unsupervised post-retrieval predictors often rely solely on retrieval score dispersion, neglecting document clustering effects. To address these challenges, we propose a novel clustering-based post-retrieval QPP method. Specifically, we introduce three unsupervised predictors: Clustered Distinction, which measures query-specific separability of retrieved clusters; Clustered Query Drift, which estimates the deviation of top-ranked documents from query intent; and a hybrid approach combining both. By analyzing the clustering structure of retrieved documents, our method improves interpretability while eliminating the need for labeled data. We evaluate our approach on three standard datasets: the large-scale MS MARCO Passage Ranking dataset, TREC DL 2019, and TREC DL 2020. Experimental results demonstrate that our method significantly outperforms state-of-the-art score-based QPP models. These findings highlight the potential of cluster-aware QPP for enhancing IR systems and reducing the impact of difficult queries.