Using query intent to boost retrieval results

Enhance your retrieval systems by determining user query intent and tailoring search strategies accordingly.

Understanding the intent behind a user’s query is crucial for delivering accurate and relevant search results. By determining whether a user is searching for specific documents or seeking answers to questions, we can tailor our retrieval systems to provide better results.

How to determine query intent

The first step in improving retrieval results is to determine the user’s intent. Queries can generally be classified into two categories:

Search Intent: The user knows what they’re looking for and wants to retrieve specific documents or information.
Answer Intent: The user is seeking an answer to a question and may not know where to find it.

By accurately identifying the intent, we can adjust our retrieval strategies accordingly.

Custom query intents

Depending on your application, you might have more specific or custom query intents. For example, you may add intents like Navigation, Command, or Clarification. Defining custom intents allows you to handle queries more precisely.

Implementing intent detection

To determine the query intent, we can use a language model or rule-based system. Below is a Python example using a language model to classify intent.

from enum import Enum
from typing import List
from pydantic import BaseModel

class Intent(str, Enum):
    Search = "SEARCH"
    Answer = "ANSWER"

class Query(BaseModel):
    intent: Intent
    keywords: List[str]
    term: str

def determine_query(input_text: str) -> Query:
    query = client.messages.create(
        messages=[{
            "role": "user",
            "content": f"Determine the query intent of <query>{input_text}</query>"
        }],
        response_model=Query
    )

    return query

Here’s how it can be used

query = determine_query("q4 planning notes")

is_search = query.intent == Intent.Search

semantic_weight = 0.3 if is_search else 0.7
fts_weight = 1 - semantic_weight

results = retrieve_documents(
    query=query.term,
    semantic_weight=semantic_weight,
    keyword_weight=fts_weight
)

In this example:

We define an Intent enum with possible intents.
The determine_query function uses a LLM to classify the intent. We could improve this by training a custom model on a bunch of examples. This is definitely doable after you’ve deployed this system.
We adjust the weights for semantic and keyword-based retrieval methods based on the detected intent.

Taking it further

You can extend this implementation by adding more sophisticated logic or handling additional intents.

Including confidence scores

You might want to include a confidence score for the intent prediction.

class Query(BaseModel):
    intent: Intent
    confidence: float
    keywords: List[str]
    term: str

query = determine_query(user_input)

if query.confidence < 0.8:
    semantic_weight = 0.5
    keyword_weight = 0.5
elif is_search:
    ...

Handling search intent

When the user’s intent is to search for specific information, we should prioritize exact matches and consider potential typos or misspellings. Boosting keyword-driven techniques like BM25 can significantly improve results in this case.

Example: “Q4 planning notes”

In this query, the user is likely looking for a specific document or set of documents related to Q4 planning notes. To handle this:

Boost BM25 and Keyword Search: Since the user probably knows what they’re looking for, prioritize documents with exact keyword matches.
Handle Typos and Misspellings: Implement fuzzy search techniques using tools like pg_trgm in PostgreSQL. This might not be relevant if your’re doing a semantic search.

Handling answer intent

When the user is seeking an answer, semantic understanding becomes more important. In this case, boosting semantic search techniques can help retrieve documents that contain the answer, even if they don’t directly match the query terms.

Example: “What did we decide during the Q4 planning meeting last week?”

To address this query:

Boost Semantic Search: Use embeddings and similarity measures to find documents that are contextually relevant.
Leverage Contextual Information: Incorporate metadata like dates to narrow down results to the relevant time frame.

Avoid relying on specific query patterns and punctuation

I’ve seen query intent being determined by ?. This is not a good idea. It’s a perfect example of when LLMs provide a much better user experience.

Additional examples

Query	Intent	Approach
”Open the Q4 planning spreadsheet”	`Search`	Boost exact match and consider user permissions to ensure they can access the document.
”How did our sales compare between Q3 and Q4 last year?”	`Answer`	Use semantic search to find documents discussing sales comparisons between quarters.
”List all customers who purchased product X in the last month.”	`Answer`	Use semantic search along with metadata filters to retrieve accurate results.

Conclusion

Effectively determining query intent can truly enhance your retrieval systems, leading to more accurate results and happier users. As always, try different methods and and evaluete to see what performs best for you. And whenever you have enough data, use that to actually optimize for what your users are searching for.

If you found this helpful, consider subscribing to my newsletter for more insights like this :)