Using query intent to boost retrieval results
Enhance your retrieval systems by determining user query intent and tailoring search strategies accordingly.
Understanding the intent behind a user’s query is crucial for delivering accurate and relevant search results. By determining whether a user is searching for specific documents or seeking answers to questions, we can tailor our retrieval systems to provide better results.
How to determine query intent
The first step in improving retrieval results is to determine the user’s intent. Queries can generally be classified into two categories:
Search Intent
: The user knows what they’re looking for and wants to retrieve specific documents or information.Answer Intent
: The user is seeking an answer to a question and may not know where to find it.
By accurately identifying the intent, we can adjust our retrieval strategies accordingly.
Custom query intents
Depending on your application, you might have more specific or custom query intents. For example, you may add intents like Navigation, Command, or Clarification. Defining custom intents allows you to handle queries more precisely.
Implementing intent detection
To determine the query intent, we can use a language model or rule-based system. Below is a Python example using a language model to classify intent.
from enum import Enum
from typing import List
from pydantic import BaseModel
class Intent(str, Enum):
Search = "SEARCH"
Answer = "ANSWER"
class Query(BaseModel):
intent: Intent
keywords: List[str]
term: str
def determine_query(input_text: str) -> Query:
query = client.messages.create(
messages=[{
"role": "user",
"content": f"Determine the query intent of <query>{input_text}</query>"
}],
response_model=Query
)
return query
Here’s how it can be used
query = determine_query("q4 planning notes")
is_search = query.intent == Intent.Search
semantic_weight = 0.3 if is_search else 0.7
fts_weight = 1 - semantic_weight
results = retrieve_documents(
query=query.term,
semantic_weight=semantic_weight,
keyword_weight=fts_weight
)
In this example:
- We define an
Intent
enum with possible intents. - The
determine_query
function uses a LLM to classify the intent. We could improve this by training a custom model on a bunch of examples. This is definitely doable after you’ve deployed this system. - We adjust the weights for semantic and keyword-based retrieval methods based on the detected intent.
Taking it further
You can extend this implementation by adding more sophisticated logic or handling additional intents.
Including confidence scores
You might want to include a confidence score for the intent prediction.
class Query(BaseModel):
intent: Intent
confidence: float
keywords: List[str]
term: str
query = determine_query(user_input)
if query.confidence < 0.8:
semantic_weight = 0.5
keyword_weight = 0.5
elif is_search:
...
Handling search intent
When the user’s intent is to search for specific information, we should prioritize exact matches and consider potential typos or misspellings. Boosting keyword-driven techniques like BM25 can significantly improve results in this case.
Example: “Q4 planning notes”
In this query, the user is likely looking for a specific document or set of documents related to Q4 planning notes. To handle this:
- Boost BM25 and Keyword Search: Since the user probably knows what they’re looking for, prioritize documents with exact keyword matches.
- Handle Typos and Misspellings: Implement fuzzy search techniques using tools like
pg_trgm
in PostgreSQL. This might not be relevant if your’re doing a semantic search.
Handling answer intent
When the user is seeking an answer, semantic understanding becomes more important. In this case, boosting semantic search techniques can help retrieve documents that contain the answer, even if they don’t directly match the query terms.
Example: “What did we decide during the Q4 planning meeting last week?”
To address this query:
- Boost Semantic Search: Use embeddings and similarity measures to find documents that are contextually relevant.
- Leverage Contextual Information: Incorporate metadata like dates to narrow down results to the relevant time frame.
Avoid relying on specific query patterns and punctuation
I’ve seen query intent being determined by ?
. This is not a good idea. It’s a perfect example of when LLMs provide a much better user experience.
Additional examples
Query | Intent | Approach |
---|---|---|
”Open the Q4 planning spreadsheet” | Search | Boost exact match and consider user permissions to ensure they can access the document. |
”How did our sales compare between Q3 and Q4 last year?” | Answer | Use semantic search to find documents discussing sales comparisons between quarters. |
”List all customers who purchased product X in the last month.” | Answer | Use semantic search along with metadata filters to retrieve accurate results. |
Conclusion
Effectively determining query intent can truly enhance your retrieval systems, leading to more accurate results and happier users. As always, try different methods and and evaluete to see what performs best for you. And whenever you have enough data, use that to actually optimize for what your users are searching for.
Further reading
Get some nuggets
Follow on