Hybris SOLR query builders and search relevance

A note from 2026: This article was published in 2017, when SAP Commerce Cloud was still commonly referred to as SAP Hybris Commerce. The Solr integration, Backoffice configuration, and search APIs have evolved across many SAP Commerce Cloud releases, so validate the described query builders and scoring behavior against your current version.

In SAP Hybris Commerce, requests to the SOLR search engine are created by Query Builders. Simply put, these components convert user queries into SOLR queries. Certainly, it is not possible to explain the search results if you don’t know what SOLR request was generated and why it contains particular conditions in a particular form. Unfortunately, the official documentation is very sparse and lacks examples. This article explains the differences between the available query builders. You will also find it a useful complement to one of the previous articles about search relevancy.

Hybris has the following OOTB query builders:

Default Free Text Query Builder
Multi-Field Free Text Query Builder
DisMax Free Text Query Builder

Hybris uses the custom relevance formula for two of these builders, Default and DisMax. For the Multi-Field builder, Hybris uses the SOLR default formula (LuceneQParser). For details on this topic, see the last section of this article.

Default Free Text Query Builder

The default query builder is the simplest in Hybris. As its name suggests, this query builder is used by default. However, it uses the Hybris custom relevancy formula, multiMaxScore (see the last section for details).

Example. If you search for:

full text string

your SOLR request will look like this:

For each field defined as “Full text”:
- OR EXACT MATCH for:
```
full
```
  (boosting X)
- OR EXACT MATCH for:
```
text
```
  (the same)
- OR EXACT MATCH for:
```
string
```
  (the same)
- If the “wildcard” flag is active for the field:
  - OR WILDCARD MATCH for:
```
full*
```
    (boosting X/2)
  - OR WILDCARD MATCH for:
```
text*
```
    (the same)
  - OR WILDCARD MATCH for:
```
string*
```
    (the same)
- If fuzzy search is active for the field:
  - If the field type is “text”:
    - OR WILDCARD MATCH for:
```
full~
```
      (with the specified fuzziness, if specified; boosting X/4)
    - OR WILDCARD MATCH for:
```
text~
```
      (the same)
    - OR WILDCARD MATCH for:
```
string~
```
      (the same)
- If the “phrase search” flag is active for the field:
  - OR WILDCARD MATCH for:
```
full text string
```
    (boosting X*2)

Note that:

There is only one configurable boosting factor: for the field. Phrase, wildcard, fuzzy search, and phrase search boosting factors depend on the field boosting factor and are non-configurable (hard-coded).

For example, the SOLR query for the request:

word1 "word2 word3" word4

will look like this:

(
(code_string:word1^90.0) OR
(keywords_text_fr:word1^20.0) OR
...
(name_text_fr:word1^100.0)
) OR (
(code_string:"word2 word3"^90.0) OR
(keywords_text_fr:"word2 word3"^20.0) OR
...
(name_text_fr:"word2 word3"^100.0)
) OR (
(code_string:word4^90.0) OR
...
(name_text_fr:word4^100.0)

) OR (
(keywords_text_fr:word1~^10.0) OR
...
(name_text_fr:word1~^25.0)
) OR (
(keywords_text_fr:"word2 word3"~^10.0) OR
...
(name_text_fr:"word2 word3"~^25.0)
) OR (
(keywords_text_fr:word4~^10.0) OR
...
(name_text_fr:word4~^25.0)
) OR (

(code_string:word1*^45.0) OR
(ean_string:word1*^50.0)
) OR (
(code_string:"word2 word3"*^45.0) OR
(ean_string:"word2 word3"*^50.0)
) OR (
(code_string:word4*^45.0) OR
(ean_string:word4*^50.0)
) OR (
(keywords_text_fr:"word1 word2 word3 word4"^40.0) OR
...
(name_text_fr:"word1 word2 word3 word4"^100.0)
)

So the pattern is:

EXACT (f1,f2,...fN) OR FUZZY (f1,f2,...fN) OR WILDCARD (f1,f2,...fN) OR PHRASE (f1,f2,...fN).

Hybris uses multiMaxParser, so the largest score wins in all pattern components (both f1…fN and EXACT/FUZZY/WILDCARD/PHRASE groups).

Multi-Field Free Text Query Builder

According to the documentation, it builds the query in a way that the final score will be the sum of the scores of all subqueries. This is how SOLR works by default.

However, it works differently from the default free text query builder in other aspects as well.

Tokens. The builder tokenizes the user query by splitting it by whitespace characters. However, it also supports quoted phrases. For example:

User query:

word1 "word2 word3" word4

Result:

—

word1

—

word2 word3

—

word4

Note that the second and third words are considered a single token here. Only double quotes work.

Phrase queries. In the example above, the phrase query is built from the original query by removing the double quotes. So the phrase query will look like this:

word1 word2 word3 word4

Boosting. It uses specific boost factors for the exact match, fuzzy match, wildcard match, and phrase match, and these factors are configurable in Backoffice. Fuzziness, sloppiness, and the wildcard query type are configurable too.

Sloppiness. A sloppy phrase query specifies a maximum “slop,” or the number of positions tokens need to be moved to get a match. In other words, it defines how many transpositions of the words need to be done for the exact match. The slop is zero by default, requiring exact matches.

For example, “the President of first” with:

slop=3

will match the document containing “the first President of the USA is Washington”, but:

slop=2

won’t.

Slop=2

will work for the query “the President first”, for example.

Fuzziness. Fuzziness is a similar thing, but for the letters of the tokens. It is the maximum allowed number of edits to match. For example:

persident

will match:

president

with fuzziness=1.

For example, the SOLR query for the request:

word1 "word2 word3" word4

will look like this:

(code_string:
(word1^90.0 OR
"word2 word3"^90.0 OR
word4^90.0 OR
word1*^45.0 OR
"word2 word3"*^45.0 OR
word4*^45.0)
) OR
(keywords_text_fr:
(word1^20.0 OR
"word2 word3"^20.0 OR
word4^20.0 OR
word1~^10.0 OR
"word2 word3"~^10.0 OR
word4~^10.0 OR
"word1 word2 word3 word4"^40.0)
) OR
...
(name_text_fr:
(word1^100.0 OR
"word2 word3"^100.0 OR
word4^100.0 OR
word1~^25.0 OR
"word2 word3"~^25.0 OR
word4~^25.0 OR
"word1 word2 word3 word4"^100.0)
)

So the pattern is:

f1 (EXACT, FUZZY, WILDCARD, PHRASE) OR f2 (EXACT, FUZZY, WILDCARD, PHRASE) ... OR FN (EXACT, FUZZY, WILDCARD, PHRASE).

DisMax Free Text Query Builder

Similar to the previous one, but it groups some of the subqueries. The score for the group will be the maximum score of the subqueries that belong to that group, not the sum. Hybris uses its custom relevancy formula (multiMaxScore). For details on this topic, see the last section of this article.

The DisMax Query Builder also supports quotes in the query, boosting, sloppiness, and fuzziness.

This query builder supports the parameters groupByQueryType and tie.

Group By Query Type. It changes the way disjunction max queries are grouped. If set to true, it also groups queries by type, where the types are: free text query, free text fuzzy query, and free text wildcard query.
Tie. The tie parameter defines how much the final score of the query will be influenced by the scores of the lower-scoring fields compared to the highest-scoring field: 0.0 makes a query a pure “disjunction max query”; 1.0 makes the query a pure “disjunction sum query,” where it doesn’t matter what the maximum-scoring subquery is.

Understanding MultiMax Query Parser

Hybris uses the custom MultiMax query parser developed by SAP for two query builders, DisMax and Default. The plugin is very simple, but you need to know that it modifies the way the score is calculated.

The easiest way to explain it is to demonstrate the internals by example.

Let’s take the following sample documents for experimentation:

Document #1.

id: “doc1”
title_text_en: “the first President of the USA is Washington titleA”
description_text_en: “the first President of the USA is Washington”

Document #2

id: “doc2”
title_text_en: “the second President of the USA is John Adams titleB”
description_text_en: “the second President of the USA is John Adams”

Document #3

id: “doc3”
title_text_en: “the first head of the USA is Washington titleC”
description_text_en: “the first head of the USA is Washington”

Take the following sample request:

(title_text_en:"first" OR description_text_en:first) OR (title_text_en:"titleC" OR description_text_en:"titleC")

All components are joined using OR. The request is very close to what Hybris creates using the query builders (see the examples above).

Let’s examine the relevancy calculation for two different cases:

Default parser (LuceneQParser)
Hybris custom parser (multiMaxScoreParser)

and compare the results.

The screenshots below may look difficult to understand. Don’t read everything — just look through. Note that multiMaxScore uses a max function, while LuceneQParser uses a sum function. This is a key difference between the custom and default query parsers.

SOLR debug output for the first scoring example

SOLR debug output for another scoring example

SOLR debug output comparing parser scoring details

This debug information shows that:

LuceneQParser calculates the score for each subquery and sums them up to get the total query score.
multiMaxScoreParser sums up the scores of the subqueries. However, it doesn’t sum up the scores from each component of the subquery.

The last statement means that in the default Hybris implementation of the scoring formula and with the DisMax/MultiMax parsers, it may not be important how many fields contain a particular token. For the particular token and particular field, the score depends on the global and local frequency of the term and the field length. I used “may not” because there are other components of the formula that make the dependency indirect.

For example, “first” is used in both fields, in the name and in the description. The Hybris formula, multiMaxScore, calculates the:

score = 0.55

because it is the maximum of the scores for:

title_text_en:first

and:

description_text_en:first

For example, we have the following documents in SOLR:

Documents indexed in SOLR for the scoring example

Let’s take the following request:

title_text_en:"first" OR description_text_en:first

Different parsers will show the documents in a different order:

Different parser results for the same SOLR query

eDisMax parser is based on LuceneQParser, so you will have the same scores and results with eDisMax for this set.

Default Query Builder Example

Multi-Field Query Builder Example

Note that the order of the documents is a bit different because of the different way of grouping and calculating subqueries.

To sum up:

Default Query Builder uses only one boosting factor; all others are built based on this one. It doesn’t recognize quotes in the query. It uses multiMaxScore instead of the SOLR default LuceneQ.
Multi-Field Query Builder doesn’t use multiMaxScore. It recognizes quotes in the query. It supports exact match, phrase, fuzzy and wildcard boosts, fuzziness, and sloppiness.
DisMax Query Builder uses multiMaxScore and recognizes quotes. It supports exact match, phrase, fuzzy and wildcard boosts, fuzziness, and sloppiness.