The MoreLikeThis search component (MLT) enables users to query for documents similar to a document in their result list.
Solr has had the MLT module since version 1.3, but Hybris doesn’t use it at all. I was wondering if it is possible to leverage MLT in hybris and how good the results would be.
I managed to create a working prototype, but the apparent simplicity of MLT integration still obscures some unexpected challenges and problems. I haven’t found the best solution for the task yet as I see it, but some findings and preliminary results are worth sharing with our community to help others go this way.
The MoreLikeThis component fetches products with similar term vectors. A term vector is a data structure that holds a list of all the words that were in the field and the number of times each word was used, excluding words that it considers to be “stop words.” SOLR loops over the fields and retrieves term vectors for each of the fields in the document we’re analyzing.
For each term, SOLR finds the field that contains the most instances of the given term and then calculates the score. Then the module selects the top K terms with the highest score to form a disjunctive query of these terms. Simply put, it displays the products that have the closest set of top words.

For example, for the product “CAMERA TAPE DIGITAL 90MIN 2PK,” the system creates a request:
- fulltext_en: Any of these terms:
- 500, 825, 860, camcord, cld, clean, comput, digit, digital8, ideal, line, loss, make, min, mode, perfect, possibl, record, resolut, tape, transfer, v825cld, videotap
- category_string_mv: Any of these terms:
- 585
- 604
These words are the most frequently used terms in the indexed product attributes.
Let’s try to see similar products for DIGITAL CAMERA TRIPOD:

The right window shows results for fulltext_en = (1.35, 135, 20.9, 209, adjust, aluminum, anod, attach, bubbl, camera, easi, feet, fold, head, leg, lock, nylon, plate, read, rubber, run, skid, stand, tall, tripod).
Challenges
Fetching recommendations
First, hybris uses SOLR 6.1 and the SolrJ library for 6.1. In this version of the library, MoreLikeThis is not supported. It means that the library is not able to parse the SOLR response correctly, and hybris ignores the MoreLikeThis section completely.
hybris is not tested with solrj > 6.1, so it is risky to replace the library with version 6.3, where these methods are supported. However, if you are able to test it thoroughly, it might be a solution.
In my PoC, I use the existing library capabilities to parse the Solr response that is not supported natively. Specifically, I used SolrSearchResult.getSolrObject().getResponse().get(“moreLikeThis”)) to access the list of recommendations.
To set up request parameters, I used:
searchQuery.addRawParam("mlt", "true");
searchQuery.addRawParam("mlt.fl", "fulltext_en,catalogVersion,category_string_mv");
searchQuery.addRawParam("mlt.count", "10"); //count of documents in the response
searchQuery.addRawParam("mlt.mindf", "2"); //min document frequency
searchQuery.addRawParam("mlt.mintf", "1"); //min term frequencyFacets
This module is poorly documented, and it seems that faceting had worked before, but in the latest versions of Solr, they no longer work with the results provided by the module.
Some sources say that it should work if the module is used as a request handler, but it seems that at least in hybris Solr 6.1, this is not true.
However, you can use, and parse, the request generated by MoreLikeThis and execute it via the regular Solr select method. In this case, facets will be supported because it is the normal way of fetching data.
Catalog versions
Catalog versions are not supported by MoreLikeThis because it knows nothing about hybris 🙂 So the recommendations contain results from both catalog versions. The solution is to filter them before displaying. For the simplest case, with two catalog versions, Online and Staged, it is not rocket science.
Field types
The fields on which to perform MLT must be indexed and of type string. MLT is not designed to work with double values (“similar prices”).
Accuracy issues
Sometimes the algorithm shows some products as similar to the selected one, but from the customer’s perspective, the proposed items have nothing in common with the original products. In the video below, I demonstrate this case. The smaller the product set, the more likely you are to face this situation. The fewer words used to describe the products, the lower the accuracy you will have.
Video
© Rauf Aliev, February 2017