Concept-aware search: automatic search facet discovery in SAP hybris
I would like to introduce my new PoC for automatic facet discovery. It sets up the facets based on the customer requests, the words used in the search query. For example, “blue armada jacket XXL” will show the products with a keyword “jacket” with three facets automatically set up, color=blue, brand=armada and size=XXL. You can also find a video below demonstrating how it works on top of hybris accelerators.
Introduction
Faceted search is a critical feature for enhancing user search experience and a vital part of any modern e-shops. From the user perspective, faceted search breaks up search results into multiple categories, showing counts for each, and allows the user to “drill down” or further restrict their search results based on those facets. So it is clear that they are extremely useful when working with large amounts of data: they improve finability, eliminate frustration, provide a guided means to navigate, or drill, and in any order. Most importantly, facets provide relevant landing pages for long tail keywords, just as category-based navigation has done for search marketers for ages.What people search
According to the research of Baymard.com, there are 12 query types. The most of them are not well supported by the search engines out-of-the-box.- Exact searches. Searching for specific products by title or model number. Example: Keurig K45.
- Product type searches. Searching for groups of whole categories of products. Example: Sandals.
- Symptom searches. Searching for products by querying for the problem they must solve in hopes of being presented with viable solutions and products to this problem. Examples: “stained rug” or “dry cough”.
- Non-product search. Searching for help pages, company information, and other non-product pages, such as the return policy or shipping information.
- Feature searches. Searching for products with specific attributes or features. Example: Waterproof cameras.
- Thematic searches. Searching for categories or concepts that are vague in nature or have “fuzzy” boundaries. “Living room rug”.
- Relational searches. Searching for products by their affiliation with another object. Movies starring Tom Hanks.
- Compatibility Search. Searching for products by their compatibility with another item. Lenses for Nikon D7000.
- Subjective Search. Searching for products using non-objective qualifiers. “High-quality kettles”.
- Slang, Abbreviation, and Symbol Searches. Searching for products using various linguistic shortcuts. Sleeping bag -10 deg.
- Implicit Search. Forgetting to include certain qualifiers in the search query due to one’s current frame of mind. [Women’s] Pants
- Natural Language Search. Searching in full sentences rather than bundles of keywords. Women’s shoes that are red and available in size 7.5
Challenge
There are some well-known problems with the facet navigation. Hybris displays facets relevant for the user query, but the query itself may contain some words that make this facet disappear. In the example above, “blue armada jacket XXL” all four words are considered by search engine as free text search request, and it displays the products having all four words in their fields. However, some facets can be stored in different format internally, and you need to create duplicate fields for indexing their text representation. That is why hybris creates two fields, categoryName (“Armada”) and category (code 584 that internally means “Armada”). The problem is that the results are not what the customer expects. “blue armada jacket XXL” displays all products having “blue”, “armada”, “jacket” and “XXL” in the name or description. That is why the most of e-shops use the product properties in the title for findability. So, in order to find all blue female jackets of the XXL size and the brand “Burton”, the customer should:- perform a search using the free text query “blue female XL Burton jacket“
- (wait)
- scroll down to the “Colors” facet and click to “blue”. If the list is long, the customer needs to click to the link “More” first.
- (wait)
- scroll down to the “Brand” facet and click to “Burton”. if the list is long..
- (wait)
- scroll down to “Size” facet and click to “XL”. This list is normally not long.
- (wait)
- scroll down to “Gender” facet and click to “Female”. This list is normally also not long)
- (found!)
Video
Should the query replacement be automatic?
In my PoC, it is automatic. However, for the real project, my recommendation is to conduct A/B testing to find out if the automatic facet discovery works or not for the particular business case. Product types, catalog size, customer profiles count for making the right decision. One of the examples of non-automatic approach is to add a one-click automatic suggestion displayed next to the hybris OOTB search results: Certainly, the design above is quick and dirty and the panel eats too much space in this form. If you want to go with this approach, the information needs to be compact and informative.Technical details and architecture
The system analyzes the query and extracts facet information from the user input. For example, “Canon flash memory” can’t set up both “Brand=Canon” and “Category=Flash memory” because Canon doesn’t have any flash memory cards in the catalog. So the system should make a decision, what is more important for the customer, all Canon products or all flash memory products. In addition to that, the system may show all Canon Flashes by ignoring the “memory” keyword. For example, the customer may want to see both canon flashes and memory in the same list. So it is obvious that the decision is tough for the computer brains, because they know nothing about the real customer intent. However, when the products having both attributes (a brand and category) are available, the customer intent is clear and the search facets can be configured automatically. For example, we have six Sony Flash Memory products available in the demo catalog, so they should be displayed as a result of “Sony Flash Memory”. The next screenshot shows the results for “Sony Flash memory 32Gb“. So the system keeps all the facet values in the memory and use them to map keywords from the request to the specific facet. These facets are built automatically by SOLR based on the documents uploaded by hybris, so the most convinient way to get these lists uniformly is to request the SOLR where all of them are stored. There is a OOTB request handler in SOLR called “terms” for that: There is one drawback: it works nicely only with KeywordTokenizer (to keep the words together in the multi-word facets) and without stemming filters (to keep the original words; Stemming Filters reduce the words in their root or base forms, the stem). However, using SOLR configuration you can create copies of the original facet filters without stemming filters and tokenizers. The simplest approach is to change the type of these fields from “text” to “string” in the hybris configuration. However, it slightly affects full text search. What facets we need to process? there are two options: all facets or only those returned by the original request. I used the second approach. For example, the request “Cheap blue XXL jacket” shows the following facets:- AvailableInStores (50)
- Price (7)
- Colors (11)
- Size (30 )
- Gender (2 )
- Collection (17)
- Category (49)
- Brand (44)
- Brand = Red Hat, keywords = big
- Color = Red, category = “Hats” (using Hat=Hats from the synonyms), keywords = Big
© Rauf Aliev, June 2017
Obada Sayed
26 June 2017 at 05:20
Very informative article !
Thank you : )
Julio Argüello
27 June 2017 at 04:55
Hi Rauf,
I really enjoy your posts!
I wonder if working with stemmed fields could help to improve results. If so we could create a copy of the original field in the index, i.e.:
And to ask for the terms over stemmed field (instead of “t_*”). In such a way we just need to apply the search stemming function on phase 2 over the customer entered text before matching.
Rauf Aliev
27 June 2017 at 07:06
A good idea!
Julio Argüello
27 June 2017 at 05:29
In my previous commend some XML senteces were lost:
> dynamicField name=”stemmed_*” type=”stemmed_text” termVectors=”true” stored=”false”
> source=”t_*” dest=”stemmed_*”
Steve
10 July 2017 at 14:08
Hi Rauf, would it be possible to share the code for this please ?
Rauf Aliev
10 July 2017 at 14:55
Hi Steve. Normally I am sharing only parts of the code I create for PoCs. There are a number of reasons: sometimes other parts belong to our clients, sometimes they are tightly linked with the parts of other PoCs. I can share _something_. let’s contact on Skype (rauf_aliev)