hybris Marketplace PoC: 2,000,000 products, 15,000 categories, 6000 search facets – hybrismart | SAP hybris under the hood

hybris Marketplace PoC: 2,000,000 products, 15,000 categories, 6000 search facets


Introduction

Online marketplaces frequently contend with needing to deal with a huge number of products, categories, and facets (product attributes used to filter search and category pages.) . Hybris OOTB supports large product sets, but many hybris components are not optimized for these volumes.  This PoC demonstrates one of the solutions that addresses this issue.

Complexity

The Hybris out-of-the-box architecture is not suitable for huge product and facet sets. A major bottleneck is in fetching data for indexing. The more products and facets you have, the slower this process will be. The merchants of typical online marketplaces generally expect that the website will reflect the changes shortly after the new updates are uploaded, which may not be the case in the event of a high quantity of products and facets.

There are several solutions to make the indexing faster, but for the really huge catalogs like the scanario we’re discussing in this article, a slight improvement may not be enough. For such cases, designs of many out-of-the-box components should be rethought to meet high load and big data requirements.

Solution

In order to validate my design with the real data and massive volumes I used the freely-available BestBuy database to create this proof of concept:
https://bestbuyapis.github.io/api-documentation/#products-bulk-download

The BestBuy XML has about 2,000,000 products, 15,000 categories and about 6000 product attributes (facets).

Marketplaces with the similarly large amounts of product generally use a distributed product management rather than a centralized one. There are different types of products and it is common to use the specialized management solutions for different product types. For this reason I assume that the products should be loaded into hybris from the external source where the products are managed.

Demo of PoC

As I mentioned above, I assumed that product data are provided as CSV files. In my solution, there is no such process as indexing because the data are supposed to be loaded directly into SOLR. However, for the storefront’s viewpoint there is no difference between indexing and the direct upload into SOLR.

It’s important that this process is very fast: ~1 second per 1000 products.
For 2,000,000 products the full update takes 25 minutes (on my laptop). If you need to refresh all marketplace product set from nothing to

If you need to refresh all marketplace product set from nothing to full set (2MM products), you need to wait for no more than 25 min per 2MM items (or faster if you use better hardware).

image2016-7-4 12-40-22.png

Custom marketplace indexer

The hybris OOTB indexer should be replaced with the custom indexer.

image2016-7-4 13-7-12.png

In my PoC

  • the preparation step takes ~20 sec per 5000 products,
  • the uploading step takes ~4 sec per 5000 products.

Example of BestBuy XML:

<product>
 <sku>9999119</sku>
 <productId>1219460752591</productId>
 <name>Amazon - Fire TV Stick - Black</name>
...
 <longDescription>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.</longDescription>
 <longDescriptionHtml>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.&lt;br&gt;&lt;br&gt;&lt;a href=&quot;/site/home-solutions/streaming-media-players-buying-guide/pcmcat333300050010.c?id=pcmcat333300050010&amp;type=category&lt;br&gt;&lt;br&gt;&quot; onclick=&quot;return popNew(this,960,800);&quot; title=&quot;Streaming media players&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_BuyingGuide_StreamingMEdiaPlayer_123166&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/global/buyingguides/streaming_media/entry_point/PDP_StreamingMedia.png&quot; width=&quot;418 px&quot; height=&quot;90 px&quot; alt=&quot;Streaming media players&quot; /&gt;&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;/site/home-promotions/tv-alternatives-education/pcmcat331500050009.c?id=pcmcat331500050009&quot; onclick=&quot;return popNew(this,960,800)&quot; title=&quot;Cable and Satellite Alternatives&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_TV_Alternatives_122113&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/hom/pr/blue-ray-pdp-banner-402x88.jpg&quot; alt=&quot;Cable and Satellite Alternatives&quot; /&gt;&lt;/a&gt;</longDescriptionHtml>
 <details>
      <detail>
        <name>Compatible Wireless Standard(s)</name>
        <value>Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band</value>
      </detail>
 ...
    </details>
 ...
 </product>

(full XML for one product is here)

SOLR document I create from this XML:

{
 "indexOperationId_long":36449,
 "id":"BestBuy/Online/9999119",
 "catalogId":"BestBuy",
 "catalogVersion":"Online",
 "price_usd_string":"39.99",
 "priceValue_usd_double":39.99,
 "category_string_mv":["cat00000"],
 "img-65Wx65H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "img-515Wx515H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_sb.jpg",
 "allCategories_string_mv":["cat00000"],
 "inStockFlag_boolean":true,
 "img-30Wx30H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "code_string":"9999119",
 "name_text_en":"Amazon - Fire TV Stick - Black",
 "name_sortable_en_sortabletext":"Amazon - Fire TV Stick - Black",
 "img-96Wx96H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "manufacturerAID_string":"",
 "manufacturerName_text":"",
 "description_text_en":"Amazon Fire TV Stick connects to your TV`s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.",
 "ean_string":"9999119",
 "summary_text_en":"Streams 1080p content; dual-band, dual-antenna Wi-Fi (MIMO); supports 802.11a/b/g/n Wi-Fi networks; Bluetooth 3.0 with support for HID, HFP and HPP profiles; 1GB memory; 8GB internal storage",
 "itemtype_string":"Product",
 "stockLevelStatus_string":"inStock",
 "categoryName_text_en_mv":["Best Buy"],
 "img-300Wx300H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_sb.jpg",
 "feature-Compatible_Wireless_Standard(s)_string" : "Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band",
"feature-Interface(s)_string" : "HDMI|Micro USB",
"feature-Smart_Capable_string" : "Yes",
"feature-Color_Category_string" : "Black",
"feature-Maximum_Supported_Resolution_string" : "1080p",
"feature-Hard_Drive_string" : "Yes",
"feature-Computer_Connectivity_string" : "Not Applicable",
"feature-Instant_Content_Supported_string" : "Amazon Video|CNN|ESPN|HBO GO|HBO NOW|Hulu Plus|Netflix|Pandora|YouTube",
"feature-Smartphone_Compatible_string" : "Yes",
"feature-Instant_Streaming_string" : "Yes",
"feature-Playable_Formats_string" : "AAC-LC|AC3|BMP|FLAC|GIF|H.264|JPEG|MP3|PNG|Vorbis|WAV",
"feature-Hard_Drive_Size_string" : "8 gigabytes",
"feature-Remote_Control_Included_string" : "Yes",
"feature-pricematch_string" : "yes",
 "url_en_string":""
}

Custom product page

My custom product page works with SOLR rather than the database.

SolrClient solrClient =
new
LBHttpSolrClient(
"<a href="http://localhost:28983/solr/master_electronics_Product">http://localhost:28983/solr/master_electronics_Product"</a>
);
SolrQuery solrSearchQuery =
new
SolrQuery();
solrSearchQuery.set(
"q"
,
"code_string:"
+productCode);
QueryResponse response = solrClient.query(solrSearchQuery);

I would add caching to it to make it faster.  In my PoC every HTTP request makes a request to the SOLR server.

Facets

To add a new filter in the facet area you need to create a facet item in the IndexedItem object.

image2016-7-4 14-1-4

E-Commerce

Solr->DB Sync. The system creates a Product item in hybris once this product is added to the cart. This approach has the advantage of avoiding major changes in Cart and Checkout functionality.  Because the total number of products in carts is much smaller than the total number of products in the database, this approach will not affect the performance.

image2016-7-4 14-0-17

© Rauf Aliev, July 2016

5 Responses

  1. aditya

    aditya

    Reply

    31 August 2016 at 07:58

    Nicely done article ,
    just curious why did you have to convert to json before indexing . solr supports xml based indexing afaik, another contrib called DataImportHandler is available for direct feed from db to solr index that could probably help you improve your indexing speeds as well, There were some options that allowed direct delta indexing as well if I remember correctly
    Regards
    Aditya

  2. David Alfaro

    David Alfaro

    Reply

    20 February 2017 at 02:30

    It’s very interesting!

  3. planetofadventure

    planetofadventure

    Reply

    11 August 2017 at 07:54

    Another good read. Small note about the comment :
    “For this reason I assume that the products should be loaded into hybris from the external source where the products are managed.”
    This is true for most systems I worked on, although the PoC herein presented will indeed be much faster than OOTB behaviour, I’m not entirely sure if anyone would go about publishing products on the marketplace store without having them in the hybris DB / PCM, I’m aware that in this PoC you create the products upon checkout but is that is to conform with the data model which expects it to be there for all the components that require it to behave as expected. Do you also set up the product category hierarchy? catalogues?
    Last project for example, products inception was done much earlier than the product reaching hybris, then upon making it to the hybris database, with all the enriched content that goes with the product, also prices, product approvals, etc then it was pushed into the market place store.
    Do you think that driving / feeding products into hybris from the market place PCM is an approach retailers would benefit from when all above is considered just to leverage faster product load time?

Leave a Reply