hybris Marketplace PoC: 2,000,000 products, 15,000 categories, 6000 search facets

Introduction

Online marketplaces frequently contend with needing to deal with a huge number of products, categories, and facets (product attributes used to filter search and category pages.) . Hybris OOTB supports large product sets, but many hybris components are not optimized for these volumes.  This PoC demonstrates one of the solutions that addresses this issue.

Complexity

The Hybris out-of-the-box architecture is not suitable for huge product and facet sets. A major bottleneck is in fetching data for indexing. The more products and facets you have, the slower this process will be. The merchants of typical online marketplaces generally expect that the website will reflect the changes shortly after the new updates are uploaded, which may not be the case in the event of a high quantity of products and facets.

There are several solutions to make the indexing faster, but for the really huge catalogs like the scanario we’re discussing in this article, a slight improvement may not be enough. For such cases, designs of many out-of-the-box components should be rethought to meet high load and big data requirements.

Solution

In order to validate my design with the real data and massive volumes I used the freely-available BestBuy database to create this proof of concept:
https://bestbuyapis.github.io/api-documentation/#products-bulk-download

The BestBuy XML has about 2,000,000 products, 15,000 categories and about 6000 product attributes (facets).

Marketplaces with the similarly large amounts of product generally use a distributed product management rather than a centralized one. There are different types of products and it is common to use the specialized management solutions for different product types. For this reason I assume that the products should be loaded into hybris from the external source where the products are managed.

Demo of PoC

As I mentioned above, I assumed that product data are provided as CSV files. In my solution, there is no such process as indexing because the data are supposed to be loaded directly into SOLR. However, for the storefront’s viewpoint there is no difference between indexing and the direct upload into SOLR.

It’s important that this process is very fast: ~1 second per 1000 products.
For 2,000,000 products the full update takes 25 minutes (on my laptop). If you need to refresh all marketplace product set from nothing to

If you need to refresh all marketplace product set from nothing to full set (2MM products), you need to wait for no more than 25 min per 2MM items (or faster if you use better hardware).

image2016-7-4 12-40-22.png

Custom marketplace indexer

The hybris OOTB indexer should be replaced with the custom indexer.

image2016-7-4 13-7-12.png

In my PoC

  • the preparation step takes ~20 sec per 5000 products,
  • the uploading step takes ~4 sec per 5000 products.

Example of BestBuy XML:

<product>
 <sku>9999119</sku>
 <productId>1219460752591</productId>
 <name>Amazon - Fire TV Stick - Black</name>
... 
 <longDescription>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.</longDescription>
 <longDescriptionHtml>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.&lt;br&gt;&lt;br&gt;&lt;a href=&quot;/site/home-solutions/streaming-media-players-buying-guide/pcmcat333300050010.c?id=pcmcat333300050010&amp;type=category&lt;br&gt;&lt;br&gt;&quot; onclick=&quot;return popNew(this,960,800);&quot; title=&quot;Streaming media players&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_BuyingGuide_StreamingMEdiaPlayer_123166&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/global/buyingguides/streaming_media/entry_point/PDP_StreamingMedia.png&quot; width=&quot;418 px&quot; height=&quot;90 px&quot; alt=&quot;Streaming media players&quot; /&gt;&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;/site/home-promotions/tv-alternatives-education/pcmcat331500050009.c?id=pcmcat331500050009&quot; onclick=&quot;return popNew(this,960,800)&quot; title=&quot;Cable and Satellite Alternatives&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_TV_Alternatives_122113&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/hom/pr/blue-ray-pdp-banner-402x88.jpg&quot; alt=&quot;Cable and Satellite Alternatives&quot; /&gt;&lt;/a&gt;</longDescriptionHtml>
 <details>
      <detail>
        <name>Compatible Wireless Standard(s)</name>
        <value>Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band</value>
      </detail>
 ...
    </details>
 ...
 </product>

(full XML for one product is here)

SOLR document I create from this XML:

{
 "indexOperationId_long":36449,
 "id":"BestBuy/Online/9999119",
 "catalogId":"BestBuy",
 "catalogVersion":"Online",
 "price_usd_string":"39.99",
 "priceValue_usd_double":39.99,
 "category_string_mv":["cat00000"],
 "img-65Wx65H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "img-515Wx515H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_sb.jpg",
 "allCategories_string_mv":["cat00000"],
 "inStockFlag_boolean":true,
 "img-30Wx30H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "code_string":"9999119",
 "name_text_en":"Amazon - Fire TV Stick - Black",
 "name_sortable_en_sortabletext":"Amazon - Fire TV Stick - Black",
 "img-96Wx96H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_s.gif",
 "manufacturerAID_string":"",
 "manufacturerName_text":"",
 "description_text_en":"Amazon Fire TV Stick connects to your TV`s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.",
 "ean_string":"9999119",
 "summary_text_en":"Streams 1080p content; dual-band, dual-antenna Wi-Fi (MIMO); supports 802.11a/b/g/n Wi-Fi networks; Bluetooth 3.0 with support for HID, HFP and HPP profiles; 1GB memory; 8GB internal storage",
 "itemtype_string":"Product",
 "stockLevelStatus_string":"inStock",
 "categoryName_text_en_mv":["Best Buy"],
 "img-300Wx300H_string":"http://img.bbystatic.com/BestBuy_US/images/products/9999/9999119_sb.jpg",
 "feature-Compatible_Wireless_Standard(s)_string" : "Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band",
"feature-Interface(s)_string" : "HDMI|Micro USB",
"feature-Smart_Capable_string" : "Yes",
"feature-Color_Category_string" : "Black",
"feature-Maximum_Supported_Resolution_string" : "1080p",
"feature-Hard_Drive_string" : "Yes",
"feature-Computer_Connectivity_string" : "Not Applicable",
"feature-Instant_Content_Supported_string" : "Amazon Video|CNN|ESPN|HBO GO|HBO NOW|Hulu Plus|Netflix|Pandora|YouTube",
"feature-Smartphone_Compatible_string" : "Yes",
"feature-Instant_Streaming_string" : "Yes",
"feature-Playable_Formats_string" : "AAC-LC|AC3|BMP|FLAC|GIF|H.264|JPEG|MP3|PNG|Vorbis|WAV",
"feature-Hard_Drive_Size_string" : "8 gigabytes",
"feature-Remote_Control_Included_string" : "Yes",
"feature-pricematch_string" : "yes",
 "url_en_string":""
}

Custom product page

My custom product page works with SOLR rather than the database.

SolrClient solrClient = new LBHttpSolrClient("http://localhost:28983/solr/master_electronics_Product");
SolrQuery solrSearchQuery = new SolrQuery();
solrSearchQuery.set("q","code_string:"+productCode);
QueryResponse response = solrClient.query(solrSearchQuery);

I would add caching to it to make it faster.  In my PoC every HTTP request makes a request to the SOLR server.

Facets

To add a new filter in the facet area you need to create a facet item in the IndexedItem object.

image2016-7-4 14-1-4

E-Commerce

Solr->DB Sync. The system creates a Product item in hybris once this product is added to the cart. This approach has the advantage of avoiding major changes in Cart and Checkout functionality.  Because the total number of products in carts is much smaller than the total number of products in the database, this approach will not affect the performance.

image2016-7-4 14-0-17

3 comments

  1. Nicely done article ,
    just curious why did you have to convert to json before indexing . solr supports xml based indexing afaik, another contrib called DataImportHandler is available for direct feed from db to solr index that could probably help you improve your indexing speeds as well, There were some options that allowed direct delta indexing as well if I remember correctly
    Regards
    Aditya

    Like

    1. Thank you for the comment. I think that XML parser is not faster than JSON parser, so i chose Json because my solr had already been configured for Jsons. Such mechanisms as direct data import from databases weren’t applicable because my data sources were files rather than databases. Anyway, there are these and other ways to make it faster or better in terms of supportability or reliability, but i tried to focus on the main idea. There a number of important things that can be added on top of it, of course

      Like

  2. David Alfaro · · Reply

    It’s very interesting!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: