hybris Marketplace PoC: 2,000,000 products, 15,000 categories, 6000 search facets


Online marketplaces frequently contend with needing to deal with a huge number of products, categories, and facets (product attributes used to filter search and category pages.) . Hybris OOTB supports large product sets, but many hybris components are not optimized for these volumes.  This PoC demonstrates one of the solutions that addresses this issue.


The Hybris out-of-the-box architecture is not suitable for huge product and facet sets. A major bottleneck is in fetching data for indexing. The more products and facets you have, the slower this process will be. The merchants of typical online marketplaces generally expect that the website will reflect the changes shortly after the new updates are uploaded, which may not be the case in the event of a high quantity of products and facets.

There are several solutions to make the indexing faster, but for the really huge catalogs like the scanario we’re discussing in this article, a slight improvement may not be enough. For such cases, designs of many out-of-the-box components should be rethought to meet high load and big data requirements.


In order to validate my design with the real data and massive volumes I used the freely-available BestBuy database to create this proof of concept:

The BestBuy XML has about 2,000,000 products, 15,000 categories and about 6000 product attributes (facets).

Marketplaces with the similarly large amounts of product generally use a distributed product management rather than a centralized one. There are different types of products and it is common to use the specialized management solutions for different product types. For this reason I assume that the products should be loaded into hybris from the external source where the products are managed.

Demo of PoC

As I mentioned above, I assumed that product data are provided as CSV files. In my solution, there is no such process as indexing because the data are supposed to be loaded directly into SOLR. However, for the storefront’s viewpoint there is no difference between indexing and the direct upload into SOLR.

It’s important that this process is very fast: ~1 second per 1000 products.
For 2,000,000 products the full update takes 25 minutes (on my laptop). If you need to refresh all marketplace product set from nothing to

If you need to refresh all marketplace product set from nothing to full set (2MM products), you need to wait for no more than 25 min per 2MM items (or faster if you use better hardware).

image2016-7-4 12-40-22.png

Custom marketplace indexer

The hybris OOTB indexer should be replaced with the custom indexer.

image2016-7-4 13-7-12.png

In my PoC

  • the preparation step takes ~20 sec per 5000 products,
  • the uploading step takes ~4 sec per 5000 products.

Example of BestBuy XML:

 <name>Amazon - Fire TV Stick - Black</name>
 <longDescription>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.</longDescription>
 <longDescriptionHtml>Amazon Fire TV Stick connects to your TV&apos;s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.&lt;br&gt;&lt;br&gt;&lt;a href=&quot;/site/home-solutions/streaming-media-players-buying-guide/pcmcat333300050010.c?id=pcmcat333300050010&amp;type=category&lt;br&gt;&lt;br&gt;&quot; onclick=&quot;return popNew(this,960,800);&quot; title=&quot;Streaming media players&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_BuyingGuide_StreamingMEdiaPlayer_123166&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/global/buyingguides/streaming_media/entry_point/PDP_StreamingMedia.png&quot; width=&quot;418 px&quot; height=&quot;90 px&quot; alt=&quot;Streaming media players&quot; /&gt;&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;/site/home-promotions/tv-alternatives-education/pcmcat331500050009.c?id=pcmcat331500050009&quot; onclick=&quot;return popNew(this,960,800)&quot; title=&quot;Cable and Satellite Alternatives&quot; target=&quot;_blank&quot; name=&quot;&amp;lid=PDP_TV_Alternatives_122113&quot;&gt;&lt;img src=&quot;http://images.bestbuy.com/BestBuy_US/en_US/images/abn/2014/hom/pr/blue-ray-pdp-banner-402x88.jpg&quot; alt=&quot;Cable and Satellite Alternatives&quot; /&gt;&lt;/a&gt;</longDescriptionHtml>
        <name>Compatible Wireless Standard(s)</name>
        <value>Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band</value>

(full XML for one product is here)

SOLR document I create from this XML:

 "name_text_en":"Amazon - Fire TV Stick - Black",
 "name_sortable_en_sortabletext":"Amazon - Fire TV Stick - Black",
 "description_text_en":"Amazon Fire TV Stick connects to your TV`s HDMI port. Just grab and go to enjoy Netflix, Prime Instant Video, Hulu Plus, YouTube.com, music, and much more.",
 "summary_text_en":"Streams 1080p content; dual-band, dual-antenna Wi-Fi (MIMO); supports 802.11a/b/g/n Wi-Fi networks; Bluetooth 3.0 with support for HID, HFP and HPP profiles; 1GB memory; 8GB internal storage",
 "categoryName_text_en_mv":["Best Buy"],
 "feature-Compatible_Wireless_Standard(s)_string" : "Wireless A|Wireless B|Wireless G|Wireless N|Wireless N Dual Band",
"feature-Interface(s)_string" : "HDMI|Micro USB",
"feature-Smart_Capable_string" : "Yes",
"feature-Color_Category_string" : "Black",
"feature-Maximum_Supported_Resolution_string" : "1080p",
"feature-Hard_Drive_string" : "Yes",
"feature-Computer_Connectivity_string" : "Not Applicable",
"feature-Instant_Content_Supported_string" : "Amazon Video|CNN|ESPN|HBO GO|HBO NOW|Hulu Plus|Netflix|Pandora|YouTube",
"feature-Smartphone_Compatible_string" : "Yes",
"feature-Instant_Streaming_string" : "Yes",
"feature-Playable_Formats_string" : "AAC-LC|AC3|BMP|FLAC|GIF|H.264|JPEG|MP3|PNG|Vorbis|WAV",
"feature-Hard_Drive_Size_string" : "8 gigabytes",
"feature-Remote_Control_Included_string" : "Yes",
"feature-pricematch_string" : "yes",

Custom product page

My custom product page works with SOLR rather than the database.

SolrClient solrClient = new LBHttpSolrClient("http://localhost:28983/solr/master_electronics_Product");
SolrQuery solrSearchQuery = new SolrQuery();
QueryResponse response = solrClient.query(solrSearchQuery);

I would add caching to it to make it faster.  In my PoC every HTTP request makes a request to the SOLR server.


To add a new filter in the facet area you need to create a facet item in the IndexedItem object.

image2016-7-4 14-1-4


Solr->DB Sync. The system creates a Product item in hybris once this product is added to the cart. This approach has the advantage of avoiding major changes in Cart and Checkout functionality.  Because the total number of products in carts is much smaller than the total number of products in the database, this approach will not affect the performance.

image2016-7-4 14-0-17


  1. Nicely done article ,
    just curious why did you have to convert to json before indexing . solr supports xml based indexing afaik, another contrib called DataImportHandler is available for direct feed from db to solr index that could probably help you improve your indexing speeds as well, There were some options that allowed direct delta indexing as well if I remember correctly


    1. Thank you for the comment. I think that XML parser is not faster than JSON parser, so i chose Json because my solr had already been configured for Jsons. Such mechanisms as direct data import from databases weren’t applicable because my data sources were files rather than databases. Anyway, there are these and other ways to make it faster or better in terms of supportability or reliability, but i tried to focus on the main idea. There a number of important things that can be added on top of it, of course


  2. David Alfaro · · Reply

    It’s very interesting!


  3. planetofadventure · · Reply

    Another good read. Small note about the comment :
    “For this reason I assume that the products should be loaded into hybris from the external source where the products are managed.”
    This is true for most systems I worked on, although the PoC herein presented will indeed be much faster than OOTB behaviour, I’m not entirely sure if anyone would go about publishing products on the marketplace store without having them in the hybris DB / PCM, I’m aware that in this PoC you create the products upon checkout but is that is to conform with the data model which expects it to be there for all the components that require it to behave as expected. Do you also set up the product category hierarchy? catalogues?
    Last project for example, products inception was done much earlier than the product reaching hybris, then upon making it to the hybris database, with all the enriched content that goes with the product, also prices, product approvals, etc then it was pushed into the market place store.
    Do you think that driving / feeding products into hybris from the market place PCM is an approach retailers would benefit from when all above is considered just to leverage faster product load time?

    Liked by 1 person

    1. Thanks, a very good question. Actually, it really depends on the specific project and technical requirements. Solr in the PoC plays a role of cache, just to minimize relational database interaction via lazy load. For example, it may work only for the application layer in the cluster, but the backoffice layer in the cluster can work with it’s own product catalog, where all products are in the database. Once the product is enriched and verified, it is published to the index that makes it visible. The proposed solution had it’s flaws, of course. For example, Solr can’t intersect things from it’s own index with other index or external database.


  4. Very nice article. Thanks for the details implementation. One query I have. We cannot user product base promotion for this solution. We need to have product promotion outside Hybris OOTB framework.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: