SOLR-based dynamic availability groups. PoC: 500K availability groups

Situation

I am still working on overcoming hybris’ limitations. Today’s topic is about personalized catalogs. In one of the previous blog posts, I talked about personalized prices for 500,000 customer groups. This time I want to tell you about personalized product availability.

It is clear that in most cases the number of availability groups is not very high. However, I used an extreme case: 500,000 availability groups for one e-shop, with one unique customer per group. Having solved this issue, this approach could easily be scaled down, recognizing the possible bottlenecks and limitations.

In my example below:

  • `left` customer (#745) belongs to availability group #745.
  • `right` customer (#11111) belongs to availability group #11111.
Availability group #745 Availability group #11111
EF 2X II EXTERNDER AVAILABLE NOT AVAILABLE
RECHARGEABLE BATTERY PACK AVAILABLE NOT AVAILABLE
FLAGSHIP TRIPOD NOT AVAILABLE AVAILABLE
HIGH QUALITY TRIPOD AVAILABLE AVAILABLE

The following behavior is expected (see the screenshot below). The left side is a screenshot from the device where customer #745 is logged in; the right side is for customer #11111.

image2016-6-9 19-58-59.png

Data and models

image2016-6-9 20-3-59

image2016-6-9 20-2-42

Complexity

In hybris, availability information is stored in the database and SOLR index. For category pages and search results, hybris uses SOLR.  For product pages, it uses the information from the database. Indexing is a slow process for large, comprehensive catalogs, so it is common that information in these sources is different. However, the product information (such as product attributes, title, description or product images) is not changed frequently, while such product data as prices and stock data are very dynamic.

However, the indexing logic is arranged in hybris so that (almost) all the information is being retrieved from the database  for indexing purposes. If your stock is very dynamic, you need to launch a new indexing process just after the previous one is complete. At times the indexing can take hours. This can mean that information about product availability won’t be relevant for a large number of the products for a significant amount of time.

One solution is to change the availability information directly, without touching the other fields. It is a good approach, but I’m not satisfied with the performance.

Solution

 

  • There is a separate SOLR core to handle availability groups. The configuration of this core is rather basic. This is a simple four-column dataset: customercode, productid, AvailableOrNot.
customercode,productcode,stock
customer1,107701,true
customer1,479956,true
customer1,592506,true
customer1,824259,true
  • It is very fast to update with information from the warehouse management system or ERP system. A full update (50M records) takes 187 seconds.
  • The database is used only to handle availability groups. Stock information is stored in SOLR. Hybris stock data is not used anymore. For the compatibility, you might sync SOLR data with hybris data when needed and only for the items affected.

Technical details

SOLR

New core: personalstock. Configuration:

Uploading data:

time -p curl "http://localhost:8983/solr/personalstock/update/csv?stream.file=/hybris/solr-prices/stock.csv&stream.contentType=text/plain;charset=utf-8"

See above for the Stock.csv structure.

Custom SOLRQueryConvertor

To use this additional core, you need to slightly change the requests for SOLR from hybris. There is a query parser plugin named JOIN that can be leveraged to use the data from our new SOLR core.

public class DSSOLRQueryConvertor extends DefaultSolrQueryConverter implements SolrQueryConverter, BeanFactoryAware {

 @Resource
 UserService userService;

 public SolrQuery convertSolrQuery(SearchQuery searchQuery) throws FacetSearchException {
 SolrQuery solrQuery = super.convertSolrQuery(searchQuery);
 String customerAvailabilityGroup = getAvailabilityGroupOftheCurrentCustomer();;
 String customerQuery="*:*";
 if (!customerAvailabilityGroup.equals("")) {
 customerQuery = "customercode:" + customerAvailabilityGroup;
 solrQuery.add("fq", "{!join from=productcode to=code_string fromIndex=personalstock}"+customerQuery);
 }
 return solrQuery;
 }

 private String getAvailabilityGroupOftheCurrentCustomer() {
 AvailabilityGroupModel AvailabilityGroupModel = userService.getCurrentUser().getAvailabilityGroup();
 currentAvailabilityGroup = AvailabilityGroupModel.getCode();
 return currentAvailabilityGroup;
 }

}

3 comments

  1. How would you deal with PDP page with this POC? I mean, if a user belonging to an (availability) group with no access to product B (with code ‘b’), how would you restrict this customer access to the the URL (/p/b) by himself?

    In our case we have made a similar solution to yours but the source of truth regarding stock is Hybris anyway and:
    a) Hybris feeds Solr with this info
    b) PDP has an availability check at the very beginning (if no access then 404 HTTP status is returned)

    Do NOTE a ‘very skilled’ customer could anyway add a restricted product to its cart (changing the request) but at the very end, during the checkout the item would be removed anyway.

    Liked by 1 person

    1. Yes, it is so :). The article only demonstrated the approach. For example, you can make a request to solr from PDP to check whether this product available or not.

      Like

  2. Good article. BTW, can you put the schema and solrconfig in public domain too.
    Now they are hosted in EPAM Confluence

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: