SOLR-based dynamic availability groups. PoC: 500K availability groups

Category: Uncategorized Author: Rauf ALIEV 26 June 2016 3 comments

Situation

I am still working on overcoming hybris’ limitations. Today’s topic is about personalized catalogs. In one of the previous blog posts, I talked about personalized prices for 500,000 customer groups. This time I want to tell you about personalized product availability. It is clear that in most cases the number of availability groups is not very high. However, I used an extreme case: 500,000 availability groups for one e-shop, with one unique customer per group. Having solved this issue, this approach could easily be scaled down, recognizing the possible bottlenecks and limitations. In my example below:

`left` customer (#745) belongs to availability group #745.
`right` customer (#11111) belongs to availability group #11111.

	Availability group #745	Availability group #11111
EF 2X II EXTERNDER	AVAILABLE	NOT AVAILABLE
RECHARGEABLE BATTERY PACK	AVAILABLE	NOT AVAILABLE
FLAGSHIP TRIPOD	NOT AVAILABLE	AVAILABLE
HIGH QUALITY TRIPOD	AVAILABLE	AVAILABLE

The following behavior is expected (see the screenshot below). The left side is a screenshot from the device where customer #745 is logged in; the right side is for customer #11111. image2016-6-9 19-58-59.png

Data and models

Complexity

In hybris, availability information is stored in the database and SOLR index. For category pages and search results, hybris uses SOLR. For product pages, it uses the information from the database. Indexing is a slow process for large, comprehensive catalogs, so it is common that information in these sources is different. However, the product information (such as product attributes, title, description or product images) is not changed frequently, while such product data as prices and stock data are very dynamic. However, the indexing logic is arranged in hybris so that (almost) all the information is being retrieved from the database for indexing purposes. If your stock is very dynamic, you need to launch a new indexing process just after the previous one is complete. At times the indexing can take hours. This can mean that information about product availability won’t be relevant for a large number of the products for a significant amount of time. One solution is to change the availability information directly, without touching the other fields. It is a good approach, but I’m not satisfied with the performance.

Solution

There is a separate SOLR core to handle availability groups. The configuration of this core is rather basic. This is a simple four-column dataset: customercode, productid, AvailableOrNot.

customercode,productcode,stock
customer1,107701,true
customer1,479956,true
customer1,592506,true
customer1,824259,true

It is very fast to update with information from the warehouse management system or ERP system. A full update (50M records) takes 187 seconds.
The database is used only to handle availability groups. Stock information is stored in SOLR. Hybris stock data is not used anymore. For the compatibility, you might sync SOLR data with hybris data when needed and only for the items affected.

Technical details

SOLR

New core: personalstock. Configuration:

Uploading data:

time -p curl "http://localhost:8983/solr/personalstock/update/csv?stream.file=/hybris/solr-prices/stock.csv&stream.contentType=text/plain;charset=utf-8"

See above for the Stock.csv structure.

Custom SOLRQueryConvertor

To use this additional core, you need to slightly change the requests for SOLR from hybris. There is a query parser plugin named JOIN that can be leveraged to use the data from our new SOLR core.

public class DSSOLRQueryConvertor extends DefaultSolrQueryConverter implements SolrQueryConverter, BeanFactoryAware {

@Resource
UserService userService;

public SolrQuery convertSolrQuery(SearchQuery searchQuery) throws FacetSearchException {
SolrQuery solrQuery = super.convertSolrQuery(searchQuery);
String customerAvailabilityGroup = getAvailabilityGroupOftheCurrentCustomer();;
String customerQuery="*:*";
if (!customerAvailabilityGroup.equals("")) {
customerQuery = "customercode:" + customerAvailabilityGroup;
solrQuery.add("fq", "{!join from=productcode to=code_string fromIndex=personalstock}"+customerQuery);
}
return solrQuery;
}

private String getAvailabilityGroupOftheCurrentCustomer() {
AvailabilityGroupModel AvailabilityGroupModel = userService.getCurrentUser().getAvailabilityGroup();
currentAvailabilityGroup = AvailabilityGroupModel.getCode();
return currentAvailabilityGroup;
}

}

Tags: SOLR

3 Responses

Julio Argüello

25 November 2016 at 04:25

How would you deal with PDP page with this POC? I mean, if a user belonging to an (availability) group with no access to product B (with code ‘b’), how would you restrict this customer access to the the URL (/p/b) by himself?

In our case we have made a similar solution to yours but the source of truth regarding stock is Hybris anyway and:
a) Hybris feeds Solr with this info
b) PDP has an availability check at the very beginning (if no access then 404 HTTP status is returned)

Do NOTE a ‘very skilled’ customer could anyway add a restricted product to its cart (changing the request) but at the very end, during the checkout the item would be removed anyway.
1. Rauf Aliev
  
  27 November 2016 at 08:55
  
  Yes, it is so :). The article only demonstrated the approach. For example, you can make a request to solr from PDP to check whether this product available or not.
Shinu Suresh (@shinusuresh)

16 December 2016 at 00:42

Good article. BTW, can you put the schema and solrconfig in public domain too.
Now they are hosted in EPAM Confluence