SOLR-based dynamic availability groups. PoC: 500K availability groups
Situation
I am still working on overcoming hybris’ limitations. Today’s topic is about personalized catalogs. In one of the previous blog posts, I talked about personalized prices for 500,000 customer groups. This time I want to tell you about personalized product availability. It is clear that in most cases the number of availability groups is not very high. However, I used an extreme case: 500,000 availability groups for one e-shop, with one unique customer per group. Having solved this issue, this approach could easily be scaled down, recognizing the possible bottlenecks and limitations. In my example below:- `left` customer (#745) belongs to availability group #745.
- `right` customer (#11111) belongs to availability group #11111.
Availability group #745 | Availability group #11111 | |
---|---|---|
EF 2X II EXTERNDER | AVAILABLE | NOT AVAILABLE |
RECHARGEABLE BATTERY PACK | AVAILABLE | NOT AVAILABLE |
FLAGSHIP TRIPOD | NOT AVAILABLE | AVAILABLE |
HIGH QUALITY TRIPOD | AVAILABLE | AVAILABLE |
Data and models
Complexity
In hybris, availability information is stored in the database and SOLR index. For category pages and search results, hybris uses SOLR. For product pages, it uses the information from the database. Indexing is a slow process for large, comprehensive catalogs, so it is common that information in these sources is different. However, the product information (such as product attributes, title, description or product images) is not changed frequently, while such product data as prices and stock data are very dynamic. However, the indexing logic is arranged in hybris so that (almost) all the information is being retrieved from the database for indexing purposes. If your stock is very dynamic, you need to launch a new indexing process just after the previous one is complete. At times the indexing can take hours. This can mean that information about product availability won’t be relevant for a large number of the products for a significant amount of time. One solution is to change the availability information directly, without touching the other fields. It is a good approach, but I’m not satisfied with the performance.Solution
- There is a separate SOLR core to handle availability groups. The configuration of this core is rather basic. This is a simple four-column dataset: customercode, productid, AvailableOrNot.
customercode,productcode,stock
customer1,107701,true
customer1,479956,true
customer1,592506,true
customer1,824259,true
customer1,107701,true
customer1,479956,true
customer1,592506,true
customer1,824259,true
- It is very fast to update with information from the warehouse management system or ERP system. A full update (50M records) takes 187 seconds.
- The database is used only to handle availability groups. Stock information is stored in SOLR. Hybris stock data is not used anymore. For the compatibility, you might sync SOLR data with hybris data when needed and only for the items affected.
Technical details
SOLR
New core: personalstock. Configuration: Uploading data:time -p curl "http://localhost:8983/solr/personalstock/update/csv?stream.file=/hybris/solr-prices/stock.csv&stream.contentType=text/plain;charset=utf-8"
Custom SOLRQueryConvertor
To use this additional core, you need to slightly change the requests for SOLR from hybris. There is a query parser plugin named JOIN that can be leveraged to use the data from our new SOLR core.public class DSSOLRQueryConvertor extends DefaultSolrQueryConverter implements SolrQueryConverter, BeanFactoryAware {
@Resource
UserService userService;
public SolrQuery convertSolrQuery(SearchQuery searchQuery) throws FacetSearchException {
SolrQuery solrQuery = super.convertSolrQuery(searchQuery);
String customerAvailabilityGroup = getAvailabilityGroupOftheCurrentCustomer();;
String customerQuery="*:*";
if (!customerAvailabilityGroup.equals("")) {
customerQuery = "customercode:" + customerAvailabilityGroup;
solrQuery.add("fq", "{!join from=productcode to=code_string fromIndex=personalstock}"+customerQuery);
}
return solrQuery;
}
private String getAvailabilityGroupOftheCurrentCustomer() {
AvailabilityGroupModel AvailabilityGroupModel = userService.getCurrentUser().getAvailabilityGroup();
currentAvailabilityGroup = AvailabilityGroupModel.getCode();
return currentAvailabilityGroup;
}
}
@Resource
UserService userService;
public SolrQuery convertSolrQuery(SearchQuery searchQuery) throws FacetSearchException {
SolrQuery solrQuery = super.convertSolrQuery(searchQuery);
String customerAvailabilityGroup = getAvailabilityGroupOftheCurrentCustomer();;
String customerQuery="*:*";
if (!customerAvailabilityGroup.equals("")) {
customerQuery = "customercode:" + customerAvailabilityGroup;
solrQuery.add("fq", "{!join from=productcode to=code_string fromIndex=personalstock}"+customerQuery);
}
return solrQuery;
}
private String getAvailabilityGroupOftheCurrentCustomer() {
AvailabilityGroupModel AvailabilityGroupModel = userService.getCurrentUser().getAvailabilityGroup();
currentAvailabilityGroup = AvailabilityGroupModel.getCode();
return currentAvailabilityGroup;
}
}
Julio Argüello
25 November 2016 at 04:25
How would you deal with PDP page with this POC? I mean, if a user belonging to an (availability) group with no access to product B (with code ‘b’), how would you restrict this customer access to the the URL (/p/b) by himself?
In our case we have made a similar solution to yours but the source of truth regarding stock is Hybris anyway and:
a) Hybris feeds Solr with this info
b) PDP has an availability check at the very beginning (if no access then 404 HTTP status is returned)
Do NOTE a ‘very skilled’ customer could anyway add a restricted product to its cart (changing the request) but at the very end, during the checkout the item would be removed anyway.
Rauf Aliev
27 November 2016 at 08:55
Yes, it is so :). The article only demonstrated the approach. For example, you can make a request to solr from PDP to check whether this product available or not.
Shinu Suresh (@shinusuresh)
16 December 2016 at 00:42
Good article. BTW, can you put the schema and solrconfig in public domain too.
Now they are hosted in EPAM Confluence