Apache SOLR 6 with SAP hybris 6


Situation

As of today, SAP hybris officially supports Apache SOLR 5.3. This version of the search engine was released in August 2015. SOLR 6 released in April 2016. It has a number of new features as (in order of importance for hybris projects):
  • Cross DC replication
    • Accommodate 2 or more data centers
    • Active/passive disaster recovery
    • Support limited bandwidth links
    • Eventually consistent passive cluster
    • Scalable: no SPoF and/or bottleneck
    • Peer cluster can have different replication factors
    • Asynchronous updates, no penalty for indexing operations and burst indexing
    • Push operations for low latency replication
    • Low overhead — uses existing transaction logs
    • Leader-to-leader communication ensures an update is sent only once to peer cluster
  • Graph traversal queries (for example, fetch all upline categories from the current)
  • Parallel SQL Execution in SOLR cloud (you can work with SOLR documents using SQL queries/JDBC)
  • and other features
[!] It is absolutely clear that SOLR 6 is not ready yet to be used with the hybris 6 in production enviroment because SOLR 6 has not been tested and approved by SAP yet. However, there are situations where SOLR 6 is a good choice.

Complexity

It is not enough to replace SOLR 5.3 with SOLR 6.0 to make hybris work with the new version.  Indexing and search will not work, too many exceptions.

Challenge

To make hybris6 and SOLR6 work together. Mainly for the educational purposes.

Solution

Technical details

Configuration

Generally, SOLR 5.3 configuration may be used in SOLR 6.0.
hybris\config\solr\instances\default\configsets\default\conf ->
      solr-6.0.1\server\solr\configsets\default\conf
The following changes should be made:
  • TF/IDF scoring classes for different types will not work in SOLR 6. So comment these lines in schema.xml:
    <similarity class="de.hybris.platform.lucene.search.similarities.FixedTFIDFSimilarityFactory"/>
  • hybris RestManager storage will not work in hybris 6.0. Comment these lines in solrconfig.xml:
    <restManager>
    <str name="storageIO">de.hybris.platform.solr.rest.IndexAwareStorageIO</str>
    </restManager>;
  • update libraries. Instead of lucene-*-5.3.* you need to use lucene-*-6.0*, instead of solr-core-5.3.0.jar and solr-solrj-5.3.0.jar you need to use solr-core-6.0.1.jar solr-solrj-6.0.1.jar
  • replace MultiMaxScoreQParser with the new version (see below).

MultiMaxScoreQParser

package de.hybris.platform.solr.search;

import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.DisjunctionMaxQuery;
import org.apache.lucene.search.Query;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.search.LuceneQParser;
import org.apache.solr.search.SyntaxError;

import java.util.ArrayList;
public class MultiMaxScoreQParser extends LuceneQParser
{
float tie = 0.0f;

public MultiMaxScoreQParser(final String qstr, final SolrParams localParams, final SolrParams params,
final SolrQueryRequest req)
{
super(qstr, localParams, params, req);

if (getParam("tie") != null)
{
tie = Float.parseFloat(getParam("tie"));
}
}

@Override
public Query parse() throws SyntaxError
{
final Query q = super.parse();

if (!(q instanceof BooleanQuery))
{
return q;
}

final BooleanQuery obq = (BooleanQuery) q;
final BooleanQuery.Builder newq = new BooleanQuery.Builder();

DisjunctionMaxQuery dmq = null;

for (final BooleanClause clause : obq.clauses())
{
if (clause.isProhibited() || clause.isRequired())
{
newq.add(clause);
}
else
{
final Query subQuery = clause.getQuery();
if (!(subQuery instanceof BooleanQuery))
{
if (dmq == null)
{
dmq = new DisjunctionMaxQuery(new ArrayList < Query > () ,tie);
newq.add(dmq, BooleanClause.Occur.SHOULD);
}

dmq.getDisjuncts().add(clause.getQuery());
}
else
{
ArrayList < Query > queries = new ArrayList< Query > ();
for (final BooleanClause subQueryClause : ((BooleanQuery) subQuery).clauses())
{
queries.add(subQueryClause.getQuery());
}
final DisjunctionMaxQuery subDmq = new DisjunctionMaxQuery(queries, tie);
newq.add(subDmq, BooleanClause.Occur.SHOULD);
}
}
}

BooleanQuery result = newq.build();

//to do: to populate boosting
//result.setBoost(obq.getBoost());

return result;
}
}

Known issues and limitations

  • Scoring may work wrongly. I have not found any evidences of it, but this part of code was not moved to SOLR 6 from SOLR 5.3.
  • Index Aware Storage is responsible for storing SOLR configuration that is being managed by hybris like synonyms or stopwords. It is not yet implemented in this solution.

Any questions?

Contact me privately using the form below or leave your comment to this article: © Rauf Aliev, June 2016

Leave a Reply