Using Native SQL in SAP Commerce Cloud (Hybris)

Category: Uncategorized Author: Rauf Aliev, Timofey Klyubin 5 June 2020 No comments

One of the core components of the SAP Commerce Cloud platform is a Type System. It provides a way of fetching and updating data in the database without executing SQL directly. This system was built to abstract the database from the business objects and logic to make the solution database-agnostic and the codebase clean and readable. The layer responsible for that is called an object-relational mapping (ORM), and it is made up of entities that provide access to data and keep business rules within themselves.

In SAP Commerce, the ORM is a bit decentralized. Its role is played by several components. Service Layer, Jalo layer, Service Layer Direct (SLD), and JDBC driver are in the middle between business logic and the database. By default, ImpEx and FlexibleSearch also use the components of the Jalo Layer and also use JDBC for database access.

In this article, we are going to discuss things normally considered as controversial and inapplicable. We focus on techniques on how to bypass the ORM and directly interact with database tables and table fields and rows rather than dealing with item types.

The Commerce Cloud’s ORM layer was not designed for efficiency but rather for faster and simpler application creation and better supportability, simplicity, extensibility, and robustness of the data models. It is a real productivity booster. This is achieved, however, at the cost of simplicity since it involves only a subset of a subset of relational database abilities. Many operations, such as bulk update, or complex joins, are not implemented in the ORM and there is no easy workaround. If they were implemented, the complexity would lead to concerns about reliability.

For example, FlexibleSearch doesn’t support enums and collections. Some of the nested query types are not supported. You can’t use SELECT inside JOIN in FlexibleSearch. There are other known limitations a developer and architect needs to know. In the majority of cases, there are workarounds, but the authors of this article encountered situations where direct database access was a decisive booster for productivity and efficiency of the code.

What is not good 🙁	What is great!
SQL generation is not manageable; the process is fully automatic and not conscious to developers;	“It just works for years without any issues” You need to expect an overhead for the team involved in learning how to use a particular database. Far from all SAP Commerce Cloud developers are good in both Oracle and MySQL.
Bulk operations are not supported. We need to iterate over millions of items to change one simple attribute for all. Impex’s batchmode is internally a loop over fetched items	The risk of making serious data loss is lower when bulk updates/removes are not easy to implement
Some vendor-specific database features, such as table-valued parameters, hierarchy ids, geospatial data types, full text search are not supported.	The app is database-agnostic which makes the cost of changing the DBMS engine in the middle of development or shortly before going live feasible (but still certainly not recommended).
And so on and so forth…

However, for many junior and middle developers, the ORM is a great replacement for not knowing SQL and database engine underlying concepts. If the documentation says “use this and that, and you will get what you want”, only a few actually know what SQL is generated under the hood.

Why not use direct SQL statements in the code to bypass the ORM and access the database with plain SQL?

It is generally believed that such an approach is not a good practice for almost any solution built with the SAP Commerce Cloud platform. The best practice says that the system should not use the native SQL to access the database. Otherwise, it is also believed that there are big chances it won’t pass SAP approval and Commerce Cloud readiness check.

Additionally, the ORM layer, if used improperly, brings a lot of complexity, performance drawbacks, and overall leakiness. The issues are hard to detect and time-consuming to isolate and fix. The challenges get complicated for the tasks ill-suited for the mechanisms under the hood.

There are a number of scenarios in which using the native SQL instead of the vendor recommended ORM layer is reasonable.

When facing heavy load it is important to keep fine-grained control over the way your data is retrieved from a database and stored in there. It might become faster or much faster than the same solution built with the persistence layer. Just “faster” is not enough of a reason for you to cross over. This option may be a kind of last resort if all “normal” ways don’t demonstrate the desired performance.

In most cases, some overhead involved with using an ORM is negligible and compensated with the faster development and cleaner code. The cache hit is often at a 95%-ish level which is especially important for the operations performed repeatedly on the different levels, in different classes.

Before applying the recommendations to your code, you have to weigh the pros and cons, thoroughly stress test the system and identify the root cause of the bottleneck. Of course, we highly recommend running baseline tests before doing any optimizations.

In this article, we explain how to use native SQL in Commerce Cloud in detail.

Learn from the OOTB code

SAP Commerce Cloud comes with tons of code you can use to learn from. By looking at the source code for the out-of-the-box modules and use these as a reference, you’ll be able to understand how to address your particular problem.

A good example of using a native SQL approach in SAP Commerce is an out-of-the-box StockLevelService. This service is designed for manipulating the product inventory data in Commerce Cloud.

The complexity arises from the fact that product inventory data tend to change frequently. Product availability is often replicated from the inventory source to the Commerce Cloud database, and often there is a requirement to do this replication with the least possible delay.

In Commerce Cloud, this service was designed to keep the system stable if resources are consumed intensively which happens when the updates come with a high rate.

What we see in the code is Stock Level Service uses native SQL and direct JDBC connection rather than the ORM (model service layer). For example, the updateActualAmount method of the DefaultStockService class updates the StockLevel object via JbdcTemplate.update with a raw SQL as a parameter.

This shows that sometimes you may break the rules. “To survive”.

How to Execute Raw SQL

Tackling the Writes

SAP Hybris provides the jdbcTemplate interface to access the database directly:

jdbcTemplate.update(query, params)

The jdbcTemplate can be injected into the DAO bean in the following manner:

<bean id="someDao" class="someClass">
<property name="jdbcTemplate">
<bean class="org.springframework.jdbc.core.JdbcTemplate">
<property name="dataSource" ref="dataSource"/>
</bean>
</property>
</bean>

Alternatively, you can use the DataSource interface available in the current tenant instance:

conn = Registry.getCurrentTenant().getDataSource().getConnection();
pstmt = conn.prepareStatement("UPDATE ...");
pstmt.setString(1, "first parameter value");
pstmt.setString(2, "second parameter value");
preparedStatement.executeUpdate();

Tackling the Reads

For fetching data, you can use jdbcTemplate.query:

query = "select * from languages";
jdbcTemplate.query(
    query,
    (ResultSetExtractor) new ResultSetExtractor<Map<String, String>>(){
 public Map<String, String> extractData(ResultSet rs) 
   throws SQLException, DataAccessException {
  //implementation                        
 }
});

You can also use the de.hybris.platform.core.DataSource interface.

conn = Registry.getCurrentTenant().getDataSource().getConnection();
pstmt = conn.prepareStatement("SELECT ...");
pstmt.setString(1, "first parameter value");
pstmt.setString(2, "second parameter value");
pstmt.execute();

For all examples where a new connection is involved, you need to remember that

the connection needs to be closed in the finally block and
the embedded HSQL DB (which is often used for local environments) doesn’t allow multiple connections.

What Complicates Things is Caching…

There’s a famous saying that cache invalidation is one of the two hardest problems in computer science. The topic we are discussing today is tightly connected with a cache subsystem.

For direct access to the database to make modifications, the cache becomes an important element we can’t ignore. For fetching data, this cache is used super-intensively to avoid overwhelming a database server. This is because In SAP Commerce FlexibleSearch is used both for fetching data from the database and from the memory cache which works a kind of relief for a developer. When a developer decides to call a service, they just include it into the code, and “it just works”. Under the hood, this service may produce hundreds or thousands of flexible query statements, but the great majority ends up hitting the cache.

For example, in SAP Commerce Cloud, there is an item type for handling UI languages, Language. The system queries this type thousands of times per request. It’s relatively expensive to fetch the language information the first time, but further readings of the same data will perform much faster because the relatively slow database won’t be involved.

If you change data in the database directly, bypassing the ORM, a cache subsystem may deliver the old value to the requesting code even if a new value is already in the database. As a result, it will be hard to predict what version of an item the SAP Commerce platform will use when it is requested. Moreover, Is it very likely that within the same thread/request the different parts of the system will get different versions of the item. Also, there is a risk that two subsequent calls will deliver different values of an item.

So, technically, the direct database modifications (bypassing the ORM) are “conditionally acceptable” if these changes are followed by a cache invalidation or cache is disabled for the item.

In the mentioned DefaultStockService the method updateActualStockLevel contains cache invalidation code:

jdbcTemplate.update(query, new Object[]{theNewAmount, stockLevelPkAsLongValue})
de.hybris.platform.util.Utilities.invalidateCache(stockLevelPk);
this.getModelService().refresh(stockLevel);

… And FlexibleSearch Restrictions

Above we considered the data modifications using direct SQL. At first glance, querying data doesn’t introduce any serious complications. But that is not true.

In SAP Commerce Cloud’s ORM, searching for types and items is provided by the query language called FlexibleSearch. The syntax is similar to SQL, and the translation is very basic. The system transforms a flexible search query into SQL query taking into account the specifics of concrete RDBMS, database schema, and its content.

What is important here is that the ORM automatically adds restrictions to the FlexibleSearch to narrow the search. This mechanism is transparent. The search results are “automatically” limited according to these rules, system-wide.

This mechanism is called “Flexible Search Restrictions”. A restriction is a set of rules added to flexible searches to limit the search results based on certain specific conditions.

… And Type Access Control

The Commerce Cloud’s ORM respects the type access control settings. Often, these settings help developers to restrict access to the objects for a user group or specific users. Direct database access will bypass this mechanism.

… And Interceptors

Interceptors are callbacks used at different steps of the SAP Commerce model’s life cycle. These callbacks won’t be called if changes are performed directly via JDBC.

We don’t recommend using direct database access for the cloud version of SAP Commerce Cloud. The cloud version uses Azure’s SQL and all above will technically work well with this service, but lack of access to the cluster may create hard-to-detect-and-resolve issues with data consistency and reliability.

Read more: In SAP Commerce Cloud, almost all tables are created by the persistence layer during the initialization or update phase using the XML definitions. Knowing their structure, the field assignments, and how to interpret the values are essential for better understanding of the processes. I recommend reading the article devoted to this topic: https://hybrismart.com/2018/05/06/explaining-hybris-database-structure-2/

In the next part, we are going to dive deep into the technical details about how to make native SQL queries co-exist with the SAP Commerce ORM engine.