How to make native SQL coexist with SAP Commerce Persistence. Part II
Intro
In this article, we’re going to continue on the recent theme on using native SQL in SAP Commerce.
The essentials were thoroughly explained in the first part of the series. Since the topic is controversial and should be used with caution, we highly recommend reading the introductory part before going further with this piece.
You need to be aware of the potential risks, limitations, pitfalls, and caveats of the ORM engine. In this work, we are explaining those details. We’d like to touch upon a few elements or aspects, you need to take into account when considering using native SQL in your SAP Commerce code.
Multi-Tenant Awareness
The first aspect we would like to mention is multitenancy. In SAP Commerce, multitenancy is the ability of the platform to host multiple independent logical instances of the application with the same code separated at the database level. For example, to host online shops for different customers using a single SAP Commerce installation.
Very likely, you probably already have an extra tenant configured in your SAP Commerce: unit and integration tests are often preconfigured to use a dedicated tenant for automated tests, “junit”. The idea behind this framework is to separate test data from the project-specific content. Using this mechanism, you can run tests on the exact same codebase you run the main application, but against completely isolated data so that nothing important cannot be affected by the tests. Another advantage is that a JUnit tenant needs much less space and smaller dataset for testing.
To read more about SAP Commerce Cloud Unit Testing, follow this link: https://help.sap.com/viewer/d0224eca81e249cb821f2cdf45a82ace/2005/en-US/8c6e8668866910148fc390638f82bad2.html
Table Names in Multi-Tenant Setup
In order to share the same schema for multiple tenants, SAP Commerce uses prefixes to segregate tenant-specific datasets.
Let’s consider a query that selects an average price from the database using JdbcTemplate:
Double avg = jdbcTemplate.queryForObject(“SELECT AVG(p_price) FROM pricerows”, Double.class);
In this example, we used a hardcoded table name and table attribute name. What do you think, is it a bad practice? Why?
There are two scenarios when this code may create issues:
- In a multitenant setup, this code will work only in the master tenant. All other tenants including junit will have prefixes to the table names which are not involved in the query composition in our example.
- Theoretically, the deployment table name can be changed too, especially before the system is still under active development and nothing is set in stone yet. It is far from common, but it happens.
To get the deployment table of the current tenant, we use the OOTB TypeService to get the ComposedType instance for the specified type, and then get the table name. The code looks as follows:
ComposedTypeModel ct = getTypeService().getComposedTypeForClass(PriceRowModel.class); String priceRowsTableName = ct.getTable();
The table name can be used in your native SQL query. Of course, this will require additional database queries, but it can be effectively cached using the built-in cache mechanisms or, on top of them, custom ones.
A similar approach can be used to avoid hard-coded table field names, using the TypeSevice’s getAttributeDescriptor() method.
Summing up, never use hardcoded table and attribute names to get your system compatible with the multi-tenant setup and ready for changes in type configuration.
Using Flexible Search In JDBC Manner
Normally Flexible Search query returns a collection of SAP Commerce models. However, sometimes you may need to extract individual fields or aggregations such as sum() or count().
Although this approach is documented, not everyone knows it well. Instead of selecting PKs you can select the individual fields and map them to the particular types. The mapping is declared with the method “setResultClassList”:
final FlexibleSearchQuery q = new FlexibleSearchQuery("SELECT {stringField}, {integerField} FROM {MyItem} WHERE ..."); q.setResultClassList(Arrays.asList(String.class, Integer.class)); ... SearchResult<List> result = flexibleSearchService.search(q); for (List row : result.getResult()) { String stringFieldValue = row[0]; String integerFieldValue = row[1]; ... }
Such an approach greatly reduces the overhead produced by a service layer and it is basically the only way on how to use aggregation with FlexibleSearch:
final FlexibleSearchQuery q = new FlexibleSearchQuery("SELECT {MyItem.field1}, count({MyItem.field2}) FROM {MyItem} GROUP BY {MyItem.field1}"); q.setResultClassList(Arrays.asList(String.class, Integer.class)); ... SearchResult<List> result = flexibleSearchService.search(q); for (List row : result.getResult()) { String field1 = row[0]; String countOfField2 = row[1]; ... }
This way, the service layer model would not be created and all the overheads related to that will be gone. Notice that you should not specify the result type and that you can return more than one type in the same select.
(Our colleague, Yegor Zuev, reminded us to mention this approach here. Thank you, Yegor!)
Retrieving Data
In this section, we cover several ways of retrieving data using native SQL. Usually, we operate with data using Models, so, depending on the case, the “raw” data returned by a native query needs to be processed in order to be usable by the service layer.
Retrieving Service-Layer Entities
The SAP Commerce FlexibleSearch service is designed to work with type system entities rather than database tables. It helps to work with complex structures without a deep understanding of how they are stored in the database, the approach makes the code more readable and supportable. Even though you can select primitive types using Flexible Search, the most common usage is to select Models.
Let’s see how you can achieve the same when your query is written using native SQL.
For example, you can retrieve the columns:
List<Map<String, Object>> = jdbcTemplate.queryForList(“SELECT p_product,p_price FROM pricerows WHERE …”); if ((rows != null) || (rows.size() > 0)) { for (Map<String, Object> tempRow : rows) { println tempRow; } }
Or the PKs:
Long pkLong = jdbcTemplate.queryForObject(“SELECT pk FROM pricerows WHERE …”, Long.class);
The extracted PK is just a unique id of the item type instance. Of course, the database knows nothing about SAP Commerce entities, and we have two ways to collect all fields in the resultset:
- Reformulate the query by adding JOINs. This approach works well only for simple cases.
- Convert the extracted data into the models via the SAP Commerce Service Layer (modelService).
The second option may not be as fast as the first one, but in many cases, it is better in terms of supportability and extensibility:
PriceRowModel priceRow = modelService.get(PK.fromLong(pkLong));
What we see under the hood is a bunch of SQL queries generated by this model service method, namely:
-
SELECT pk FROM pricerows WHERE ...
-
SELECT * FROM pricerows WHERE PK=?
-
SELECT * FROM composedtypes WHERE PK=?
-
SELECT * FROM attributedescriptors WHERE EnclosingTypePK=?
-
SELECT * FROM composedtypes WHERE PK=?
-
SELECT * FROM atomictypes WHERE PK=?
Remember, however, that the platform’s query cache takes care of a query before the persistence layer sends it to the database to reduce the number of database calls. The tables which names end with “-types” and AttributeDescriptors contain the type configuration needed to map the response from the database into Java classes.
It looks like we are querying the pricerows table for the second time. In reality the measure of “how bad this is” depends on what the use case is. If you want to use the data straight away — this is actually what you want. But what if you don’t want to use it right now, or you only need one or two attributes? What if we need to make a fetch call to the database only if the attribute is in use (lazy loading)?
There is a solution with Direct Persistence to create a PriceRowModel without any preloaded data. The Groovy script below demonstrates how it works:
import de.hybris.platform.core.*; import de.hybris.platform.persistence.*; import de.hybris.platform.europe1.model.*; import de.hybris.platform.directpersistence.cache.*; import de.hybris.platform.servicelayer.internal.model.impl.*; def tenant = Registry.getCurrentTenant(); def pool = Registry.getCurrentTenant().getPersistencePool(); def mgr = Registry.getCurrentTenant().getPersistenceManager(); def resMap = jdbcTemplate.queryForMap('SELECT pk, hjmpTS FROM pricerows limit 1') def pkLong = resMap['pk'] def hjmpTs = resMap['hjmpTS'] def pk = PK.fromLong(pkLong) def typeInfoMap = mgr.getPersistenceInfo(PriceRowModel._TYPECODE) def typePk = typeInfoMap.typePK def sldContainer = SLDDataContainer.builder() .withPk(pk) .withTypePk(typePk) .withTypeInfoMap(typeInfoMap) .withVersion(hjmpTs) .withAttributes([]) .withLocalizedAttributes([]) .withProperties([]) .build() def persistentObj = new SLDPersistenceObject(sourceTransformer, sldContainer, sldRelationDAO) def priceRow = modelService.get(persistentObj)
The resulting priceRow object is an instance of PriceRowModel. This instance doesn’t have any data, and this is why it is called “lazy loading”. It contains only the information on how to to retrieve it if the needs arise. With the first call of any getter method, the item will be physically loaded into the instance.
The script above produces only a single SQL query:
SELECT pk, hjmpTS FROM pricerows limit 1
This approach may be useful in some cases because it reduces the overhead at the time of the data searching.
Once you call any getter method, the platform starts the initialization of the model. At this point, all the model’s fields are going to be queried and be put into the entity cache. Thus, all subsequent calls to getters of the model won’t generate any additional SQL queries, provided that the cache has not expired.
The call will produce a bunch of SQL queries:
SELECT * FROM pricerows WHERE PK='8796093056031' SELECT * FROM props WHERE ITEMPK='8796093056031' SELECT item_t0.PK FROM languages item_t0 WHERE (item_t0.TypePkString=8796095447122 ) SELECT * FROM languages WHERE PK IN (8796093087776,8796093218848,8796093186080,8796093513760,8796093415456,8796093120544,8796093546528,8796093153312,8796093317152,8796093448224,8796093579296,8796093382688,8796093251616,8796093284384,8796093480992) SELECT REALNAME,LANGPK,TYPE1,VALUESTRING1,VALUE1 FROM props WHERE ITEMPK = 8796093055008 SELECT REALNAME,LANGPK,TYPE1,VALUESTRING1,VALUE1 FROM props WHERE ITEMPK = 8796093087776 SELECT REALNAME,LANGPK,TYPE1,VALUESTRING1,VALUE1 FROM props WHERE ITEMPK = 8796093120544 SELECT REALNAME,LANGPK,TYPE1,VALUESTRING1,VALUE1 FROM props WHERE ITEMPK = 8796093153312
(Don’t put much attention to plenty of same-looking SELECTs at the end. They are present because a wider set of languages are activated in my setup (one query per language) which is probably not the case for you. On a typical project, you would have only a couple of languages enabled, and you will have a shorter list of SELECTs).
There we go, the entire PriceRowModel was fetched at this point.
Updating Data
In this section, you will learn about the hybris cache system. ORM-level caching is what makes data updates challenging, and this is why this topic deserves close attention.
Using L1 cache for storing JDBC queries results
An L1 cache in Hybris is a mechanism to cache an entity state in the current persistence session, a “level-1” cache. This cache handles entities that fall under a specific session and is cleared once a session is closed. If there are multiple transactions inside one session, all entities will be stored from all these transactions.
For instance, in the previous example with the priceRow object, if you call the getPrice() a second time, no SQL queries will be issued since the data is already populated into the cache object and can be retrieved from memory in the faster manner. If the cache object changes, Hybris will evict data for this object from the L1 cache. Hybris will also broadcast a message throughout the cluster, telling other nodes to clean their L1 caches for that entity.
This mechanism is deeply integrated with the Persistence layer.
That being said, an L1 cache is really useful for caching values that depend on an entity state. For example, handling an available-to-sell amount for a given StockLevel object. The available-to-sell amount is calculated as the available value minus the reserved value. The corresponding SQL query result always depends on these two values.
So, instead of calculating this value every time, let’s make it into a new field for the StockLevel type. How do we do that? SAP recommends using dynamic attributes for the purpose. The dynamic attributes mechanism is a great tool, but not for the case if you want to query the DB in the handler. It strips off a major flaw of the dynamic attributes: they don’t support any caching mechanisms. The getter call is delegated to the handler straightaway.
Real-life case
On one of the projects at EPAM, we had an interesting challenge. The product catalog is relatively small (~1500 unique SKUs), but a number of orders was very high (thousands per day). The probability of having two orders with the same SKUs was rather high. The business logic involved checking stock levels before an order is placed and stock reservations are sent to the warehouses. It created a situation when multiple threads were trying to update the same row in the database. As a result of that, checking stock levels took up to a minute. Such delays directly affected the user experience.
One of the solutions was to publish an event when a reserved amount is changed rather than directly update the field in the DB. This concept is called Inventory Events and often used to handle such scenarios. Since you don’t have to lock any rows in the DB, this operation runs much faster.
However, because of the delays in stock updates, you cannot solely rely on the available amount in the hybris’s stock level anymore, and therefore you should rethink the way you check the actual available amount for a given product. The stock level is not updated immediately anymore, so there is a delay between the event being published and the time it is processed and the reserved amount is updated. Now, when calculating the ATS (Available to Sell) value, you must add the total value of all events for a given stock level. Basically, it is a left join query:
SELECT sl.reserved + SUM(sle.value) FROM stocklevels sl LEFT JOIN stockleveleevents sle ON sle.stocklevel = sl.pk WHERE sl.product = ?
However, in a big, long-lasting project you cannot simply change every place where the reserved value is used. Some of the code is the OOTB platform code and you cannot easily change it. To address this problem, we decided to modify the “reserved” attribute into a dynamic one, so that at all places where it is used it would automatically yield the correct value.
The OOTB approach would be to use dynamic attributes. A dynamic attribute is a one that when you call its getter, the dedicated handler class is called and arbitrary custom code is executed. As we mentioned above, there is a problem with this approach that it doesn’t have any caching mechanisms in place, but at the same time it is supposed to mimic persistent attributes of an entity, and they have caching to prevent unnecessary calls to the DB.
We needed to invent or reuse a caching mechanism that would cache the calculated reserved value, but which would invalidate the cache as soon as this value changed. And the best solution was the one used for normal persistent attributes — it caches values, invalidates the cache when they change, and it propagates those changes throughout the cluster — you just need to find a way to make use of it.
Coming back to our L1 cache now. To use the L1 cache in a convenient way we need to understand how the models are composed. On a high level, each model is a wrapper for a persistence object. Persistence objects implement low-level operations, such as making queries to the DB, fetching relations, etc. The model holds the L1 cache and uses it to store the data it got from the persistence object. This is the actual reason why the dynamic attributes aren’t cached: they are implemented at the model level, so from the model’s perspective, it didn’t get any data in the first place, so there’s nothing to cache. Therefore, in order to utilize the L1 cache, the query must be executed from the persistence object.
If you worked with older versions of Hybris, you are probably already familiar with one type of persistence object — Jalo objects. In very old versions of Hybris, Jalo objects were the only available way to work with DB. Later, the Service Layer concept was introduced, along with the ModelService. Models provide a more convenient and high-level interface for managing persistent data, while still using the Jalo objects under the hood.
The second type of persistence object is `SLD persistence`. SLD stands for ServiceLayer Direct. We’ve used SLD in the example with the lazy-loaded model.
Custom L1-cached values with legacy (Jalo) persistence
The first and easiest method to cache computed values in the L1 cache is to compute them inside the Jalo class (to make sure your models are backed by Jalo classes, the legacy persistence must be enabled). Hybris Commerce still provides a convenient way to customize Jalo classes for your models. Just as a refresher about how the things work: in the items.xml file in the type declaration you specify the Jalo class name that will be generated for that model:
<itemtype code="StockLevel" jaloclass="de.hybris.platform.ordersplitting.jalo.StockLevel" extends="GenericItem" autocreate="true" generate="true">
Then, inside the de.hybris.platform.ordersplitting.jalo.StockLevel class you can either override getAttribute() method or directly the getter for your custom attribute. The attribute, of course, must be declared in the items.xml first.
Let’s say our stock levels allow overselling, and to calculate the maximum overselling value for a given stock level we need to join another table that contains the replenishment data. The modified StockLevel Jalo class may look like this:
@Override public Object getAttribute(SessionContext ctx, String qualifier) throws JaloInvalidParameterException, JaloSecurityException { if ("maxOversellingAmount".equals(qualifier)) { return jdbcTemplate.queryForObject( "SELECT AVG(rp.p_statistical) - AVG(rp.p_contingency) FROM replenishmentstats rp WHERE rp.p_productcode = ?", Long.class, this.getProductCode()); } return super.getAttribute(ctx, qualifier); }
And that’s it. Now, if you make consecutive calls for the getMaxOversellingAmount() getter on the StockLevelModel class, it will only produce a single SQL query, and the getAttribute() method will be called only once.
Custom L1-cached values with direct persistence
If you have legacy persistence disabled (which, sadly, is still not recommended by SAP), it means that all your models are backed by SLD persistence objects. Unfortunately, there is no easy way to create a custom attribute which would be cached in the L1 cache. It is, of course, doable, but you would have to extend the DefaultSLDItemDAO and put a custom AttributesExtractor in there. Surely one can make it in a nice and usable way, but it is not a simple task.
Invalidating the L1 cache
When working a database directly, you must be aware of cached models. Read-only operations are relatively safe, but even they can introduce issues, especially if you were to cache their results in the L1 cache, as described in the previous section.
Updating the database directly requires that you invalidate the corresponding cache.
Hybris Platform provides a convenient way to invalidate a single item: invalidateCache() method from the de.hybris.platform.util.Utilities class. It handles region cache update, L1 cache reload and cluster multicast. In the example with calculating a maximum overselling amount for a stock level, you would have to invalidate the stock level object whenever the related replenishment statistic is updated.
Bulk Updates
One of the powerful ways you can utilize the native SQL queries in your project is bulk updates. In the OOTB platform, you have only two options: ImpEx in the batch mode or custom code (e.g. Groovy script) which operates the models. Both ways have a huge overhead for simple operations like updating a single field. Instead, you could just run the UPDATE query updating multiple rows just with a single query.
Since ImpEx and ModelService conveniently handle the cache invalidation for you, in the case of using a native query you have to do it yourself. The easiest way to do it is to call the invalidateCache() method from the de.hybris.platform.util.Utilities class for each updated PK. If the number of updated entities is very large, you may want to consider invalidating the entire cache region, since it’s going to be a lot faster than iterating through all the updated objects.
Below is a simple example of invalidating a cache region would look like this:
def regions = cacheRegionProvider.getRegionByName('myCacheRegion'); regions.each { region -> region.clearCache() }
(Don’t forget running this script on each node if you are in a clustered environment)
// In this article, all examples were tested on SAP Commerce Cloud v1905 patch 8.