Behind the Scenes of the New SAP Commerce Cloud: Architecture details
This article continues a series “Takeaways from SAP CX LIVE Barcelona”. This time I share my notes and thoughts based on the great presentation given by Axel Großmann, an enterprise architect for New Commerce Cloud.
The slides below and my comments in between were taken from the conference. The PDF version of them is available for participants, and, as noted on them, has a public status. The commentaries between the slides is a compilation of what I took away from the session and slides and the author’s notes at the slides.
What is Commerce Cloud?
In order to avoid any confusion as to terminology, let’s define what is SAP Commerce Cloud today. We know, that it is a new name for SAP Hybris Commerce since this summer. However, a cloud offering went as a deployment or licensing option, but not as part of the e-commerce platform.
According to the new vision, SAP Commerce Cloud is not only a software package, but it is also a cloud solution, where many components are involved, and only one of them is a good old Hybris Commerce, now rebranded. Of course, the platform has been renamed not only of the high marketing thoughts but also because of significant technology shift: the SAP products are now braided together and all these are woven into the cloud software. Of course, you are still able to pick only the platform and host it traditionally, and in some cases, it may be even reasonable, but it is getting more and more obvious that for the large business the “traditional way” is no longer an option.
Another thing that adds to the confusion is two different offerings named SAP Commerce Cloud. The first is CCv1 which is no longer relevant today. The second is CCv2 which is a topic of this article.
CCv1 is related to the offering SAP has four or something years ago. That was the first attempt of SAP Hybris to go to the cloud. Customers owned the build pipeline, SAP owned the deployment pipeline, infrastructure was provisioned and managed by SAP staff only, no automation, no self-service. Even the smallest change required filling out the form and waiting for hours or days. Personally I have a CCv1 project in my portfolio, so it is a firsthand knowledge. We developed and launched it on time, but maintenance… it was a nightmare. This offering is no longer relevant today. We all happy to forget it now.
For the new edition of this service (CCv2), things look much, much better. The process is fully automated. You don’t need any assistance for the key operations. SAP now owns both the build and deploy pipelines, infrastructure provision, build and deploy are automated, the system is integrated with Git repository, and there is a powerful self-service portal.
Unlike the first version, CCv2 is completely on top of the public cloud now. The key components or layers are in a sub-cloud. Technically, CCv2 is on Azure, but the strategy is to be multi-cloud, so Google and AWS are in the roadmap.
In the heart of the Commerce Cloud offering, there is a platform core, which is very similar to what we have as “on-premise” version. The cloud version needs to be cloud-ready because it is supposed to work as in the containerized form. This approach creates some constraints, such as image immutability principle (containerized applications are meant to be immutable, and once built are not expected to change between different environments) and process disposability principle (containers need to be as ephemeral as possible and ready to be replaced by another container instance at any point in time). In the cloud version of the SAP Commerce Cloud, these issues have been addressed, but the details are still hidden from us.
This article uncovers some of the ideas and philosophy underlying CCv2 as a cloud offering.
Axel Großmann highlighted the following key paradigms of the New Commerce Cloud:
- Self-Service. Central Management Portal and Common tasks without tickets.
- High Degree of Automation. Environment setup, deployment, backup and restore.
- Standard Facilities. Central logging, maintenance controls. Performance analysis and scaling are not yet available now, but will be available in 2019.
- Standard Build Pipeline. Build, test and release within Commerce Cloud. The system operates with container images.
- Pre-defined Commerce Setup. Pre-defined cluster roles, one database and one media storage.
- Standard Cluster Deployment. Kubernates operator, Apache Ingress and non-interactive initialization/update.
There are two architectures,
- Platform. Contains centralized components which are common for all customers
- Subscription. Contains customer project specific components. The configuration of these is under control of the customer.
Main platform component and services
- Automation Engine
Main subscription components and services
- Azure services and API, namely
- External services
- CatchPoint monitoring (availability)
- CDN. It is not available yet, but it is in the roadmap.
- Open source solutions
- Metrics. It is not available yet, but it is in the roadmap.
The cluster software is not very flexible, so if you want to use Nginx, for example, as a load balancer, you might find it hard to argue SAP to install it. However, the current stack is pretty good and based on the battle-proven enterprise-level components.
On the database side, SQL Azure is used. For now, it fits well, because it is cloud-native, database as a service. You can subscribe to that, you can select the desired performance tier. Azure Blob Storage is used for media storage. Azure Container Registry is used for CCv2 docker images.
So everything you need is specifying the branch/tag for the source code and the target environment, and the automation engine will create a task for a build and deploy, create a container, upload container to the Azure Container Registry, and create or update a cluster in the target environment.
Each environment has its own setup. Kubernetes ensures that the image is installed in the cluster and launched successfully.
The cluster is separated into four buckets, one per server role:
- Storefront (SF)
- Backoffice (BO)
- Background processing (cronjobs, BG)
- API and SOLR (SO)
On the production environment, you have a bigger number of storefronts, a fair amount of backoffice instances.
SAP Commerce Cloud has a preinstalled logging stack, ElasticSearch + Kibana + FluentBit. For metrics, SAP uses also the standard stack, Prometheus + Grafana. Both for logging and metrics, there is single sign-on authentication via SAP CI. Log format is JSON.
Commerce Snapshot and Restore
SAP also has a standard snapshot/restore mechanism. For the database, it is backed up as blob. Now it is not possible to backup only a subset of tables or records via this mechanism. For the backup, the commerce part is paused, then the system creates a snapshot of the database and media, and then resumes the system back.
For the Restore process, the Commerce platform is stopped completely, the existing media blob and the database are replaced with the snapshots, and the platform is re-deployed. Snapshots can be copied and restored on other environments.
Customizations and Build Process in Commerce Cloud
How do you get your customizations into Commerce Cloud. The CCv2 system builds the system automatically directly from the source code repository. The Commerce App and other SAP modules are updated monthly, so new automation features and bug fixes will be applied regularly. The system merge the commerce app and custom code and build it into containers. The container is deployed into the target environment or environments.
At the diagram below, there are three components:
- CCv2 Platform (blue)
- CCv2 Commerce Release (yellow)
- CCv2 App XYZ (black) – a module or application from SAP supposed to extend Commerce and Platform functionality (mix-in features)
There is a simplified option, for non-customizatble applications. It is perfect for simple or standard solutions based on Commerce stack or when the solution is fully managed and operated by SAP. Customers are allowed to do partial configuration and operations. For this option, SAP is developing Product Content Cloud (PCC). Unfortunately, to date, we know nothing but the name.
The new configuration file is manifest.json. It defines how your application should be built by including baseline properties with specific configurations. This file contains the following blocks:
- a version of the Commerce Suite
- list of extensions (similar to what we had in localextensions.xml)
- properties (key / value / environment /or persona as it is called here/)
- storefront addons (addon / storefront / template)
- aspects configuration (for example, backoffice or background processing specific)
- tests configuration
So the build process is performed in the cloud, the result of it is a set of docker images. Platform code is not in the project repository: the system has its own artefacts repository where all versions and extensions are stored (and updated with hotfixes if a need arise)
There are four types of cluster component roles (or aspects): storefront, backoffice, API and background processing. Each aspect has a number of containers of the same image. On top of each aspect there is a dedicated Apache Web Server and Azure Load Balancer. Solr and zookeeper are separate roles. Apache Ingress provides load balancing, SSL termination and name-based virtual hosting and routing to services and service discovery.
Each role (aspect) needs its own configuration:
- Frontend type traffic (from shop web app)
- Many small concurrent requests = high amount of CPU
- Needs big cache = high amount of memory
- No background processing enabled
- Backend type traffic (from backoffice web app)
- Medium amount of concurrent requests = medium amount of CPU
- Medium cache size = medium amount of memory
- Exclusive background processing enabled for ‘backoffice’ process types
- Client / Integration type traffic, for example, from Kyma
- Medium amount of concurrent requests = medium amount of CPU
- Mostly transactional non-cacheable requests = low amount of memory
- Exclusive background processing enabled for API process types
- “Background processing”
- No traffic, event and self-driven processes
- Medium amount of concurrent processes = medium amount of API
- Some ‘heavy’ processes = medium amount of memory
- General background processing enabled
Initialization and Update
In case of INIT and UPDATE, the Commerce platform will be fully stopped (and started again once the changes are applied). There is a special mode, LIVE UPDATE. With them, non-critical aspects stay online, the system updates a copy of the type system, then switch from the current to the new one.
Third party server/software
In the CCv1 (legacy) version, customers were able to request an additional server to host “whatever they like”. In the new CCv2, this is no longer an option. Customers have to either rely on a service provider or to host the service by themselves.
For example, if you need to use ImageMagick, it is required to have it along with the platform which will not happen at new Commerce Cloud. The customer can host ImageMagick as a service at an external infrastructure.
The same applies to such products as Hazelcast, Redis, Memcached, Varnish and so forth.
In the legacy version, e-mails are sent via SMTP (you need SMTP Relay Server). In the new CCv2, there is an SMTP Relay Service which provides both SMTP and WEB API. Batch e-mails are not supported.
The following diagram shows the architecture of the Kubernetes cluster.
Scaling is an ability of the cluster to increase a number of resources allocated in reply to the increasing demand, manually or automatically.
Currently, the feature is not directly usable by customers of fixed-price subscriptions. Axel joked that if it is available for the customers, they all will always set the slider to the max. So this component needs to be designed wisely, and, as I understood, SAP checks different options now.
Currently, there are no metrics or scheduling in the system available for the customers. As Axel explained, in a long run, it might not be useful too, because customers will be given a guarantee if they run on Commerce Cloud, and they are within the boundaries of their contract, SAP cares to keep KPIs (average response time etc.) in the agreed range.
The Scale Operator, a special component, watches Kubernetes pods not being able to be deployed and grows the cluster by new worker vms. It also watches overall deployed pods to shrink the cluster again.
This technical session showed up a lot of interesting details for me. We reconnected with Axel after the session and discussed the topic in details. I hope I will come back to the topic with the deeper details very soon here at Hybrismart. Stay tuned!