Integrating Contentstack with SAP Commerce Cloud and Data Migration
What content management system you use is of a much higher importance today than it was years ago. Today’s CMS needs to contend with a multi-device world with a wide variety of tablets, mobiles, laptops, and be architecturally ready for new not-yet-existing channels in the consumer’s universe, such as AR/VR and the Internet of Things.
A CMS typically consists of a frontend part, or a «head», and backend. The headless CMS, as their name says, don’t stick to a single predefined “head”. Theoretically, you can develop your own if it is good for your case. Another name for headless CMS is API-first because this interaction is based on APIs.
For example, the old hybris CMS Cockpit had both parts built with ZK Framework, which orchestrated both the HTML/Javascript “head” and was the essential part of the CMS backend. These parts were tightly connected and intertwined into a single unit called CMS Cockpit.
Later, SAP announced Smartedit, which has been a big bet and a quantum leap for Commerce Cloud’s content management. Spartacus, the open-source Javascript storefront from SAP, well complemented Smartedit. I published the comprehensive reviews of both products in the blog earlier.
There are a number of headless content management systems that are mature enough to efficiently replace the built-in Smartedit CMS. Among them, I would highlight Contentful, Prismic, DotCMS, Cloud CMS, Amplience, and Contentstack as well-established and innovative products with clear roadmap and good integration capabilities.
One of my recent research projects was aimed at using Contentstack as a headless content management system for SAP Commerce Cloud. This article uncovers the key highlights of the research. As a result of that work, I also created a data migration tool for moving existing CMS content from Commerce Cloud to Contentstack.
What is Contentstack? Contentstack is a SaaS headless CMS with an API-first approach that offers the approach known as Content-as-a-Service. It focuses on managing structured content into data feeds that other applications, such as Commerce Cloud, can consume. You can design a content data model interactively or via the Contentstack API and populate the structures with the content interactively or via the same API.
In this article, I pay special attention to the data migration from Commerce Cloud’s CMS to Contentstack. This exercise helped me to see the complexity in all its glory. Because when you design a system from scratch, you can evade the platform’s constraints by designing the structures compatible only with the new CMS. When you need to migrate data, you have to find a solution on how to adapt concepts and data models to get them compatible with Commerce Cloud.
In the project, the JSP storefront was reorganized to a custom-built Angular 2 Javascript frontend that posed a need to have a content API to deliver tons of content to the customer in the native Angular 2 way, via REST calls. Commerce Cloud does offer the content API via the Omni-Channel Connect CMS API, but the strategy was to decouple content from the commerce platform completely. Contentstack was considered as the best in class for such needs.
Contentstack in the Nutshell
As a headless content management system, Contentstack provides the essential infrastructure to create and manage content. It is not coupled to any single presentation layer which makes it ready to work with any storefront and backends.
Contentstack provides a comprehensive API, and this API can be roughly split into two sets:
- Content Delivery API and
- Content Management API.
Content Delivery provides REST APIs to deliver content to any channel such as websites, mobile apps, devices, marketing kiosks or any digital platform that displays content. Contentstack limits the maximum number of requests that you can make in a given period, and this constitutes the pay-as-you-grow approach. Contentstack provides the comprehensive query language you can use to fetch data from the storefront. Using GraphQL Content Delivery API you can fetch customized response or retrieve data of nested or multiple resources through a single API request.
Content Management provides REST APIs and user interfaces for creating/editing/deleting entries and assets, managing content types. It looks like a basic user-friendly database management system tailored for simple structured content and enhanced with some CMS-specific features such as content authoring/publishing and multilanguage support. If you are equipped with sufficient permissions, you can create a «database table» and populate it with the items using Contentstack backoffice. The input forms for your custom data fields will be generated automatically, you can change their order.
What is great that Contentstack allows you to inject your own Javascript logic into the auto-generated input forms for the content types which allows you to customize field appearance and fetch data from the third-party apps. Using extension API you can create custom fields such as color picker or youtube video selector. Contentstack Webhooks allow you to specify your custom URL to which you would like Contentstack to post data when an event happens. If you want to be notified every time a new entry is created in a content type, you can create a webhook for it.
The Content Type concept is about the same as an Item Content Type in Commerce Cloud. Unlike SAP’s, in Contentstack you can manage content type attributes during runtime. Currently Contentstack doesn’t support inheritance and versions, but it can be implemented via attributes and extension mechanisms, such as «webhooks» and injectable Javascript code. The number of types is limited and some of the complex types have limitations (such as number of them per content type).
Contentstack doesn’t support page layout management. This is because page layouts are tightly coupled with the specific content composition logic, such as content slot-components model in SAP Commerce Cloud.
Contentstack provides basic mechanisms for asset management. You can upload images and binary files and refer to them from the content types.
Contentstack supports multiple languages. However, it is different from how multiple languages are supported by SAP Commerce Cloud. In Contentstack, the language version belongs to the whole content type item while in Commerce Cloud the language version is handled mainly at the attribute level. For example, if you have three languages and a news feed of three entries, in Commerce Cloud you will have three entries and up to three language versions for all localized attributes while in Contentstack you will have up to nine entries, namely, up to two language versions of three original items.
Javascript Storefront and Page Layouts
The page layouts are not normally part of content management, because they are not device agnostic. The best practice is to decouple content management and page layout management. You can configure the page layouts in the storefront code or you can create a separate API for them.
Below is the example of the page layout I dealt with during my research. It is one of 41 page templates used in the real project. The template defines how slots are organized and how they are pre-populated (if they are).
Some CMSes are focused on the page layouts while others are focused on the data leaving page layouts for the customer-facing application.
Each has pros and cons.
Page layout management would help marketers to quickly add a second Banner component to a page during runtime, without a need of redeploying the storefront application. Possibly, such flexibility is not required for all pages.
Some of the headless CMSs, such as DotCMS, provide Layout-as-a-Service to deliver the benefits of traditional CMS-driven experience with the developer friendliness of Content-as-a-Service. Contentstack doesn’t offer page layout management.
For example, in my project, each of 399 pages were based on the template shown above. The differences are in the components assigned to the slots. The typical page may contain the following components assigned:
- Title Component (text + how-to-display params)
- TopBanner (image + text + link)
- Event overview (image + text)
- Event container
-
-
- a list of events, each event has
- thumbnail
- title
- link
- a list of events, each event has
-
- «Disclaimer» (visible only for usergroup B)
- BottomBanner (image + text + link)
Since the major part of the project was component and page data migration, the page layouts have to be data driven as they were in Commerce Cloud.
There are two types of Content Types in Content Stack, Webpage and Content Block. Webpage content type is designed for handling webpages such as Home, Contact Us, and so on. The Content Block content type allows you to create chunks of data, such as Banners, which can be used by the webpage. Irrespective of the type of content type, you can mark it as either single or multiple. ‘Multiple’ lets you create more than one entry.
Another concept we need to know better to understand the challenge is Modular Blocks. With Modular Blocks, developers can create multiple or dynamic sets of fields for an attribute. Content managers can easily create content by choosing a block from the fixed list of allowed blocks, add more blocks, rearrange them, or remove them. It is a sort of collection of content items in Commerce Cloud, but the items can be of different types (predefined).
According to the design of the data migration tool, the following Commerce Cloud objects are to be transformed to the following Content Stack entities:
SAP Commerce object |
Content Stack object |
CMS Component |
Block Content Type Content, multiple entries |
CMS Component instance |
Block Content Type Content Entry |
CMS Page |
Webpage Content Type Entry |
CMS Page Template |
Webpage Content Type / multiple entries |
CMS Page Slots |
Webpage Content Type Modular Blocks |
CMS Page Slot Components |
Webpage Content Type Modular Block’s Content Blocks |
The first thought is creating the content type in Contentstack with a multi-value field per CommerceCloud’s page slot. Something like this:
AboutCompany’s Type Field Name |
AboutCompany’s Content Type Field Type |
slot1 |
Multi-valued List of [AbstractCMSComponent] |
slot2 |
Multi-valued List of [AbstractCMSComponent] |
And populate it with data taken from Commerce Cloud:
AboutCompany’s Type Field Name |
AboutCompany’s Content Type Value |
slot1 |
|
slot2 |
|
However, such an approach won’t work nicely with Contentstack because it doesn’t support inheritance. In Commerce Cloud, Page Slot is a collection of components, or more specific, a collection of AbstractCMSComponent which is a parent for the CMS Components. In Contentstack it is not so.
In Contentstack, you can’t say «TopBanner» and «BottomBanner» types are subtypes of «AbstractCMSComponent». If they had the same structure, you could create one type for them, and it would be a solution, but the CMS components don’t have the same structure. If two or more content types have a similar structure, you can’t create one parent object having the common portion, and specify the differences in the subentities. You need to duplicate.
Contentstack field types don’t support type hierarchies, so you can’t create a list field having the items of the abstract type «Components» as it is implemented in Commerce Cloud.
In Contentstack you have to list the types explicitly if the list items are allowed to be of different types. Normally we know all these types, and we can list them when defining a «slot» field for the content type responsible for a page definition.
So, the second thought is using the following Contentstack definition
AboutCompany’s Type Field Name |
AboutCompany’s Content Type Field Type |
slot1 |
Multi-valued List of [TopBannerType, BottomBannerType] |
slot2 |
Multi-valued List of [LeftBannerType] |
In this case, the slots can be populated in the similar way as it is in Commerce Cloud, and AboutCompany object will define a page structure.
The solution? At that time, there was another catch. Contentstack didn’t support more than 5 modular blocks within a single content type and more than 20 blocks within each Modular Blocks field. That means that a number of content slots will be constrained by this limitation.
These default limits are still in the documentation, but Contentstack assured that it is now possible to increase the defaults limits per customer by request if such a need arise.
In my project, I had 41 page templates which were supposed to be converted into 41 Webpage Content Type in Contentstack. Their entries are pages, totally 366. According to the design, each template or page group (or the webpage content type in our terms) had tens of slots for components. The pages of the page group (=content type entries) are configured with components assigned to the fields (originally, slots in SAP Commerce). The same approach was used for the components. Each component instance was supposed to be represented as a content block entry. The content block content type is a component type in our terms. I needed to migrate 1661 component instances of 269 component types. The structures of these 269 component types were different, and some will have definitely more than five fields.
Content Data Migration From SAP Commerce Cloud to Contentstack
In my solution, there were four steps we need to take to migrate content data from the Commerce Cloud CMS to Contentstack.
First, we analyze all Commerce Cloud CMS data structures and component controllers and prepare the configuration for the data migration tool saying what Contentstack objects need to be created or changed.
Some content controllers may use the non-content data. Depending on details, this functionality can or cannot be moved to the angular code. Possibly, you need to add APIs consuming the Contentstack services and Commerce Cloud API to provide the new comprehensive service to the storefront.
In the second phase, we create or modify the content stack data model based on the findings from the first phase. If the original component has three text fields, you will probably want to have three fields created in the Contentstack target data model and get the types compatible. The types in Commerce Cloud and Contentstack are not the same, and the transformations can be complex and non-trivial.
The third phase is extracting content data from Commerce Cloud to populate the Contentstack structures at the fourth phase.
What added a new layer of complexity is that the source system is live and the data model has already been established there. Taking into account huge amounts of content, redesigning the data model would be a cumbersome process everyone wanted to avoid. The system has thousands of content items which in turn have different structures. All of them are needed to get migrated to Contentstack.
The system has about 80 component types that are tightly connected with each other. For instance, the component displaying a list of news has a list attribute for the news items. Each of these news items are also CMS components which can be assigned to the content slot or attached to another component. It creates a complex set of relations between CMS component types we needed to preserve.
Of course, the manual data modeling and migration would be a nightmare and error-prone, so developing a migration tool was an important part of the project.
In the original system, these complex structures are processed by the component controller. The controller fetches data from the database (Commerce Cloud) and injects them into the JSP template to deliver the HTML code to the browser. In the target architecture, such composing is outside the content management system (Contentstack) and also not part of the e-commerce platform (SAP Commerce Cloud). This composition is performed in the Javascript in the browser or, for complex scenarios, in the API composition layer which is between the browser and the content provider.
After all structures are created and populated, the storefront will be able to consume these feeds via the Contentstack API.
Phase 0. Analysis and Configuration
At this phase, I prepare a configuration, component and page definitions for the data migration tool. These basically say what objects we need to migrate and how. Taking into account the huge number of structures to migrate, creating this set is semi-automatic.
I exported all components and page definitions from the source system to ensure the CC’s data structure’s conformity with the Contentstack data models. After that, I changed the exported data according to my needs. Some attributes were removed from the migration.
Component definitions contain a list of component attributes with their type definitions and sample values.
Unlike component definitions, layouts are data-driven. A page definition contains a list of content slots and CMS components in these slots as well as some page attributes.
Based on these extracts, I prepared the configuration (a data structure) for the data migration tool.
This configuration is a set of CSV files:
- Components and pages to migrate
- Component definitions
- Page definitions
Below is a sample of component definition CSV:
Phase 1. Creating Contentstack Data Model
At this phase, the system automatically creates the data model in the Contentstack based on the configuration from the previous step.
This phase includes removing and (re-)creating a schema for the objects listed in the configuration. The migration tool is capable to work with the configuration, but it also supports the filter parameter to work with single objects. The example below shows how to create/recreate two objects in Contentstack, «AboutCompany» and «BannerCMSComponent». The first object, «aboutcompany», is originally a page in Commerce Cloud, and the second object is a banner which is used at this page. These objects can be created separately or in one go.
./cs-client.py removeSchema -filter c_aboutcompany,c_AboutCompanyBannerCMSComponent
./cs-client.py createSchema -filter c_aboutcompany,c_AboutCompanyBannerCMSComponent
Both commands use the Contentstack API for data modeling. removeSchema removes the data structure with all items, and createSchema assumes that data structure is removed.
Contentstack automatically creates the data input form for the data model. For example, for the component this form is available by following the link like https://app.Contentstack.com/#!/stack/<STACKID>/content-types?view_by=Label&search=c_AboutCompanyBannerCMSComponent
Phase 2. Extracting Data From SAP Commerce Cloud
This phase is used for extracting the CMS data items from the Commerce Cloud.
./cs-client.py createImpexScript -filter c_aboutcompany,c_AboutCompanyBannerCMSComponent
Output (a fragment):
INSERT_UPDATE AboutCompanyBannerCMSComponent;button(url);button(linkName[lang=en]);desktopBackgroundImage (url);hideSubtitleForMobile;imagelink(url);imagelink(linkName[lang=en]);mobileBackgroundImage (url);name;subtitle;textLinkColor;title;uid[unique=true];visible;catalogVersion(catalog(id), version)[unique=true];
The generated impex script is used to extract data from SAP Commerce Cloud via the existing data export mechanisms in backoffice.
After exporting is over, you have a zip file with the data items. There is a dedicated command for importing exported data into the data repository used by the migration tool. Internally, it simply unpacks the file into the data directory and performs some clean up procedures.
./cs-client.py importData dataexport_0000066A.zip
Phase 3. Loading Data to Contentstack
The following two commands populate the Contentstack data structures with the data from the directory with the Commerce Cloud data extracts.
In the examples below the particular CMS item entries are listed, but the tool can take them from the configuration instead.
Creating the component data:
./cs-client.py createDataAndCommit
-filter c_aboutcompany,c_AboutCompanyBannerCMSComponent
-ids AboutCompanyTopBanner,AboutCompanyBottomBanner
Creating the page layout data:
./cs-client-2.py populateInitialAndCommit
-filter c_aboutcompany,c_AboutCompanyBannerCMSComponent
-ids AboutCompanyTopBanner,AboutCompanyBottomBanner
Publishing
Contentstack introduces the concept of environments. An environment corresponds to one or more deployment servers or a content delivery destination where the entries need to be published. The most common publishing environments used are development, staging, and production.
Contentstack doesn’t have built-in cascaded publishing as it is implemented in CommerceCloud.
For example,
- Page has an attribute «bannerReference».
- BannerReference is a reference to «BannerType»
- BannerType has an attribute «BannerImage» (file)
- Banner image is an Contentstack asset.
When I publish the page, the system asks me if the components need to be published too. However, I need to publish a banner image separately — otherwise I will see ‘null’ in the component data response.
Contentstack supports only one layer deep publishing (in our case, components for pages). This is why the banner image wasn’t published. You can use Contentstack webhooks to auto-publish the assets or for making publishing deeper.
Multi-language and Multi-country Support
As I mentioned in the beginning, Contentstack’s language support is different from CommerceCloud’s. Specifically, CommerceCloud supports attribute-level language versions (or localized attribute types) while Contentstack deals with the language versions of the whole objects.
However, there is a challenge with fallback languages which are supported by CommerceCloud natively.
Let’s say:
- You have N country versions, each has X language versions. X is different for different N.
- Some pages are relevant only for particular countries (such as Business in UA in the figure below)
- Some pages are not translated to one of the country-specific languages.
- The country has one fallback language which is used if the translation to the session language is not available (requested UA-ENGLISH, the system returns UA-Ukranian if UA-ENGLISH is not available).
How to see all available language versions for the master item?
In this setup, I have 4 language versions (NL, FR, EN, UA) and 5 countries (BE, FR, UK, NL, UA).
The first question is how to implement Country-specific pages. For example, «Belgium Office» should be available only at the Belgium website. With Contentstack, it should be handled by the frontend application. The page related to Belgium would only be available on the Belgian website. Alternatively, you need to add a layer between the frontend application and Contentstack.
The next question is how to implement Language-specific pages for a particular country. If the content is country-specific, you can create «locales» with language/country combinations (for example «French France» and «French Belgium»). However, Contentstack doesn’t allow you to create a non-existing locale — it may raise a serious issue. If you’re looking to share content but just have rules on which languages appear on a country, you can use metadata fields to help identify what to show where.
There is also a challenge on how to implement different EN versions of the same page for different countries, such as Belgium and France in our example.
By design, Contentstack has a master version of the content entry and a number of localized versions (up to a number of configured languages).
The first thought was about creating different page items, such as «Contact Us (UA, UK)» and «Contact Us (France)», adding a custom field «Country» and using it as a filter in the storefront. We will need two separate instances of «Contact Us» in French, for Belgium and France, because Contact Us pages are country-specific.
But this approach poses new challenges: how to manage a Payment and Delivery page which has identical French versions for Belgium and France without EN version ready. As an option, I need to have a checkbox «the version is ready» and uncheck it for the English version.
English (master) |
Belgium French (localized) |
France French (localized) |
|
Payment and Delivery Page (BE, FR) |
Ready=false |
Ready=true |
Ready=true |
Another question is how to add custom data validation before submitting the changes. For example, I want to check if the title of the page contains «France» when ‘France’ is in the selected items of the page’s «Country» attribute. If the administrator forgets to add ‘France’ in the title, we’ll have a mess. Additionally, search won’t work in the list of content types, because it works with title and tags.
Custom validation can be added via webhooks configured on «on save» and «on workflow change» events and your script will decide to approve the content using a publishing rule or update the stage to an approved one.
Performance and Caching
The POST queries to Contentstack, especially large GraphQL POST queries, cannot be cached by CDN efficiently. The frontend app can provide own caching, but such an approach doesn’t work nicely for large websites. Contentstack provides its own caching, but it is fully automatic and not configurable.
Because the data for the page are distributed across different content types, you may need to make hundreds of calls to get all the information you need to render the page. Performance tuning for client-side rendering for such volumes is challenging. API response aggregation and caching help, but the complexity of the system increases and the effort required for maintenance and testing grows.
Security
Since the rendering occurs on the client-side, all your code is an open book, as is the data transferred from APIs. Anyone can access this data. A malicious user is able to fetch data from the app by sending incremental queries. This topic relates to all headless CMS solutions, and currently it is considered as an inevitable evil of the present-day world. Of course, all API security mechanisms should be used if you see such risks. Usually for the CMS APIs this problem is not a big issue.
Conclusion
Contentstack is a great tool for use as an alternative CMS both for content management and content delivery. Migrating the existing system to Contentstack might be challenging, but creating a new system won’t raise any significant issues.