Hybris Impex Preprocessor: Impex PLUS


Impex is a hybris-specific language on top of CSV to import/export data the database. Hybris impex engine converts the ImpEx statements into SQL. The developers deal with ImpEx only: SQL statements are not used at all. The syntax of Impex is quite flexible, but there are cases which are not covered by it nicely. I developed an Impex pre-processor to make up for such shortfalls. This article explains my enhanced version of ImpEx scripting language. However, for some tasks, the default capabilities are not enough. The additions explained in the article especially useful in the multi-market setups, where many objects are to be created with the same or similar configuration.

Macro definitions plus: key concepts

Impex out-of-the-box allows you to define macros so that you do not have to type repeating strings. For example, you can define a variable $A=3 and use it one or more time in the same file. ImpEx engine replaces these variables with their values.
$catalogVersion = catalogVersion(catalog(id), version)[unique=true,default=CatA:Online]
UPDATE MyComponent; $catalogVersion;uid[unique=true];title
;;componentId1;title1
;;componentId2;title2
;;componentId3;title3
It is a very useful feature, but some things are not possible to implement using the macros. The out-of-the-box macro functionality doesn’t help you to iterate over the collection in ImpEx. There is an out-of-the-box way to do that: you can use an embedded code, with BeanShell. However, the scripts look awful, not convenient and error-prone. Another option is using Velocity engine to convert impex templates into impexes during a built phase. It requires customization. Some examples of that can bee seen in hybris OOTB modules. My solution is converting the impex templates on-the-fly and using a dedicated macro language, which could be the better for this purpose than Velocity. The concept is easier to get demonstrated by example. For example, you have four catalogs, A, B, C, and D. You need to create all of them in one impex. It is supposed that the number of catalogs are higher than 4 and counting. The purpose is to create a maintainable set of data load/update scripts.
INSERT_UPDATE Catalog;id[unique=true]
;A
;B
;C
;D
INSERT_UPDATE CatalogVersion;catalog(id)[unique=true];version[unique=true];active
;A;Online;true
;B;Staged;false
;C;Online;true
;D;Staged;false
It is pretty straightforward, but when you have a dozen of catalogs, and you need to add another one, you need to duplicate lines in different files. This approach is standard but error-prone and not convenient. In my solution, there is a special kind of macro substitution. It is enclosed in the percent signs and consist of the macro name and one or more optional attributes. With my macro enhancements, the script will look like:
INSERT_UPDATE Catalog;id[unique]
;%ALLCATALOGS.ID%

INSERT_UPDATE CatalogVersion;catalog(id)[default=%ALLCATALOGS.ID%, unique=true];version[unique=true];active
;%ALLCATALOGVERSIONS.ID%; %ALLCATALOGVERSIONS.ACTIVE%
To get it converted into ImpEx, this template requires a macro definition, which is defined separately (and can be shared among different scripts):
{ 
  "ALLCATALOGS": [ { ID: A }, { ID: B }, { ID: C }, { ID: D } ],
  "ALLCATALOGVERSIONS" : [{ ID:Online, ACTIVE:true }, {ID:Staged, ACTIVE:false}]
}
This macro and the template above generate the following ImpEx:
INSERT_UPDATE Catalog;id[unique]
;A
;B
;C
;D

INSERT_UPDATE CatalogVersion;catalog(id)[default=A, unique=true];version[unique=true];active
;;Staged; false
;;Online; true

INSERT_UPDATE CatalogVersion;catalog(id)[default=B, unique=true];version[unique=true];active
;;Staged; false
;;Online; true

INSERT_UPDATE CatalogVersion;catalog(id)[default=C, unique=true];version[unique=true];active
;;Staged; false
;;Online; true

INSERT_UPDATE CatalogVersion;catalog(id)[default=D, unique=true];version[unique=true];active
;;Staged; false
;;Online; true
We saved a lot of ImpEx code and moved the configuration to a separate file. This file can be shared across other Impex files. This example above shows the following features of the pre-processor:
  • Repeating whole blocks (INSERT_UPDATE) as many times as the macro values.
  • Repeating the data block lines as many times as the macro values.
  • Using the hierarchical structure of the macro definition.
The first macro attribute in a line is an iterator. All other attributes are used for fetching data from the JSON block for the macro identified by the first attribute. Only one macro object per line is allowed. However, you can use any number of attributes of the object there.

References

References are used if you need to avoid duplications in the macro definitions. For example, you need to iterate over product catalogs and refer to some catalog attributes defined in the list of all catalogs. You can create a reference by using a special char “&” in the macro value. The syntax of this value is the following: &OBJ.ATTRIBUTE/VALUE
{
 "ALLPRODUCTCATALOGS" : [ 
 { "ID" : "A", "DETAILS": "&ALLCATALOGS.ID/A" },
 { "ID" : "B", "DETAILS": "&ALLCATALOGS.ID/B" } ],
 "ALLCATALOGS": [ 
 { "ID": "A", "NAME": "ProductCatalogA" }, 
 { "ID": "B", "NAME": "ProductCatalogB" }, 
 { "ID": "C", "NAME": "ContentCatalogC" }, 
 { "ID": "D", "NAME": "ContentCatalogD" } ]
}
With this macro, the following script:
INSERT_UPDATE Catalog;id[unique=true]
;%ALLCATALOGS.ID%

INSERT_UPDATE Catalog;id[unique=true];name
;%ALLPRODUCTCATALOGS.ID%;%ALLPRODUCTCATALOGS.DETAILS.NAME%;
will generate the following ImpEx:
INSERT_UPDATE Catalog;id[unique=true]
;A
;B
;C
;D

INSERT_UPDATE Catalog;id[unique=true];name
;A;ProductCatalogA;
;B;ProductCatalogB;

Global Constants

The references can be used to define and use the constants shared between the macros:
{ "CAT" : [ 
    { ID : A, 
      TYPE:"&TYPE.ID/TYPE1" }, 
    { ID : B, 
      TYPE:"&TYPE.ID/TYPE2"  } 
   ],
  "TYPE" : [
    { "ID" : "TYPE1", "VALUE" : "TYPE_1" },
    { "ID" : "TYPE2", "VALUE" : "TYPE_2" }
   ]
}
The following script uses these global types:
UPDATE Obj;
;%CAT.ID%;%CAT.TYPE%;
Resulting ImpEx:
UPDATE Obj;
;A;TYPE_1;
;B;TYPE_2;

Localizations and data maps

This feature allows you to refer to the complex objects. It is easier to demonstrate by example. Let’s take the following macro definition:
{
 "ALLPRODUCTCATALOGS" :  [ { "ID" : "A", "DETAILS": "&ALLCATALOGS.ID/A" },
 { "ID" : "B", "DETAILS": "&ALLCATALOGS.ID/B" } ],
 "LANGUAGES" : [
     { "ID" : "fr" },
     { "ID" : "en" }
 ],
 "ALLCATALOGS": [
      { "ID": "A",
        "NAME": [ { "en" : "ProductCatalogA" } ,
                  { "fr" : "CatalogueDeProduitsA" }
            ] } ,
      { "ID": "B",
        "NAME": [  { "en" : "ProductCatalogB" },
                   { "fr" : "CatalogueDeProduitsB" }
         ]  } ]
}
As you see, in this macro I added localizations for the catalog names. You can refer to these translations by specifying a language ID in the square brackets in the data lines.
INSERT_UPDATE Catalog;id[unique=true];name[lang=%LANGUAGES.ID%]
;%ALLPRODUCTCATALOGS.ID%;%ALLPRODUCTCATALOGS.DETAILS.NAME[LANGUAGES.ID]%;
The template above will generate the following ImpEx file:
INSERT_UPDATE Catalog;id[unique];name[lang=fr]
;A;CatalogueDeProduitsA;
;B;CatalogueDeProduitsB;

INSERT_UPDATE Catalog;id[unique];name[lang=en] 
;A;ProductCatalogA; 
;B;ProductCatalogB;
So, the macro in the header makes the whole impex block repeated as many times as the macro values there. The macro in the block makes the data lines repeated. The parameter in the square brackets can refer only to a macro in a header.

Lists

Lists and maps are converted into strings as comma-separated values:
{
 "ALLPRODUCTCATALOGS" :  [ { "ID" : "A", "DETAILS": "&ALLCATALOGS.ID/A" },
 { "ID" : "B", "DETAILS": "&ALLCATALOGS.ID/B" } ],
 "LANGUAGES" : [
     { "ID" : "fr" },
     { "ID" : "en" }
 ],
 "ALLCATALOGS": [
      { "ID": "A",
        "NAME": [ { "en" : "ProductCatalogA" } ,
                  { "fr" : "CatalogueDeProduitsA" }
            ] } ,
      { "ID": "B",
        "NAME": [  { "en" : "ProductCatalogB" },
                   { "fr" : "CatalogueDeProduitsB" }
         ]  } ]
}
ImpEx template:
INSERT_UPDATE Object;languages 
;%LANGUAGES%;
INSERT_UPDATE Object;languages 
;%LANGUAGES.ID%;
INSERT_UPDATE Object;names 
;%ALLCATALOGS.NAME["en"]%;
INSERT_UPDATE Object;names 
;%ALLCATALOGS.NAME%;
Resulting Impex below shows the difference:
INSERT_UPDATE Object;languages 
;fr, en;
INSERT_UPDATE Object;languages 
;fr;
;en;
INSERT_UPDATE Object;names 
;ProductCatalogA;
;ProductCatalogB;
INSERT_UPDATE Object;names 
;ProductCatalogA,CatalogueDeProduitsA;
;ProductCatalogB,CatalogueDeProduitsB;
If the value contains spaces or special chars, it will be enclosed into double quotes automatically (let’s assume we changed the macro definition file to meet this case):
INSERT_UPDATE Object;names 
;"Product Catalog A","CatalogueDeProduitsA";
;"Product Catalog B","CatalogueDeProduitsB";

Configuration variables

Some values can be set as configuration variables.
{
 "ALLPRODUCTCATALOGS" : [ 
   { "ID" : A, NAME:"&config.catalogA.Name" },
   { "ID" : B, NAME:"&config.catalogB.Name" } 
 ],
}
these variables can be defined in project.properties or local.properties:
catalogA.Name=Catalog A Name
catalogB.Name=Catalog B Name

Impex Variables and Macros

While it is a preprocessor, we need to have a valid ImpEx file as a result of applying macros. The examples above demonstrated how to use macros in headers and data lines. But how to use them in the variables? For example,
$catalogVersion=catalogVersion(catalog(id), version)[default=%CAT%]
UPDATE Obj; $catalogVersion
Unfortunately, the macrosubstitutions won’t work in this case if used as above. The main reason is that it is a stream one-pass preprocessor what create limitations. The allowed syntax is the following:
$catalogVersion%CAT.ID%=catalogVersion(catalog(id), version)[default=%CAT.ID%]
UPDATE Obj; 
;$catalogVersion%CAT.ID%
Let’s assume we use the following macro definition:
{ "CAT" : [ { ID : A }, { ID : B } ] }
In this case, you will have the following ImpEx file as a result:
$catalogVersionA=catalogVersion(catalog(id), version)[default=A]
$catalogVersionB=catalogVersion(catalog(id), version)[default=B]
UPDATE Obj; 
;$catalogVersionA
;$catalogVersionB

Impex Plus Validators

Macros explained above are especially useful for a large number of catalogs or languages. However, the macro definition file tends to be larger with time, and it is easy to overlook missing one or several blocks supposed to be mandatory.
  • Templates:
    • Only allowed symbols are in the macro definition. If found any unallowed, the fragment won’t be considered as macro and processed as a simple impex content.
    • Only one macro per line. However, you can use any number of attributes there.
    • All macros in ImpEx are defined in the macro definition file. If the macro is present in ImpEx, but not defined in the macro definition file, the system throws an error.
  • Macro definitions:
    • JSON is valid
    • All macro references (&obj.attr/value) are valid.
    • All macro names and attributes conform the requirements (allowed characters etc.)

Integrating with Hybris

Unfortunately, SAP Hybris Impex Processor is not extendible. Using processors doesn’t help much too. I used an ImpEx converter class that transforms impextemplate+macrodefinition into an impex data stream which the  SAP hybris default impex engine is able to execute. The extension setup class was extended to use the enhanced mechanism in lieu of the default one to make the ootb updates support the impex templates and macro language.

Leave a Reply