SAP hybris Data Folder Structure. Media folders.


In hybris 6.6, almost all project-specific files are located in the special folder called “data folder”. The structure of this folder is not well documented. The folders and files have “random” names. Basically, it is not very important until you find that this folder is too big. Understanding of its structure and purpose of its components will help you to troubleshoot and optimize the system. The official hybris documentation explains some technical aspects, but pretty tenuous. In this article, I’m trying to look deeper.

Location

Hybris Data Dir is specified in env.properties:
HYBRIS_BIN_DIR=${platformhome}/../../bin
HYBRIS_CONFIG_DIR=${platformhome}/../../config
HYBRIS_DATA_DIR=${platformhome}/../../data
HYBRIS_LOG_DIR=${platformhome}/../../log
HYBRIS_TEMP_DIR=${platformhome}/../../temp/hybris

Structure

The data folder contains runtime data:
  1. Accelerator Services Batch Impex Base Folder for import and export. By default, it is DATADIR/acceleratorservices/import and DATADIR/acceleratorservices/export. Configurable via acceleratorservices/project.properties.
  2. Tomcat runtime files (wrapper.conf), such as
    • status file (hybristomcat.java.status),
    • process id files (hybristomcat.java.idfile and hybristomcat.java.pidfiles),
    • lock file (hybristomcat.lock),
  3. HSQLDB data files (/hsqldb; configurable; platform/project.properties)
  4. SOLR data files (“solr/” configuratble; solrserver/buildcallbacks.xml)
  5. Swagger maven plugin output directory docs (/doc; by webservicecommons)
  6. Media replication directory (/media; configurable; media.replication.dirs in advanced.properties). Deprecated.
  7. Media read directory (/media; configurable; media.read.dir in advanced.properties)
    1. sys_master/ – for the default tenant “master”
      1. (all files of the files here are referenced files, e.g. there is an object (item) referencing the file)
    2. sys_junit/ – for a defult tenant “junit”
  8. Lucene Index (/luceneindex; configurable; lucenesearch.indexdir in advanced.properties)

How to access the data folder from the code

  • MediaUtil.getLocalStorageDataDir() – data directory (Ex:/data/);
  • MediaUtil.getMediaReadDir() – media subdirectory (Ex.: /data/media/sys_master/)
  • MediaUtil.getSystemDir()
  • MediaUtil.getTenantMediaReadDir() – with a tenant subdirectory (Ex.: /data/media/sys_master/)

Media

Storage strategy

The storage strategy is a way how hybris sources and organizes media files such as images or documents. By default, the configuration parameter “<PREFIX>.storage.strategy” is There are four out-of-the-box storage strategies:
  • LocalFileMediaStorageFactory
  • GridFSMediaStorageFactory
  • S3MediaStorageFactory
  • Windows AzureMediaStorageFactory
In this document, I am focusing on LocalFileMediaStorageFactory. In this strategy, all data are stored locally in local file system storage defined by media.read.dir and media.replication.dirs  configuration properties. See the section “Configuration Parameters Prefix” below to know more about what <PREFIX> is.

Local cache

The local cache is a way of speeding up delivery of the media files by caching them in memory. Caching is implemented via region cache framework. By default, the configuration parameter “<PREFIX>.local.cache” specifies whether data from non-local strategies is cached or not. There is a parameter “<PREFIX>.local.cache.rootCacheFolder”. It specifies a root cache folder for all cached files. <PREFIX>.local.cache.maxSize is used for specifying the max size of media cache (in megabytes). See the section “Configuration Parameters Prefix” below to know more about what <PREFIX> is.

Local File Hierarchy

The file name of the media file consists of two parts:
  • base file name part This name is actually a 13-digit dataPK of the media object.
  • file extension part. The extension is one of the configured extensions or “bin” for others. These predefined “allowed” extensions are listed in advanced.properties, see the configuration variables starting with ” media.customextension”. Associating the mime type is important if you download this file. For example, you may want to download html files rather then open them in the browser window.
So you can list all PKs of all stored medias by listing all files in the media folder and compare them with the PKs stored in the database to see if there are any differences. Typical file name looks like “9593242157086.xml”. The file path consists of two parts:
  • Media folder (such as “hmc”, “images”)
  • Hash hierarchy of the specified depth (such as h04/ha3/). Each component is one of the 256 values (h00..hff).  The folders are created only if there is at least one file in them.
Typical file paths:
  • hmc/h04/ha3/
  • images/h1a/h12/
The full path will look like:

Media folders

Media folder is one of the parameters for the media when it is created. However, there are a number of predefined:
  • root
  • hmc – for hmc configuration (XML files)
  • images – for product images (ProductImageMediaService)
  • impex – for impex specific media files such as ImpExMedia items
  • jasperreports – for jasper reports
  • catalogsync – for CatalogVersionSyncScheduleMedia
  • cronjob – for cronjob logs
  • documents
  • email-body
  • email-attachments
  • account-summary
  • couponcodes
  • etc.

Hashing depth

The hashing depth is a number of levels in the hierarchy (see “hash part” above) to limit the number of files per directory. By default, the configuration parameter “hashing.depth” is “2”. You can change it in the configuration. For building a hash value, hybris uses “salt” as a parameter. Different values of “salt” lead to different hashes. This value is specified by the configuration parameter ‘<PREFIX>.storage.location.hash.salt’. The value of this salt is the same for all installations (at least version 6.2 and 6.6 have the same value hardcoded in the configuration). See the section “Configuration Parameters Prefix” below to know more about what <PREFIX> is.

Configuration Parameters Prefix

The configuration settings can be specified globally or for the particular folder. Some values can be marked as default. So, all mentioned configuration parameters start with the <PREFIX> which is one of the following values:
  • media.folder
  • media.default
  • media.globalSettings

Abandoned medias and other special cases

  • If you remove an object that refers to the media object, the media object won’t be removed.
    • For example, if you remove a component having a media attribute filled with the image, the component will be removed, but the image won’t. It creates “abandoned medias”: the media objects which are not used by anybody.
  • Removing and changing media objects:
    • If you remove the only media object for the file, the platform removes the file from the storage.
    • The file is removed from the filesystem only if there is no Media object referring to the file.
    • If you replace image for the object, the previous version of the image will be removed from the filesystem if there are no any other objects using this image. The same is for any other types of the media objects.
    • If you clear the image for the object, the old image file will be removed automatically. The same is for any other types of the media objects.
    • However, all these actions are performed by RemoveInterceptor, which is not involved if you decide to explicitly turn it off (for impex import, for example). If so, removing or changing creates abandoned files.
  • Synchronizations:
    • Synchronized copy of the object refers to the same physical file on the filesystem.
    • If you synchronize media objects from version A to version B, and both objects have different media files, and both media files are not used by other objects, you will have only first media file (from A) available after synchronization, because the second file (from B) will be removed automatically once it is not used by any objects anymore.

Leave a Reply