Rolling Policy

IBMStreams com.ibm.streamsx.objectstorage Toolkit > com.ibm.streamsx.objectstorage 2.2.5 > com.ibm.streamsx.objectstorage.s3 > S3ObjectStorageSink > Rolling Policy

Rolling policy specifies the window size managed by operator per output object. When window is closed the current output object is closed and a new object is opened. The operator supports three rolling policy types:

  • Size-based (parameter bytesPerObject)
  • Time-based (parameter timePerObject)

  • Tuple count-based (parameter tuplesPerObject)

Object name

The objectName parameter can optionally contain the following variables, which the operator evaluates at runtime to generate the object name:

%TIME is the time when the COS object is created. The default time format is yyyyMMdd_HHmmss.

The variable %TIME can be added anywhere in the path after the bucket name. The variable is typically used to make dynamic object names when you expect the application to create multiple objects.

Here are some examples of valid file paths with %TIME:

  • event%TIME.parquet
  • %TIME_event.parquet

  • /my_new_folder/my_new_file_%TIME.csv

%OBJECTNUM is an object number, starting at 0, when a new object is created for writing.

Objects with the same name will be overwritten. Typically, %OBJECTNUM is added after the file name.

Here are some examples of valid file paths with %OBJECTNUM:

  • event_%OBJECTNUM.parquet
  • /geo/uk/geo_%OBJECTNUM.parquet

  • %OBJECTNUM_event.csv

  • %OBJECTNUM_%TIME.csv

Note: If partitioning is used, %OBJECTNUM is managed globally for all partitions in the COS object,rather than independently for each partition.

%PARTITIONS place partitions anywhere in the object name. By default, partitions are placed immediately before the last part of the object name.

Here's an example of default position of partitions in an object name:

Suppose that the file path is /GeoData/test_%TIME.parquet. Partitions are defined as YEAR, MONTH, DAY, and HOUR.

The object in COS would be /GeoData/YEAR=2014/MONTH=7/DAY=29/HOUR=36/test_20171022_124948.parquet

With %PARTITIONS, you can change the placement of partitions in the object name from the default.

Let's see how the partition placement changes by using %PARTITIONS:

Suppose that the file path now is /GeoData/Asia/%PARTITIONS/test_%TIME.parquet.

The object name in COS would be /GeoData/Asia/YEAR=2014/MONTH=7/DAY=29/HOUR=36/test_20171022_124948.parquet

Empty partition values

If a value in a partition is not valid, the invalid values are replaced by the string __HIVE_DEFAULT_PARTITION__ in the COS object name.

For example, /GeoData/Asia/YEAR=2014/MONTH=7/DAY=29/HOUR=__HIVE_DEFAULT_PARTITION__/test_20171022_124948.parquet

Further variables for the object name

  • %HOST the host that is running the processing element (PE) of this operator.
  • %PROCID the process ID of the processing element running the this operator.

  • %PEID the processing element ID.

  • %PELAUNCHNUM the PE launch count.

  • %CHANNEL the channel number of the operator (0..n). If not part of parallel region, then the variable is replaced with 0.