Operator ObjectStorageScan

IBMStreams com.ibm.streamsx.objectstorage Toolkit > com.ibm.streamsx.objectstorage 2.2.5 > com.ibm.streamsx.objectstorage > ObjectStorageScan

Operator scans for specified key name pattern on a object storage. The operator supports basic (HMAC) and IAM authentication.

The ObjectStorageScan is similar to the DirectoryScan operator. The ObjectStorageScan operator repeatedly scans an object storage directory and writes the names of new or modified objects that are found in the directory to the output port. Initial scan lists all objects in the directory. The operator sleeps between scans.

Behavior in a consistent region

The operator can participate in a consistent region. The operator can be at the start of a consistent region if there is no input port.

The operator supports periodic and operator-driven consistent region policies. If consistent region policy is set as operator driven, the operator initiates a drain after each tuple is submitted. This allows for a consistent state to be established after a object is fully processed. If the consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. This means that multiple objects can be processed before a consistent state is established.

At checkpoint, the operator saves the last submitted object name and its modification timestamp to the checkpoint. Upon application failures, the operator resubmits all objects that are newer than the last submitted object at checkpoint.

Supported Authentication Schemes
Examples

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 13 parameters.

Required: endpoint, objectStorageURI

Optional: appConfigName, credentials, directory, initDelay, maxAttempts, objectStoragePassword, objectStorageUser, pattern, sleepTime, sslEnabled, strictMode

Metrics
This operator reports 1 metric.

Properties

Implementation
Java

Input Ports

Ports (0)

The ObjectStorageScan operator has an optional control input port. You can use this port to change the directory that the operator scans at run time without restarting or recompiling the application. The expected schema for the input port is of tuple<rstring directory>, a schema containing a single attribute of type rstring. If a directory scan is in progress when a tuple is received, the scan completes and a new scan starts immediately after and uses the new directory that was specified. If the operator is sleeping, the operator starts scanning the new directory immediately after it receives an input tuple.

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

The ObjectStorageScan operator has one output port. This port provides tuples of type rstring that are encoded in UTF-8 and represent the object names that are found in the directory, one object name per tuple. The object names do not occur in any particular order.

Properties

Parameters

This operator supports 13 parameters.

Required: endpoint, objectStorageURI

Optional: appConfigName, credentials, directory, initDelay, maxAttempts, objectStoragePassword, objectStorageUser, pattern, sleepTime, sslEnabled, strictMode

appConfigName

Specifies the name of the application configuration containing IBM Cloud Object Storage (COS) IAM credentials. If not set the default application configuration name is cos. Create a property in the cos application configuration named cos.creds. The value of the property cos.creds should be the raw IBM Cloud Object Storage Credentials JSON.

Properties
credentials

Specifies the JSON credentials of the IBM Cloud Object Storage (COS) service. The application configuration property cos.creds is ignored, when this parameter is set.

Properties
directory

Specifies the name of the directory to be scanned. Directory should always be considered in context of bucket or container. If not specified, then the root directory '/' is taken as default.

Properties
endpoint

Specifies endpoint for connection to Cloud Object Storage (COS). For example, for S3 the endpoint might be 's3.amazonaws.com'. The default value is the IBM Cloud Object Storage (COS) public endpoint 's3.us.cloud-object-storage.appdomain.cloud'.

Properties
initDelay

Specifies the time to wait in seconds before the operator scans the bucket directory for the first time. The default value is 0.

Properties
maxAttempts

Number of times we should retry errors. Default value is 20.

Properties
objectStoragePassword

Specifies password for connection to a Cloud Object Storage (COS), also known as 'SecretAccessKey' for S3-compliant COS.

Properties
objectStorageURI

Specifies URI for connection to Cloud Object Storage (COS). For S3-compliant COS the URI should be in 'cos://bucket/ or s3a://bucket/' format. The bucket or container must exist. The operator does not create a bucket or container.

Properties
objectStorageUser

Specifies username for connection to a Cloud Object Storage (COS), also known as 'AccessKeyID' for S3-compliant COS.

Properties
pattern

Limits the object names that are listed to the names that match the specified regular expression. The operator ignores object names that do not match the specified regular expression. If not specified, then the pattern .* is taken as default.

Properties
sleepTime

Specifies the minimum time between bucket directory scans. The default value is 5.0 seconds.

Properties
sslEnabled

Enables or disables SSL connections to S3, default is true.

Properties
strictMode

Specifies whether the operator reports an error if the bucket directory to be scanned does not exist.

Properties

Metrics

nScans - Counter

The number of times operator scans the directory

Libraries

Operator class library
Library Path: ../../impl/lib/com.ibm.streamsx.objectstorage.jar, ../../opt/*, ../../opt/downloaded/*