IBMStreams com.ibm.streamsx.objectstorage Toolkit > com.ibm.streamsx.objectstorage 2.2.5 > com.ibm.streamsx.objectstorage.s3 > S3ObjectStorageScan
Operator scans for specified key name pattern on a S3 compliant object storage.
The S3ObjectStorageScan is similar to the DirectoryScan operator. The S3ObjectStorageScan operator repeatedly scans an object storage directory and writes the names of new or modified files that are found in the directory to the output port. Initial scan lists all objects in the directory. The operator sleeps between scans.
The operator can participate in a consistent region. The operator can be at the start of a consistent region if there is no input port.
The operator supports periodic and operator-driven consistent region policies. If consistent region policy is set as operator driven, the operator initiates a drain after each tuple is submitted. This allows for a consistent state to be established after a object is fully processed. If the consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. This means that multiple objects can be processed before a consistent state is established.
At checkpoint, the operator saves the last submitted object name and its modification timestamp to the checkpoint. Upon application failures, the operator resubmits all objects that are newer than the last submitted object at checkpoint.
Required: accessKeyID, bucket, secretAccessKey
Optional: directory, endpoint, initDelay, maxAttempts, pattern, protocol, sleepTime, sslEnabled, strictMode
The S3ObjectStorageSink operator has an optional control input port. You can use this port to change the directory that the operator scans at run time without restarting or recompiling the application. The expected schema for the input port is of tuple<rstring directory>, a schema containing a single attribute of type rstring. If a directory scan is in progress when a tuple is received, the scan completes and a new scan starts immediately after and uses the new directory that was specified. If the operator is sleeping, the operator starts scanning the new directory immediately after it receives an input tuple.
The S3ObjectStorageScan operator has one output port. This port provides tuples of type rstring that are encoded in UTF-8 and represent the object names that are found in the directory, one object name per tuple. The object names do not occur in any particular order.
Required: accessKeyID, bucket, secretAccessKey
Optional: directory, endpoint, initDelay, maxAttempts, pattern, protocol, sleepTime, sslEnabled, strictMode
Specifies the Access Key ID for S3 account.
Specifies a bucket to use for scanning. The bucket must exist. The operator does not create a bucket.
Specifies the name of the directory to be scanned. Directory should always be considered in context of bucket or container. If not specified, then the root directory '/' is taken as default.
Specifies endpoint for connection to object storage. For example, for S3 the endpoint might be 's3.amazonaws.com'. The default value is the IBM Cloud Object Storage (COS) public endpoint 's3.us.cloud-object-storage.appdomain.cloud'.
Specifies the time to wait in seconds before the operator scans the bucket directory for the first time. The default value is 0.
Number of times we should retry errors. Default value is 20.
Limits the object names that are listed to the names that match the specified regular expression. The operator ignores object names that do not match the specified regular expression. If not specified, then the pattern .* is taken as default.
Specifies the protocol to use for communication with object storage. Supported values are s3a and cos. The default value is s3a.
Specifies the Secret Access Key for S3 account.
Specifies the minimum time between bucket directory scans. The default value is 5.0 seconds.
Enables or disables SSL connections to S3, default is true.
Specifies whether the operator reports an error if the bucket directory to be scanned does not exist.