Reference information for IBM Streams Runner for Apache Beam
Edit meLearn about the package contents and pipeline options for IBM® Streams Runner for Apache Beam.
Package contents for Streams Runner
The Streams Runner package contains the following directories:
-
com.ibm.streams.beam
: The IBM Streams Runner for Apache Beam toolkit, which you can use to submit Apache Beam 2.4 applications to the IBM Streams runtime environment. -
examples
: Toolkit sample applications. For information about the samples, see the README file in theexamples
directory.
Pipeline Options
General pipeline options
Parameter | Description | Default value |
---|---|---|
runner |
The pipeline runner to use. Use this option to determine the pipeline runner at run time. | Set this option to StreamsRunner to run with IBM Streams. |
streaming |
A flag to indicate whether streaming mode is enabled (true ). Note: IBM Streams is a pure streaming engine and does not have a discrete batch-processing mode. For this reason, this parameter is ignored and is automatically set to true . |
true |
jobName |
The name of the job. | Defaults to a Beam-generated string. |
appName |
The name of the app for display purposes. | Defaults to the class name of the PipelineOptions creator. |
Streams Runner pipeline options
Parameter | Description | Default value |
---|---|---|
contextType |
The mode to run the application in:
|
STREAMING_ANALYTICS_SERVICE |
jarsToStage |
A list of JAR files (separated by colons) that are required to run the Apache Beam application. Include the JAR files that contain your program and any dependencies. (You don’t need to include Beam Google IO SDK or core Beam JAR files.) The listed JAR files are added to the SAB file. Globs (wildcards) may be used to specify files (e.g., `foo/bar/*.jar` ). However, globs in directory paths are not supported (e.g., `**/*.jar` ).Note: The use of fat or uber JAR files can reduce the number of JAR files that must be specified, but take care not to include JAR files that are provided by Streams Runner. Including redundant dependencies can increase the application archive and can negatively impact submission times to IBM Cloud. |
[null] |
filesToStage |
A JSON string that maps local file paths as keys to destination paths as values. The local files are included in the SAB and can be accessible to the application through the destination path. For example, --filesToStage="{\"/path/to/local/file\": \"input/file1.in\", \"path/to/another/local/file\": \"env.conf\"}" In this example, the two local files are added into the bundle at the specified destination paths, which can then be accessed by using "streams://input/file1.in" and "streams://env.conf" paths. |
[null] |
beamToolkitDir |
The location of the Streams Runner toolkit. Use this option to explicitly specify the Streams Runner toolkit location. | Defaults to the path of the com.ibm.streams.beam.translation.jar file in the Java™ class path. |
tracingLevel |
Set the tracing and logging level of StreamsRunner translation and runtime. If specified, overrides all other tracing options. Levels: ERROR , WARN , INFO , DEBUG , TRACE |
[null] |
traceTranslation |
Set the translation trace level for the application. Specify a single level: ERROR , WARN , INFO , DEBUG , TRACE to set all loggers to level and/or set individual components. Components include Runner and Streams. For example, to set Streams Runner to WARN and all other loggers to INFO , specify the following:INFO,Runner=WARN |
Runner=INFO,Streams=WARN |
traceRuntime |
Set the runtime trace level for the application. See traceTranslation for more information. |
WARN |
bundleSize |
Controls the maximum number of data tuples in every bundle. Applications should make sure that each bundle does not exceed 2GB. | 1 |
bundleMillis |
Controls the maximum time delay of every bundle. Applications should make sure that each bundle does not exceed 2GB. | 100 |
parallelWidths |
Experimental Sets the parallelism for the entire Beam pipeline or individual transforms via transform step names. A default parallel width for the pipeline is specified as plain number with no step name, and a list of step paths with widths gives the widths for matching steps. If not specified, the parallel width for all transforms will be 1. Step matching is done the same way step names are matched in Beam MetricsFilter .If a step name matches multiple paths, the first match is used. Likewise, if a default is given multiple times, the first one is used. For example, the following configuration sets the default (entire pipeline parallelism) to width 2, but steps (transforms) that contain the subpath Device_3/Map will have width 3.--parallelWidths=2,Device_3/Map=3 As the first match is taken, the parallel width for Device_3/Map is still 3 in the following configuration.--parallelWidths=2,Device_3/Map=3,Map=4 For Source transforms, the runner attempts to match the configured parallel width by calling the split API. For UnboundedSource , the runner uses the specified parallel width as the desiredNumSplits argument in split . For BoundedSource , the runner uses the specified parallel width to calculate the desiredBundleSizeBytes . However, the split API does not guarantee to respect the desired widths. Some sources might be unsplittable, and some only split to a certain number of sub-sources. Source parallel width is set to either the specified width or the number of sub-sources, whichever is smaller. If there were more sub-sources than specified width, one channel could contain multiple sub-source instances.For KV PCollection s, it is guaranteed that the same key is always processed in the same parallel instance, correctly preserving per-key states.Changing parallel width (since IBM Streams release 4.3.0) dynamically at runtime will break Beam pipelines. |
[null] |
For the full list of pipeline options, enter --help=StreamsPipelineOptions
on the Beam application command line.
STREAMING_ANALYTICS_SERVICE
context-specific pipeline options
Parameter | Description | Default value |
---|---|---|
vcapServices |
The location of the Streaming Analytics VCAP file. This parameter is required when you use the STREAMING_ANALYTICS_SERVICE context type. This parameter can be omitted if the $VCAP_SERVICES environment variable is set to the path of the file. |
[null] |
serviceName |
The name of the Streaming Analytics service on IBM Cloud. This parameter is required when you use the STREAMING_ANALYTICS_SERVICE context type. |
[null] |
DISTRIBUTED
context-specific pipeline options
Parameter | Description | Default value |
---|---|---|
restUrl |
The URL for the REST API when you use the DISTRIBUTED context type. This parameter is required if you implement metrics in the application. |
The return value of command streamtool geturl |
userName |
The user name for basic authentication with REST API when you use the DISTRIBUTED context type. |
The user.name system property |
userPassword |
A path to a file containing the user password (recommended) or a string containing the user password for basic authentication for REST API when you use the DISTRIBUTED context type. If the option value is a path to a readable file, the first line of the file will be used as the password. |
[null] |
Environment Variables
These environment variables are not required for Streams Runner to work; however, they can be used for convenience when you launch your Beam application.
Name | Description | Notes | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
STREAMS_RUNNER_HOME | The absolute path to the extraction location of the `com.ibm.streams.beam-Set by using one of the following methods:
|
</tr>
STREAMS_BEAM_TOOLKIT |
The path to the Streams Runner toolkit ( |
$STREAMS_RUNNER_HOME/com.ibm.streams.beam )Set by using one of the following methods:
|
VCAP_SERVICES |
The path to the IBM Cloud credentials file. If this environment variable is set, the |
--vcapServices parameter does not need to be specified on the command line.For more information about the credentials file, see Creating a credentials file for your Streaming Analytics service. Set by using the |
export command.STREAMING_ANALYTICS_SERVICE_NAME |
The name of the Streaming Analytics service in the IBM Cloud credentials file to use. If this environment variable is set, the |
--serviceName parameter does not need to be specified on the command line.Set by using the |
export command.STREAMS_INSTALL |
The path to the local IBM Streams installation on your system. Only set if submitting an application to a local Streams environment. |
Important: If this variable exists, you must use the |
unset command to unset it before you can submit an application to the Streaming Analytics service. |