Java API to allow creation of streaming applications for IBM Streams by Java & Scala developers.

See: Description

Packages 
Package Description
com.ibm.streamsx.rest
Java bindings for the REST APIs of IBM Streams and Streaming Analytics service.
com.ibm.streamsx.rest.build
Java binding for the Build REST APIs of IBM Cloud Pak for Data.
com.ibm.streamsx.rest.internal  
com.ibm.streamsx.rest.internal.icp4d  
com.ibm.streamsx.topology
Java application API for IBM Streams.
com.ibm.streamsx.topology.consistent
At least once processing using consistent regions.
com.ibm.streamsx.topology.context
Context for executing IBM Streams Java topologies.
com.ibm.streamsx.topology.context.local  
com.ibm.streamsx.topology.file
File related streams.
com.ibm.streamsx.topology.function
Classes that provide the mechanisms to implement functional transformations on streams.
com.ibm.streamsx.topology.inet
Network access to streaming data.
com.ibm.streamsx.topology.jobconfig  
com.ibm.streamsx.topology.json
JSON streams.
com.ibm.streamsx.topology.logic
Utilities and implementations of functional logic.
com.ibm.streamsx.topology.messaging.kafka
Support for integrating with the Apache Kafka messaging system http://kafka.apache.org.
com.ibm.streamsx.topology.messaging.mqtt
Support for integrating with the MQTT messaging system http://mqtt.org.
com.ibm.streamsx.topology.spi
SPI for adding extensions to the functional api.
com.ibm.streamsx.topology.spi.builder  
com.ibm.streamsx.topology.spi.runtime  
com.ibm.streamsx.topology.spl
Integration between Streams SPL and Java applications.
com.ibm.streamsx.topology.streams
Utilities to create and transform streams.
com.ibm.streamsx.topology.tester
Support for testing of topologies.
com.ibm.streamsx.topology.tester.junit  
com.ibm.streamsx.topology.tester.spl  
com.ibm.streamsx.topology.tuple
Interfaces for tuple types.
Java API to allow creation of streaming applications for IBM Streams by Java & Scala developers.

Overview

IBM® Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. Streams can handle very high data throughput rates, millions of events or messages per second.
With this API Java & Scala developers can build streaming applications that can be executed using IBM Streams, including the processing being distributed across multiple computing resources (hosts or machines) for scalability.

Java Application API

The fundamental building block of a Java Streams application is a stream, which is continuous sequence of tuples (messages, events, records). The API provides the ability to perform some form of processing or analytics on each tuple as it appears on a stream, resulting in a new stream containing tuples representing the result of the processing against the input tuples.
Source streams are streams containing tuples from external systems, for example a source stream may be created from a reading messages from a message queue system, such as MQTT. The purpose of a source stream is to bring the external data into the Streams environment, so that it can be processed, analyzed, correlated with other streams, etc.
Streams are terminated using sink functions that typically deliver tuples to external systems, such as real-time dashboards, SMS alerts, databases, HDFS files, etc.

An application is represented by a Topology object containing instances of TStream. The Java interface TStream is a declaration of a stream of tuples, each tuple being an instance of the Java class or interface T. For example TStream<String> represents a stream where each tuple will be String object, while, for example, TStream<CallDetailRecord> represents a stream of com.sometelco.switch.CallDetailRecord tuples. Thus, tuples on streams are Java objects, rather than SPL tuples with an attribute based schema.

Streams are created (sourced), transformed or terminated (sinked) generally though functions, where a function is represented by an instance of a Java class with a single method. Frequently, these functions are implemented by anonymous classes specific to an application, though utility methods may encapsulate one or more functions. Here is an example of filtering out all empty strings from stream s of type String


TStream<String> s = ...
TStream<String> filtered = s.filter(new Predicate<String>() {
             @Override
             public boolean test(String tuple) {
                 return !tuple.isEmpty();
             }} );

s.filter() is passed an instance of Predicate, and sets up a filter where the output stream filtered will only contain tuples from the input stream s if the method test() returns true. This implementation of Predicate, provided as an anonymous class, returns true if the input tuple (a String object) is not empty. At runtime, every String tuple that appears on s will result in a call to test, if the String tuple is not empty, then filtered will contain the String tuple.

Java 8

With Java 8 lambda expressions or method references can be used to provide the function. Using a lambda expression the above example simplifies to:

TStream<String> s = ...
TStream<String> filtered = s.filter(tuple -> !tuple.isEmpty());


Java 8 is supported by IBM Streams 4.0.1 and later.

Scala Support

Since Scala interoperates with Java classes, applications are implemented in Scala by having the code simply call the Java Application API. A set of implicit conversions are provided to support use of Scala functions in stream transformations. See the Scala documentation under com.ibm.streamsx.topology/doc/scaladoc/index.html.

The API is provided as as an SPL toolkit com.ibm.streamsx.topology containing the Java API in lib/com.ibm.streamsx.topology.jar as well as the SPL operators used to execute the Java functional transformations.

Features

These features are supported:
FeatureReferenceSince
Tuple types are Java and/or Scala objects.TStream1.0
Functional programming, streams are transformed, filtered etc. by functional transformations implemented as Java and/or Scala functions. A Java function is an implementation of interface with a single method, or when using Java 8 a lambda expression or a method reference.TStream1.0
Execution within the Java virtual machine, IBM Streams 4.0.1+ Streams standalone or distributed & IBM BluemixStreamsContext1.0
Pipeline topologies.Topology1.0
Fan-out, multiple independent functions may be applied to a single stream to produce multiple streams of different or the same type.TStream1.0
Fan-in, multiple independent streams of the same type may transformed by a single function to produce a single stream.union1.0
Window based aggregation and joins, including partitioning.TWindow1.0
Parallel streams (UDP, User Defined Parallelism), including partitioning.parallel1.0
Topic based publish-subscribe stream model for cross application communication (Streams dynamic connections). publish, subscribe1.0
Ability to specify where portions of the topology will execute in distributed mode, including running on resources (hosts) with specified tags. Placeable, isolate, lowLatency1.0
Integration with Apache Kafka and MQTT messaging systems Kafka, MQTT1.0
Testing of topologies, including those using SPL operators, while running in distributed, standalone or embedded.Tester1.0
Integration with SPL streams using SPL attribute schemas.SPLStream1.0
Invocation of existing SPL primitive or composite operators.SPL1.0

Samples

A number of sample Java applications are provided under samples. The samples declare a topology and then execute it, the Apache Ant build.xml file includes some targets for executing the samples, demonstrating the correct class path.
The javadoc for the samples includes the sample source code (click on the class name of a sample), and is also copied into the SPL toolkit for reference, and is available here: Java Functional Samples

Declaring a Topology

Java code is used to create a streaming topology, or graph, starting with the Topology object and then creating instances of TStream by:
  • • Invoking methods such as source to produce a source stream.
  • • Invoking utility static methods that declare a source stream, such as those in BeaconStreams.
  • • Invoking methods such as filter to produce a stream derived from another stream.

Streams are terminated by sinks, typically the tuples are sent to an external system by the sink function.

A topology may be arbitrarily complex, including multiple sources and sinks, fan-out on any stream by having multiple functional transformations or fan-in by creating a union of streams with identical tuple types.

Creating the Topology and its streams as instances of TStream just declares how tuples will flow, it is not a runtime representation of the graph. The TStream is submitted to a StreamsContext in order to execute the graph.

Java compilation and execution

The API requires these jar files to be in the classpath for compilation and execution:
  • com.ibm.streamsx.topology/lib/com.ibm.streamsx.topology.jar - Jar for this API.
  • $STREAMS_INSTALL/lib/com.ibm.streams.operator.samples.jar - IBM Streams Java Operator API and samples.

Testing

The API includes the ability to test topologies, by allow the test program to capture the output tuples of a stream (TStream) and validate them. This is described in the com.ibm.streamsx.topology.tester package overview.

Integration with SPL

While the design goal for the API is to not require knowledge of SPL, developers familiar with SPL may also utilize some SPL primitive and composite operators from existing toolkits and use streams that have SPL schemas.
See Also:
Integrating SPL operators
streamsx.topology 2.1 @ IBMStreams GitHub