Publish-subscribe Overview

Toolkits > com.ibm.streamsx.topology 2.1.0 > com.ibm.streamsx.topology.topic > Publish-subscribe Overview

Applications can publish streams to a topic name which can then be subscribed to by other applications (or even the same application). Publish-subscribe works across applications written in SPL and those written using application APIs provided by this toolkit.

A subscriber matches a publisher if their topic filter matches a publisher's topic name and the stream type is an exact match to that of the publisher. It is recommended that a single stream type is used for a topic name.

A topic is a rstring value (encoded with UTF-8), based upon the MQTT topic style.

Topic names for publishing a stream:

  • Must be at least one character long.
  • Use / as a level separator, zero length topic levels are valid.
  • Must not include wild card characters + and #.
  • Must not include the nul character \u0000.
Topic filters for subscribing to streams:
  • Must be at least one character long.
  • Use / as a level separator.
  • Must not include the nul character \u0000.
  • + is a single-level wildcard character that can be used at any level, but it must occupy the entire level. +, a/b/+, +/b/+ and +/b are valid but a/b/c+ is not valid.
  • # is a wildcard character that matches any number of levels including the parent and any number of child levels. The multi-level wildcard character must be specified either on its own or following a topic level separator. In either case it must be the last character specified in the topic filter. # and 'a/b/#' are valid but a/b/c# and a/#/c are not valid.

Without a wildcard character a topic filter is an exact match for a topic name so that filter a/b/c only matches a/b/c.

Single-level filter (+) match examples are:
  • filter + matches a and b but not a/b
  • filter a/+ matches a/b, a/c and a/ but not a, b/c or a/b/c
  • filter +/+ matches a/b, b/c, d/ and / but not a or a/b/c
Multi-level filter (#) match examples are:
  • filter # matches every topic name such as a, b/c, //
  • filter a/b/# matches a/b (parent), a/b/c, a/b/d and a/b/c/d

Note: a publish-subscribe match requires the stream type to match as well as the topic filter matching the topic name.

Note: For correct wildcard matching both pblisher and subscriber must have been compiled with version 1.3.0 or later.

Publish-subscribe is a many to many relationship, any number of publishers can publish to the same topic and stream type, and there can be many subscribers to a topic.

For example a telco ingest application may process Call Detail Records from network switches and publish processed records on multiple topics, cdr/voice/normal, cdr/voice/dropped, cdr/sms, etc. by publishing each processed stream with its own topic. Then a dropped call analytic application would subscribe to the cdr/voice/dropped topic.

Publish-subscribe is dynamic, using IBM Streams dynamic connections, an application can be submitted that subscribes to topics published by other already running applications. Once the new application has initialized, it will start consuming tuples from published streams from existing applications. And any stream the new application publishes will be subscribed to by existing applications where the topic and stream type matches.

An application only receives tuples that are published while it is connected, thus tuples are lost during a connection failure.

SPL Publish-Subscribe

An SPL application uses Publish to publish a stream to a topic, and Subscribe to subscribe to a topic.

Java & Scala Publish-Subscribe

A Java application uses TStream.publish(topic) to publish streams.

Python Publish-Subscribe

A Python application uses Stream.publish(topic) to publish streams and Topology.subscribe(topic) to subscribe to published streams.

Python supports publishing and subscribing to streams containing Python, SPL (structured) schemas, JSON or String objects. igy

Interchangeable Stream Types

Published streams can be subscribed to by IBM Streams applications written in different languages, by ensuring common stream types (schemas).

  • SPL Tuples
    • SPL : SPL schema of the stream.
    • Java : SPLStream with a schema matching the SPL schema.
    • Scala : com.ibm.streamsx.topology.spl.SPLStream with a schema matching the SPL schema.
    • Python: streamsx.topology.schema.StreamSchema matching the SPL schema. Tuples are dict, tuple or namedtuple objects.
  • JSON tuples
  • String tuples
    • SPL : com.ibm.streamsx.topology::String
    • Java: TStream<String>
    • Scala: TStream[String]
    • Python: schema.CommonSchema.String - Tuples are Python string objects. Python object tuples are converted to strings using str() when publishing as String
  • XML tuples
  • Binary tuples

Behavior with parallel regions

Topic publish-subscribe model was changed for releases 1.1.6 to have intuitive and defined behavior when the publisher or subscriber is in a parallel region (with width > 1).

It is recommended that if either publisher or subscriber is in a parallel region then applications should be compiled against released versions of this toolkit:
  • 1.1.x with x >= 6
  • 1.2.x with x >= 6
  • or any new version in future

The intuitive case is that parallel regions are used to partition tuple processing across the channels, so that each channel processes a subset of the tuples, and each tuple is processed by a single channel.

Any subscriber connecting to a publisher must then process the complete output from the publisher and only process each published tuple only once. For example if a publisher is in a parallel region with width three, then there are three published channels. A connecting subscriber with width two will have two subscribing channels. Each published channel must connect to a single subscribing channel, and in this case one subscriber channel will connect to two published channels and the other to the remaining published channel.

Table 1. Publish-Subscribe with Parallel Regions

Publisher

Subscriber

Behavior

not-parallel

not-parallel

Single connection between the publisher and subscriber containing all the tuples

parallel(1)

parallel(1)

not-parallel

parallel(N)

N > 1

One and only one of the N subscribers connects to the single publisher and thus the subscriber region correctly processes the tuples once. Note that which subscriber channel processes the single published channel is not defined, it could be any of 0,1,...,N-1.

parallel(1)

parallel(M)

M > 1

not-parallel

Single subscriber connects to each of the M publishers so that the subscriber region will process all the the published tuples once.

parallel(1)

parallel(M) M > 1

parallel(N) N > 1

Each published channel is connected to a single subscribe channel with channel number publish channel % N. Thus the subscriber region processes the tuples once.

Note: It's important to remember there may be multiple applications publishing and/or subscribing to the same topic and thus publish/subscribe must work in all combinations, e.g. there may be a publishers to the same topic of non-parallel, parallel(5) and parallel(3), and subscribers of non-parallel and parallel(7). This is especially true in a microservices style architecture where analytic applications may come and go and there may not be pre-assigned agreement that publisher and subscribers will have matching channel numbers.