com.ibm.streamsx.topology.spl

Interface SPLStream

  • All Superinterfaces:
    Placeable<TStream<com.ibm.streams.operator.Tuple>>, SPLInput, TopologyElement, TStream<com.ibm.streams.operator.Tuple>


    public interface SPLStream
    extends TStream<com.ibm.streams.operator.Tuple>, SPLInput
    A SPLStream is a declaration of a continuous sequence of tuples with a defined SPL schema. A SPLStream is a TStream<Tuple> thus may be handled using any functional logic where each tuple will be an instance of com.ibm.streams.operator.Tuple.
    • Method Detail

      • getSchema

        com.ibm.streams.operator.StreamSchema getSchema()
        SPL schema of this stream.
        Returns:
        SPL schema of this stream.
      • toJSON

        TStream<com.ibm.json.java.JSONObject> toJSON()
        Transform SPL tuples into JSON. Each tuple from this stream is converted into a JSON representation.
        • If getSchema() returns JSONSchemas.JSON then each tuple is taken as a serialized JSON and deserialized. If the serialized JSON is an array, then a JSON object is created, with a single attribute payload containing the deserialized value.
        • Otherwise the tuple is converted to JSON using the encoding provided by the SPL Java Operator API com.ibm.streams.operator.encoding.JSONEncoding.
        Returns:
        A stream with each tuple as a JSONObject.
      • convert

        <T> TStream<T> convert(Function<com.ibm.streams.operator.Tuple,T> convertor)
        Convert SPL tuples into Java objects. This call is equivalent to transform(converter).
        Parameters:
        convertor - Function to convert
        Returns:
        Stream containing tuples of type T transformed from this stream's SPL tuples.
        See Also:
        TStream.transform(Function)
      • toTupleString

        TStream<java.lang.String> toTupleString()
        Create a stream that converts each input tuple on this stream to its SPL character representation representation.
        Returns:
        Stream containing SPL character representations of this stream's SPL tuples.
      • toStringStream

        TStream<java.lang.String> toStringStream()
        Create a TStream<Tuple> from this stream. This SPLStream must have a schema of SPLSchemas.STRING.
        Returns:
        This stream declared as a TStream<Tuple>.
        Throws:
        java.lang.IllegalStateException - Stream does not have a value schema for TStream<Tuple>.
        See Also:
        SPLStreams.stringToSPLStream(TStream)
      • endLowLatency

        SPLStream endLowLatency()
        Return a TStream that is no longer guaranteed to run in the same process as the calling stream. For example, in the following topology:
        
         ---source---.lowLatency()---filter---transform---.endLowLatency()---filter2
         
        It is guaranteed that the filter and transform operations will run in the same process, but it is not guaranteed that the transform and filter2 operations will run in the same process.

        Only applies for distributed contexts.
        Specified by:
        endLowLatency in interface TStream<com.ibm.streams.operator.Tuple>
        Returns:
        A stream that is not guaranteed to run in the same process as the calling stream.
      • filter

        SPLStream filter(Predicate<com.ibm.streams.operator.Tuple> filter)
        Declare a new stream that filters tuples from this stream. Each tuple t on this stream will appear in the returned stream if filter.test(t) returns true. If filter.test(t) returns false then then t will not appear in the returned stream.

        Examples of filtering out all empty strings from stream s of type String

         
         // Java 8 - Using lambda expression
         TStream<String> s = ...
         TStream<String> filtered = s.filter(t -> !t.isEmpty());
                     
         // Java 7 - Using anonymous class
         TStream<String> s = ...
         TStream<String> filtered = s.filter(new Predicate<String>() {
                     @Override
                     public boolean test(String t) {
                         return !t.isEmpty();
                     }} );
         
         

        Specified by:
        filter in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        filter - Filtering logic to be executed against each tuple.
        Returns:
        Filtered stream
        See Also:
        TStream.split(int, ToIntFunction)
      • lowLatency

        SPLStream lowLatency()
        Return a stream that is guaranteed to run in the same process as the calling TStream. All streams that are created from the returned stream are also guaranteed to run in the same process until TStream.endLowLatency() is called.

        For example, for the following topology:
        
         ---source---.lowLatency()---filter---transform---.endLowLatency()---
         
        It is guaranteed that the filter and transform operations will run in the same process.

        Only applies for distributed contexts.
        Specified by:
        lowLatency in interface TStream<com.ibm.streams.operator.Tuple>
        Returns:
        A stream that is guaranteed to run in the same process as the calling stream.
      • isolate

        SPLStream isolate()
        Return a stream whose immediate subsequent processing will execute in a separate operating system process from this stream's processing.
        For the following Topology:
        
         -->transform1-->.isolate()-->transform2-->transform3-->.isolate()-->sink
         
        It is guaranteed that:
        • transform1 and transform2 will execute in separate processes.
        • transform3 and sink will execute in separate processes.
        If multiple transformations (t1, t2, t3) are applied to a stream returned from isolate() then it is guaranteed that each of them will execute in a separate operating system process to this stream, but no guarantees are made about where t1, t2, t3 are placed in relationship to each other.
        Only applies for distributed contexts.
        Specified by:
        isolate in interface TStream<com.ibm.streams.operator.Tuple>
        Returns:
        A stream that runs in a separate process from this stream.
      • modify

        SPLStream modify(UnaryOperator<com.ibm.streams.operator.Tuple> modifier)
        Declare a new stream that modifies each tuple from this stream into one (or zero) tuple of the same type T. For each tuple t on this stream, the returned stream will contain a tuple that is the result of modifier.apply(t) when the return is not null. The function may return the same reference as its input t or a different object of the same type. If modifier.apply(t) returns null then no tuple is submitted to the returned stream for t.

        Example of modifying a stream String values by adding the suffix 'extra'.

         
         TStream<String> strings = ...
         TStream<String> modifiedStrings = strings.modify(new UnaryOperator() {
                     @Override
                     public String apply(String tuple) {
                         return tuple.concat("extra");
                     }});
         
         

        This method is equivalent to transform(Function<T,T> modifier).

        Specified by:
        modify in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        modifier - Modifier logic to be executed against each tuple.
        Returns:
        Stream that will contain tuples of type T modified from this stream's tuples.
      • sample

        SPLStream sample(double fraction)
        Return a stream that is a random sample of this stream.
        Specified by:
        sample in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        fraction - Fraction of the data to return.
        Returns:
        Stream containing a random sample of this stream.
      • throttle

        SPLStream throttle(long delay,
                           java.util.concurrent.TimeUnit unit)
        Throttle a stream by ensuring any tuple is submitted with least delay from the previous tuple.
        Specified by:
        throttle in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        delay - Maximum amount to delay a tuple.
        unit - Unit of delay.
        Returns:
        Stream containing all tuples on this stream. but throttled.
      • parallel

        SPLStream parallel(int width)
        Parallelizes the stream into a a fixed number of parallel channels using round-robin distribution.
        Tuples are routed to the parallel channels in a round-robin fashion.
        Subsequent transformations on the returned stream will be executed width channels until TStream.endParallel() is called or the stream terminates.
        See TStream.parallel(Supplier, Routing) for more information.
        Specified by:
        parallel in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        width - The degree of parallelism in the parallel region.
        Returns:
        A reference to a stream for which subsequent transformations will be executed in parallel using width channels.
      • parallel

        SPLStream parallel(Supplier<java.lang.Integer> width,
                           TStream.Routing routing)
        Parallelizes the stream into width parallel channels. Tuples are routed to the parallel channels based on the TStream.Routing parameter.
        If TStream.Routing.ROUND_ROBIN is specified the tuples are routed to parallel channels such that an even distribution is maintained.
        If TStream.Routing.HASH_PARTITIONED is specified then the hashCode() of the tuple is used to route the tuple to a corresponding channel, so that all tuples with the same hash code are sent to the same channel.
        If TStream.Routing.KEY_PARTITIONED is specified each tuple is is taken to be its own key and is routed so that all tuples with the same key are sent to the same channel. This is equivalent to calling TStream.parallel(Supplier, Function) with an identity function.
        If parallel is invoked when submitting to an embedded context, the flow will execute as though parallel had not been called.
        Given the following code:
         
         TStream<String> myStream = ...;
         TStream<String> parallel_start = myStream.parallel(of(3), TStream.Routing.ROUND_ROBIN);
         TStream<String> in_parallel = parallel_start.filter(...).transform(...);
         TStream<String> joined_parallel_streams = in_parallel.endParallel();
         
         
        a visual representation a visual representation for parallel() would be as follows:
         
                          |----filter----transform----|
                          |                           |
         ---myStream------|----filter----transform----|--joined_parallel_streams
                          |                           |
                          |----filter----transform----|
         
         
        Each parallel channel can be thought of as being assigned its own thread. As such, each parallelized stream function (filter and transform, in this case) are separate instances and operate independently from one another.
        parallel(...) will only parallelize the stream operations performed after the call to parallel(...) and before the call to endParallel(). In the above example, the parallel(3) was invoked on myStream, so its subsequent functions, filter(...) and transform(...), were parallelized.

        Parallel regions aren't required to have an output stream, and thus may be used as sinks. The following would be an example of a parallel sink:
         
         TStream<String> myStream = ...;
         TStream<String> myParallelStream = myStream.parallel(6);
         myParallelStream.print();
         
         
        myParallelStream will be printed to output in parallel. In other words, a parallel sink is created by calling TStream.parallel(int) and creating a sink operation (such as TStream.sink(Consumer)). It is not necessary to invoke TStream.endParallel() on parallel sinks.

        Limitations of parallel() are as follows:
        Nested parallelism is not currently supported. A call to parallel(...) should never be made immediately after another call to parallel(...) without having an endParallel() in between.

        parallel() should not be invoked immediately after another call to parallel(). The following is invalid:
         
         myStream.parallel(2).parallel(2);
         
        Every call to endParallel() must have a call to parallel(...) preceding it. The following is invalid:
         
         TStream myStream = topology.strings("a","b","c");
         myStream.endParallel();
         
        A parallelized stream cannot be joined with another window, and a parallelized window cannot be joined with a stream. The following is invalid:
         
         TWindow numWindow = topology.strings("1","2","3").last(3);
         TStream stringStream = topology.strings("a","b","c");
         TStream paraStringStream = myStream.parallel(5);
         TStream filtered = myStream.filter(...);
         filtered.join(numWindow, ...);
         
        Specified by:
        parallel in interface TStream<com.ibm.streams.operator.Tuple>
        Parameters:
        width - The degree of parallelism. see TStream.parallel(int width) for more details.
        routing - Defines how tuples will be routed channels.
        Returns:
        A reference to a TStream<> at the beginning of the parallel region.
        See Also:
        Topology.createSubmissionParameter(String, Class), TStream.split(int, ToIntFunction)
      • autonomous

        SPLStream autonomous()
        Return a stream matching this stream whose subsequent processing will execute in an autonomous region. By default IBM Streams processing is executed in an autonomous region where any checkpointing of operator state is autonomous (independent) of other operators.
        This method may be used to end a consistent region by starting an autonomous region. This may be called even if this stream is in an autonomous region.
        Autonomous is not applicable when a topology is submitted to embedded and standalone contexts and will be ignored.
        Specified by:
        autonomous in interface TStream<com.ibm.streams.operator.Tuple>
      • setConsistent

        SPLStream setConsistent(ConsistentRegionConfig config)
        Set the source operator for this stream to be the start of a consistent region to support at least once and exactly once processing. IBM Streams calculates the boundaries of the consistent region that is based on the reachability graph of this stream.

        Consistent regions are only supported in distributed contexts.

        Specified by:
        setConsistent in interface TStream<com.ibm.streams.operator.Tuple>
        Returns:
        this
        See Also:
        ConsistentRegionConfig
streamsx.topology 1.5 @ IBMStreams GitHub