SPL Operator Model

An operator model can be used to describe a C++ primitive operator or a Java primitive operator. The set of elements that can be present in a Java operator model is a strict subset of the ones that can be present in a C++ operator model, with the exception of a special element used for JVM related configurations.

Context

The context element describes the properties that apply to the operator as a whole and are not associated with particular parameters or ports of the operator. It also includes common definitions that are referenced in other places in the operator model.

Description
The description element, which is optional, provides an overview of the operator.
Metrics
The metrics element, which is optional, contains the list of metrics exposed by the operator. It is structured as a list of metric elements, where each metric element contains a name, a description, and a kind.
Kind: Counter
Represents metrics whose values are either non-decreasing or non-increasing.
Kind: Gauge
Represent metrics that can change their values freely, that is, they can go up or down.
Kind: Time
Represents metrics that denote a point in time.
Custom Literals
The customLiterals element, which is optional, captures the identifiers that may appear in parameter configurations of an operator. It is structured as a list of enumeration elements. For instance, a Source operator may support different source formats, in which case we can have an enumeration called FileFormat that will contain values {csv, xml, bin}.
Custom Output Functions (C++ only)
The customOutputFunctions element, which is optional, captures the output function prototypes used by an operator in its output attribute assignments. It is structured as a list of customOutputFunction elements, where each enumeration contains a name and a list of output function prototypes. For instance, an Aggregate operator may support relational aggregations, in which case we can have an enumeration called RelationalAggs that will contain output functions {Min, Max, Avg, Sum, and so on}.
Dependencies (Optional)
A sequence of one or more library elements, each representing a library dependency. The library element format is exactly the same as the one used for operator models.
Description (Optional)
A description of the library
Managed Library
Specifies the details of the individual library artifacts. The paths can contain environment variables embedded between @ signs (for example: @FOO_FFT_HOME@/lib), which will be fully resolved by the SPL compiler at compile time.
lib (Optional)
Specifies a name to be passed to C++ compiler's -l argument (such as fft which will be translated into -lfft when passed to the linker)
libPath (Optional)
Specifies a path to be passed to C++ compiler's -L argument.
includePath (Optional)
Specifies a path to be passed to C++ compiler's -I argument.
command (Optional)
A path to a program that will be executed to retrieve includePath, libPath, and lib information. If the path to the program is relative, it is assumed to be rooted at the directory of the operator model. The program is executed three times, each time with a different argument, namely lib, libPath, and includePath. 1 The standard output from these executions will be read and each line (trimmed of white spaces) will be added to one of the lib, libPath, and includePath elements, depending on the type of the execution. A line that begins with # will be ignored. Relative paths are assumed to be rooted at the directory where the operator model XML document resides.
Provides Single Threaded Execution Context (C++ only)
The providesSingleThreadedContext element describes the threading semantics of the operator with respect to the flow of execution. An operator provides a single threaded execution context, if and only if:
  • It does not perform concurrent submit calls unless its process method(s) are called concurrently.
  • Its submit calls complete before the process call that triggered the submission completes.

Both source and non-source operators have process methods, and the definition above applies globally. Based on this definition, if an operator has submit calls that are not triggered by a process call, such as those triggered by a time-based event, then that operator does not provide a single threaded execution context. Note that this definition does not require a submit call to execute under the same thread that executes the process call which triggered the submission (even though in the common case they execute under the same thread).

There are several valid values for this property:
  • Never: Instances of this operator never provide a single threaded execution context.
  • Always: Instances of this operator always provide a single threaded execution context.
  • WindowBound: Instances of this operator that do not specify time-based window eviction policies or time-based window trigger policies provide a single threaded execution context.
  • WindowEvictionBound: Instances of this operator that do not specify time-based window eviction policies provide a single threaded execution context.
  • WindowTriggerBound: Instances of this operator that do not specify time-based window trigger policies provide a single threaded execution context.
  • WindowPartitionEvictionBound: Instances of this operator use a thread to implement partition eviction. Use this setting if tuples are submitted from the onWindowPartitionEvictionSelection event.

As an example, consider a Filter operator. Unless its process method is being called concurrently, the Filter operator does not make concurrent submit calls. Its submit calls are triggered by incoming tuples. When it receives a tuple via a process call, it makes a submit call if the received tuple passes the filter condition, and that submit call completes before the process call that triggered it is complete. As a result, all instances of a Filter operator provide a single threaded context and the setting Always is appropriate.

Implementation note: The providesSingleThreadedContext element is used to enable the SPL runtime to avoid unnecessary thread synchronization. While setting it to the value Never is safe for all operators, it would prevent optimizations that reduce synchronization overhead when the operator does provide a single threaded context. Specifying a value other than Never that is inconsistent with the threading semantics implemented by the operator will result in undefined behavior.

Incremental Compilation Strategy (C++ only)
Specifies how the compiler should manage incremental compilation of operators. The choices are:
  • SourceDependent: In this mode the compiler will only regenerate the operator source if it is out-of-date with the SPL source or the code generator for that operator. This is the default mode.
  • ResultDependent: In this mode the compiler always generates the operator source, but only updates the source files if they differ from what exists prior to the compile. Use this mode if the operator code generator relies on external configurations that are not captured by the parameterization given in the SPL source.
Allow Custom Logic (C++ only)

This optional element specifies whether or not the use of an operator is permitted to have a logic clause specifying either state, onTuple or onPunct processing. When set to false no logic clause may be specified for the given operator. The default, in the absence of this element, is true.

Code Template

This optional element specifies one or more code templates for the operator. These will show up in IDE's context sensitive content assist menus and SPLDOC documents. Each code template has a name attribute that names it, a description element that describes it, and a value element, which is a string that contains the boilerplate code for the template. When the template is used in the IDE, the boilerplate code is embedded into the source code. The parts of the code that are in the form ${name} are used to indicate the pieces that must be customized by the user. The IDE will use the identifier specified within the ${} to indicate the customizable portions. One example for the Barrier operator is as follows:

<codeTemplates>
  <codeTemplate name="Barrier">
    <description>Basic Barrier template</description>
    <template>
      <![CDATA[ 
        stream<${schema}> ${outputStream} = Barrier(${inputStream1};${inputStream2}) 
        {
          param
            ${parameter}: ${parameterExpression};
          output
            ${outputStream}: ${outputExpression};
          ${cursor} 
        }
      ]]>
    <template>
  </codeTemplate>
</codeTemplates>
SPL Expression Tree (C++ only)

An optional element that controls the generation of SPL expression trees for use in generic C++ primitive operators.

  • param - If set to true, the SPL Expression Trees are generated for parameters.
  • output - If set to true, the SPL Expression Trees are generated for output.
  • cppCode - If set to true, each node in the generated operator instance XML is enhanced with C++ code using templates. This C++ code can be used to generate the C++ code for an SPL expression. For example, for the SPL code:
    param predicates : {a = "a" == In.letter, b = "b" == In.letter};
    

    The generated SPL expression tree includes:

    <expressionTree cppCode="SPL::BeJwrMUoyTEwyTAIAC7UCCQ({attr:0}, {attr:1})">
      <literal cppCode="SPL::BeJwrMUoyTEwyTAIAC7UCCQ({attr:0}, {attr:1})" type="1">
        <tuple count="2" cppCode="SPL::BeJwrMUoyTEwyTAIAC7UCCQ({attr:0}, {attr:1})" type="1">
          <attr id="a">
            <value cppCode="({Lhs} == {Rhs})" type="2">
              <expn cppCode="({Lhs} == {Rhs})">
                <binary cppCode="({Lhs} == {Rhs})" op="==" type="2">
                  <lhs cppCode="SPL::rstring("a")">
                    <literal cppCode="SPL::rstring("a")" type="0">"a"</literal>
                  </lhs>
                  <rhs cppCode="iport$0.get_letter()">
                    <attribute attribute="letter" cppCode="iport$0.get_letter()" type="0">
                      <lhs cppCode="iport$0">
                        <stream cppCode="iport$0" name="In" port="0" type="3"/>
                      </lhs>
                    </attribute>
                  </rhs>
                </binary>
              </expn>
            </value>
          </attr>
    

    The templates (for example, {Lhs}, {attr:0}) are used to ensure that code replacement is well defined.

    These expressions represent the SPL expression, but are available in a form that can easily be walked. Perl objects are derived from SPL::Operator::Instance::ExpressionTree, and have a kind, type, and methods to access the fields of the expression. ExpressionTreeVisitor is a visitor pattern provided to allow easy walking of the expression tree. For more information, see the IBM Streams Processing Language Code Generation API Documentation.

Operating System Capabilities (C++ only)

This optional list of elements specifies special privileges for the operator. IBM Streams supports the Linux capabilities model via the capability element. You can include any number of elements to specify the exact privileges your operator requires. For example, <capability>CAP_NET_RAW+eip</capability> indicates that the operator needs permission to access raw sockets. Note that the IBM Streams instance must be configured to allow PE processes to run with special operating system capabilities.

Input Port Set

Input ports are defined in terms of port sets. A port set is a fixed number of ports that share the same configuration. This avoids repetition of the same configuration for different ports. A port set can be open, in which case it can contain zero or more ports with the same configuration. An inputPorts element contains zero or more inputPortSet elements, followed by an optional inputPortOpenSet element.

Cardinality
Defines the number of ports that the port set represents. This property applies to non-open port sets.
Optional
A boolean which specifies whether the input port set is optional.
Control Port
The optional controlPort element tells the compiler that tuples received on this port will be used only to control the operator, and no tuples will be submitted when tuples are processed on this port. If not specified, the value is false. The SPL compiler will emit warnings when loops are found in the operator graph, as this can lead to deadlock or infinite recursion. Setting controlPort to true will tell the compiler that this port will not submit further tuples, and that this is an expected (and harmless) feedback loop, so no warning will be emitted for this port.
Windowing Mode
The windowingMode element specifies the valid windowing configurations for the port. Options include NonWindowed, Windowed, and OptionallyWindowed.
Window Punctuation Input Mode
The windowPunctuationInputMode element specifies the punctuation semantics of the input port. The valid options are:
  • Expecting - This port expects window punctuations in order for the operator to function correctly and thus must be fed a punctuated stream.
  • Oblivious - This port does not require punctuations in order for the operator to work correctly and thus has no restrictions on the connections that can be attached to it.
  • WindowBound - This port is an Expecting port if it has a punctuation based window, and an Oblivious port otherwise.
Window Expression Mode
This element tells the compiler what type of windowing expressions are valid. If not specified, the default is Constant.
  • Constant - Expressions in count, time, and delta must be constants that can be evaluated at compile time, or if runtime expressions that do not reference input tuple attributes are valid.
  • AttributeFree - Expressions cannot reference input tuple attributes. An expression such as time ((int32) getSubmissionTimeValue("timeParam")) can be used. For delta, only the second argument is allowed to be a runtime attribute-free expression. The first argument is still an attribute from the input stream.
Rewrite Allowed for Window Expression (C++ only)
If set to true, this boolean element tells the compiler that it may rewrite the window expression the same way the rewriteAllowed element rewrites the expressions that appear in the parameter values. For more information about the rewriteAllowed element, see Parameters. If the rewriteAllowedForWindowExpression element is not specified, by default the value is set to false. rewriteAllowedForWindowExpression must be false (or omitted) if the C++ primitive operator wants to examine the value as a literal.
Tuple Mutation Allowed
The tupleMutationAllowed element defines whether the processing logic attached to the input port (this includes both the logic associated with the operator's process functions and the processing done as part of the onTuple clauses specified in the SPL code) can mutate an incoming tuple. It can be set to true for operators that desire to modify the tuples they receive.

Output Port Set

Output ports are defined in terms of port sets, just like input ports. A port set is a fixed number of ports that share the same configuration. This avoids repetition of the same configuration for different ports. A port set can be open, in which case it can contain zero or more ports with the same configuration. An outputPorts element contains zero or more outputPortSet elements, followed by an optional outputPortOpenSet element.

Cardinality
Defines the number of ports that the port set represents. This property applies to non-open port sets.
Optional
A boolean which specifies whether the output port set is optional.
Expression Mode (C++ only)
The expressionMode element describes the valid syntax of the attribute assignments made on this port. Note that an expressionMode value of CustomLiteral is not valid for output ports and will result in a compilation error. Valid values for the expression mode are:
  • Attribute: This means that the assignments made to output attributes of this port need to be stream attributes. For example: output Out : x = In.y;, but not x = In.y.z.
  • AttributeFree: This means that the assignments made to output attributes of this port cannot reference any input stream attributes. For example: output Out : x = 3 + random(3);, but not x = In.x + 3.
  • Constant: This means that the assignments made to output attributes of this port need to be compile-time evaluatable to a constant. For example: output Out : x = 3 + pow(2, 3);, but not x = random(3).
  • Expression: This is the most flexible expression mode, any SPL expression of correct type can appear as an assignment to the output attributes of this port. For example: output Out : x = A.y + B.z;.
  • Nonexistent: This means that output attribute assignments cannot be specified in the SPL source for this port.
Auto Assignment
The autoAssignment element defines whether unassigned attributes will be automatically assigned from the attributes of the input ports. If set to true, the SPL compiler will rewrite (at compile-time) the operator invocation as if the unassigned output attributes have explicit assignments in the output section. For each output attribute that is missing an assignment, an input attribute that has the same name and type, or that has the same name and type T, where the output attribute type is optional<T>, will be assigned to it. If there is no such input attribute or if there are more than one, an error is reported at compile-time. Note that an expressionMode value of Constant is incompatible with an autoAssignment value of true. This combination will result in a compilation error.
Complete Assignment
The completeAssignment element defines if all the output port attributes need to be assigned in order to have a valid invocation of the operator. This is checked at compile-time. If an operator has this element set to true in its operator model and if not all output attributes have assignments after the auto-assignment step (if requested) for a given instance of this operator, an error will be reported.
Rewrite Allowed (C++ only)
The rewriteAllowed element specifies whether the compiler is allowed to rewrite the expressions that appear in the output attribute assignments for this port.
Output Functions
The outputFunctions element defines the valid custom output functions that can appear in output attribute assignments. It is optional. When present, it contains two sub-elements: the type element, which defines the name of the custom output function type, as in RelationalAggs; and the default element, which defines the default output function to be used when performing auto-assignment of output attributes. This value should be a valid function name for the custom output function type that is being used (as in Last for RelationalAggs). Note that if the user code specifies an output attribute assignment without an output function for a port that expects an output function, the default output function will be inserted automatically.
Final Punctuation Port Scope
The finalPunctuationPortScope element, which is optional, specifies the set of input ports to be used by the SPL language runtime for final punctuation forwarding. By default, operators that have both input and output ports will automatically forward final punctuations from their input ports to their output ports. This is achieved by generating a final punctuation on an output port when a final punctuation is received on all input ports. The finalPunctuationPortScope can be used to limit the set of input ports to be used for forwarding the final punctuation. This element can also be used to turn off auto-forwarding of final punctuations, by setting the set of input ports to use for forwarding to the empty set. In this case, the operator developer is responsible for ensuring that the output port gets a final punctuation.
Window Punctuation Output Mode
The windowPunctuationOutputMode specifies the window punctuation semantics of the output port. The options are:
  • Generating - This port generates window punctuations.
  • Free - This port is free of window punctuations.
  • Preserving - This port preserves the received window punctuations. If an operator has more than one input port, then the windowPunctuationInputPort element must be specified in order to identify which input port's punctuation is being preserved.
Tuple Mutation Allowed
The tupleMutationAllowed element defines whether this operator permits the downstream operators to mutate the output tuples submitted to this port via the submit call. If set to true, then the processing logic of the operator should expect that the tuples it submits to this port are modified as a result of the submit call.
Window Punctuation Input Port
As mentioned above, the windowPunctuationInputPort element associates an input port with a punctuation preserving output port. This element may only be specified if the output port's window punctuation mode is Preserving. The windowPunctuationInputPort can be set to -1, which has the same semantics as a missing windowPunctuationInputPort element. It is important to note that punctuation forwarding for window punctuations is not performed automatically by the SPL language runtime (unlike final punctuations) and the operator model is used to inform the SPL compiler about the behavior that is being implemented by the operator. For more information, see the IBM Streams Processing Language Toolkit Development Reference.
Output Assignment Port Scope
The outputAssignmentPortScope optionally limits which input port attributes may appear in output assignments on this port. If a scope is specified, only attributes from the ports specified by the scope may appear in the output assignments for that port.

Parameters

The parameters element describes the valid parameters an operator can be configured with. It also describes the valid syntax for such parameter configurations.

Allow Any
This element is a boolean flag that determines whether an operator can take arbitrary parameters, with no restrictions. An operator can take arbitrary parameters, yet still specify additional parameters and associated restrictions.
Parameter

Each parameter element contains several subelements.

Name
The name element is the name of the parameter as it will appear in the SPL source code. For example, a Functor operator may have a filter parameter.
Description
An optional description of this parameter.
Optional
A boolean which specifies whether this parameter is optional. A value of false implies that the parameter must be specified in the SPL source code.
Rewrite Allowed (C++ only)
This boolean parameter allows the compiler to rewrite the expressions that appear in this parameter's values by substituting literals (including those resulting from compile-time evaluation step) with variables whose values are loaded at runtime. This enables the compiler to generate less code for operators that differ slightly in their parameter configurations. In certain cases, the operator code generators may want to look into the parameter value, in order to generate different code based on the particular value found or perform compile-time validation. For example, format: csv may result in generating specialized code for a Source operator. In such cases, expression rewrite should be turned off.
Expression Mode
  • Attribute - Restricts the parameter values to stream attributes.
  • AttributeFree - The parameter value is an expression that does not contain a reference to a stream attribute.
  • Constant (C++ only) - The parameter values need to be compile-time evaluatable to a constant.
  • CustomLiteral - Restricts the parameter values to valid values from one of the custom literal enumerations defined in the context section of the model
  • Expression (C++ only) - The most flexible expression mode, where any SPL expression of correct type can appear as a parameter value.
Type
The type of a parameter is either the SPL type of its values (such as list<ustring>) or a custom literal name (such as SourceFormat). The type can also be omitted, in which case any SPL type will match. The type subelement of a parameter can have an empty value, which has the same semantics as a missing type element.
Cardinality
The maximum number of values the parameter accepts. If omitted or the value is -1, the number of values is assumed to be unbounded. The number of parameter values must match the cardinality. The cardinality subelement can take a value of -1, which has the same semantics as a missing cardinality element.
Port Scope (C++ only)
This element is used to limit the stream attributes that appear in a parameter value to a specific input port or to a list of input ports. Port indices start from 0. When omitted, there are no restrictions on stream attributes.
Custom Output Function (C++ only)
This optional element of a parameter specifies the name of a custom output function set defined in the context element, and makes the functions defined in that set visible during the compilation of a parameter. It is the responsibility of the operator to generate correct C++ code that involves custom output functions with the parameter, in the same manner as it would be for a use in an output clause.