Operator RutaText

Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > RutaText

The Java Operator RutaText uses the Apache UIMA Ruta rules to annotate incoming tuple with text of type rstring. It creates a UIMA CAS from the text, applies the RUTA script to the CAS, serializes the resulting CAS to .xmi and then submits a tuple of type rstring. The RUTA rules, resources and CAS types are expected in a UIMA .pear file. The .pear file gets loaded on operator initialization and reloaded when a window punctuation is received on the second input port. The .pear file is installed in the data directory under 'installedPears<OPERATOR_NAME>'. If data directory is not set, then /tmp is used for installation. Please, find in toolkit dir ./doc/UIMA_workbench.pdf a detailed sample description of Ruta .pear creation. If this operator is used in the Streaming Analytics service (IBM Cloud), then the data directory needs to be set to '/tmp'.

Summary

Ports
This operator has 2 input ports and 2 output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 14 parameters.

Required: pearFile

Optional: casJson, casOut, debugMode, errorsAttribute, inputDoc, languageCode, languageCodeAttribute, outputAttributes, outputTypes, outputViews, removeBasics, trimInputDoc, view

Metrics
This operator does not report any metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

Port that ingests tuples

Properties
Ports (1)

Optional control port

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

Port that produces tuples

Properties

Ports (1)

Port that reports detected errors

Properties

Parameters

This operator supports 14 parameters.

Required: pearFile

Optional: casJson, casOut, debugMode, errorsAttribute, inputDoc, languageCode, languageCodeAttribute, outputAttributes, outputTypes, outputViews, removeBasics, trimInputDoc, view

casJson

If this parameter is set to true, then the attribute specified with the parameter casOut contains the UIMA CAS as serialized JSON string. If this parameter is not specified or set to false, then XMI serialization is used for the CAS output.

Properties
casOut

This parameter specifies the attribute of the output tuples that contains the UIMA CAS as serialized XMI string (or JSON string, see parameter casJson). The output attribute is of type rstring. If this parameter is not specified, the operator expects that the parameter outputAttributes is set.

Properties
debugMode

If this parameter is set to true, then additional information about the execution of a rule script is added to the CAS. The default value of this parameter is set to false.

Properties
errorsAttribute

This parameter specifies the attribute name that contains the reported errors. The output attribute is of type list of rstring. If the error port (output port 1) is specified, then the operator expects that output stream on port 1 contains this attribute. Otherwise the errors attribute is required on output port 0, if this parameter is set.

Properties
inputDoc

This optional parameter specifies the attribute of the input tuples that is passed to the Analytics Engine of UIMA. If there is only one attribute on the input tuple, this parameter is not required.

Properties
languageCode

This optional parameter specifies the ISO language code to be used by UIMA. The default value is en for English.

Properties
languageCodeAttribute

This optional parameter enables the language to be specified on a tuple-by-tuple basis. It specifies the name of the attribute that contains the language code.

Properties
outputAttributes

This parameter specifies the name of tuple attributes on the output port for the annotations. This parameter can be specified more than once. The operator assumes that the views from the parameter outputViews are in the same order as the attribute names in this parameter. If this parameter is not specified, the operator expects that the parameter casOut is set. The attribute must a list type.

Properties
outputTypes

This optional parameter specifies the fully qualified type names to filter the output for a set of types. This parameter can be specified more than once. The output attributes, that are set with the parameter outputAttributes, contain annotation of these types only.

Properties
outputViews

This optional parameter specifies the fully qualified view names to output. This parameter can be specified more than once. The operator assumes that the output tuple attribute names from the parameter outputAttributes are in the same order as the views in this parameter. If this parameter is not specified, the operator expects that the parameter outputAttributes contains a single output tuple attribute only.

Properties
pearFile

This parameter specifies the PEAR file to be installed. The file should be stored in etc directory and can be specified using absolute paths or relative paths. If relative paths, then the PEAR file is relative to the root of the application directory.

Properties
removeBasics

If this parameter is set to true, then all inference annotations are removed and the CAS xmi output does not contain these basic annotations. The default value of this parameter is set to false. This parameter needs to be set only, if the parameter casOut is set.

Properties
trimInputDoc

If this optional parameter is set to false, then trim function is not applied on the input document and leading whitespace characters are not removed. The default value of this parameter is set to true.

Properties
view

This parameter specifies the view of the CAS.

Properties

Libraries

Operator class library
Library Path: ../../impl/lib/com.ibm.streamsx.nlp.jar, ../../impl/lib/apache-uima/uima-core.jar, ../../impl/lib/apache-uima/ruta-core-2.4.0.jar, ../../impl/lib/apache-uima/uimafit-core-2.1.0.jar, ../../impl/lib/apache-uima/antlr-runtime-3.5.2.jar, ../../impl/lib/apache-uima/spring-asm-3.1.2.RELEASE.jar, ../../impl/lib/apache-uima/spring-beans-3.1.2.RELEASE.jar, ../../impl/lib/apache-uima/spring-context-3.1.2.RELEASE.jar, ../../impl/lib/apache-uima/spring-core-3.1.2.RELEASE.jar, ../../impl/lib/apache-uima/spring-expression-3.1.2.RELEASE.jar, ../../impl/lib/apache-uima/commons-lang-2.6.jar, ../../impl/lib/apache-uima/commons-logging-1.1.1.jar, ../../impl/lib/apache-uima/commons-lang3-3.1.jar, ../../impl/lib/apache-uima/commons-collections-3.2.1.jar, ../../impl/lib/apache-uima/commons-io-2.4.jar, ../../impl/lib/apache-uima/commons-math3-3.0.jar, ../../impl/lib/apache-uima/uimaj-json.jar, ../../impl/lib/apache-uima/jackson-core-2.9.5.jar