Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > RutaText
The Java Operator RutaText uses the Apache UIMA Ruta rules to annotate incoming tuple with text of type rstring. It creates a UIMA CAS from the text, applies the RUTA script to the CAS, serializes the resulting CAS to .xmi and then submits a tuple of type rstring. The RUTA rules, resources and CAS types are expected in a UIMA .pear file. The .pear file gets loaded on operator initialization and reloaded when a window punctuation is received on the second input port. The .pear file is installed in the data directory under 'installedPears<OPERATOR_NAME>'. If data directory is not set, then /tmp is used for installation. Please, find in toolkit dir ./doc/UIMA_workbench.pdf a detailed sample description of Ruta .pear creation. If this operator is used in the Streaming Analytics service (IBM Cloud), then the data directory needs to be set to '/tmp'.
Required: pearFile
Optional: casJson, casOut, debugMode, errorsAttribute, inputDoc, languageCode, languageCodeAttribute, outputAttributes, outputTypes, outputViews, removeBasics, trimInputDoc, view
Port that ingests tuples
Optional control port
Port that produces tuples
Port that reports detected errors
Required: pearFile
Optional: casJson, casOut, debugMode, errorsAttribute, inputDoc, languageCode, languageCodeAttribute, outputAttributes, outputTypes, outputViews, removeBasics, trimInputDoc, view
If this parameter is set to true, then the attribute specified with the parameter casOut contains the UIMA CAS as serialized JSON string. If this parameter is not specified or set to false, then XMI serialization is used for the CAS output.
This parameter specifies the attribute of the output tuples that contains the UIMA CAS as serialized XMI string (or JSON string, see parameter casJson). The output attribute is of type rstring. If this parameter is not specified, the operator expects that the parameter outputAttributes is set.
If this parameter is set to true, then additional information about the execution of a rule script is added to the CAS. The default value of this parameter is set to false.
This parameter specifies the attribute name that contains the reported errors. The output attribute is of type list of rstring. If the error port (output port 1) is specified, then the operator expects that output stream on port 1 contains this attribute. Otherwise the errors attribute is required on output port 0, if this parameter is set.
This optional parameter specifies the attribute of the input tuples that is passed to the Analytics Engine of UIMA. If there is only one attribute on the input tuple, this parameter is not required.
This optional parameter specifies the ISO language code to be used by UIMA. The default value is en for English.
This optional parameter enables the language to be specified on a tuple-by-tuple basis. It specifies the name of the attribute that contains the language code.
This parameter specifies the name of tuple attributes on the output port for the annotations. This parameter can be specified more than once. The operator assumes that the views from the parameter outputViews are in the same order as the attribute names in this parameter. If this parameter is not specified, the operator expects that the parameter casOut is set. The attribute must a list type.
This optional parameter specifies the fully qualified type names to filter the output for a set of types. This parameter can be specified more than once. The output attributes, that are set with the parameter outputAttributes, contain annotation of these types only.
This optional parameter specifies the fully qualified view names to output. This parameter can be specified more than once. The operator assumes that the output tuple attribute names from the parameter outputAttributes are in the same order as the views in this parameter. If this parameter is not specified, the operator expects that the parameter outputAttributes contains a single output tuple attribute only.
This parameter specifies the PEAR file to be installed. The file should be stored in etc directory and can be specified using absolute paths or relative paths. If relative paths, then the PEAR file is relative to the root of the application directory.
If this parameter is set to true, then all inference annotations are removed and the CAS xmi output does not contain these basic annotations. The default value of this parameter is set to false. This parameter needs to be set only, if the parameter casOut is set.
If this optional parameter is set to false, then trim function is not applied on the input document and leading whitespace characters are not removed. The default value of this parameter is set to true.
This parameter specifies the view of the CAS.