Operator DictionaryFilter

Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > DictionaryFilter

This operator checks the words of a text against a dictionary. Words are removed from the text if they are found in the dictionary and the filterMode remove is selected. The text will contain words of the dictionary only if filterMode keep is selected.

Summary

Ports
This operator has 2 input ports and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 6 parameters.

Required: dictionaryFile, textAttribute

Optional: commandAttribute, filterMode, ignoreCase, wordAttribute

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

Port that ingests tuples to be manipulated

Properties

Ports (1)

The second input port is the command input port to clear and update the dictionary.

If the command input port exists, the commandAttribute parameter is mandatory. The command input port must have an rstring attribute with a value, also known as the command string, (see the commandAttribute parameter) and a rstring attribute that specifies the word (see the wordAttribute parameter).

Supported commands are

  • clear - Resets the dictionary.
  • add - Add the word that is part of wordAttribute to the dictionary.
A window punctuation marks the end of the dictionary update and must follow the add command.
Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Ports (0)

Port that outputs the modified tuples.

Properties

Parameters

This operator supports 6 parameters.

Required: dictionaryFile, textAttribute

Optional: commandAttribute, filterMode, ignoreCase, wordAttribute

commandAttribute

Specifies the rstring attribute of the optional command input port that holds the command that is described in “Input Ports.”

If you specify a value for this parameter, the optional command input and status output ports are mandatory. If the optional command input port exists, this parameter is mandatory.

Properties

dictionaryFile

Specifies the dictionary file loaded on startup. If relative path is used, then root is application directory. It is recommended to store the file in etc directory.

Words in the dictionary file can be separated by space or new line.

Properties

filterMode

Mode that determines which parts of text is forwarded. If filterMode remove is selected, then all words of the input stream textAttribute are removed in case they exist in the dictionary. This is the default value if filterMode parameter is not specified. In case the filterMode keep is selected, then those words of the textAttribute are forwarded that are stored in the dictionary only.

Properties

ignoreCase

Specifies whether to perform a case sensitive dictionary match or not. The default value is true, meaning that a not case sensitive match is performed.

Properties

textAttribute

Specifies the text attribute of the input stream.

Properties

wordAttribute

Specifies the rstring attribute of the optional command input port that holds the word as described in "Input Ports.”

If this parameter is specified, the optional command input and status output ports are mandatory. If this parameter is omitted and the command input port exists, the read and write commands are not supported.

Properties

Code Templates

DictionaryFilter
stream<${schema}> ${outputStream} = DictionaryFilter(${inputStream}) {
            param
                filterMode : DictionaryFilter.remove;
		textAttribute : text;
		dictionaryFile : "etc/stopwords.csv";
            output
                ${outputStream} : ${outputAttribute} = ${value};
        }