Toolkit technical background overview

IBM Natural Language Processing Toolkit

The streamsx.nlp Toolkit integrates the Text Analytics component of Apache UIMA, which provides a system for extracting information from text data.

This version of the toolkit is intended for use with IBM Streams release 3.2 and later.

Instead of the operator com.ibm.streamsx.nlp::RutaText (UIMA Ruta rules based), you can also use the operator com.ibm.streams.text.analytics::TextExtract (AQL rules based) from the com.ibm.streams.text toolkit.

In addition to the com.ibm.streams.text toolkit the streamsx.nlp toolkit covers features that are useful in the natural language analysis pipeline with functionality: lemmatization, part-of-speech tagging, stop-word-removal, dictionary-filter, n-gram, tf-idf (determine the meaning of a word to the text), classification (which category a text belongs to), content ranking (figure out the intend of a text).