Operator NgramBasic

Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > NgramBasic

This operator generates n-grams of co-occuring words within a given window.

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 4 parameters.

Required: documentAttribute, size

Optional: minSize, wordAttribute

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

Tuples for N-gram generation

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
Functions
<any T> T AsIs(T)

The original argument expression is submitted.

public list<tuple<rstring term>> NgramTerms()

This function generates a list of n-gram term tuples.

Example use:

stream<rstring text,list<tuple<rstring term>> result> A = NgramBasic() {
   param ...
   output A : result = NgramTerms();
}

public list<rstring> NgramTermsList()

This function generates a list of n-gram terms.

Example use:

stream<rstring text,list<rstring> terms> A = NgramBasic() {
   param ...
   output A : terms = NgramTermsList();
}

public map<rstring, uint32> NgramCount()

This function generates a map of n-gram terms and the frequency of the term in the document.

Example use:

stream<rstring text, map<rstring, uint32> ngramMap> A = NgramBasic() {
   param ...
   output A : ngramMap = NgramCount();
}

Ports (0)

This mandatory output port sends the n-gram tuples from the documents received on input port 0.

Properties

Parameters

This operator supports 4 parameters.

Required: documentAttribute, size

Optional: minSize, wordAttribute

documentAttribute

The input stream attribute containing the text for n-gram generation. It must be of type SPL::rstring, SPL:list%ltrstring%gt or SPL:list%lttuple%ltrstring word%gt%gt

Properties

minSize

With this parameter you can define a range of n-grams. This parameter must be smaller or equal to the size parameter, otherwise the parameter has no effect. For example: If the size parameter is 3 and minSize is 1 then unigrams, bigrams and trigrams are generated.

Properties

size

The number of words combined to a new term. When size=1, this is referred to as unigrams and this is essentially the individual words in a sentence. When size=2, this is called bigrams and when size=3 this is called trigrams. When size>3 this is usually referred to as four grams or five grams and so on.

Properties

wordAttribute

The attribute containing the word if documentAttribute is of type SPL:list. If this parameter is not specified, then the SPL:list tuple must contain an attribute with the name word or the first attribute is of type rstring.

Properties