Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > NgramBasic
This operator generates n-grams of co-occuring words within a given window.
Required: documentAttribute, size
Optional: minSize, wordAttribute
Tuples for N-gram generation
The original argument expression is submitted.
This function generates a list of n-gram term tuples.
Example use:
stream<rstring text,list<tuple<rstring term>> result> A = NgramBasic() { param ... output A : result = NgramTerms(); }
This function generates a list of n-gram terms.
Example use:
stream<rstring text,list<rstring> terms> A = NgramBasic() { param ... output A : terms = NgramTermsList(); }
This function generates a map of n-gram terms and the frequency of the term in the document.
Example use:
stream<rstring text, map<rstring, uint32> ngramMap> A = NgramBasic() { param ... output A : ngramMap = NgramCount(); }
This mandatory output port sends the n-gram tuples from the documents received on input port 0.
Required: documentAttribute, size
Optional: minSize, wordAttribute
The input stream attribute containing the text for n-gram generation. It must be of type SPL::rstring, SPL:list%ltrstring%gt or SPL:list%lttuple%ltrstring word%gt%gt
With this parameter you can define a range of n-grams. This parameter must be smaller or equal to the size parameter, otherwise the parameter has no effect. For example: If the size parameter is 3 and minSize is 1 then unigrams, bigrams and trigrams are generated.
The number of words combined to a new term. When size=1, this is referred to as unigrams and this is essentially the individual words in a sentence. When size=2, this is called bigrams and when size=3 this is called trigrams. When size>3 this is usually referred to as four grams or five grams and so on.
The attribute containing the word if documentAttribute is of type SPL:list. If this parameter is not specified, then the SPL:list tuple must contain an attribute with the name word or the first attribute is of type rstring.