C++ Native Functions: com.ibm.streamsx.nlp.utils

Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp.utils > C++ Native Functions

This page documents native functions that can be invoked from SPL, including the SPL interfaces that can be used to invoke each of the native functions.

Functions

public stateful void generateNgram(list<rstring> words, uint32 minSize, uint32 maxSize, mutable list<rstring> terms)

Generates a set of n-grams of co-occuring words within a given window. For example: If the maxSize parameter is 3 and minSize is 1 then unigrams, bigrams and trigrams are generated.

Parameters
words

Text for n-gram generation

minSize

The number of words combined to a new term

maxSize

The number of words combined to a new term

terms

Generated n-gram output

public stateful void generateNgram(list<rstring> words, uint32 size, mutable list<rstring> terms)

Generates n-grams of co-occuring words within a given window

Parameters
words

Text for n-gram generation

size

The number of words combined to a new term

terms

Generated n-gram output

public stateful rstring getToolkitPath()

Get the toolkit root directory.

public stateful boolean initializeLemmatizer()

Initialize and read lexicon files from default directory (<toolkit_dir>/etc/gposttl)

Returns

On success, true is returned. On error, false is returned.

public stateful boolean initializeLemmatizer(rstring directory)

Initialize and read lexicon files from given directory

Parameters
directory

The directory to the lexicon files

Returns

On success, true is returned. On error, false is returned.

public stateful void lemmatize(rstring text, mutable list<rstring> lemmas)

Lemmatizing of the input text

Parameters
text

Text for lemmatizing

lemmas

Lemma output

public stateful void lemmatize(rstring text, mutable list<rstring> words, mutable list<rstring> pos, mutable list<rstring> lemmas)

PoS-tagging and lemmatizing of the input text

Parameters
text

Text for lemmatizing

words

List of used words of the text as output

pos

Part-of-Speech tag output

lemmas

Lemma output