Operators
- ContentRanking: This operator uses a previously trained model in order to find out the most likely intend of a text.
- ContentRankingModelBuilder: This operator trains a model for ContentRanking based on training documents.
- ContentRankingModelBuilder2: This operator trains a model for ContentRanking based on training documents.
- DictionaryFilter: This operator checks the words of a text against a dictionary.
- IdfCorpusBuilder: This operator trains the model (aka corpus) for TfIdfWight based on training data.
- Lemmatizer: This operator derives dictionary form of words in a text (aka lemma), (e.g. box for boxes), and determines their part-of-speech tags (e.g. NNS for noun, plural).
- LinearClassification: This operator identifies to which of a set of categories the input text belongs on the basis of a previously trained model.
- LinearClassificationModelBuilder: This operator builds a model on the basis of a training set of data whose category membership is known.
- LinearClassificationModelBuilder2: This operator builds a model on the basis of a training set of data whose category membership is known.
- NgramBasic: This operator generates n-grams of co-occuring words within a given window.
- Ngrams: The Ngrams operator implements rolling hash technique and utilizes ngramhashing.
- RutaCas: The Java Operator RutaCas uses the Apache UIMA Ruta rules to annotate incoming tuple with serialized UIMA CAS xmi of type rstring.
- RutaText: The Java Operator RutaText uses the Apache UIMA Ruta rules to annotate incoming tuple with text of type rstring.
- TfIdfWeight: This operator determines how meaningful a term/word in a text is related to a previously trained model (corpus).
- UimaCas: The Java Operator UimaCas uses a Apache UIMA Analysis Engine to annotate incoming tuple with serialized UIMA CAS xmi of type rstring.
- UimaText: The Java Operator UimaText uses a Apache UIMA Analysis Engine to annotate incoming tuple with text of type rstring.
Functions
- countNgrams(rstring, uint32): This function counts each n-gram in the string and places the counter in the result list at the same index as located in the string.
- getNgrams(rstring, uint32): This function counts each n-gram in the string and places the n-gram and the counter in the result map as a key/value pair.