Toolkits > com.ibm.streamsx.nlp 1.9.2 > com.ibm.streamsx.nlp > TfIdfWeight
This operator determines how meaningful a term/word in a text is related to a previously trained model (corpus).
Required: corpusFile, defaultIDF, documentAttribute
Optional: nTopWeightedTerms, termAttribute
Tuples for TF-IDF calculation
This input port tuple contains the filename of the corpus file to load. You can also change the defaultIDF value with this control port tuple. Supported tuple attributes are rstring corpusFile and/or float64 defaultIDF.
The original argument expression is submitted.
This function generates a list of terms and corresponding TF-IDF value.
Example use:
stream<rstring text,list<tuple<rstring term, float tfidf>> result> A = TfIdfWeight() { param ... output A : result = WeightedTerms(); }
This function generates a list of terms and corresponding TF-IDF value limited to the number of term specified by the parameter nTopWeightedTerms.
Example use:
stream<rstring text,list<tuple<rstring term, float tfidf>> result> A = TfIdfWeight() { param ... output A : result = TopWeightedTerms(); }
This mandatory output port sends the tf-idf tuples from the documents received on input port 0.
Required: corpusFile, defaultIDF, documentAttribute
Optional: nTopWeightedTerms, termAttribute
Filename of the corpus file read at operator initialization. If relative path is used, then root is application directory. It is recommended to store the file in etc directory.
The IDF value if term is not in the corpus.
The input stream attribute containing the document. It must be of type SPL::rstring, SPL:list%ltrstring%gt or SPL:list%lttuple%ltrstring term%gt%gt
Limits the number of terms in the output list. If this parameter is not specified, then all terms are in the output list. This parameter is relevant for the custom output function TopWeightedTerms() only.
The attribute containing the term if documentAttribute is of type SPL:list. If this parameter is not specified, then the SPL:list tuple must contain an attribute with the name term or the first attribute is of type rstring.