Content
- Operators
-
- LinearClassification: This operator identifies to which of a set of categories the input text belongs on the basis of a previously trained model.
- Types
-
Composites
composite LinearClassification(output OutStream; input InStream)
This operator identifies to which of a set of categories the input text belongs on the basis of a previously trained model.
The LinearClassification operator should be used in a Streams release where SPL Python primitive support is not provided (e.g. Streams 3.2). It uses a ShellPipe operator to invoke Python scripts. When using a Streams release 4.2 or later, it is recommended to create a SPL Python primitive operator to invoke Python classes or functions.
Parameters
- pythonCommand: The name of the python binary. The default name is python. With this parameter you can change the version and the location of the python command according to your environment. Linear classification scripts need python 2.7 or later!
- pythonScript: The name of the python script. The default is <toolkit_dir>/etc/python/LinearClassification.py.
- modelFilesDirectory: The name of the directory where the model pkl files are located.
- initOnFirstTuple: The script is called on operator startup. If this parameter is set to true, then the script is called on first tuple.
- documentAttribute: The attribute used for the classification
- outStreamType: The OutStream (output port 0) schema of this operator. The schema must contain the schema defined by the resultType.
Input Ports
- InStream: One tuple is one document.
Output Ports
- OutStream: The result tuple with one attribute classes which is the list of classifications/predictions and a second attribute decisions which is a list<float64> with the list of decisions This stream must contain the schema defined by the resultType. Where decisions is the output of the LinearSVC decision_function http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC.decision_function which returns the Confidence scores for one sample and all classes. The confidence score for a sample is the signed distance of that sample to the hyperplane. >0 means this class would be predicted. The decisions list contains a single value if the training was with 2 classes and it contains n values if the training was with n>2 classes Where modelClasses is the list of the unique class names of the trained model. The order of the decisionslist corresponds to the order of the modelClasses list.
Static Types
- LinearClassification.resultType = tuple<list<rstring> classes, list<float64> decisions, list<rstring> modelClasses>;