Toolkit technical background overview

Overview

Apache Spark is a fast general purpose clustering system that is well suited for machine learning algorithms. MLlib is a machine learning library provided with Spark with support for common machine learning algorithms including classification, regression, collaborative filtering and others. The purpose of this project is to allow Spark’s MLlib library to be used for real-time scoring of data in InfoSphere Streams.