Toolkit for real-time scoring using Apache Spark's machine learning library


Apache Spark is a fast general purpose clustering system that is well suited for machine learning algorithms. MLlib is a machine learning library provided with Spark with support for common machine learning algorithms including classification, regression, collaborative filtering and others. The purpose of this project is to allow Spark's MLlib library to be used for real-time scoring of data in InfoSphere Streams.

Getting started

Get started quickly with the Spark MLLib toolkit using the Streams Quick Start Edition VM image available here.

For more information on how to use the toolkit in your Streams applications, refer to the Getting Started guide.

For more information on how to get started with toolkit development, refer to the Getting Started with Development guide.

Documentation for the toolkit is available here.