Installation and Setup

Edit me

Setup instructions

These are the basic requirements to create Streams applications with Python:

  1. Set up a Streams instance
  2. Set up your development environment(#setup)
  3. Set up a connection to the Streams instance

Set up a Streams instance

The Python API is used to create a Topology, or application that is executed by the Streams runtime. The Streams runtime can be in the public or private cloud or installed locally.

Choose the option that matches your desired Streams runtime, and follow the steps to install and/or configure the Streams instance.

Streams on Cloud Pak for Data (Recommended)

Cloud Pak for Data v3.0+

If you are developing for a Streams instance running in IBM Cloud Pak for Data, make sure you have installed the add-on and provisioned an instance of the service.

Cloud Pak for Data v2.5

You can optionally install Streams as a stand-alone instance.
Watch this video for detailed steps.


See the Installation section in the documentation for more details. </li> </ul>
Streaming Analytics service/CPDaaS

Using the Streaming Analytics service or Cloud Pak for Data as a Service (CPDaaS)

The Streaming Analytics service is Streams' Software as a Service offering. You do not need to install Streams to use it.

Instead, create an instance of the service in the IBM Cloud. When you have an instance of the service, you can create applications that will run on the service using:

  • A notebook in Watson Studio in Cloud Pak for Data as a Service
  • Any IDE to develop your Python applications

Create an instance of the Streaming analytics service

Create an instance of the Streaming Analytics service, in IBM Cloud if you have not already done so: To create a new Streaming Analytics service:
  1. Go to the IBM Cloud web portal and sign in (or sign up for a free IBM Cloud account).
  2. Click Catalog, browse for the Streaming Analytics service, and then click it.
  3. Enter the service name and then click Create to set up your service. The service dashboard opens and your service starts automatically. The service name appears as the title of the service dashboard.
Make sure the service is started: From the service dashboard, click START.

Local installation of Streams

Developing for a local Streams installation

These steps assume that you are installing Python 3.6 from Anaconda on a Linux workstation.
  1. Install version 4.2 or later of IBM Streams or the IBM Streams Quick Start Edition:

  2. (IBM Streams only, doesn't apply to the Quick Start Edition) If necessary, install a supported version of Python. Python 3.5, 3.6 and 3.7 are supported. Python 2.7 support is currently deprecated.

    Important: Python 3.6 is required to build application bundles that can be submitted to your IBM Streams on Cloud Pak for Data or Cloud Pak for Data as a Service.

    You can choose from one of these options:

    • (Recommended) Anaconda

    • CPython: https://www.python.org

      If you build Python from source, remember to pass --enable-shared as a parameter to configure. After installation, set the LD_LIBRARY_PATH environment variable to Python_Install>/lib.

  3. Streams also includes a version of the streamsx package, so to make sure you are using the latest version of streamsx and not the one bundled with Streams, you should either:

    • Remove the PYTHONPATH environment variable, e.g unset PYTHONPATH
    • Or, make sure that PYTHONPATH does not include a path ending with com.ibm.streamsx.topology/opt/python/package.

    Tip: Add the unset PYTHONPATH line to your home-directory/.bashrc shell initialization file. Otherwise, you'll have to enter the command every time you start IBM Streams.

  4. Set the PYTHONHOME application environment variable on your Streams instance by entering the following streamtool command on the command line:

    streamtool setproperty -i <INSTANCE_ID> -d <DOMAIN_ID> --application-ev PYTHONHOME=<path_to_python_install>
    

    For example, if using the Quick Start Edition:

    streamtool setproperty -i StreamsInstance -d StreamsDomain --application-ev PYTHONHOME=/opt/pyenv/versions/3.6.1 --embeddedzk

    You can also set the environment variable from the Streams Console in your service.

  5. Not required for the Quick Start Edition: Configure your Streams instance to use SSH keys instead of password authentication. See the documentation for details.



Note: If your applications are a mix of Python and SPL (Streams Processing Language) code, a local installation of Streams is required.

Set up your development environment

To get your development environment ready:

  1. Install Python on your local development environment. The version of Python you install must be supported by the Streams instance.
  2. Install the streamsx Python package.
  3. Install a Java 1.8 JRE, if you do not already have one.

See the following sections for more information on these steps.

Install a supported version of Python

Make sure you have the right version of Python for your Streams instance:

  • For the Streaming Analytics service in IBM Cloud, use Python 3.6.
  • For a local installation of IBM Streams, Python 3.5, 3.6 or 3.7 are supported.
  • IBM Cloud Pak for Data:
    • The Streams add-on is pre-configured with Python 3.6, so install Python 3.6.
    • For a standalone installation of Streams, make sure you install, at a minimum, the same version of Python installed with Streams.

Install the streamsx package

  1. Use pip to install streamsx:

     pip install streamsx
    

    if streamsx is already installed, upgrade to the latest version:

       pip install --upgrade streamsx
    
  2. Set the JAVA_HOME environment variable to a Java 1.8 JRE or JDK/SDK.

For the most complete instructions regarding installation, including when a local installation of Streams is required, see the developer setup page of the streamsx project documentation.

Set up a connection to the Streams instance

A Streams Python application, or Topology, must always be compiled and run on a Streams instance.

After defining the application, you programmatically submit the Topology to the Streams instance to be compiled and run using the streamsx.topology.context.submit function.

Below is sample code that you can use to connect to the Streams instance and submit your Topology. So copy it now and add it to your Python script or as a cell in your notebook.

You will see an example of how this sample code is used later in this tutorial.

Using a project in Cloud Pak for Data

Submit an application from a notebook in Cloud Pak for Data


In this context you need to provide the name of the Streams instance.
To find your Streams instance name:
  1. From the navigation menu, click Services > Instances.
  2. Select the Streams instance you want to use, and set the value of STREAMS_INSTANCE_ID where indicated in the code.
</li> Copy this code snippet:
from icpd_core import icpd_util
from streamsx.topology import context


def submit_topology(topo):
    streams_instance_name = "sample-streams" ## Change this to Streams instance

    try:
        cfg=icpd_util.get_service_instance_details(name=streams_instance_name, instance_type="streams")
    except TypeError:
        cfg=icpd_util.get_service_instance_details(name=streams_instance_name)

    # Set the deployment space, CPD 3.5+ only
    # cfg[context.ConfigParams.SPACE_NAME] = "myspacename"  
    
    # Disable SSL certificate verification if necessary
    
    cfg[context.ConfigParams.SSL_VERIFY] = False   
    # Topology wil be deployed as a distributed app
    contextType = context.ContextTypes.DISTRIBUTED
    return context.submit (contextType, topo, config = cfg)
Submit to Cloud Pak for Data without a project

Submit without using a Cloud Pak for Data project

Collect the following information. Set the values for each variable where indicated.

  • CP4D_URL - Cloud Pak for Data deployment URL, e.g. https://cp4d_server:31843.

  • STREAMS_INSTANCE_ID:

    1. From the navigation menu, click My instances.
    2. Click the Provisioned Instances tab.
    3. Select the Streams instance you want to use, and set the value of STREAMS_INSTANCE_ID where indicated in the code.
  • STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.

  • STREAMS_PASSWORD - Password for authentication.

See the documenatation or contact your administrator for details.

If you are using a username to authenticate, enter when prompted, otherwise delete that line before running the code.

Copy this code snippet:
import os
import getpass

from streamsx.topology import context

def submit_topology(topo):
     CP4D_URL = "Paste URL here"
     username = input("Streams username")
    password = getpass.getpass("Streams password")
    STREAMS_INSTANCE_ID = "my-instance" # Set instance name
    os.environ["STREAMS_USERNAME"] = username
    os.environ["STREAMS_PASSWORD"] = password
    os.environ["STREAMS_INSTANCE_ID"] = STREAMS_INSTANCE_ID
    os.environ["CP4D_URL"] = CP4D_URL

    cfg ={}
    cfg[context.ConfigParams.SSL_VERIFY] = False
    # This specifies how the application will be deployed
    contextType = context.ContextTypes.DISTRIBUTED
    return context.submit (contextType, topo, config = cfg)
CPDaaS/Streaming Analytics service

Code to submit to the Streaming Analytics service

To connect to the Streaming Analytics service in IBM cloud you need to get the service credentials from the Streaming Analytics service dashboard.

To copy your service credentials, open the Streaming Analytics service dashboard click Service Credentials, then View Credentials, and copy the contents of the cell. Click Add new credentials if there are no credentials listed.

See the image below for an example. Click to enlarge.


Copy this code snippet:
from streamsx.topology.context import ConfigParams
from streamsx.topology import context
import json
import getpass



def submit_topology(topo):
    # This specifies how the application will be deployed
    contextType = context.ContextTypes.STREAMING_ANALYTICS_SERVICE
    cfg  = {}
    SA_credentials=getpass.getpass('Streaming Analytics credentials:')
    cfg[ConfigParams.SERVICE_DEFINITION] = json.loads(SA_credentials)
    cfg[context.ConfigParams.SSL_VERIFY] = False
    
    return context.submit (contextType, topo, config = cfg) 
   
Local install

Code to submit to Streams v4.2 or v4.3

Use these steps if Streams is installed locally, or if you are using the Streams Quick Start Edition (QSE).

If you are using the Streams Quick Start Edition, you do not have to modify the code as it uses the default instance and domain ids.

Otherwise, make sure that the STREAMS_INSTANCE_ID and STREAMS_DOMAIN_ID are set to match your installation.


Copy this code snippet:
import os
from streamsx.topology import context

cfg = {}
cfg[context.ConfigParams.SSL_VERIFY] = False

def submit_topology(topo):
    global cfg
    os.environ["STREAMS_INSTANCE_ID"]= "StreamsInstance" # change as needed
    os.environ["STREAMS_DOMAIN_ID"]= "StreamsDomain" # change as needed
    # These environment variables should be set as needed

    # os.environ["STREAMS_INSTALL"] =  # Location of a IBM Streams installation (4.2 or 4.3).
    # os.environ["STREAMS_ZKCONNECT"] = # (optional) ZooKeeper connection string (when not using an embedded ZooKeeper)
    # os.environ["STREAMS_USERNAME"] = # (optional) User name to submit the job as

    contextType = context.ContextTypes.DISTRIBUTED
    return context.submit (contextType, topo, config = cfg)

    
Streams on Kubernetes/OpenShift

Code to submit to a Standalone Streams installation of Streams v5

In order to submit a Streams application you need the following information from the Streams instance:
  • STREAMS_BUILD_URL: Streams build service URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>
  • STREAMS_REST_URL: Streams SWS service (REST API) URL
  • STREAMS_USERNAME : (optional) User name to submit the job as, defaulting to the current operating system user name.
  • STREAMS_PASSWORD : Password for authentication.

The documentation has the steps to retrieve the URLs for the Build and REST service. Set the values for each variable where indicated in the code.


Copy this code snippet:
import os
import getpass

from streamsx.topology import context
STREAMS_REST_URL = "Paste URL Here" # 
STREAMS_BUILD_URL = # Paste URL here


def submit_topology(topo):
    
    # This specifies how the application will be deployed
    global cfg
    username = input("Streams username")
    password = getpass.getpass("Streams password")
    os.environ["STREAMS_BUILD_URL"]= STREAMS_BUILD_URL
    os.environ["STREAMS_REST_URL"]= STREAMS_REST_URL
    os.environ["STREAMS_USERNAME"]= username
    os.environ["STREAMS_PASSWORD"]= password
    cfg = {}
    cfg[context.ConfigParams.SSL_VERIFY] = False
    contextType = context.ContextTypes.DISTRIBUTED
    return context.submit (contextType, topo, config = cfg)

    

Create your first application

Now you are ready to create your first application with the Streams Python API.

The application will ingest temperature readings from a simulated sensor and compute the rolling average reading for each sensor.

Learn more about the API

After you create your first application, visit the Process data with common Streams transforms section to learn more about the API.