Apache Kafka options for edge applications

Edit me

This document provides different Apache Kafka options available to users developing edge applications.

Apache Kafka allows users to publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. This makes it an effective tool for edge solutions to pass data from sensors and systems close to the edge to a large data processing system in a private or public cloud.

The rest of this document will cover how to install and deploy several flavors of Kafka and how to connect to the those deployments in IBM Streams applications using the streamsx.kafka or streamsx.messagehub toolkit.

Before you begin: Streams application development

Python developers: use the streamsx.kafka Python package regardless of Kafka deployment. Usage of streamsx.eventstreams is not recommended because it is no longer updated.
SPL developers: use the KafkaConsumer and KafkaProducer operators in the streamsx.kafka toolkit for all Kafka deployments other than IBM Event Streams. For users planning to use IBM Event Streams, use the streamsx.messagehub toolkit instead.

Using an existing Kafka deployment

If you already have a Kafka environment ready, use that environment by configuring your KafkaConsumer and KafkaProducer operators and your property file for the Streams application to access the existing Kafka environment.

Python applications: see Connection examples.
SPL applications: see streamsx.kafka samples.

Red Hat AMQ Streams

Red Hat’s version of the Apache Kafka and Strimzi projects which simplifies the process of running Apache Kafka in an OpenShift cluster.

For a full overview of AMQ Streams and Kafka concepts and architecture, see the AMQ Streams overview documentation

Installing and deploying AMQ Streams

AMQ Streams can be installed and deployed in the following environments:

See their respective links for instructions on how to download, install, and deploy AMQ Streams.

Quick Start for OpenShift 4.3

Go to the OperatorHub in the OCP console.
Select the installation mode to be a specific namespace.
Select a namespace where the AMQ Streams deployment will be (e.g. amq-streams).
- If you need to create a new namespace, create a new project via the OCP console, or via CLI by running oc new-project amq-streams.
Click ‘Subscribe’ and wait for the AMQ Streams operator to be installed.
Once installed, click on the operator and click on ‘Create Instance’ for a ‘Kafka’ resource
Edit the YAML to add a route, change the name, or set the storage type of the Kafka or Zookeeper deployments
1. Under .spec.kafka.listeners, add
```
external:
  type: route
```
- The default YAML will create an emphimeral Kafka cluster named ‘my-cluster’
- If you need a persistent cluster, see the AMQ Streams doc for more info.
Once done editing the Kafka YAML, click ‘Create’.
Return back to the AMQ Streams Operator, click on ‘Kafka Topic’, and click ‘Create KafkaTopic’
Set the name, partitions, and config for the topic as desired
Click ‘Create’

Connecting to AMQ Streams

In a terminal, run the commands below to get certificates and keystores necessary to connect:

 # Get Kafka bootstrap route; value will be referred to as <RouteURL> later
 oc get routes my-cluster-kafka-bootstrap -n amq-streams -o=jsonpath='{.status.ingress[0].host}{"\n"}'

 # Extract server public cert, client public cert, and client private key
 oc extract secret/my-cluster-cluster-ca-cert -n amq-streams --keys=ca.crt --to=- > ca.crt
 oc extract secret/my-cluster-client-ca-cert -n amq-streams --keys=ca.crt --to=- > user.crt
 oc extract secret/my-cluster-client-ca -n amq-streams --keys=ca.key --to=- > user.key

Use the methods depending on application type:

Python applications: use the streamsx-kafka-make-properties command to create a properties file.

SPL applications:

Create the truststore and keystore manually:

 keytool -import -trustcacerts -alias root -file ca.crt -keystore truststore.jks -storepass trustpassword -noprompt

 openssl pkcs12 -export -in user.crt -inkey user.key -name client-alias -out ./keystore.pkcs12 -noiter -nomaciter -passout keypassword

 keytool -importkeystore -deststorepass password -destkeystore ./keystore.jks -srckeystore ./keystore.pkcs12 -srcstoretype pkcs12 -srcstorepass keypassword

Populate a kafka.properties file with the following values:

 bootstrap.servers=<RouteURL>
 security.protocol=SSL
 ssl.keystore.type=JKS
 ssl.keystore.password=keypassword
 ssl.key.password=keypassword
 ssl.keystore.location={applicationDir}/etc/keystore.jks
 ssl.endpoint.identification.algorithm=https
 ssl.truststore.type=JKS
 ssl.truststore.password=trustpassword
 ssl.truststore.location={applicationDir}/etc/truststore.jks

Copy the JKS files and kafka.properties to etc/ in the SPL application workspace

For more information, see Using streamsx.kafka with Red Hat AMQ Streams documentation.

Event Streams in IBM Cloud

IBM Event Streams builds on top of open source Apache Kafka to offer enterprise-grade event streaming capabilities.

Provisioning Event Streams

Log in to IBM Cloud or create an account if you do not have one.
Visit Event Streams in the catalog.
Select a region (e.g. Dallas, Frankfurt)
Select a plan (e.g. Lite).
- Important: The Lite plan only allows one topic which may not be enough for some samples to work.
Enter in a service name (e.g. Event Streams for Edge)
Click ‘Create’

Creating credentials and a topic

Go to ‘Service credentials’ in the navigation pane.
Click ‘New credential’.
Give the credential a name so you can identify its purpose later. You can accept the default value.
Give the credential the Manager role so that it can access the topics, and create them if necessary.
Click ‘Add’. The new credential is listed in the table in Service credentials.
For the newly created credentials, click the ‘Copy to clipboard’ icon.
Go to the Topics tab.
Go to ‘Manage’ in the navigation pane.
Click ‘Create a topic’
Name your topic.
Keep the defaults set in the rest of the topic creation, click ‘Next’ and then ‘Create topic’.

For any questions or details regarding Event Streams, see the Event Streams documentation for more information.

Connecting to Event Streams

Use the copied credentials to save them to a file or create a Streams application config with the credentials:

Python applications: see Connecting with the IBM Event Streams cloud service.
SPL applications: see streamsx.messagehub samples for connection configuration options.

Vanilla Apache Kafka

Apache Kafka can be deployed on bare-metal or VM systems as well as Kubernetes, or OpenShift environments. Edge applications can leverage these Kafka installations; however, the edge systems where the edge application will be running must be able to connect to the system or environment where Kafka is running. Additionally, any cloud services or applications that consume Kafka topics must be able to access the system or environment.

Because users should already have access to a Kubernetes-like environment, the following install section will cover Helm charts and Operators to deploy Kafka servers.

Installing and deploying Kafka

Helm

Download the latest Helm 3 release
Follow the instructions for the Bitnami Kakfa Helm charts

Kubernetes Operator

To deploy Kafka using a Kubernetes operator, use the Strimzi Kafka Operator here.

Setup and configuration is nearly identical to AMQ Streams. For full information, visit the Strimzi doc.

Connecting to Kafka

Python applications: see Connection examples.
SPL applications: see streamsx.kafka samples.

What to do next?

Build and test your application using your Streams service instance in Cloud Pak for Data. Once your application is ready to be built as an edge application, see Building an edge application for more information.