Java API to allow creation of streaming applications for IBM Streams by Java & Scala developers.

Overview

IBM® Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. Streams can handle very high data throughput rates, millions of events or messages per second.
With this API Java & Scala developers can build streaming applications that can be executed using IBM Streams, including the processing being distributed across multiple computing resources (hosts or machines) for scalability.

Java Application API

The fundamental building block of a Java Streams application is a {@link com.ibm.streamsx.topology.TStream stream}, which is continuous sequence of tuples (messages, events, records). The API provides the ability to perform some form of processing or analytics on each tuple as it appears on a stream, resulting in a new stream containing tuples representing the result of the processing against the input tuples.
{@link com.ibm.streamsx.topology.Topology#source Source streams} are streams containing tuples from external systems, for example a source stream may be created from a reading messages from a message queue system, such as MQTT. The purpose of a source stream is to bring the external data into the Streams environment, so that it can be processed, analyzed, correlated with other streams, etc.
Streams are terminated using {@link com.ibm.streamsx.topology.TStream#sink sink functions} that typically deliver tuples to external systems, such as real-time dashboards, SMS alerts, databases, HDFS files, etc.

An application is represented by a {@link com.ibm.streamsx.topology.Topology} object containing instances of {@link com.ibm.streamsx.topology.TStream}. The Java interface {@link com.ibm.streamsx.topology.TStream TStream} is a declaration of a stream of tuples, each tuple being an instance of the Java class or interface T. For example {@code TStream} represents a stream where each tuple will be {@code String} object, while, for example, {@code TStream} represents a stream of {@code com.sometelco.switch.CallDetailRecord} tuples. Thus, tuples on streams are Java objects, rather than SPL tuples with an attribute based schema.

Streams are created (sourced), transformed or terminated (sinked) generally though functions, where a function is represented by an instance of a Java class with a single method. Frequently, these functions are implemented by anonymous classes specific to an application, though utility methods may encapsulate one or more functions. Here is an example of filtering out all empty strings from stream {@code s} of type {@code String}


TStream<String> s = ...
TStream<String> filtered = s.filter(new Predicate<String>() {
             @Override
             public boolean test(String tuple) {
                 return !tuple.isEmpty();
             }} );

{@link com.ibm.streamsx.topology.TStream#filter s.filter()} is passed an instance of {@link com.ibm.streamsx.topology.function.Predicate}, and sets up a filter where the output stream {@code filtered} will only contain tuples from the input stream {@code s} if the method {@code test()} returns {@code true}. This implementation of {@code Predicate}, provided as an anonymous class, returns true if the input tuple (a {@code String} object) is not empty. At runtime, every {@code String} tuple that appears on {@code s} will result in a call to {@code test}, if the {@code String} tuple is not empty, then {@code filtered} will contain the {@code String} tuple.

Java 8

With Java 8 lambda expressions or method references can be used to provide the function. Using a lambda expression the above example simplifies to:

TStream<String> s = ...
TStream<String> filtered = s.filter(tuple -> !tuple.isEmpty());


Java 8 is supported by IBM Streams 4.0.1 and later.

Scala Support

Since Scala interoperates with Java classes, applications are implemented in Scala by having the code simply call the Java Application API. A set of implicit conversions are provided to support use of Scala functions in stream transformations. See the Scala documentation under {@code com.ibm.streamsx.topology/doc/scaladoc/index.html}.

The API is provided as as an SPL toolkit {@code com.ibm.streamsx.topology} containing the Java API in {@code lib/com.ibm.streamsx.topology.jar} as well as the SPL operators used to execute the Java functional transformations.

Features

These features are supported:
FeatureReferenceSince
Tuple types are Java and/or Scala objects.{@link com.ibm.streamsx.topology.TStream}1.0
Functional programming, streams are transformed, filtered etc. by functional transformations implemented as Java and/or Scala functions. A Java function is an implementation of interface with a single method, or when using Java 8 a lambda expression or a method reference.{@link com.ibm.streamsx.topology.TStream}1.0
Execution within the Java virtual machine, IBM Streams 4.0.1+ Streams standalone or distributed & IBM Bluemix{@link com.ibm.streamsx.topology.context.StreamsContext}1.0
Pipeline topologies.{@link com.ibm.streamsx.topology.Topology}1.0
Fan-out, multiple independent functions may be applied to a single stream to produce multiple streams of different or the same type.{@link com.ibm.streamsx.topology.TStream}1.0
Fan-in, multiple independent streams of the same type may transformed by a single function to produce a single stream.{@link com.ibm.streamsx.topology.TStream#union union}1.0
Window based aggregation and joins, including partitioning.{@link com.ibm.streamsx.topology.TWindow}1.0
Parallel streams (UDP, User Defined Parallelism), including partitioning.{@link com.ibm.streamsx.topology.TStream#parallel parallel}1.0
Topic based publish-subscribe stream model for cross application communication (Streams dynamic connections). {@link com.ibm.streamsx.topology.TStream#publish publish}, {@link com.ibm.streamsx.topology.Topology#subscribe subscribe}1.0
Ability to specify where portions of the topology will execute in distributed mode, including running on resources (hosts) with specified tags. {@link com.ibm.streamsx.topology.context.Placeable}, {@link com.ibm.streamsx.topology.TStream#isolate isolate}, {@link com.ibm.streamsx.topology.TStream#lowLatency lowLatency}1.0
Integration with Apache Kafka and MQTT messaging systems {@link com.ibm.streamsx.topology.messaging.kafka.KafkaConsumer Kafka}, {@link com.ibm.streamsx.topology.messaging.mqtt.MqttStreams MQTT}1.0
Testing of topologies, including those using SPL operators, while running in distributed, standalone or embedded.{@link com.ibm.streamsx.topology.tester.Tester}1.0
Integration with SPL streams using SPL attribute schemas.{@link com.ibm.streamsx.topology.spl.SPLStream}1.0
Invocation of existing SPL primitive or composite operators.{@link com.ibm.streamsx.topology.spl.SPL}1.0

Samples

A number of sample Java applications are provided under samples. The samples declare a topology and then execute it, the Apache Ant {@code build.xml} file includes some targets for executing the samples, demonstrating the correct class path.
The javadoc for the samples includes the sample source code (click on the class name of a sample), and is also copied into the SPL toolkit for reference, and is available here: Java Functional Samples

Declaring a Topology

Java code is used to create a streaming topology, or graph, starting with the {@link com.ibm.streamsx.topology.Topology} object and then creating instances of {@link com.ibm.streamsx.topology.TStream} by:
Streams are terminated by {@link com.ibm.streamsx.topology.TStream#sink sinks}, typically the tuples are sent to an external system by the sink function.

A topology may be arbitrarily complex, including multiple sources and sinks, fan-out on any stream by having multiple functional transformations or fan-in by creating a {@link com.ibm.streamsx.topology.TStream#union union} of streams with identical tuple types.

Creating the {@code Topology} and its streams as instances of {@code TStream} just declares how tuples will flow, it is not a runtime representation of the graph. The {@code TStream} is submitted to a {@link com.ibm.streamsx.topology.context.StreamsContext} in order to execute the graph.

Java compilation and execution

The API requires these jar files to be in the classpath for compilation and execution:

Testing

The API includes the ability to test topologies, by allow the test program to capture the output tuples of a stream ({@code TStream}) and validate them. This is described in the {@code com.ibm.streamsx.topology.tester} package overview.

Integration with SPL

While the design goal for the API is to not require knowledge of SPL, developers familiar with SPL may also utilize some {@link com.ibm.streamsx.topology.spl.SPL SPL primitive and composite operators} from existing toolkits and use {@link com.ibm.streamsx.topology.spl.SPLStream streams that have SPL schemas}.
@see Integrating SPL operators