IBM® Streams is an advanced analytic platform that allows user-developed
applications to quickly ingest, analyze and correlate information as it arrives from
thousands of real-time sources. Streams can handle very high data throughput rates,
millions of events or messages per second.
With this API Java & Scala developers can build streaming applications that can be
executed using IBM Streams, including the processing being distributed
across multiple computing resources (hosts or machines) for scalability.
An application is represented by a
{@link com.ibm.streamsx.topology.Topology}
object containing instances of {@link com.ibm.streamsx.topology.TStream}.
The Java interface {@link com.ibm.streamsx.topology.TStream TStream
Streams are created (sourced), transformed or terminated (sinked) generally though functions, where a function is represented by an instance of a Java class with a single method. Frequently, these functions are implemented by anonymous classes specific to an application, though utility methods may encapsulate one or more functions. Here is an example of filtering out all empty strings from stream {@code s} of type {@code String}
TStream<String> s = ...
TStream<String> filtered = s.filter(new Predicate<String>() {
@Override
public boolean test(String tuple) {
return !tuple.isEmpty();
}} );
{@link com.ibm.streamsx.topology.TStream#filter s.filter()} is passed an instance of
{@link com.ibm.streamsx.topology.function.Predicate}, and sets up a filter
where the output stream {@code filtered} will only contain tuples from the
input stream {@code s} if the method {@code test()} returns {@code true}.
This implementation of {@code Predicate}, provided as an anonymous class,
returns true if the input tuple (a {@code String} object) is not empty.
At runtime, every {@code String} tuple that appears on {@code s} will result in a
call to {@code test}, if the {@code String} tuple is not empty, then {@code filtered} will
contain the {@code String} tuple.
TStream<String> s = ...
TStream<String> filtered = s.filter(tuple -> !tuple.isEmpty());
Since Scala interoperates with Java classes, applications are implemented in Scala by having the code simply call the Java Application API. A set of implicit conversions are provided to support use of Scala functions in stream transformations. See the Scala documentation under {@code com.ibm.streamsx.topology/doc/scaladoc/index.html}.
The API is provided as as an SPL toolkit {@code com.ibm.streamsx.topology} containing the Java API in {@code lib/com.ibm.streamsx.topology.jar} as well as the SPL operators used to execute the Java functional transformations.
Feature | Reference | Since |
---|---|---|
Tuple types are Java and/or Scala objects. | {@link com.ibm.streamsx.topology.TStream} | 1.0 |
Functional programming, streams are transformed, filtered etc. by functional transformations implemented as Java and/or Scala functions. A Java function is an implementation of interface with a single method, or when using Java 8 a lambda expression or a method reference. | {@link com.ibm.streamsx.topology.TStream} | 1.0 |
Execution within the Java virtual machine, IBM Streams 4.0.1+ Streams standalone or distributed & IBM Bluemix | {@link com.ibm.streamsx.topology.context.StreamsContext} | 1.0 |
Pipeline topologies. | {@link com.ibm.streamsx.topology.Topology} | 1.0 |
Fan-out, multiple independent functions may be applied to a single stream to produce multiple streams of different or the same type. | {@link com.ibm.streamsx.topology.TStream} | 1.0 |
Fan-in, multiple independent streams of the same type may transformed by a single function to produce a single stream. | {@link com.ibm.streamsx.topology.TStream#union union} | 1.0 |
Window based aggregation and joins, including partitioning. | {@link com.ibm.streamsx.topology.TWindow} | 1.0 |
Parallel streams (UDP, User Defined Parallelism), including partitioning. | {@link com.ibm.streamsx.topology.TStream#parallel parallel} | 1.0 |
Topic based publish-subscribe stream model for cross application communication (Streams dynamic connections). | {@link com.ibm.streamsx.topology.TStream#publish publish}, {@link com.ibm.streamsx.topology.Topology#subscribe subscribe} | 1.0 |
Ability to specify where portions of the topology will execute in distributed mode, including running on resources (hosts) with specified tags. | {@link com.ibm.streamsx.topology.context.Placeable}, {@link com.ibm.streamsx.topology.TStream#isolate isolate}, {@link com.ibm.streamsx.topology.TStream#lowLatency lowLatency} | 1.0 |
Integration with Apache Kafka and MQTT messaging systems | {@link com.ibm.streamsx.topology.messaging.kafka.KafkaConsumer Kafka}, {@link com.ibm.streamsx.topology.messaging.mqtt.MqttStreams MQTT} | 1.0 |
Testing of topologies, including those using SPL operators, while running in distributed, standalone or embedded. | {@link com.ibm.streamsx.topology.tester.Tester} | 1.0 |
Integration with SPL streams using SPL attribute schemas. | {@link com.ibm.streamsx.topology.spl.SPLStream} | 1.0 |
Invocation of existing SPL primitive or composite operators. | {@link com.ibm.streamsx.topology.spl.SPL} | 1.0 |
A topology may be arbitrarily complex, including multiple sources and sinks, fan-out on any stream by having multiple functional transformations or fan-in by creating a {@link com.ibm.streamsx.topology.TStream#union union} of streams with identical tuple types.
Creating the {@code Topology} and its streams as instances of {@code TStream} just declares how tuples will flow, it is not a runtime representation of the graph. The {@code TStream} is submitted to a {@link com.ibm.streamsx.topology.context.StreamsContext} in order to execute the graph.