IBMStreams com.ibm.streamsx.hbase Toolkit > com.ibm.streamsx.hbase 3.9.3
The HBase toolkit provides support for interacting with Apache HBase from IBM Streams.
HBase is a Hadoop database, a distributed, scalable, big data store. Tables are partitioned by rows across clusters. A value in an HBase table is accessed by its row, columnFamily, columnQualifier, and timestamp. Usually the timestamp is left out, and only the latest value is returned. The HBase toolkit currently provides no support related to timestamps.
The columnFamily and columnQualifier can collectively be thought of as a column and are sometimes called that in the APIs. The separation of the column into two parts allows for some extra flexibility: the columnFamilies must be defined when the table is established and might be limited, but new columnQualifiers can be added at run time and there is no limit to their number.
For some operators, such as HBASEPut, the row, columnFamily, columnQualifer, and value must all be specified. For other operators, such as HBASEGet and HBASEDelete, the behavior depends on which of those items are specified. The HBASEDelete operator, for example, deletes the whole row if columnFamily and columnQualifier are not specified, but it can also be used to delete only a single value.
The columnFamily and columnQualifier (when relevant) can either be specified as an attribute of the input tuple (columnFamilyAttrName, columnQualifierAttrName), or specified as a single string that is used for all tuples (staticColumnFamily, staticColumnQualifier). The the row and the value (when needed) come from the input tuple.
Except for HBASEIncrement and HBASEGet, the only data types that are currently supported are rstrings. HBASEGet supports getting a value of type long.
The com.ibm.streamsx.hbase uses the same configuration information from the hbase-site.xml file that HBase does. For more information about HBase, see http://hbase.apache.org/.