Release Notes (pre 1.0)¶
Note
These release notes are for versions of ibis prior to 1.0. For 1.0 and later release notes see Release Notes.
v0.14.0 (August 23rd, 2018)¶
This release brings refactored, more composable core components and rule system to ibis. We also focused quite heavily on the BigQuery backend this release.
New Features¶
Allow keyword arguments in Node subclasses (#968)
Splat args into Node subclasses instead of requiring a list (#969)
Add support for
UNION
in the BigQuery backend (#1408, #1409)Support for writing UDFs in BigQuery (#1377). See the BigQuery UDF docs for more details.
Support for cross-project expressions in the BigQuery backend. (#1427, #1428)
Add
strftime
andto_timestamp
support for BigQuery (#1422, #1410)Require
google-cloud-bigquery >=1.0
(#1424)Limited support for interval arithmetic in the pandas backend (#1407)
Support for subclassing
TableExpr
(#1439)Fill out pandas backend operations (#1423)
Add common DDL APIs to the pandas backend (#1464)
Implement the
sql
method for BigQuery (#1463)Add
to_timestamp
for BigQuery (#1455)Add the
mapd
backend (#1419)Implement range windows (#1349)
Support for map types in the pandas backend (#1498)
Add
mean
andsum
forboolean
types in BigQuery (#1516)All recent versions of SQLAlchemy are now suppported (#1384)
Add support for
NUMERIC
types in the BigQuery backend (#1534)Speed up grouped and rolling operations in the pandas backend (#1549)
Implement
TimestampNow
for BigQuery and pandas (#1575)
Bug Fixes¶
Nullable property is now propagated through value types (#1289)
Implicit casting between signed and unsigned integers checks boundaries
Fix precedence of case statement (#1412)
Fix handling of large timestamps (#1440)
Fix
identical_to
precedence (#1458)Pandas 0.23 compatibility (#1458)
Preserve timezones in timestamp-typed literals (#1459)
Fix incorrect topological ordering of
UNION
expressions (#1501)Fix projection fusion bug when attempting to fuse columns of the same name (#1496)
Fix output type for some decimal operations (#1541)
API Changes¶
The previous, private rules API has been rewritten (#1366)
Defining input arguments for operations happens in a more readable fashion instead of the previous input_type list.
Removed support for async query execution (only Impala supported)
Remove support for Python 3.4 (#1326)
BigQuery division defaults to using
IEEE_DIVIDE
(#1390)Add
tolerance
parameter toasof_join
(#1443)
v0.13.0 (March 30, 2018)¶
This release brings new backends, including support for executing against files, MySQL, Pandas user defined scalar and aggregations along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Backends¶
New Features¶
Support for Unsigned Integer Types (#1194)
Support for Interval types and expressions with support for execution on the Impala and Clickhouse backends (#1243)
Isnan, isinf operations for float and double values (#1261)
Support for an interval with a quarter period (#1259)
ibis.pandas.from_dataframe
convenience function (#1155)Remove the restriction on
ROW_NUMBER()
requiring it to have anORDER BY
clause (#1371)Add
.get()
operation on a Map type (#1376)Allow visualization of custom defined expressions
Add experimental support for pandas UDFs/UDAFs (#1277)
Generalize the use of the
where
parameter to reduction operations (#1220)Support for interval operations thanks to @kszucs (#1243, #1260, #1249)
Support for the
PARTITIONTIME
column in the BigQuery backend (#1322)Add
arbitrary()
method for selecting the first non null value in a column (#1230, #1309)Windowed
MultiQuantile
operation in the pandas backend thanks to @DiegoAlbertoTorres (#1343)Rules for validating table expressions thanks to @DiegoAlbertoTorres (#1298)
Complete end-to-end testing framework for all supported backends (#1256)
contains
/not contains
now supported in the pandas backend (#1210, #1211)CI builds are now reproducible locally thanks to @kszucs (#1121, #1237, #1255, #1311)
isnan
/isinf
operations thanks to @kszucs (#1261)Framework for generalized dtype and schema inference, and implicit casting thanks to @kszucs (#1221, #1269)
Generic utilities for expression traversal thanks to @kszucs (#1336)
Design documentation for ibis (#1351)
API Changes¶
Fixing #1378 required the removal of the
name
parameter to theparam()
function. Use thename()
method instead.
v0.12.0 (October 28, 2017)¶
This release brings Clickhouse and BigQuery SQL support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Backends¶
New Features¶
Add support for
Binary
data type (#1183)Allow users of the BigQuery client to define their own API proxy classes (#1188)
Add support for HAVING in the pandas backend (#1182)
Add struct field tab completion (#1178)
Add expressions for Map/Struct types and columns (#1166)
Support Table.asof_join (#1162)
Allow right side of arithmetic operations to take over (#1150)
Add a data_preload step in pandas backend (#1142)
expressions in join predicates in the pandas backend (#1138)
Scalar parameters (#1075)
Limited window function support for pandas (#1083)
Implement Time datatype (#1105)
Implement array ops for pandas (#1100)
support for passing multiple quantiles in
.quantile()
(#1094)support for clip and quantile ops on DoubleColumns (#1090)
Enable unary math operations for pandas, sqlite (#1071)
Enable casting from strings to temporal types (#1076)
Allow selection of whole tables in pandas joins (#1072)
Implement comparison for string vs date and timestamp types (#1065)
Implement isnull and notnull for pandas (#1066)
Allow like operation to accept a list of conditions to match (#1061)
Add a pre_execute step in pandas backend (#1189)
Bug Fixes¶
Remove global expression caching to ensure repeatable code generation (#1179, #1181)
Ensure that
DataType
and subclasses hash properly (#1172)Ensure that the pandas backend can deal with unary operations in groupby
(#1182)
Incorrect impala code generated for NOT with complex argument (#1176)
BUG/CLN: Fix predicates on Selections on Joins (#1149)
Don’t use SET LOCAL to allow redshift to work (#1163)
Allow empty arrays as arguments (#1154)
Fix column renaming in groupby keys (#1151)
Ensure that we only cast if timezone is not None (#1147)
Fix location of conftest.py (#1107)
TST/Make sure we drop tables during postgres testing (#1101)
Fix misleading join error message (#1086)
BUG/TST: Make hdfs an optional dependency (#1082)
Memoization should include expression name where available (#1080)
Performance Enhancements¶
Contributors¶
The following people contributed to the 0.12.0 release
$ git shortlog -sn --no-merges v0.11.2..v0.12.0
63 Phillip Cloud
8 Jeff Reback
2 Krisztián Szűcs
2 Tory Haavik
1 Anirudh
1 Szucs Krisztian
1 dlovell
1 kwangin
0.11.0 (June 28, 2017)¶
This release brings initial Pandas backend support along with a number of bug fixes and reliability enhancements. We recommend that all users upgrade from earlier versions of Ibis.
New Features¶
Experimental pandas backend to allow execution of ibis expression against pandas DataFrames
Graphviz visualization of ibis expressions. Implements
_repr_png_
for Jupyter Notebook functionalityAbility to create a partitioned table from an ibis expression
Support for missing operations in the SQLite backend: sqrt, power, variance, and standard deviation, regular expression functions, and missing power support for PostgreSQL
Support for schemas inside databases with the PostgreSQL backend
Appveyor testing on core ibis across all supported Python versions
Add
year
/month
/day
methods todate
typesAbility to sort, group by and project columns according to positional index rather than only by name
Added a
type
parameter toibis.literal
to allow user specification of literal types
Bug Fixes¶
Fix broken conda recipe
Fix incorrectly typed fillna operation
Fix postgres boolean summary operations
Fix kudu support to reflect client API Changes
Fix equality of nested types and construction of nested types when the value type is specified as a string
API Changes¶
Deprecate passing integer values to the
ibis.timestamp
literal constructor, this will be removed in 0.12.0Added the
admin_timeout
parameter to the kudu clientconnect
function
Contributors¶
$ git shortlog --summary --numbered v0.10.0..v0.11.0
58 Phillip Cloud
1 Greg Rahn
1 Marius van Niekerk
1 Tarun Gogineni
1 Wes McKinney
0.8 (May 19, 2016)¶
This release brings initial PostgreSQL backend support along with a number of critical bug fixes and usability improvements. As several correctness bugs with the SQL compiler were fixed, we recommend that all users upgrade from earlier versions of Ibis.
New Features¶
Initial PostgreSQL backend contributed by Phillip Cloud.
Add
groupby
as an alias forgroup_by
to table expressions
Bug Fixes¶
Fix an expression error when filtering based on a new field
Fix Impala’s SQL compilation of using
OR
with compound filtersVarious fixes with the
having(...)
function in grouped table expressionsFix CTE (
WITH
) extraction insideUNION ALL
expressions.Fix
ImportError
on Python 2 whenmock
library not installed
API Changes¶
The deprecated
ibis.impala_connect
andibis.make_client
APIs have been removed
0.7 (March 16, 2016)¶
This release brings initial Kudu-Impala integration and improved Impala and SQLite support, along with several critical bug fixes.
New Features¶
Apache Kudu (incubating) integration for Impala users. See the blog post for now. Will add some documentation here when possible.
Add
use_https
option toibis.hdfs_connect
for WebHDFS connections in secure (Kerberized) clusters without SSL enabled.Correctly compile aggregate expressions involving multiple subqueries.
To explain this last point in more detail, suppose you had:
table = ibis.table([('flag', 'string'),
('value', 'double')],
'tbl')
flagged = table[table.flag == '1']
unflagged = table[table.flag == '0']
fv = flagged.value
uv = unflagged.value
expr = (fv.mean() / fv.sum()) - (uv.mean() / uv.sum())
The last expression now generates the correct Impala or SQLite SQL:
SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '1'
) t0
CROSS JOIN (
SELECT avg(`value`) / sum(`value`) AS `tmp`
FROM tbl
WHERE `flag` = '0'
) t1
Bug Fixes¶
CHAR(n)
andVARCHAR(n)
Impala types now correctly map to Ibis string expressionsFix inappropriate projection-join-filter expression rewrites resulting in incorrect generated SQL.
ImpalaClient.create_table
correctly passesSTORED AS PARQUET
forformat='parquet'
.Fixed several issues with Ibis dependencies (impyla, thriftpy, sasl, thrift_sasl), especially for secure clusters. Upgrading will pull in these new dependencies.
Do not fail in
ibis.impala.connect
when trying to create the temporary Ibis database if no HDFS connection passed.Fix join predicate evaluation bug when column names overlap with table attributes.
Fix handling of fully-materialized joins (aka
select *
joins) in SQLAlchemy / SQLite.
Contributors¶
Thank you to all who contributed patches to this release.
$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
21 Wes McKinney
1 Uri Laserson
1 Kristopher Overholt
0.6 (December 1, 2015)¶
This release brings expanded pandas and Impala integration, including support for managing partitioned tables in Impala. See the new Ibis for Impala Users guide for more on using Ibis with Impala.
The Ibis for SQL Programmers guide also was written since the 0.5 release.
This release also includes bug fixes affecting generated SQL correctness. All users should upgrade as soon as possible.
New Features¶
New integrated Impala functionality. See Ibis for Impala Users for more details on these things.
Improved Impala-pandas integration. Create tables or insert into existing tables from pandas
DataFrame
objects.Partitioned table metadata management API. Add, drop, alter, and insert into table partitions.
Add
is_partitioned
property toImpalaTable
.Added support for
LOAD DATA
DDL using theload_data
function, also supporting partitioned tables.Modify table metadata (location, format, SerDe properties etc.) using
ImpalaTable.alter
Interrupting Impala expression execution with Control-C will attempt to cancel the running query with the server.
Set the compression codec (e.g. snappy) used with
ImpalaClient.set_compression_codec
.Get and set query options for a client session with
ImpalaClient.get_options
andImpalaClient.set_options
.Add
ImpalaTable.metadata
method that parses the output of theDESCRIBE FORMATTED
DDL to simplify table metadata inspection.Add
ImpalaTable.stats
andImpalaTable.column_stats
to see computed table and partition statistics.Add
CHAR
andVARCHAR
handlingAdd
refresh
,invalidate_metadata
DDL options and addincremental
option tocompute_stats
forCOMPUTE INCREMENTAL STATS
.
Add
substitute
method for performing multiple value substitutions in an array or scalar expression.Division is by default true division like Python 3 for all numeric data. This means for SQL systems that use C-style division semantics, the appropriate
CAST
will be automatically inserted in the generated SQL.Easier joins on tables with overlapping column names. See Ibis for SQL Programmers.
Expressions like
string_expr[:3]
now work as expected.Add
coalesce
instance method to all value expressions.Passing
limit=None
to theexecute
method on expressions disables any default row limits.
API Changes¶
ImpalaTable.rename
no longer mutates the calling table expression.
Contributors¶
$ git log v0.5.0..v0.6.0 --pretty=format:%aN | sort | uniq -c | sort -rn
46 Wes McKinney
3 Uri Laserson
1 Phillip Cloud
1 mariusvniekerk
1 Kristopher Overholt
0.5 (September 10, 2015)¶
Highlights in this release are the SQLite, Python 3, Impala UDA support, and an asynchronous execution API. There are also many usability improvements, bug fixes, and other new features.
New Features¶
SQLite client and built-in function support
Ibis now supports Python 3.4 as well as 2.6 and 2.7
Ibis can utilize Impala user-defined aggregate (UDA) functions
SQLAlchemy-based translation toolchain to enable more SQL engines having SQLAlchemy dialects to be supported
Many window function usability improvements (nested analytic functions and deferred binding conveniences)
More convenient aggregation with keyword arguments in
aggregate
functionsBuilt preliminary wrapper API for MADLib-on-Impala
Add
var
andstd
aggregation methods and support in ImpalaAdd
nullifzero
numeric method for all SQL enginesAdd
rename
method to Impala tables (for renaming tables in the Hive metastore)Add
close
method toImpalaClient
for session cleanup (#533)Add
relabel
method to table expressionsAdd
insert
method to Impala tablesAdd
compile
andverify
methods to all expressions to test compilation and ability to compile (since many operations are unavailable in SQLite, for example)
API Changes¶
Impala Ibis client creation now uses only
ibis.impala.connect
, andibis.make_client
has been deprecated
Contributors¶
$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn
55 Wes McKinney
9 Uri Laserson
1 Kristopher Overholt
0.4 (August 14, 2015)¶
New Features¶
Add tooling to use Impala C++ scalar UDFs within Ibis (#262, #195)
Support and testing for Kerberos-enabled secure HDFS clusters
Many table functions can now accept functions as parameters (invoked on the calling table) to enhance composability and emulate late-binding semantics of languages (like R) that have non-standard evaluation (#460)
Add
any
,all
,notany
, andnotall
reductions on boolean arrays, as well ascumany
andcumall
Using
topk
now produces an analytic expression that is executable (as an aggregation) but can also be used as a filter as before (#392, #91)Added experimental database object “usability layer”, see
ImpalaClient.database
.Add
TableExpr.info
Add
compute_stats
API to table expressions referencing physical Impala tablesAdd
explain
method toImpalaClient
to show query plan for an expressionAdd
chmod
andchown
APIs toHDFS
interface for superusersAdd
convert_base
method to strings and integer typesAdd option to
ImpalaClient.create_table
to create empty partitioned tablesibis.cross_join
can now join more than 2 tables at onceAdd
ImpalaClient.raw_sql
method for running naked SQL queriesImpalaClient.insert
now validates schemas locally prior to sending query to cluster, for better usability.Add conda installation recipes
Contributors¶
$ git log v0.3.0..v0.4.0 --pretty=format:%aN | sort | uniq -c | sort -rn
38 Wes McKinney
9 Uri Laserson
2 Meghana Vuyyuru
2 Kristopher Overholt
1 Marius van Niekerk
0.3 (July 20, 2015)¶
First public release. See http://ibis-project.org for more.
New Features¶
Implement window / analytic function support
Enable non-equijoins (join clauses with operations other than
==
).Add remaining string functions supported by Impala.
Add
pipe
method to tables (hat-tip to the pandas dev team).Add
mutate
convenience method to tables.Fleshed out
WebHDFS
implementations: get/put directories, move files, etc. See the full HDFS API.Add
truncate
method for timestamp valuesImpalaClient
can execute scalar expressions not involving any table.Can also create internal Impala tables with a specific HDFS path.
Make Ibis’s temporary Impala database and HDFS paths configurable (see
ibis.options
).Add
truncate_table
function to client (if the user’s Impala cluster supports it).Python 2.6 compatibility
Enable Ibis to execute concurrent queries in multithreaded applications (earlier versions were not thread-safe).
Test data load script in
scripts/load_test_data.py
Add an internal operation type signature API to enhance developer productivity.
Contributors¶
$ git log v0.2.0..v0.3.0 --pretty=format:%aN | sort | uniq -c | sort -rn
59 Wes McKinney
29 Uri Laserson
4 Isaac Hodes
2 Meghana Vuyyuru
0.2 (June 16, 2015)¶
New Features¶
insert
method on Ibis client for inserting data into existing tables.parquet_file
,delimited_file
, andavro_file
client methods for querying datasets not yet available in ImpalaNew
ibis.hdfs_connect
method andHDFS
client API for WebHDFS for writing files and directories to HDFSNew timedelta API and improved timestamp data support
New
bucket
andhistogram
methods on numeric expressionsNew
category
logical datatype for handling bucketed data, among other thingsAdd
summary
API to numeric expressionsAdd
value_counts
convenience API to array expressionsNew string methods
like
,rlike
, andcontains
for fuzzy and regex searchingAdd
options.verbose
option and configurableoptions.verbose_log
callback function for improved query logging and visibilitySupport for new SQL built-in functions
ibis.coalesce
ibis.greatest
andibis.least
ibis.where
for conditional logic (see alsoibis.case
andibis.cases
)nullif
method on value expressionsibis.now
New aggregate functions:
approx_median
,approx_nunique
, andgroup_concat
where
argument in aggregate functionsAdd
having
method togroup_by
intermediate objectAdded group-by convenience
table.group_by(exprs).COLUMN_NAME.agg_function()
Add default expression names to most aggregate functions
New Impala database client helper methods
create_database
drop_database
exists_database
list_databases
set_database
Client
list_tables
searching / listing methodAdd
add
,sub
, and other explicit arithmetic methods to value expressions
API Changes¶
New Ibis client and Impala connection workflow. Client now combined from an Impala connection and an optional HDFS connection
Bug Fixes¶
Numerous expression API bug fixes and rough edges fixed
Contributors¶
$ git log v0.1.0..v0.2.0 --pretty=format:%aN | sort | uniq -c | sort -rn
71 Wes McKinney
1 Juliet Hougland
1 Isaac Hodes
0.1 (March 26, 2015)¶
First Ibis release.
Expression DSL design and type system
Expression to ImpalaSQL compiler toolchain
Impala built-in function wrappers
$ git log 84d0435..v0.1.0 --pretty=format:%aN | sort | uniq -c | sort -rn
78 Wes McKinney
1 srus
1 Henry Robinson