DB-API 2.0 Compatibility¶
PyEXASOL is similar to PEP-249 DB-API 2.0 specification, but it does not strictly follow it. This page explains the reasons behind this decision and your alternative(s) if you need or want to use a DBAPI2 compatible driver.
Alternatives¶
Exasol WebSocket Driver¶
The pyexasol
package includes a DBAPI2 compatible driver facade, located in the exasol.driver
package. However, using pyexasol directly will generally yield better performance when utilizing Exasol in an OLAP manner, which is likely the typical use case.
That said, in specific scenarios, the DBAPI2 API can be advantageous or even necessary. This is particularly true when integrating with “DB-Agnostic” frameworks. In such cases, you can just import and use the DBAPI2 compliant facade as illustrated in the example below.
from exasol.driver.websocket.dbapi2 import connect
connection = connect(dsn='', username='sys', password='exasol', schema='TEST')
with connection.cursor() as cursor:
cursor.execute("SELECT 1;")
TurboODBC¶
TurboODBC offers an alternative ODBC-based, DBAPI2-compatible driver, which supports the Exasol database.
Pyodbc¶
Pyodbc provides an ODBC-based, DBAPI2-compatible driver. For further details, please refer to our wiki.
Rationale¶
PEP-249 was originally created for general purpose OLTP row store databases running on a single server: SQLite, MySQL, PostgreSQL, MSSQL, Oracle, etc.
It does not work very well for OLAP columnar databases (like Exasol) running on multiple servers because it was never designed for this purpose. Despite both OLTP DBMS and OLAP DBMS using SQL for communication, the foundation and usage patterns are completely different.
When people use DB-API 2.0 drivers, they tend to skip manuals and automatically apply OLTP usage patterns without even realizing how much they lose in terms of performance and efficiency.
A good example is TurboODBC. Very few know that it is possible to fetch data as NumPy arrays and as Apache Arrow.
Minor intentional incompatibilities with DB-API 2.0 force users to look through the manual and to learn about better ways of getting the job done.
Exasol Specific Problems with DB-API 2.0¶
Default
autocommit=off
prevents indexes from being stored permanently on disk forSELECT
statements;Default
autocommit=off
may hold transaction for a long time (e.g., opened connection in an IPython notebook);Python object creation and destruction overhead is very significant when you process large amounts of data;
Functions
fetchmany()
andexecutemany()
have significant additional overhead related to JSON serialization;Exasol WebSocket protocol provides more information about columns than is normally available in
.description
property ofcursor
;
We also wanted to discourage:
“Drop-in” replacements of other Exasol drivers without reading the manual;
Usage of OLTP-oriented ORM (e.g., SQLAlchemy, Django);
Unlike common OLTP databases, each OLAP database is very unique. It is important to understand implementation details and features of the specific database and to build applications around those features. Generalization of any kind and the “copy-paste” approach may lead to abysmal performance in trivial cases.
Ideas for Migration¶
Find cursor()
calls:
cur = C.cursor()
cur.execute('SELECT * FROM table')
data = cur.fetchall()
Replace with:
st = C.execute('SELECT * FROM table')
data = st.fetchall()
Find .description
columns = list(map(str.lower, next(zip(*cur.description))))
Replace with:
columns = st.column_names()
Find all reads into pandas:
cur.execute('SELECT * FROM table')
pandas.DataFrame(cur.fetchall(), columns=columns)
Replace with:
C.export_to_pandas('SELECT * FROM table')
…etc.