Importing and Exporting Data¶
PyExasol provides many Variants to import data to and export data from an Exasol database. To provide a consistent API, these variant have simple interfaces that follow the same pattern:
utilize the same inputs for generating the
IMPORTandEXPORTstatements. See Exasol Parameters.use the HTTP Transport to reduce the fetching overhead associated with large data sets.
when the functionality of an external API is used (e.g. pandas), the variant will have an argument
callback_paramsso that a user can define custom parameters to the required transformation function.
For more advanced users, check out the documentation on Parallel to parallelize importing or exporting data.
Warning
The Variants for importing and exporting data in Exasol are built on a shared callback pattern involving three concurrent Python threads:
Main Thread: The caller of
import_*orexport_*, which processes data from that pipe.HTTP Thread: Consumes the HTTP[S] stream from the database and writes it to a pipe.
SQL Thread: Executes the actual SQL
IMPORTorEXPORTstatement against the DB.Error Handling Because these threads are interdependent, a failure in one typically triggers secondary exceptions in the others. Since PyExasol version 2.1.0, the library monitors all three sources for exceptions. Any caught exceptions are bundled into a custom exception (
pyexasol.exceptions.ExaImportErrororpyexasol.exceptions.ExaExportError), which also displays the tracebacks.Variable Error Messages: Depending on the exact timing of the failure and the specific binary packages in use (which may introduce their own internal threading), the reported error messages may vary between executions of the same failing task. For instance, an error in the callback function may or may not propagate to the SQL thread when the exception was caught. Thus, between runs, you might catch between 1 to 2 sub-exceptions.
Variants¶
File¶
Export¶
See pyexasol.ExaConnection.export_to_file(). This method supports exporting data
from an Exasol database into a file, path object, file-like object opened in binary
mode, or to a process STDOUT using sys.stdout.buffer.
# Export from file defined with string path
C.export_to_file('my_file.csv', "users")
# Export into STDOUT
C.export_to_file(sys.stdout.buffer, "users")
Import¶
See pyexasol.ExaConnection.import_from_file(). This method supports importing
data from a file, path object, file-like object opened in binary mode, or from a process
STDIN using sys.stdin.buffer.
# Import from file defined with string path
C.import_from_file('/test/my_file.csv', "users")
# Import from path object
C.import_from_file(pathlib.Path('/test/my_file.csv'), "users")
# Import from opened file
file = open('/test/my_file.csv', 'rb')
C.import_from_file(file, "users")
file.close()
# Import from STDIN
C.import_from_file(sys.stdin.buffer, "users")
Iterable¶
Export¶
See pyexasol.ExaConnection.export_to_list().
# Read from SQL
export_list = C.export_to_list("SELECT * FROM users")
# Read from table
export_list = C.export_to_list("users")
Import¶
See pyexasol.ExaConnection.import_from_iterable().
my_list = [
(1, 'Bob', False, '2018-01-01'),
(2, 'Gill', True, '2018-02-01'),
]
C.import_from_iterable(my_list, "users")
# This is suitable for very big INSERTS as long as the generator returns rows
# one-by-one and does not run out of memory.
def my_generator():
for i in range(5):
yield (i, 'Bob', True, '2017-01-01')
C.import_from_iterable(my_generator(), "users")
Pandas¶
Export¶
See pyexasol.ExaConnection.export_to_pandas().
# Read from SQL
pd = C.export_to_pandas("SELECT * FROM users")
# Read from table
pd = C.export_to_pandas("users")
Import¶
See pyexasol.ExaConnection.import_from_pandas().
C.import_from_pandas(pd, "users")
Parquet¶
Export¶
See pyexasol.ExaConnection.export_to_parquet().
# Read from SQL & write to parquet file in dst
C.export_to_parquet(dst=Path("local_empty_directory"), query_or_table="SELECT * FROM users")
# Read from table & write to parquet file in dst
C.export_to_parquet(dst="local_empty_directory", query_or_table="users")
Import¶
See pyexasol.ExaConnection.import_from_parquet().
from pathlib import Path
# list[Path]: list of specific parquet files to load
C.import_from_parquet(source=[Path("local_path/test.parquet")], table="users")
# Path: can be either a file or directory. If it's a directory,
# all files matching this pattern *.parquet will be processed.
C.import_from_parquet(source=Path("local_path/test.parquet"), table="users")
# string: representing a filepath which already contains a glob pattern
C.import_from_parquet(source="local_path/*.parquet", table="users")
Polars¶
Export¶
See pyexasol.ExaConnection.export_to_polars().
# Read from SQL
df = C.export_to_polars("SELECT * FROM users")
# Read from table
df = C.export_to_polars("users")
Import¶
See pyexasol.ExaConnection.import_from_polars().
C.import_from_polars(df, "users")
Write a Custom Variant¶
Note
A full collection of pre-defined callback functions is available in the
pyexasol.callback module. Their usage by the
pyexasol.ExaConnection class is documented in the Variants section.
Export¶
Example of a callback exporting into a basic Python list.
# Define callback function
def export_to_list(pipe, dst, **kwargs):
wrapped_pipe = io.TextIOWrapper(pipe, newline='\n')
reader = csv.reader(wrapped_pipe, lineterminator='\n', **kwargs)
return [row for row in reader]
# Run EXPORT using the defined callback function
C.export_to_callback(export_to_list, None, 'my_table')
Import¶
Example of a callback importing from Pandas into Exasol.
df = <pandas.DataFrame>
def import_from_pandas(pipe, src, **kwargs):
wrapped_pipe = io.TextIOWrapper(pipe, newline='\n')
return src.to_csv(wrapped_pipe, header=False, index=False, quoting=csv.QUOTE_NONNUMERIC, **kwargs)
# Run IMPORT using the defined callback function
C.import_from_callback(import_from_pandas, df, 'my_table')