Basic’s

The Bucketfs Service

In the On-Prem database, a single bucketfs service can host multiple buckets. In order to interact with a bucketfs service one can use the exasol.bucketfs.Service class.

List buckets

from exasol.bucketfs import Service

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}

bucketfs = Service(URL, CREDENTIALS)
buckets = [bucket for bucket in bucketfs]

Get a Bucket reference

from exasol.bucketfs import Service

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}

bucketfs = Service(URL, CREDENTIALS)
default_bucket = bucketfs["default"]

Bucket class

A Bucket contains a set of files which may be restricted, depending on the credentials of the requester. The Bucket class for an On-Prem database is exasol.bucketfs.Bucket. The correspondent class for a SaaS database is exasol.bucketfs.SaaSBucket. Using these classes the user can interact with the files in the bucket (download, upload, list and delete them).

Most of the examples below are based on the On-Prem implementation of the BucketFS. In the SaaS implementation there is only one BucketFS service, providing a single bucket. To access the BucketFS in SaaS the Bucket object should be created directly, as it is demonstrated in the last example. The interface of the Bucket object for the SaaS database is identical to that of the On-Prem database.

List files in a Bucket

"""
These examples are relevant for the On-Prem Exasol database.
"""
from exasol.bucketfs import Service

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}
bucketfs = Service(URL, CREDENTIALS)

default_bucket = bucketfs["default"]
files = [file for file in default_bucket]

Upload files to a Bucket

"""
These examples are relevant for the On-Prem Exasol database.
"""
import io

from exasol.bucketfs import Service

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}

bucketfs = Service(URL, CREDENTIALS)
bucket = bucketfs["default"]

# Upload bytes
data = bytes([65, 65, 65, 65])
bucket.upload("some/other/path/file2.bin", data)

# Upload file content
with open("myfile2.txt", "rb") as f:
    bucket.upload("destination/path/myfile2.txt", f)

# Upload file like object
file_like = io.BytesIO(b"some content")
bucket.upload("file/like/file1.txt", file_like)

# Upload string content
text = "Some string content"
bucket.upload("string/file1.txt", text.encode("utf-8"))

# Upload generated content
generator = (b"abcd" for _ in range(0, 10))
bucket.upload("string/file2.txt", generator)

Download files from a Bucket

Note

When downloading a file from a bucket it will be provided back to the caller as an iterable set of byte chunks. This keeps the reception efficient and flexible regarding memory usage. Still most users will prefer to get a more tangible object from the download, in that case the bucketfs converters should be used.

Available Converters

  • exasol.bucketfs.as_bytes

  • exasol.bucketfs.as_string

  • exasol.bucketfs.as_hash

  • exasol.bucketfs.as_file

"""
These examples are relevant for the On-Prem Exasol database.
"""
from exasol.bucketfs import (
    Service,
    as_bytes,
    as_file,
    as_string,
)

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}

bucketfs = Service(URL, CREDENTIALS)
bucket = bucketfs["default"]

# Download as raw bytes
data = as_bytes(bucket.download("some/other/path/file2.bin"))

# Download into file
file = as_file(bucket.download("some/other/path/file2.bin"), filename="myfile.bin")

# Download into string
my_utf8_string = as_string(bucket.download("some/utf8/encoded/text-file.txt"))
my_ascii_string = as_string(
    bucket.download("some/other/text-file.txt"), encoding="ascii"
)


Delete files from Bucket

"""
These examples are relevant for the On-Prem Exasol database.
"""
from exasol.bucketfs import Service

URL = "http://localhost:6666"
CREDENTIALS = {"default": {"username": "w", "password": "write"}}

bucketfs = Service(URL, CREDENTIALS)
bucket = bucketfs["default"]

# Delete file from bucket
bucket.delete("some/other/path/file2.bin")

Create bucket object in SaaS

"""
This example is relevant for the Exasol SaaS database.
It demonstrates the creation of a bucket object for a SaaS database.
"""
import os

from exasol.bucketfs import SaaSBucket

# Let's assume that the required SaaS connection parameters
# are stored in environment variables.
bucket = SaaSBucket(
    url=os.environ.get('SAAS_URL'),
    account_id=os.environ.get('SAAS_ACCOUNT_ID'),
    database_id=os.environ.get('SAAS_DATABASE_ID'),
    pat=os.environ.get('SAAS_PAT'),
)

PathLike interface

A PathLike is an interface similar to the pathlib.Path and should feel familiar to most users.

Using the PathLike interface

"""
In this tutorial we will demonstrate the usage of the PathLike interface
with an example of handling customer reviews.
"""
from typing import ByteString
import tempfile
import os

import exasol.bucketfs as bfs

# First, we need to get a path in the BucketFS where we will store reviews.
# We will use the build_path() function for that. This function takes different
# input parameters depending on the backend in use. We will set the type of
# backed to the variable below. Please change it to bfs.path.StorageBackend.saas
# if needed.
backend = bfs.path.StorageBackend.onprem

if backend == bfs.path.StorageBackend.onprem:
    # The parameters below are the default BucketFS parameters of the Docker-DB
    # running on a local machine. Please change them according to the settings of the
    # On-Prem database being used. For better security, consider storing the password
    # in an environment variable.
    reviews = bfs.path.build_path(
        backend=backend,
        url="http://localhost:6666",
        bucket_name='default',
        service_name='bfsdefault',
        path='reviews',
        username='w',
        password='write',
        verify=False
    )
elif backend == bfs.path.StorageBackend.saas:
    # In case of a SaaS database we will assume that the required SaaS connection
    # parameters are stored in environment variables.
    reviews = bfs.path.build_path(
        backend=backend,
        url=os.environ.get('SAAS_URL'),
        account_id=os.environ.get('SAAS_ACCOUNT_ID'),
        database_id=os.environ.get('SAAS_DATABASE_ID'),
        pat=os.environ.get('SAAS_PAT'),
        path='reviews',
    )
else:
    raise RuntimeError(f'Unknown backend {backend}')

# Let's create a path for good reviews and write some reviews there,
# each into a separate file.
good_reviews = reviews / 'good'

john_h_review = good_reviews / 'John-H.review'
john_h_review.write(
    b'I had an amazing experience with this company! '
    b'The customer service was top-notch, and the product exceeded my expectations. '
    b'I highly recommend them to anyone looking for quality products and excellent service.'
)

sarah_l_review = good_reviews / 'Sarah-L.review'
sarah_l_review.write(
    b'I am a repeat customer of this business, and they never disappoint. '
    b'The team is always friendly and helpful, and their products are outstanding. '
    b'I have recommended them to all my friends and family, and I will continue to do so!'
)

david_w_review = good_reviews / 'David-W.review'
david_w_review.write(
    b'After trying several other companies, I finally found the perfect fit with this one. '
    b'Their attention to detail and commitment to customer satisfaction is unparalleled. '
    b'I will definitely be using their services again in the future.'
)

# Now let's write some bad reviews in a different subdirectory.
bad_reviews = reviews / 'bad'

# Previously we provided content as a ByteString. But we can also use a file object,
# as shown here.
with tempfile.TemporaryFile() as file_obj:
    file_obj.write(
        b'I first began coming here because of their amazing reviews. '
        b'Unfortunately, my experiences have been overwhelmingly negative. '
        b'I was billed more than 2,600 euros, the vast majority of which '
        b'I did not consent to and were never carried out.'
    )
    file_obj.seek(0)
    mike_s_review = bad_reviews / 'Mike-S.review'
    mike_s_review.write(file_obj)


# A PathLike object supports an interface similar to the PosixPurePath.
for path_obj in [reviews, good_reviews, john_h_review]:
    print(path_obj)
    print('\tname:', path_obj.name)
    print('\tsuffix:', path_obj.suffix)
    print('\tparent:', path_obj.parent)
    print('\texists:', path_obj.exists())
    print('\tis_dir:', path_obj.is_dir())
    print('\tis_file:', path_obj.is_file())

# The as_udf_path() function returns the correspondent path, as it's seen from a UDF.
print("A UDF can find John's review at", john_h_review.as_udf_path())


# The read() method returns an iterator over chunks of content.
# The function below reads the whole content of the specified file.
def read_content(bfs_path: bfs.path.PathLike) -> ByteString:
    return b''.join(bfs_path.read())


# Like the pathlib.Path class, the BucketFS PathLike object provides methods
# to iterate over the content of a directory.
# Let's use the iterdir() method to print all good reviews.
for item in good_reviews.iterdir():
    if item.is_file():
        print(item.name, 'said:')
        print(read_content(item))


# The walk method allows traversing subdirectories.
# Let's use this method to create a list of all review paths.
all_reviews = [node / file for node, _, files in reviews.walk() for file in files]
for review in all_reviews:
    print(review)


# A file can be deleted using the rm() method. Please note that once the file is
# deleted it won't be possible to write another file to the same path for a certain
# time, due to internal internode synchronisation procedure.
mike_s_review.rm()

# A directory can be deleted using the rmdir() method. If it is not empty we need
# to use the recursive=True option to delete the directory with all its content.
good_reviews.rmdir(recursive=True)

# Now all reviews should be deleted.
print('Are any reviews left?', reviews.exists())

# It may look surprising why a call to the review.exists() returns False, since we
# have not deleted the base directory. In BucketFS a directory doesn't exist as a
# distinct entity. Therefore, the exists() function called on a path for an empty
# directory returns False.

Configure logging

import logging

from exasol.bucketfs import Service

logging.basicConfig(level=logging.INFO)