Design Document¶

Motivation¶

Unifying tooling is just a the first step when it comes to reducing cognitive and administrative overhead for maintaining and working on projects. As a natural next step, common development tasks(e.g. CI/CD), the maintenance of those tasks and updating the tooling needs to simplified in order to keep the complexity and development & maintenance effort manageable.

This project serves as such a simplification by providing common dev tooling, task, and configuration based on which common automation (e.g. CI/CD) is provided.

Note

It is obvious that not each project is exactly the same, and we will need to deal with project specifics. Still the basic “Developer Front End” (e.g. build automation tasks, CI, etc.) should look the same, but may have project specific additions, which ideally reuse existing building blocks of this project.

Overview¶

This project mainly serves three main purposes:

Provide library code, scripts and commands for common developer tasks within a python project.
Provide and maintain commonly required functionality for a python project
- Common Projects Tasks
  
  apply code formatters
  
  lint project
  
  type check project
  
  run unit tests
  
  run integration tests
  
  determine code coverage
  
  build-, open-, clean- documentation
  
  creates GitHub Issues for vulnerabilities
- CI (verify PR’s and merges)
- CI/CD (verify and publish releases)
- Build & Publish Documentation (verify and publish documentation)
- Provide and enforce configuration settings (code formatter & co.)
Provide usage examples of this common functionality

Design¶

Design Principles¶

This project needs to be thought of as a development dependency only!
- Library code should not imported/used in non-development code of the projects
Convention over configuration
- Being able to assume conventions reduces the code base/paths significantly
- First thought always should be: Can it be done easily by using/applying convention(s)
- Use configuration if it’s more practical or if it simplifies transitioning projects
Provide extension points (hooks) for project specific behaviour
- If it can’t be a convention or configuration setting
- If having something as a convention or configuration significantly complicates the implementation
- If you have an obvious use case within at least one project
KISS (Keep It Stupid Simple)
- This project shall simplify the work of the developer, not add a burden on top
- Try to automate as much as possible
- Try to built on tools which are already in use
  
  E.g. documentation related issues ideally should be addressed by extending sphinx
Note

It is clear that not everything can and will be automated right from the beginning, but there should be continuous effort to improve the work of the developers.

e.g.:

Template > Generator > Automated Updater
YAGNI (You Ain’t Gonna Need It)
- Only add settings, features, extension points etc. when they are explicitly needed
Note

Every feature needs to have at least one project using it. Still if a feature only is used by a single project it is likely rather done within that project specifically, once a second project requiring it it makes sense to move it into this project.

Having at least two projects using a feature also will more clearly show the commonalities which need to be provided/dealt with.
SoC (Separation of Concerns)
Note

Due the nature of the project different concern will be covered by this project
- Library code
- Tools
- Tasks
- Workflows
- …
Still in order to achieve a specific outcome clear boundaries need to be made/established.

E.g. when it comes to CI/CD, the infrastructure/tool (Github workflows & actions), should only assemble, provide and orchestrate the CI/CD execution. The actual task(s) run by this infrastructure/tool, should be an individual defined task which can be executed on any machine providing the appropriate environment (e.g. make or nox task).
Iteration
Note

Generally we want to use an integrative approach when adding and developing new functionality. E.g.:
1. Add template(s) and instructions
2. Provide tooling to generate files, settings etc.
3. Provide tooling to automagically update und sync files, settings etc.

Design Decisions¶

Whenever possible, tools provided or required by the toolbox should get their configuration from the projects pyproject.toml file.
Whenever a more dynamic configuration is needed, it should be made part of the config object in the projects noxconfig.py file.
The required standard tooling used within the toolbox will obey what has been agreed upon in the Exasol python-styleguide.
For a task runner, the toolbox will be using nox
Warning

Known Issue(s)

Nox tasks should not call (notify) other nox tasks. This can lead to unexpected behaviour due to the fact that the job/task queue will execute a task only once.

Therefore, all functionality, which needs to be re-used, called multiple times calls, or is used by different nox tasks, should be provided by python code (e.g. functions) which receives a nox session as an argument, but the code itself shall not be annotated as a nox session/task (@nox.session).
Note

Nox was chosen as a task runner because:
- It is configured in code
- It is functionality is straightforward and compact
- It is already used by a couple of our projects, so the team is familiar with it
- The author of the toolbox is very familiar with it
That said, no in-depth evaluation of other tools has been done.
Workflows (CI/CD & Co.) will be GitHub Actions-based
- This is the standard tool within the Exasol Integration Team
Workflows only shall provide an execution environment and orchestrate the execution itself

Detailed Design¶

Tasks¶

Todo

Add diagram configuration and tasks (noxfile.py + noxconfig.py + exasol.toolbox)

To view all the defined nox tasks & their definitions use:

poetry run -- nox -l

Workflows¶

Todo

Add diagram of GitHub workflows and interaction

Available Workflows¶

Workflow	Description
checks.yml	Verifies the project consistency (tests, linting, etc.)
build-and-publish.yml	Builds and publishes releases of the project
gh-pages.yml	Builds and publishes the project documentation

Available Actions¶

Action	Description
python-environment	Sets up an appropriate poetry-based python environment
security-issues	Takes a JSON of known vulnerabilities affecting a repo & creates GitHub Issues in said repo for any vulnerabilities, which do not yet have a GitHub Issue

security-issues¶

The security-issues/action.yml creates GitHub Issues for known vulnerabilities for maven and pip-audit. The following steps are taken:

Convert a JSON of known vulnerabilities into a common format (class Issue)
Filter out vulnerabilities which already have an existing GitHub Issue via CVE
Create new GitHub Issues
Return a JSON of the newly created GitHub Issues

Input Variants¶

An input variant would be passed in as a string-encoded JSON.

maven (with ossindex-audit)

{
    "vulnerable": {
        "<package_name>@<package_version>:compile": {
            "coordinates": "<package_name>@<package_version>",
            "description": "<package_description>",
            "reference": "<oss_url_for_vuln>",
            "vulnerabilities": [
                {
                    "id": "<vuln_id>",
                    "displayName": "<vuln_name>",
                    "title": "<vuln_title>",
                    "description": "<vuln_description>",
                    "cvssScore": 7.5,
                    "cvssVector": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H",
                    "cwe": "<cwe_vuln_id>",
                    "cve": "<cve_vuln_id>",
                    "reference": "<oss_url_for_vuln>",
                    "externalReferences": ["<vuln_reference_url>"],
                }
            ],
        },
    }
}

pip-audit (via nox -s dependency:audit)

{
     "dependencies": [
         {
             "name": "<package_name>",
             "version": "<package_version>",
             "vulns":
             [
                 {
                     "id": "<vuln_id>",
                     "fix_versions": ["<fix_version>"],
                     "aliases": ["<vuln_id2>"],
                     "description": "<vuln_description>"
                 }
             ]
         }
     ]
 }

Known Issues¶

The security-issues/action.yml assumes that eventually every known vulnerability will be associated with a singular CVE.

This can be problematic as vulnerabilities may be initially reported to different services and not receive a CVE until a few days later or, in some cases, never. This could mean that some vulnerabilities are initially missed or, in some cases, never propagated by our action.
Additionally, reporting tools like pip-audit must link a vulnerability with the different vulnerability IDs from different reporting services. Typically, this is done by selecting 1 of the vulnerability IDs as the unique identifier of the vulnerability. This, as is the case for pip-audit, may not be the CVE, so it is possible if the linked vulnerability IDs were to change (i.e. wrongly linked CVE) that we could end up with multiple GitHub Issues for the same underlying vulnerability.

Known Issues¶

This section documents flaws, sins, and known issues with the current design and/or its current implementation that were either known upfront or surfaced through the course of implementing it. Additionally, it attempts to explain why certain choices were made at the time, so one can better understand whether it may be reasonable to make changes now or in the future.

Passing files as individual arguments on the CLI¶

Description:

As of today selection of Python files for linting, formatting etc. is done by passing all relevant python files as individual argument(s) to the tools used/invoked by the python toolbox.

Downsides:

Most shells have limitations on the number of arguments and their length.
Noisy output, making it hard to decipher the actual command.
Not ideal for all use cases.

Rationale/History:

The current method of passing files as individual arguments by default offers ease in collection and filtering. It also allows users to simply provide or replace the selection mechanism fairly easily.
Every tool used by the toolbox (e.g., black, isort) used to support passing files by argument. However, not all of them provided the same mechanism for selection or deselection patterns (e.g. “glob”).

Ideas/Solutions:

Develop a wrapper that allows for different selection mechanisms

Inconsistent Naming¶

Description:

The naming is not consistent across the project name (python-toolbox) and the PyPI package name (exasol-toolbox).

Downsides:

Misalignment between the PyPI package name and the project name causes confusion when discussing or referring to the project/package.

Rationale/History:

Initially, this was a proof of concept (POC) to verify a few ideas, and the naming was not well thought out at the time.
Later, when publishing the first package for distribution, the project name was unavailable on PyPI, resulting in a different name being used on PyPI.

Ideas/Solutions:

Consistently rename project to exasol-python-toolbox: Issue-325

Project Configuration¶

Description: Currently, the documentation regarding the configuration of projects using the toolbox has various gaps and does not follow a clear configuration hierarchy or structure.

Downsides:

Multiple scattered configuration points make management and understanding difficult.
Configurations overlap or conflict with unclear priorities.
Tool leakage (e.g., the [isort] section in pyproject.toml). (If everything were done via toolbox config file(s), backing tools could be swapped more easily).

Rationale/History:

Initial decisions aimed to simplify individual adjustments in the projects until we had a better understanding of what needed to be configured.
Scattering configuration across various files and tools was a hasty decision to expedite development and accommodate various tools.

Ideas/Solutions:

Currently used methods to configure toolbox-based projects:

Project configuration: noxconfig.py
Tool-specific configuration files or sections in pyproject.toml
Implementing plugin extension points
Overwriting nox tasks with custom implementations
Replacing with customized workflows of the same name (only applicable for action/workflows)

Refinement:

Centralize all toolbox based configurations in a toolbox config file (noxconfig.py).
Rename the toolbox config file from noxconfig.py to a more appropriate name that reflects its purpose.
Document configuration hierarchy and usage.

Nox Task Runner¶

Description: While Nox isn’t a perfect fit, it still meets most of our requirements for a task runner.

Downsides:

Imports over top-level modules are problematic as all contained tasks are imported.
Passing and receiving additional arguments to a task is clunky.
The default behavior of creating a venv for tasks is undesirable.
Nox does not support grouping.

Rationale/History:

Why Nox was chosen:

No additional language(s) required: There was no need to introduce extra programming languages or binaries, simplifying the development process.
Python-based: Being Python-based, Nox can be extended and understood by Python developers.
Python code: As Nox tasks are defined via Python code, existing scripts can be reused and code can be shared easily.
Simplicity: Nox is relatively “small” in functionality, making it somewhat simple to use and understand.

Ideas/Solutions:

Grouping:

Since Nox doesn’t natively support task grouping, we need a strategy to group commands. Therefore, a naming convention to indicate grouping should be adopted.

Suggestion: Groups will be separated using a : (colon) because - (dash) might already be used within task names.

Imports:

Consider modularizing tasks to handle top-level imports better.

Others Issues:

Generally, one may consider addressing the other issues by choosing another task runner or creating a small set of CLI tools and extension points manually provided by the toolbox.

Poetry for Project Management¶

While poetry was and is a good choice for Exasol project, dependency, build tool etc. “most recently” uv has surfaced and made big advancements. Looking at uv it addresses additional itches with our projects, and, therefore, in the long run, it may be a good idea to migrate our project setups to it. Use poetry for project, build and dependency management.

Code Formatting¶

Description:

Currently, we use Black and Isort for code formatting, though running them on a larger code base as pre-commit hooks or such can take quite a bit of time.

Downsides:

Two tools and an aligned configuration of them are required to cleanly and correctly format the codebase.
Code needs to be processed at least twice as we apply two individual tools.
The performance of Black and Isort is okay but not great compared to other tools.

Rationale/History:

Black and Isort have been used because they are battle-tested and widely used
When we opted for Black and Isort, ruff wasn’t “a thing” yet and at best in its early stages.
Black and Isort already have been known by most python devs when we where selecting the tools

Ideas/Solutions:

As Ruff is fairly stable and also tested and used by many Python projects, we should consider transitioning to it.

Advantages:

Well-tested
Widely used
Excellent performance
Single tool for imports and formatting the codebase
Simplifies adopting ruff for linting

Pylint¶

Description: We are currently using Pylint instead of Ruff.

Downsides:

Pylint is slower and less usable in pre-commit hooks
It is an additional tool, therefore at least one more processing run of the code is required
No support for Language Server Protocol (LSP, e.g. compare to ruff lsp)

Rationale/History:

Well-known
Pylint provides built-in project score/rating
Project score is good for improving legacy code bases which haven’t been linted previously
Plugin support

Ideas/Possible Solutions:

Replacing Pylint with Ruff for linting would provide significant performance improvement. Additionally, Ruff offers an LSP and IDE integrations and is widely used these days. Additionally, there would be an additional synergy if we adopt ruff for formatting the code base.

Transitioning to Ruff requires us to adjust the migration and improvement strategies for our projects:

Currently, our codebase improvements are guided by scores. However, with Ruff, a new approach is necessary. For example, we could incrementally introduce specific linting rules, fix the related issues, and then enforce these rules.
The project rating and scoring system will also need modification. One possibility would be to run Ruff and Pylint in parallel, utilizing Pylint solely for rating and issue resolution while Ruff is incorporated for linting tasks.

Security Linter¶

Description: As of today, the security linter does not fail if it has findings. This was intentionally done to simplify integration and adoption of the tool. Developers can still use the results to improve and find issues within the codebase, and additionally, a rating will be generated to provide some guidance on which projects need attention.

Downsides: - No enforced safeguard on introducing potential security issues

Rationale/History: - Simplify adoption into projects - First step to introduce tooling and make the current state/rating visible

Ideas/Possible Solutions: * Define a strategy to address potential security issues in projects. Once this has been done, enforce the immediate addressing of potential security issues in the codebase upon introduction. * Allow excluding individual findings in projects until they are fixed.

Workflows Dependency Structure¶

Description: Undocumented workflow interdependencies and structure

Downsides: - Hard to customize if one does not understand the overall setup and dependencies

Rationale/History: - Simplify development during the discovery phase (what is needed, how to implement, adjust to discovered needs) - Ideally, all workflows will be integrated and use a standard setup (part of the customization can also be done in the called nox tasks)

Ideas/Possible Solutions:

Define clear requirements and interfaces
Document those requirements and interfaces