Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22907

Peter Hoffmann: Using docker multistage build to build turbodbc with pyarrow support on Debian 11

$
0
0

Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. For maximum performance, turbodbc offers built-in NumPy and Apache Arrow support and internally relies on batched data transfer instead of single-record communication as other popular ODBC modules do.

Building turbodbc with pyarrow support has some caveats as it has build time detection if pyarrow is installed and needs pybind and several debian dev packages to get the C++ compilation.

By using docker multistage builds we can natively build turbodbc with pyarrow support without getting the dev packages into the final image.

First step is the base image that has all necessary debian packages to run turbodbc later on:

# syntax=docker/dockerfile:1FROMdebian:bullseyeasbase# Create user, must not be ROOT and UID should be greater than 1000RUNuseradd--uid1100app--create-homeRUNapt-getupdateRUN--mount=type=cache,target=/var/cache/aptapt-getinstall--yespython3python3-venvgitRUN--mount=type=cache,target=/var/cache/aptapt-getinstall--yeslibodbc1odbcinstodbcinst1debian2binutils-x86-64-linux-gnuRUNpython3-mvenv/opt/venvENVPATH="/opt/venv/bin:${PATH}"WORKDIR/app/ENVPYTHONPATH=/app/

In the second stage we install the build requirements that are only needed to compile turbodbc with arrow support. There are two important notes:

Firstly pyarrow has to be installed before turbodbc is build as the turbodbc build process automatically detects if pyarrow is available.

To make the detection work you need to pass --no-build-isolation to the turbodbc install and make sure the arrow libraries are linked correctly.

FROMbaseasbuilderRUN--mount=type=cache,target=/var/cache/aptapt-get-yqinstall \
    build-essential \
    gdb \
    lcov \
    libbz2-dev \
    libffi-dev \
    libgdbm-dev \
    liblzma-dev \
    libboost-dev \
    libncurses5-dev \
    libreadline6-dev \
    libsqlite3-dev \
    libssl-dev \
    lzma \
    lzma-dev \
    python3-dev \
    tk-dev \
    unixodbc-dev \
    uuid-dev \
    xvfb \
    zlib1g-devRUNpip3install-Upip==22.0.4setuptools==45.2.0wheel==0.37.1RUNpip3install-Upybind11==2.10.1numpy==1.23.5pandas==1.5.2six==1.16.0pyarrow==5.0.0RUNpython3-c"import pyarrow; pyarrow.create_library_symlinks()" \
    &&CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"pip3install--no-build-isolationturbodbc==4.5.5

In the third stage we create a fresh stage and only reuse venv with the turbodbc build packages

FROMbaseasrunnerCOPY--from=builder/opt/venv/opt/venvCOPYrequirements.txt/app/requirements.txtRUN--mount=type=cache,target=/root/.cachepipinstall--requirement/app/requirements.txt# Set the User we created aboveUSER1100CMD[]

Viewing all articles
Browse latest Browse all 22907

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>