Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. For maximum performance, turbodbc offers built-in NumPy and Apache Arrow support and internally relies on batched data transfer instead of single-record communication as other popular ODBC modules do.
Building turbodbc with pyarrow support has some caveats as it has build time detection if pyarrow is installed and needs pybind and several debian dev packages to get the C++ compilation.
By using docker multistage builds we can natively build turbodbc with pyarrow support without getting the dev packages into the final image.
First step is the base image that has all necessary debian packages to run turbodbc later on:
# syntax=docker/dockerfile:1FROMdebian:bullseyeasbase# Create user, must not be ROOT and UID should be greater than 1000RUNuseradd--uid1100app--create-homeRUNapt-getupdateRUN--mount=type=cache,target=/var/cache/aptapt-getinstall--yespython3python3-venvgitRUN--mount=type=cache,target=/var/cache/aptapt-getinstall--yeslibodbc1odbcinstodbcinst1debian2binutils-x86-64-linux-gnuRUNpython3-mvenv/opt/venvENVPATH="/opt/venv/bin:${PATH}"WORKDIR/app/ENVPYTHONPATH=/app/
In the second stage we install the build requirements that are only needed to compile turbodbc with arrow support. There are two important notes:
Firstly pyarrow has to be installed before turbodbc is build as the turbodbc build process automatically detects if pyarrow is available.
To make the detection work you need to pass --no-build-isolation
to
the turbodbc install and make sure the arrow libraries are linked correctly.
FROMbaseasbuilderRUN--mount=type=cache,target=/var/cache/aptapt-get-yqinstall \
build-essential \
gdb \
lcov \
libbz2-dev \
libffi-dev \
libgdbm-dev \
liblzma-dev \
libboost-dev \
libncurses5-dev \
libreadline6-dev \
libsqlite3-dev \
libssl-dev \
lzma \
lzma-dev \
python3-dev \
tk-dev \
unixodbc-dev \
uuid-dev \
xvfb \
zlib1g-devRUNpip3install-Upip==22.0.4setuptools==45.2.0wheel==0.37.1RUNpip3install-Upybind11==2.10.1numpy==1.23.5pandas==1.5.2six==1.16.0pyarrow==5.0.0RUNpython3-c"import pyarrow; pyarrow.create_library_symlinks()" \
&&CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"pip3install--no-build-isolationturbodbc==4.5.5
In the third stage we create a fresh stage and only reuse venv with the turbodbc build packages
FROMbaseasrunnerCOPY--from=builder/opt/venv/opt/venvCOPYrequirements.txt/app/requirements.txtRUN--mount=type=cache,target=/root/.cachepipinstall--requirement/app/requirements.txt# Set the User we created aboveUSER1100CMD[]