Conda build 2.0 has just been released. This marks an important evolution towards much greater test coverage and a stable API. With this release, it’s a great time to revisit some of conda-build’s features and best practices.
Quick Recap of Conda Build
Fundamentally, conda build is a tool to help you package your software so that it is easy to distribute using the conda package manager. Conda build takes input in the form of yaml files and shell/batch scripts (“recipes”), and outputs conda packages. Conda build also includes utilities for quickly generating recipes from external repositories, such as PyPI, CPAN, or CRAN. During each build process, conda build has 4 different phases that occur: rendering, building, post-processing/packaging, and testing. Rendering takes your input meta.yaml file, fills in any jinja2 templates, and applies selectors. The end result is a python object of the MetaData class, as defined in metadata.py. Source code for your package may be downloaded during rendering if it is necessary to provide information for rendering the recipe (for example, if the version is obtained from source, rather than provided in meta.yaml). The build step creates the build environment (also called the build “prefix”), and runs the build.sh (Linux/Mac) or bld.bat (Windows) scripts. Post-processing looks at which files in the build prefix are new - ones that were not there when the build prefix was created. These are the files that are packaged up into the .tar.bz2 file. Other inspection tasks, such as detecting files containing prefixes that need replacement at install time, are also done in the post-processing phase. Finally, the test phase creates a test environment, installs the created package, and runs any tests that you specify, either in meta.yaml, or in run_test.bat (Windows), run_test.sh (Linux, Mac), or run_test.py (all platforms).
Meta.yaml
Meta.yaml is the core of any conda recipe. It describes the package’s name, version, source location, and build/test environment specifications. Full documentation on meta.yaml is at http://conda.pydata.org/docs/building/meta-yaml.html.
Let’s step through the options available to you. We’ll mention Jinja2 templating and selectors a few times in here. If you’re not familiar with these, just ignore them for now. These are described in much greater detail at the end of the article.
Software Sources
Conda build will happily obtain source code from local filesystems, http/https URLs, git repositories, mercurial repositories, and subversion repositories. Syntax for each of these is described at http://conda.pydata.org/docs/building/meta-yaml.html#source-section.
Presently, Jinja2 template variables are populated only for git and mercurial repositories. These are described at http://conda.pydata.org/docs/building/environment-vars.html. Future work will add Jinja2 template variables for the remaining version control systems.
As a general guideline, use tarballs (http/https URLs) with hashes (SHA preferably) where available. Version control system (VCS) tags can be moved to other commits, and your packages are less guaranteed to be repeatable. Failing this, using VCS hash values is also highly repeatable. Finally, with tarballs, it is better to paste a hash provided by your download site than it is to compute it yourself. If the download site does not provide one, you can compute a hash with openssl. Openssl is a requirement of miniconda so it is already available in every conda environment.
openssl dgst -sha256 <path to file>
Build Options
The “build” section of meta.yaml includes options that change some build-related options in conda build. Here you can skip certain platforms, control prefix replacement, exclude the recipe from being packaged, add entry points, and more.
Requirements
In the requirements section, you define conda packages that should be installed before build, and before running your package. It is important to list your requirements for build here, because conda build does not allow you to download requirements using pip. This restriction ensures that builds are easier to reproduce. If you are missing dependencies and pip tries to install them, you will see a traceback.
When you need a particular version of something, you can apply version constraints to your specification. This is often called “pinning.” There are 3 kinds of pinning: exact, “globbing,” and boolean logic. Each pinning is an additional string after the package specification in meta.yaml. For example:
requirements: build: - python 2.7.12
For exact pinning, you specify the exact version you want. This should be used sparingly, as it can quickly make your package over-constrained and hard to install. Globbing uses the * character to allow any sub-version to be installed. For example, with semantic versioning, to allow bug fix releases, one could specify a version such as 1.2.* - no major or minor releases allowed. Not all packages use semantic versioning, though. Finally, boolean expressions of versions are valid. To allow a range of versions, you can use pinnings such as >=1.6.21,<1.7.
There are some packages that need to be defined in a special way. For example, packages that compile with NumPy’s C API need the same version of NumPy at runtime that was used at build time. If your package uses NumPy via Cython, or if any part of your extension code includes numpy.h, then this probably applies to you. The special syntax for NumPy is:
requirements: build: - numpy x.x run: - numpy x.x
There is a lot of discussion around extending this to other packages, because it is common with compiled code to have build time versions determine runtime compatibility. This discussion is active at https://github.com/conda/conda-build/issues/1142 and is slated for the next major conda-build release.
Build strings—that little bit of text in your output package name, like np110py27—is determined by default by the contents of your run requirements. You can change the build string manually in meta.yaml, but doing so disables conda’s automatic addition.
Test
Testing occurs by default automatically after building the package. If the tests fail, the package is moved into the “broken” folder, rather than the normal output folder for your platform.
Tests have been confusing for many people for some time. If your package did not include the test files, it was difficult to figure out how to get your tests to run. Conda build 2.0 adds a new key to the test section, “source_files,” that accepts a list of files and/or folders from your source folder that will be copied from your source folder into your test folder at test time. These specifications are done with Python’s glob, so any glob pattern will work.
test: source_files: - tests - some_important_test_file.txt - data/*.h5
Selectors
Selectors are used to limit part of your meta.yaml file. Selectors exist for Python version, platform, and architecture. Selectors are parsed and applied after jinja2 templates, so you may use jinja2 templates for more dynamic selectors. The full list of available selectors is at http://conda.pydata.org/docs/building/meta-yaml.html#preprocessing-selectors.
Jinja Templating
Templates are not a new feature, but they are not always well understood. Templates are placeholders that are dynamically filled with content when your recipe is loaded by conda build. They are heavily used at conda-forge, where they make updating recipes easier:
{% set version=”1.0.0” %} package: name: my_test_package version: {{ version }} source: url: http://some.url/package-{{ version }}.tar.gz
Using templates this way means that you only have to change the version once, and it applies to multiple places. Jinja templates also support running Python code to do interesting things, such as getting versions from a setup.py file:
{% set data = load_setup_py_data() %} package: name: conda-build-test-source-setup-py-data version: {{ data.get('version') }} # source will be downloaded prior to filling in jinja templates # Example assumes that this folder has setup.py in it source: path_url: ../
The Python code that is actually reading the setup.py file (load_setup_py_data) is part of conda build (jinja_context.py). Presently, we do not have an extension scheme. That will be part of future work, so that users can customize their recipes with their own Python functions.
Binary Prefix Length
A somewhat esoteric aspect of relocatability is that binaries on Linux and Mac have prefix strings embedded in them that tell the binary where to go look for shared libraries. At build time, conda build detects these prefixes, and makes a note of where they are. At install time, conda uses that list to replace those prefixes with the appropriate prefix for the new environment that it is installing into. Historically, the length of these embedded prefixes has been 80 characters. Conda build 2.0 increases this length to 255 characters. Unfortunately, to fully take advantage of this change, all packages that would be installed into an environment need to have been built by conda build 2.0 to have the longer prefix. In practice, this means rebuilding many of the lower-level dependencies. To aid in this effort, conda build has added a tool:
conda inspect prefix-lengths <package path> [more packages] [--min-prefix-length <value, default 255>]
More concretely:
conda inspect prefix-lengths ~/miniconda2/pkgs/*.tar.bz2
This is presently not relevant to Windows, though conda build does now record binary prefixes on Windows, especially for pip-created entry point executables, so that they can function correctly. These entry point executables consist of a program, the prefix, and the entry point script all rolled into a single executable. The prefix length does not matter, because the binary can simply be recreated with any arbitrary prefix by concatenating the pieces together.
Conda Build API
Finally, the other large feature of conda build 2.0 has been the creation of a public API. This is a promise to our users that the interfaces will not change without a bump to the major version number. It is also an opportunity to divide the command line interface into smaller, more testable chunks. The CLI will still be available and users will now have the API as a different, more guaranteed-stable option. The full API is at https://github.com/conda/conda-build/blob/master/conda_build/api.py.
A quick mapping of legacy CLI commands to interesting api functions is the following:
command line interface command | Python API functions |
---|---|
conda build | api.build |
conda build --ouput | api.get_output_file_path |
conda render | api.output_yaml |
conda sign | api.sign, api.verify, api.keygen, api.import_sign_key |
conda skeleton | api.skeletonize; api.list_skeletons |
conda develop | api.develop |
conda inspect | api.test_installable; api.inspect_linkages; api.inspect_objects; api.inspect_prefix_length |
conda index | api.update_index |
conda metapackage | api.create_metapackage |
Implementation Details of Potential Interest
Non-global Config: conda build 1.x used a global instance of the conda_build.config.Config class. This has been replaced by passing a local Config instance across all system calls. This allows for more direct customization of api calls, and obviates the need to create ArgParse namespace objects to interact with conda-build.
Build id and Build folder: conda build 1.x stored environments with other conda environments, and stored the build “work” folder and test work (test_tmp) folder in the conda-bld folder (by default). Conda-build 2.0 assigns a build id to each build, consisting of the recipe name joined with the number of milliseconds since the epoch. While it is theoretically possible for name collision here, it should be unlikely. Both the environments and the work folders have moved into folders named with the build id. Each build is thus self-contained, and multiple builds can run at once (in separate processes). The monotonically increasing build ids facilitate reuse of source with the “--dirty” build option.