Production Docker Image for Apache Airflow
wDr3Y7q2XoI
Intro
What this talk is NOT about?
- Basic container image knowlegde
- Details of CI container image of Airflow
- Details of how Kubernetes Airflow integrate
- “Airflow on Kubernetes” by Michael Hewitt
- Details on deploying Airflow with the image
Who is the talk for?
- You want to deploy Airflow using container images
- You want to contribute to Airflow in Devops area
- You want to learn about best practices of using Airflow Containers
- You are a curious person that want to learn something new
What is a container?
- Standard unit of software
- Packages code and its dependencies
- Lightweight execution package of software
- Container images - binary packages
Container != Docker
- Docker is a command line tool
- Building, Running, Sharing containers
- Docker Engine runs containers
- DockerHub.com is popular container registry
Context: What is Container file
FROM ubuntu:18.04 COPY . /app RUN make /app && make install WORKDIR /bin/project ENTRYPOINT ["/bin/project"] CMD ["--help"]
- Specify base image
- Run commands
- Copy files
- Set working directory
- Define entrypoint
- Define default command
Why containers are important?
- Predictable, consistent development & test environment
- Predictable, consistent execution environment
- Lightweight but isolated: sandboxed view of the OS isolated from others
- Build once: run anywhere
- Kubernetes runs containers natively
- Bridge: “Development → Operations”
Internals
Features of the production image file
- Builds optimised image
- Highly customizable (ARGs)
- Multi segmented (build + main)
## Usage
Extending Airflow image - use released image
docker build . -t yourcompany/airflow:1.10.11-BUILD_Id
FROM apache/airflow:1.10.11
# change to root user temporarily
USER root
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
emacs \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf '/var/lib/apt/lists/*'
# Change back to the airflow user
USER airflow
# Add extra dependencies
RUN pip install --user numpy
# Embed DAGs (Optionally) - DAGs can be baked in but also
# they can be git-synced or mounted from shared volume
COPY --chown=airflow:root dags-folder $(AIRFLOW_HOME)/dags/
Extending image - Pros & Cons
Pros
- Use releases images
- Simple build command
- Own Dockerfile
- No need for Airflow sources
Cons
- Potentially bigger size
- Predefined extras only
- Installs limited set of python dependecies
Customizing Airflow image - default docker build
Customizing Airflow image - use build args
- Installs from PyPi == 1.10.11
- Additional airflow extras, dev, runtime deps …
- Does not use local sources (can be run from master including entrypoint)
It's a Breeze to build images
- Breeze - development and test environment
- Supports building production image
- Auto-complete of options
- New Breeze video showing building production images:
./breeze build-image --help
See BREEZE.rst in the Airflow repo

