Best practices for writing Dockerfiles

I chanced upon a nice article covering the use of RUN, CMD and ENTRYPOINT by Jay Schmidt while revisiting the use of Docker. Thus, I have condensed my understanding from the article into a summary of key takeaways, lessons, best practices and personal rules.

NOTE

I am not covering best practices for Docker instruction ordering, layering or image size optimization in this post as I only focus on RUN, ENTRYPOINT and CMD. They should also be considered for advanced users seeking optimal Docker image building.

Difficulties and Challenges

The choice of using RUN, CMD and ENTRYPOINT has always been a constant source of confusion for newcomers to Docker – including me. There have been attempts to explain and clarify their usage – most of which have failed – and they involve careful reading due to the way these instructions has been designed.

To add to the confusion, there is a related issue of having to choose between shell and exec form of writing these instructions, which doubles the potential combination of ways to express an identical idea.

Plus, the lack of consistency between Dockerfile writers and Docker image creators has resulted in some using CMD over ENTRYPOINT. While both may work under certain circumstances, many are left confused which one to choose by default, and why and when one is more appropriate than the other.

This valuable article brings clarity to these issues with an in-depth discussion. A careful reading is recommended if one can spare the time to do so.

Best Practices

Below are the rules-of-thumb to quickly guide me in deciding when to use RUN, CMD or ENTRYPOINT, and when to choose the shell or exec form when writing a Dockerfile.

Let me stress that it is a matter of personal preference and self-discipline when it comes to adopting them.

Rule #1: For running commands to customize a Docker image during build time, use RUN.
Rule #2: For starting a program or script as the main and sole process with PID 1, always use ENTRYPOINT.
Rule #3: For optional command line argument(s) that we expect the user may wish to add or override during runtime, use CMD.
Rule #4: Use shell form only when the command involves shell variable expansion (e.g. for wildcard and name globbing), references to environment or shell variables (e.g. $HOME, $PWD, $USER, $PATH), or piping or redirection (e.g. |, >, >>).
Rule #5: Use exec form by default to run without a subshell, especially for ENTRYPOINT instruction. This ensures the main process is PID 1 that handles signals properly from the Docker host.

Common Mistakes or Bad Practices

Observation #1: Using `CMD` for starting process instead of `ENTRYPOINT`

While it may work, the command can and may be accidentally overridden by users leading to mysterious error from a missing executable or script. The path and name of the executable or script has to be specified by user if overridden, which in all likelihood, a Docker image user would not know unless one digs into the Dockerfile to figure out. For example, when doing a docker run, it has to be run like:

docker run <image> <path to executable/script> [arg1, arg2, ...]

instead of just:

docker run <image> [arg1, arg2, ...]

if ENTRYPOINT is used instead.

Observation #2: Using `shell` form over `exec` form

While it may be convenient for converting shell commands without having to enclose them in brackets and quotes in the exec form, instructions in shell form are run in a subshell that does not handle signals (e.g. SIGTERM)from Docker host properly for a clean and graceful process termination.

Processing Logic in Pseudocode

Deep within Docker, I guess the logic looks like this when it comes to parsing and interpreting the instructions. I have chosen to write in pseudocode for the sake of brevity simply to demonstrate the logic behind it.

string command = ""

if running Docker build:
  if instruction is "RUN":
    if following string tokens are in shell form:
      prepend subshell "/bin/sh -c" to command
      
else if running Docker run:
  
  if instruction is "CMD":
    if following string tokens are in shell form:
      prepend subshell "/bin/sh -c" to command
    append following string tokens to command 
    
  if instruction is "ENTRYPOINT":
    if following string tokens are in shell form:
      prepend subshell "/bin/sh -c" to command
    insert following string token in command before other cmd
    
execute command

Discussion and Recommendations

So the fact that using CMD works is more of a hack or side-effect rather than its intended proper usage by design. However, given that today many Dockerfiles are misusing it, it is a problem that Docker does not wish to correct by enforcing proper use of CMD and ENTRYPOINT. Doing so would probably break many existing Docker images. So, my suggestion would be to start using them in the right way instead, especially when learning to use Docker.

Nobody would stop one from always using CMD and the shell form because they look easy and familiar. The Docker containers would still run and not handling signals may not cause any problem in some situations.

But learning to do things the right way by design from the beginning is always better in my opinion. Feel free to disagree with me, no offence intended. After all, everyone is entitled to their own opinions.

NOTE