There are many benefits to contributing to a popular open source project like Docker:
- You earn recognition for improving a project used by many people.
- You get to collaborate with other amazingly smart people in the open source community.
- You become a better programmer yourself through the process of understanding and improving an important system.
But getting started on a new codebase can be daunting. Docker has many, many lines of code. Fixing even the smallest issue can require reading through a lot of that code and understanding how the pieces all fit together.
But it's also not as difficult as you might think. You can follow Docker's contributor guide to get a development environment set up. Then follow these 5 simple steps for diving into a new codebase (with interactive code snippets to guide you along the way). The skills you hone doing so will come in handy on every new project you encounter over the course of your programming life. So what are you waiting for? Here they are:
Step 1: Start at 'func main()'
Start with what you know, as the old saying goes. If you're like most Docker users, you probably mainly use the Docker CLI. So let's start with the entry point into that program: the 'main' function.
For the remainder of this post, we'll use a site called Sourcegraph, which the Docker team uses to search and browse code on the web as you would in an intelligent IDE. To follow along, it may be easiest to open a second browser window to Sourcegraph and hop back and forth between that and this post.
On Sourcegraph, let's search for 'func main()' inside the Docker repository.
We're looking for the 'main' function corresponding to the 'docker' command, which is the one in the 'docker/docker/docker.go' file. Clicking on that search result, we jump to its definition (shown below). Take some time to read through this function:
At the top of the 'main' function, we see a lot of code related to setting up logging, reading command flags, and initializing defaults. At the bottom, we find a call to 'client.NewDockerCli', which seems to be responsible for creating the struct whose methods do all the actual work. Let's issue a search query for 'NewDockerCli'.
Step 2: Get to the core
In many applications and libraries, there's one or two key interfaces that describe the core functionality or essence. Let's try to get there from where we are now.
Clicking on the 'NewDockerCli' search result, we arrive at the definition of the function. Since what we're interested in is the struct that the function returns, 'DockerCli', let's click on the return type to jump to its definition.
Clicking on 'DockerCli' brings us to its definition. Scrolling down through this file, we see its methods, 'getMethod', 'Cmd', 'Subcmd', and 'LoadConfigFile'. 'Cmd' looks noteworthy. It's the only method with a docstring, and the docstring suggests that it's the core method for executing each Docker command.
Step 3: Dive deep
Now that we've found 'DockerCli', the core "controller" of the Docker client, let's dive into how one of the specific Docker commands work. Let's zoom in on 'docker build'.
Reading the implementation of 'DockerCli.Cmd' shows that it calls 'DockerCli.getMethod' to invoke the function corresponding to each Docker command.
In 'DockerCli.getMethod', we see that this is accomplished by the dynamic invocation of a method whose name is the string '"Cmd"' prepended to the name of the Docker command. So in the case of 'docker build', we're looking for 'DockerCli.CmdBuild'. No such method is defined in this file, so let's search for 'CmdBuild'.
Indeed, the search results show there is a method 'CmdBuild' on 'DockerCli', so let's select the result to jump to its definition. The 'DockerCli.CmdBuild' method body is rather long to inline in this blog post, but here it is for reference.
There's a lot going on here. At the top of the method, we see code dealing with a variety of input methods for the Dockerfile and configuration. Oftentimes, a good strategy for reading through a long method is to work backwards. Start at the bottom and look at what the method does at the very end. In many cases, that's the meat of the method and everything before is just setup for completing that core action.
At the bottom of 'CmdBuild', we see a 'POST' request made via 'cli.stream'. Jumping through a few more definitions, we arrive at 'DockerCli.clientRequest', which constructs a HTTP request that contains the information you pass to Docker via 'docker build'. So at the end of the day, all 'docker build' does is issue a fancy 'POST' request to the Docker daemon. You could try replicating its behavior with 'curl' if you really wanted.
Now that we've understood a single Docker client command through and through, you might be interested in diving deeper still and finding where the daemon receives the request and following it all the way down to its interaction with LXC and the kernel. That's certainly a valid route, but we leave that for now as an exercise to the reader. Instead, let's get a broader understanding of the key components of the client.
Step 4: Look at usage examples
One way of better understanding a piece of code is to look at usage examples of how that code is used. Let's go back to the 'DockerCli.clientRequest' method. In the right-hand side panel on Sourcegraph, we can page through usage examples of this method. It turns out this method is used in multiple places, since most of the Docker client commands result in HTTP requests issued to the daemon.
In order to fully understand a piece of code, you need to understand both how it works and how it's used. Jumping to definition lets us understand the former by walking forward along the graph of code, while looking at usage examples covers the latter by walking backwards.
Try this out for a few more functions and methods to understand how they're interconnected. If it's helpful, draw a picture of how various components of the application interact with one another.
Step 5: Select an issue and start coding!
Now that you have a decent picture of the Docker codebase as a whole, take a look at the issue tracker to see what needs working on, and reach out to members of the Docker community with questions you aren't able to answer yourself. Because you've taken the time to explore and understand the code, you'll be better equipped to ask smart questions and know where specific issues fit into the broader picture.
And if you feel up for it, take notes along the way, document your experience, and write it up as a blog post like this one. The Docker team would love to hear about your experience diving into their code.
Contributing effectively
One of the misconceptions that often prevents people from getting involved in projects is being daunted by the task of jumping into a large, foreign codebase. We often assume, as programmers, that the hard work lies in writing code, but often, it's reading and understanding other people's code that is the critical first step. Recognizing that and approaching the task in a principled way, armed with good tools for doing so, will help you conquer the psychological barrier of diving into the code.
So make the leap and check out Docker's source today. A vibrant open source community and codebase awaits you!
1 Comment