Developing Ubuntu using git

git ubuntu illustration

Back in 2014, I published some information and tooling on using git for Ubuntu development, even though most Ubuntu development wasn’t done with git at the time.

Three years on, this work has expanded significantly. Most of the server team is using git for daily work when touching Ubuntu packaging. We have expanded our tooling. With the significant interest we’ve received, we’re now interested in developing this work to make git become the way of working with Ubuntu’s source code. Our plan is to do this with no disruption to existing developer workflows.

This post is part of a blog series. Here’s an index of all our planned posts. We’ll amend this index and update links as we go.

Developing Ubuntu using git (this post)
git ubuntu clone
More on the imported repositories
Available branches
History and parenting
Repository objects
Rich history
Wrapper subcommands

Why is this so hard?

Most Free Software development projects already use git. So why has Ubuntu taken so long?

Unlike most software projects, Ubuntu (like other distributions) derives its sources from upstreams, rather than being the originator of the software. This means that we do not “control” the git repositories. Repository elements such as branch names, tag names, branching policies and so forth are not up to us; and nor is the choice to use git in the first place.

For git use in Ubuntu development to be effective, we need these repository elements to follow the same schemes across all our packages. But upstream projects use different schemes for their branches, tags and merge workflows. So our task isn’t as trivial as just adopting upstreams’ git trees.

Existing packaging repositories

While Ubuntu makes key changes to packages as needed, the long tail of packages that we ship are derived from Debian with no changes. Debian package maintainers use the VCS system of their choice, which may be nothing, git, or something else like Mercurial or Bazaar. These repositories are arranged in the manner of the package maintainers’ choices. They may be derived from their upstreams’ respective repositories, or they may instead be based on wholesale upstream code “import” commits done when the packaging is updated to a latest upstream release.

Right now, 68% of source packages in Ubuntu are listed as having their packaging maintained in git (whether in Debian or Ubuntu):

Note: the data I have used cannot tell us the difference between a package not being maintained in a VCS and the package maintainer not having declared in metadata that a particular VCS is in use.

Choices for Ubuntu and git

We’re not in a position to mandate that everyone uses git. Even if we did do that for Ubuntu, we cannot expect mandate it in Debian and certainly not in every upstream project in our repositories.

One of the problems we want to solve is to be able to answer the question “where do I git clone from to get the Ubuntu source for package X?”. We don’t want to be forced to say “ah, but for package X, it’s the same as Debian, and they’re using Mercurial in this case, so you can’t git clone you have to use hg clone from this other place”, and then have the answer be different for package Y and different again for package Z. We know this will happen for 3 out of 10 packages. We want to eliminate all the edge cases so that, for git clone against any Ubuntu package, the repository structures are all consistent and all subsequent developer expectations always work.

To achieve this consistency, we need to find a way to use git for all Ubuntu packages: regardless of what VCS Debian or upstreams use for each package and project; and regardless of their different branching, tagging and merging models.

We think we’ve achieved this with our design; more on this in a future post.

Ubuntu, Bazaar, and UDD

Some readers may be familiar with a previous effort in Ubuntu, UDD, which was largely a similar effort but with Bazaar. Nine years later, git has largely won the “VCS wars”, and appears to be preferred by the majority of developers. Our current effort could be seen as “UDD, but with git”, if you wish.

Project goals

We'd like to avoid flag days and forced workflow changes. Ubuntu git integration will develop over time, but we don’t expect Ubuntu developers to be forced to switch to it. We’d prefer for developers to choose to use our integration on its own merits, switching over if and when they feel it appropriate. If, after further consultation with users and Launchpad developers, we did switch to git as the primary source of truth from Launchpad, we expect to be able to wrap for backwards compatibility with dput.

Being central to all code, “moving to git” can be somewhat all-encompassing in terms of desirable use cases. Our original goal was very specific: to make what we call “Ubuntu package merges” easier. I achieved this back in 2014, and the server team has since made big improvements to this particular use case. Now we want to use git for much more, so this necessarily encompasses a wide range of use cases. We have accepted the following use cases as falling within the scope of our project:

For drive-by contributors and new developers

Provide a single place from which any developer can git clone to see the current state of a package across all our releases, and to provide branches against which pull requests can be received.
Make automatic checking of contributions possible via a linter, for contributors to run locally but also run by a bot automatically against pull requests, to tighten the feedback loop where automatic advice is possible.
Simplify and flatten the learning curve by eliminating the need to use some of the arcane tooling that has built up over the decades in the case of simple contributions. Most developers either know git or have many others readily available to teach them git, so we can take advantage of this instead of requiring them to learn pull-lp-source, debdiff, etc.

For routine Ubuntu development

Faster and more accurate “Ubuntu package merges” by using git (already achieved).
Collaborative working for sets of complex package changes, such as SRUs and backports, so that planned changes can be shared, reviewed and amended before upload.

For experienced Ubuntu developers

Automatic linting of contributions to allow contributors to fix some issues directly and immediately themselves, to relieve sponsor and review workload.
All publication history available for debugging, bisection and other general software archeology tasks.
git push to upload to the Ubuntu archives.

Current status

We have an importer running that automatically imports new source package publications into git, so the entire publication history of that package becomes available to git users. Until we’re ready to scale this up, we’re importing a subset of packages from a whitelist, with other packages imported on request for interested developers. You can also run the importer yourself locally on any package.

Tooling is available as an extension to git providing a set of subcommands (git ubuntu ...). The CLI is still experimental and subject to change, and we have a set of high-level subcommands planned that we have yet to write.

Experimental status

If you’re interested, please do take a look! We’d appreciate feedback. However, note that we aren’t “production ready” yet:

There are a number of developer UX issues we’d like to fix before declaring the CLI “stable”.
For scaling reasons, Launchpad needs improved git shared object support before we’re ready for developers to push cloned package repositories en-masse.
We expect to re-run the importer on all packages before declaring ourselves ready, so git commit hashes for our published branches will change until we declare them stable.

What would make a 1.0

Launchpad shared object support.
Hash stability declared.
Developer UX issues fixed.
Anything else? Please let us know what you think should be essential.

The wrapper

On our way, we hit a bunch of edge cases which may confuse developers. Some examples:

An upstream may have placed a .gitattributes file that will unexpectedly “modify” the upstream source ( $Id$ etc) as we add packaging commits.
git will by default convert line endings and suchlike for you; but in packaging work, we want to leave the upstream sources untouched except where we have some reason to explicitly patch them.
The build may depend on empty directories, which git cannot currently represent.

These edge cases can be worked around, often automatically, but this won’t happen when a new developers use git clone directly.

To avoid having to introduce too much at once, we have written a wrapper that handles these edge cases automatically, or at least warns you about them on the occasions that they are relevant. There are also some common repetitive actions that are specific to our workflows; the wrapper also composes these for convenience to save developer time.

We don’t want to mandate use of our wrapper. To better suit advanced developers, we’ve designed everything to be directly accessible without the wrapper, and we consider this method of access to be a first class citizen in our work. We’ll talk more about the wrapper and its capabilities in a future post.

In the next post, we’ll cover details of where the imported repositories are and what they look like.