A Pelican Blog

More on the imported repositories

2017-08-21T16:53:44+00:00

This is the third post in our series on our git workflow tooling in Ubuntu. There is an index of all our planned posts in the first post. As mentioned there, it is important to keep in mind that the tooling and implementation are still highly experimental.

Nish introduced our imported repositories using the git ubuntu clone wrapper, as well as using git directly.

I'd like to go into a little more detail about what we're actually importing.

Automatic imports watch for new source publications in Launchpad. When they occur, the importer will push new commits and tags to match those publications and move up branch pointers automatically. All branches maintained by the importer are fast-forwarding. A temporary exception to this is during our experimental stage, before we release a 1.0. Until then, we may re-import packages as we work on the importer.

Our importer maintains one repository per source package in lp:~usd-import-team for now. Eventually we expect to make these available under an alias of the form lp:ubuntu/+source/<package>.

The imported trees correctly reflect the ancestry of individual source packages. This includes both Debian and Ubuntu history, with Ubuntu history correctly parented from the appropriate points in Debian's history.

There is a branch corresponding to each Debian and Ubuntu series (eg. sid, stretch, xenial, artful), and in the case of Ubuntu, pocket (eg. trusty-updates, xenial-security, artful-proposed). We’ll cover details on how exactly these work, and the logic behind their design, in future posts.

Single source of truth

A goal of the importer is to reflect the single source of truth, which we define as Launchpad's source publication history. Therefore, the only source of the imported tags and branches must come via the importer and not directly from uploaders. Only the importer can push to the imported repositories. We do have a mechanism to preserve "rich history" provided by git committers that in any other project they'd be able to push directly; more on this in a future post.

Note that this methodology means that upstream's commit graphs are typically not made available by the importer. You won't find individual upstream commits by looking in our imported respository against a particular package. We may decide to make this possible in the future by relying on the "rich history" import mechanism combined with the use of our parenting invariant. More on this in a future post.

We're trying hard to make sure that there is no case where a source package upload can cause the importer to fail. We're treating any failure to import a source package as a bug in our importer. The importer should be able to import anything that was ever uploaded to Debian or Ubuntu.

Available branches

As Nish mentioned, there are two main types of branches:

Devel branches: the branches on which to start development.
Pocket branches: understanding the current state of the archive.

There are also "applied" versions of these branches, which provide quilt patches already applied.

The "devel" branches

Ubuntu developers are used to using pull-lp-source to grab the latest version of a package source on which to base development. We want to make git clone work instead.

What is the correct version on which to base development? In Ubuntu, this can vary depending on the exact publication state. Usually you want the highest version published in the development release, or the highest version published in a particular series for an SRU. For this purpose, the git importer maintains devel branches. For the development release, this is just called ubuntu/devel. For a stable release, it is ubuntu/<codename>-devel; for example: ubuntu/xenial-devel.

This is intended to save developers the effort of figuring out which pocket to use (between the release pocket, proposed, updates and security) and saves scripts from having to figure it out themselves.

One exception to this is in the rare case that a version in the proposed pocket has been discarded, such as a failed SRU. The importer only follows new version publications and does not follow deletions. In this case you may wish to start directly from a pocket branch (described below). Alternatively, before you begin you could start from the devel branch and use git revert to add a commit reversing the published change in line with the deletion. The infrastructure doesn’t care which method you choose.

Pocket branches

A different use case is to look at the git branches to understand what the archive looks like today, or how it looked in the past.

Ubuntu’s pockets

First let’s review what Ubuntu pockets actually are.

In Ubuntu, the archive for a particular release (“series”) is divided up into a number of “pockets”, which correspond to the entries in your sources.list file:

The release pocket. This pocket has no specific name. During development, the release pocket is where package eventually land. Upon release, this pocket is frozen and never changes again.
The “proposed” pocket. This pocket is used for two different purposes depending on whether the series is in development or has been released. During development, this is the staging area in which packages are built, tested and checked for installability before they land. When all tests pass, packages are moved into the release pocket in a process called proposed migration. After release, the proposed pocket is used for a different purpose: manual user verification of proposed stable release updates through the SRU verification process. When a package passes SRU verification, the package is moved into the updates pocket, rather than the release pocket is as the case during development.
The “updates” pocket. Only used for stable releases, this pocket is used to issue recommended updates to users during the lifetime of a stable release.
The “security” pocket. Only used for stable releases, this pocket is used to issue security updates to users during the lifetime of a stable release. Updates to the security pocket are always also copied into the updates pocket.

The name of a pocket is in the form <series>-<type>, such as xenial-security and artful-updates. As an exception, the release pocket is the series with no suffix, such as trusty.

Importing pockets into git

The importer maintains a branch for each pocket, in the form ubuntu/<pocket name>, such as ubuntu/trusty (the Trusty release pocket), ubuntu/xenial-security, ubuntu/artful-proposed and so on.

Combining pocket branches: the devel branches

To support the archeology use case, the importer must maintain these pocket branches. However, for the ongoing development use case, it's far more convenient for developers to use our "devel branches".

We do this by making use of our "parenting invariant", which we’ll describe in more detail in a future post. For now, it’s sufficient to understand that the devel branches are the points at which you should start development, and the pocket branches can be used to examine the current state of the various pockets in the Ubuntu archive.

Applied branches

For our drive-by contributor use case, it would be nice if these drive-by contributors didn't have to understand quilt. It's reasonable to expect that, straight after running git clone from Ubuntu and selecting an appropriate branch, the tree that appears is exactly the source used to build the corresponding Ubuntu package. For drive-by contributors unfamiliar with Debian packaging, this means that quilt patches should appear applied. Our "applied" branches and tags provide this function.

The non-applied branches are the normative imports from Debian and Ubuntu developer uploads. They are intended to represent the source package exactly and without any derived components. This means that quilt patches appear in debian/patches/ as normal, but are not applied.

From one of these import commits, all quilt patches (if any) are applied one by one. Each application results in a separate git commit. A final commit with an identical tree is added for branch fast-forwarding purposes, and these commits form the line of the "applied" branch.

Consequently, you can switch to an "applied" branch, or corresponding tag, and you'll see the state of the Ubuntu source package with all patches applied and as if quilt doesn't exist. Drive-by contributors can then file pull requests against such a branch (eg. applied/ubuntu/devel) and bots and sponsors will be able to understand exactly what is being requested.

It will often be necessary to pull out requested changes into a separate quilt patch, add dep3 headers, and possibly squash the requested change into an existing quilt patch, before uploading. We hope to automate some of this in the future, but for now this is left as a task for sponsors who accept git workflow upload requests. Contributors can either submit pull requests against the applied branches this way, or fold up into quilt patches themselves and submit pull requests against the non-applied branches instead.

In the future, it may be desirable for all Ubuntu developers to forget quilt and live entirely inside git-based patchsets. For the time being, we want to support both traditional and git-based workflows. So both applied and non-applied branches are maintained by the importer.

It is a matter of debate as to whether the non-applied or applied branches should appear by default. We are open to further discussion on this.

Available tags

The first time the importer imports a new version of a source package, it tags it using an "import tag", which is of the form import/<version>. The patches-applied equivalent is derived, and the result is tagged in the form applied/<version>.

Pocket copies, such as from Debian into Ubuntu via autosync, are not tagged; the pocket branches are moved forward with new commits (with identical trees) as necessary.

Since not all valid package version strings are valid git tag names, the tag names are escaped using the same rules as specified in dep14, Debian's recommendation on git repository layouts.

We use tags of the form upload/<version> to supply rich history for adoption into the imported commit graph. This cannot be pushed directly; more on this in a future post.

To support the preservation of orig tarballs using pristine tar, the importer also tags upstream/<version>.

pristine-tar and the dsc branches

So that the git repositories contain all the information needed to reconstruct an imported source package, the importer also stores the orig tarballs using pristine-tar and the signed dsc file in separate branches. Orig tarballs are automatically extracted on package build as needed by the git ubuntu build wrapper. This means that git ubuntu clone followed by git ubuntu build should Just Work.

Orig tarballs between Debian and Ubuntu may vary in exceptional cases, so the importer keeps these properly namespaced in debian/pristine-tar and ubuntu/pristine-tar branches to avoid collisions, and likewise for the dsc files debian/dsc and ubuntu/dsc. pristine-tar doesn't currently support multiple branches; it assumes a single branch of pristine-tar. Our git ubuntu build wrapper works around this for now. In the future I'd like for us to drive getting parameterised branch name support into pristine-tar upstream to support our use case.

Conclusion

In order to provide a full understanding of what we're doing to those developers interested in all the detail, I've tried to cover all the git reference objects presented by our importer in this post today.

In our next post, Nish will continue discussing the git ubuntu tooling by introducing git ubuntu tag.

Developing Ubuntu using git

2017-07-24T16:29:35+00:00

Back in 2014, I published some information and tooling on using git for Ubuntu development, even though most Ubuntu development wasn’t done with git at the time.

Three years on, this work has expanded significantly. Most of the server team is using git for daily work when touching Ubuntu packaging. We have expanded our tooling. With the significant interest we’ve received, we’re now interested in developing this work to make git become the way of working with Ubuntu’s source code. Our plan is to do this with no disruption to existing developer workflows.

This post is part of a blog series. Here’s an index of all our planned posts. We’ll amend this index and update links as we go.

Developing Ubuntu using git (this post)
git ubuntu clone
More on the imported repositories
Available branches
History and parenting
Repository objects
Rich history
Wrapper subcommands

Why is this so hard?

Most Free Software development projects already use git. So why has Ubuntu taken so long?

Unlike most software projects, Ubuntu (like other distributions) derives its sources from upstreams, rather than being the originator of the software. This means that we do not “control” the git repositories. Repository elements such as branch names, tag names, branching policies and so forth are not up to us; and nor is the choice to use git in the first place.

For git use in Ubuntu development to be effective, we need these repository elements to follow the same schemes across all our packages. But upstream projects use different schemes for their branches, tags and merge workflows. So our task isn’t as trivial as just adopting upstreams’ git trees.

Existing packaging repositories

While Ubuntu makes key changes to packages as needed, the long tail of packages that we ship are derived from Debian with no changes. Debian package maintainers use the VCS system of their choice, which may be nothing, git, or something else like Mercurial or Bazaar. These repositories are arranged in the manner of the package maintainers’ choices. They may be derived from their upstreams’ respective repositories, or they may instead be based on wholesale upstream code “import” commits done when the packaging is updated to a latest upstream release.

Right now, 68% of source packages in Ubuntu are listed as having their packaging maintained in git (whether in Debian or Ubuntu):

Note: the data I have used cannot tell us the difference between a package not being maintained in a VCS and the package maintainer not having declared in metadata that a particular VCS is in use.

Choices for Ubuntu and git

We’re not in a position to mandate that everyone uses git. Even if we did do that for Ubuntu, we cannot expect mandate it in Debian and certainly not in every upstream project in our repositories.

One of the problems we want to solve is to be able to answer the question “where do I git clone from to get the Ubuntu source for package X?”. We don’t want to be forced to say “ah, but for package X, it’s the same as Debian, and they’re using Mercurial in this case, so you can’t git clone you have to use hg clone from this other place”, and then have the answer be different for package Y and different again for package Z. We know this will happen for 3 out of 10 packages. We want to eliminate all the edge cases so that, for git clone against any Ubuntu package, the repository structures are all consistent and all subsequent developer expectations always work.

To achieve this consistency, we need to find a way to use git for all Ubuntu packages: regardless of what VCS Debian or upstreams use for each package and project; and regardless of their different branching, tagging and merging models.

We think we’ve achieved this with our design; more on this in a future post.

Ubuntu, Bazaar, and UDD

Some readers may be familiar with a previous effort in Ubuntu, UDD, which was largely a similar effort but with Bazaar. Nine years later, git has largely won the “VCS wars”, and appears to be preferred by the majority of developers. Our current effort could be seen as “UDD, but with git”, if you wish.

Project goals

We'd like to avoid flag days and forced workflow changes. Ubuntu git integration will develop over time, but we don’t expect Ubuntu developers to be forced to switch to it. We’d prefer for developers to choose to use our integration on its own merits, switching over if and when they feel it appropriate. If, after further consultation with users and Launchpad developers, we did switch to git as the primary source of truth from Launchpad, we expect to be able to wrap for backwards compatibility with dput.

Being central to all code, “moving to git” can be somewhat all-encompassing in terms of desirable use cases. Our original goal was very specific: to make what we call “Ubuntu package merges” easier. I achieved this back in 2014, and the server team has since made big improvements to this particular use case. Now we want to use git for much more, so this necessarily encompasses a wide range of use cases. We have accepted the following use cases as falling within the scope of our project:

For drive-by contributors and new developers

Provide a single place from which any developer can git clone to see the current state of a package across all our releases, and to provide branches against which pull requests can be received.
Make automatic checking of contributions possible via a linter, for contributors to run locally but also run by a bot automatically against pull requests, to tighten the feedback loop where automatic advice is possible.
Simplify and flatten the learning curve by eliminating the need to use some of the arcane tooling that has built up over the decades in the case of simple contributions. Most developers either know git or have many others readily available to teach them git, so we can take advantage of this instead of requiring them to learn pull-lp-source, debdiff, etc.

For routine Ubuntu development

Faster and more accurate “Ubuntu package merges” by using git (already achieved).
Collaborative working for sets of complex package changes, such as SRUs and backports, so that planned changes can be shared, reviewed and amended before upload.

For experienced Ubuntu developers

Automatic linting of contributions to allow contributors to fix some issues directly and immediately themselves, to relieve sponsor and review workload.
All publication history available for debugging, bisection and other general software archeology tasks.
git push to upload to the Ubuntu archives.

Current status

We have an importer running that automatically imports new source package publications into git, so the entire publication history of that package becomes available to git users. Until we’re ready to scale this up, we’re importing a subset of packages from a whitelist, with other packages imported on request for interested developers. You can also run the importer yourself locally on any package.

Tooling is available as an extension to git providing a set of subcommands (git ubuntu ...). The CLI is still experimental and subject to change, and we have a set of high-level subcommands planned that we have yet to write.

Experimental status

If you’re interested, please do take a look! We’d appreciate feedback. However, note that we aren’t “production ready” yet:

There are a number of developer UX issues we’d like to fix before declaring the CLI “stable”.
For scaling reasons, Launchpad needs improved git shared object support before we’re ready for developers to push cloned package repositories en-masse.
We expect to re-run the importer on all packages before declaring ourselves ready, so git commit hashes for our published branches will change until we declare them stable.

What would make a 1.0

Launchpad shared object support.
Hash stability declared.
Developer UX issues fixed.
Anything else? Please let us know what you think should be essential.

The wrapper

On our way, we hit a bunch of edge cases which may confuse developers. Some examples:

An upstream may have placed a .gitattributes file that will unexpectedly “modify” the upstream source ( $Id$ etc) as we add packaging commits.
git will by default convert line endings and suchlike for you; but in packaging work, we want to leave the upstream sources untouched except where we have some reason to explicitly patch them.
The build may depend on empty directories, which git cannot currently represent.

These edge cases can be worked around, often automatically, but this won’t happen when a new developers use git clone directly.

To avoid having to introduce too much at once, we have written a wrapper that handles these edge cases automatically, or at least warns you about them on the occasions that they are relevant. There are also some common repetitive actions that are specific to our workflows; the wrapper also composes these for convenience to save developer time.

We don’t want to mandate use of our wrapper. To better suit advanced developers, we’ve designed everything to be directly accessible without the wrapper, and we consider this method of access to be a first class citizen in our work. We’ll talk more about the wrapper and its capabilities in a future post.

In the next post, we’ll cover details of where the imported repositories are and what they look like.

On Ubuntu and License Compliance

2015-03-24T12:54:43+00:00

I found it quite frustrating to read Carsten Munk's concerns about GPL and licensing related to the kernel shipping with the bq Ubuntu phone. Clarity is essential in these matters. That Carsten can't tell what is going on for certain is a problem. It shouldn't have happened and I'm pleased to see that my colleagues are working hard to clear it all up.

As a Canonical employee and an Ubuntu developer I work hard to make sure that the work I'm involved with is fully compliant. Sometimes this takes me considerable time and effort. So for me the frustrating part of reading Carsten's investigation is that only our mistakes are evident. When things are done right people often don't notice, and so it's all too easy for outsiders to draw the conclusion that we are "evil". I'd like to present an example of how I work hard to do things right, in an effort to balance this view.

Juju is a particularly challenging project to package using the traditional distribution model. It's a cross-platform, cross-distribution and cross-release tool, and a single deployment needs to be able to deal with all of this simultaneously. But from a licensing perspective, the challenge comes from it being a major Go project. It follows standard Go practices in handling its dependencies, so by the time an an upstream release gets to me the release tarball contains all of Juju's dependencies embedded within it. As the person who uploads new Juju packages to the Ubuntu archive, it's my responsibility to make sure that everything is compliant from a licensing perspective. The embedding means that instead of having to verify just the Juju code itself, I also have to verify all dependencies, recursively. Many of the dependencies are small third party projects that appear to not have been packaged for a distribution before, with little attention paid to licensing compliance before I looked at them. Dependencies are added and versions bumped frequently. Every time, I have to check again. Right now, the sum of Juju and its dependencies involves over 3000 files over 37 separate projects.

Back in July I did a full review over all of this code and developed a process to follow further changes incrementally, since the situation here is quite radically different from a traditional distribution package. In my initial review, I found a whole slew of clearly unintentional errors, but sought to have them fixed anyway. I filed an extensive bug report describing the contradictions and ambiguities I found. I have also filed bugs in upstream projects as appropriate: for example in gojsonschema. I was pleased to find that it wasn't just me focusing on diligence in this area: as you can see from the first bug, my colleagues on the Juju team all took the issues I raised seriously, addressed them and committed fixes in just a week. Bugs I have filed more recently about licensing errors introduced in newer releases have continued to result in a quick response.

So, please do not misconstrue our intentions. Mistakes may happen but we do care, and do seek to resolve them as quickly as we can.

What's in an Ubuntu package version string?

2015-01-15T12:01:47+00:00

Here's a typical example:

corosync 2.3.3-1ubuntu1

The part before the hyphen (2.3.3) is the "upstream version". This is the version of the release tarball from upstream that the package is based on.

The part after the hyphen (1ubuntu1) is the packaging revision. But this splits further into the part before ubuntu and the part after. The part before (1) is the Debian packaging revision the packaging in Ubuntu is based on. The part after (also 1 in this case) is the Ubuntu packaging revision.

So here's how we can interpret this. The ubuntu tells us that there are Ubuntu specific changes that have been made in the package. The string after it (1) is the packaging revision assigned by the Ubuntu developer, and suggests that it has only been modified once. Going backwards, we can see that this Ubuntu modified package is based on Debian's package of corosync version 2.3.3-1. The Debian maintainer has assigned packaging revision 1 also, and his package is based on corosync's upstream release version 2.3.3.

Another example:

apache2 2.4.10-8ubuntu2

This package has also been modified from Debian, since ubuntu is present in the version string. It is on the second revision of packaging modifications in Ubuntu, and these modifications are based on the eighth Debian maintainer's packaging revision of upstream's 2.4.10 release.

Being in sync

haproxy 1.5.10-1

The absence of an ubuntu string inside the version number tells us that this package source has not been modified from Debian. We usually describe this as being "in sync" with Debian, which is a common goal for us in Ubuntu for most packages. This is the first Debian packaging revision of upstream haproxy's 1.5.10 release.

Not directly based on Debian

libvirt 1.2.8-0ubuntu19

The -0ubuntu tells us that this package is not based on Debian's packaging of upstream release 1.2.8 at all. This may be for a number of reasons, which cannot be determined solely from the version number:

Ubuntu may have pushed ahead with a newer upstream release of 1.2.8 before Debian uploaded it. This can happen when Ubuntu developers have a tighter deadline than Debian for a particular version. For example, since Ubuntu has a faster release cycle than Debian, it may be the case that a Debian maintainer hasn't had time to upload to Debian yet, but an Ubuntu developer wants to make the shorter deadline of an imminent Ubuntu release.
The package doesn't exist in Debian at all. This isn't ideal, but happens for very Ubuntu-specific packages. An example of this is nvidia-304, which is a "restricted" component binary non-free driver.
Ubuntu developers have decided to deliberately diverge from Debian for some reason. We try to avoid this situation as much as possible, but the situation does exist for some packages.

SRUs

vsftpd 3.0.2-1ubuntu2.14.04.1

This follows a common scheme used for updates to a stable release (a Stable Release Update, or SRU), where it is often necessary to "insert" a version in between the version in the stable release and version in a future release, so that upgrades to future releases still work correctly.

Use of this scheme isn't mandatory, but unless an Ubuntu developer is trying to be misleading, this version string means that the second Ubuntu modified packaging revision (ubuntu2) of the first Debian packaging revision (-1) of the upstream vsftpd release 3.0.2 has had one SRU (.1) applied to it in the 14.04 release. See the wiki page on security update preparation for more details of this scheme, which is also the recommended scheme to use for non-security updates.

Another (possibly more) common pattern you'll see in SRU version numbers is:

freeipmi 1.1.5-3ubuntu3.1

This uses the same scheme. But here, "inserting" a version didn't need the 14.04 style prefix since the same 1.1.5-3ubuntu3 didn't appear in multiple releases. So, a little more straightforwardly, this has had one SRU (.1) applied to the third (3) Ubuntu (ubuntu) modification of the third (-3) Debian packaging revision of upstream freeipmi release 1.1.5.

Comparing version numbers

Underpinning all of this is the definition of how Debian and Ubuntu package version numbers are compared. A strict ordering is defined in Debian policy, and familiarity with the scheme is essential for package maintainers. When in doubt, dpkg --compare-versions can verify the ordering for you.

Autosync

It is useful to know that the way that autosync mechanism prevents Ubuntu-specific changes from being overwritten is by detecting ubuntu in the version string. This is how, for example, no-change rebuilds are automatically synced over, but regular packages with Ubuntu deltas present are not.

Edge cases

Unfortunately, there are many edge cases which make it difficult to write a comprehensive guide. Here are some that come to mind that you can expect to eventually come across:

Native packages (eg. dpkg) have no packaging revision and so have no hyphen.
Hyphens are permitted in upstream version numbers, which leads to multiple hyphens in the Debian package version string!
Upstream release tarballs that contain files that cannot be redistributed by Debian or Ubuntu for legal reasons are modified by the maintainer to meet policy, and then grow something like +dfsg to the "upstream" version string.
A tilde (~) is defined to sort before anything else, including nothing. So it's used to "insert" a version before a standard one. This is commonly used in PPAs and backports.
Version epochs prefix a number followed by a colon to "reset" version numbers, for example when an upstream version numbering scheme changes and a newer version number would otherwise evaluate "backwards".
No change rebuilds lead to versions numbers like 1.5.2-3build1.
Versions "in the middle". I've said things like "second" and "eighth" above, but without checking we don't know for example that there wasn't an extra 2.4.10-6.1 revision in between 2.4.10-6 and 2.4.10-7, so this isn't strictly correct. We can't assume the number of versions that exist between two version strings based just on the version strings. The version numbering system is deliberately designed so that it is always possible to create a new version string in the middle.
Debian non-maintainer uploads (NMUs) use an additional suffix instead of bumping the first part of the packaging revision number as the maintainer would. This is to prevent any conflicts with an upload the maintainer might be preparing, and has the benefit of making it clear to others that an NMU took place.

My git-based Ubuntu package merge workflow

2014-08-04T16:17:45+00:00

Originally posted to the ubuntu-devel mailing list (archive).

I thought it was about time that I shared my own merge workflow, as I think it is quite different from most other Ubuntu developers. I'm an advanced git user (even a fanatic, perhaps), and I make extensive use of git's interactive rebase feature. To me, an Ubuntu package merge is just a rebase in git's terminology, and in this case I use git as nothing more than an advanced patchset manager.

I find my workflow allows me to handle arbitrarily complex package merges - something I've not been able to do any other way. And once I've merged a particular package with this workflow once, future merges take me far less time because checking individual broken down diffs is even quicker still.

This workflow may be useful to others, but probably only if you are already very familiar with git's interactive rebase feature. I don't suggest that you try to use this workflow without first being extremely comfortable with this (for example, working with git while not attached to a branch).

On the other hand, if you are very familiar with rebasing in git, then like me you may find this workflow to be the logically obvious way of doing package merges in Ubuntu. I wonder if anybody else feels like this.

In my mind, this write-up may seem complex, but I think this complexity is just a reflection of the reality of what's really going on when one does an Ubuntu package merge. But by using git, the complexity gets moved to the complexity of doing git rebases, and this is something that only needs to be learned once.

I'm also interested to know how this fits in with other recent work in using git with Debian packaging. My impression is that it doesn't fit so well, because in Ubuntu we need to deal with all Debian packages, including those not managed in git in Debian. Comments, feedback and criticism are all appreciated.

Considering merges

Merge essentials

Let's first consider what an Ubuntu package merge really is. Existing Ubuntu developers probably want to skip this section.

First, some terminology. For a given package that needs merging, Ubuntu has applied some set of changes from the Debian version it is based on. So we have some Debian version from which Ubuntu diverged (the base version), the latest Debian version, and the current Ubuntu version. The old Ubuntu delta is the diff between the base version and the current Ubuntu version. The new Ubuntu delta will be the diff between the latest Debian version and the newest Ubuntu version that we will upload.

To do a package merge, we must re-apply all of the Ubuntu delta that is still required onto the latest Debian version. On the way, we might find that some changes are no longer required, some changes that have to be modified to work against the latest Debian version, and may perhaps need to introduce new changes.

We expect the result to contain a changelog entry summarising what remains in the Ubuntu delta, what was modified or dropped, and any new changes that were made.

The logical delta

So when doing a package merge, it is essential to understand what exactly logically constituted the previous Ubuntu delta, so that we can identify what changes are no longer required, how we might need to modify some previous changes, and what new changes may be needed.

When the Ubuntu delta is relatively trivial, checking all of this by examining the diffs produced by merge-o-matic is normally fine. Even if the delta consists of a few changes, they are easy to identify and understand in a small diff.

But when the delta is larger, I find it far more difficult to follow it all in my head at once, particularly when multiple logical changes apply changes to similar overlapping areas across multiple files. This is, of course, yet another good reason why we should be sending our changes to Debian and keeping our delta small, but in some cases maintaining a large delta is necessary, at least in the short term.

In following my workflow, I have come across a number of merge errors made by multiple Ubuntu developers where the claimed delta in the changelog for a merge did not match the delta itself. This suggests to me that developers are not always checking and understanding the delta as they should.

Applying git

git makes it easy to take a large "squashed" diff and split it into multiple constituent logical parts. This is what I've been doing here. Once split like this, I use git rebase to apply the logical parts back on to the latest Debian version. This allows me to examine each logical part of the delta separately, modifying or removing them as required. When I'm done, it is easy to review each part, and even compare against the previous version. And I can save the broken down parts for the next merge.

So broadly, my workflow for packages with complex deltas is:

Import the base, latest Debian and all Ubuntu revisions since the base version into a git repository.
Break down the Ubuntu revisions into constituent logical parts using git rebase. Or if I followed this workflow last time, then I just run git am against what I saved previously. One might consider this step to be the opposite of a "squash" operation. "Unsquash", if you like.
Rebase onto the latest Debian version, dropping any metadata changes (eg. debian/changelog changes and update-maintainer) and amending the delta on the way as required.
Update debian/changelog, apply update-maintainer, review, test and upload.
Run git format-patch to save my set of logical changes for next time.

To help with these tasks, I have written some tooling that I use. I've pushed these to git://github.com/basak/ubuntu-git-tools.git:

xgit is a wrapper around setting GIT_WORK_DIR and GIT_DIR so that I can operate with a .git directory that is outside my working tree. This means that dpkg-buildpackage, dpkg-source etc. don't need to know or care that I'm using git, and I can run git commands without necessarily being in my working tree.
git-dsc-commit imports a source package by just committing and tagging a new commit (in the current branch, or detached HEAD) that is exactly the unpacked source package.
git-merge-changelogs is a wrapper around dpkg-mergechangelogs that takes its input changelogs from debian/changelog files found in specific git revisions.
git-reconstruct-changelog extracts commit log messages from a set of git commits and writes them to debian/changelog.

These tools are incomplete. I didn't know where I was going when I wrote them, and there is certainly scope for more automation. I addressed the biggest needs first, and what is remaining costs me little time so I have not spent time to automate any more yet.

Importing revisions into a git repository

I generally start with:

# Download relevant source packages. This could probably be automated
# with the help of grab-merge.
pull-debian-source -d <package>
pull-debian-source -d <package> <base-revision>
pull-lp-source -d <package>
pull-lp-source -d <package> <version-since-base>  # 0 or more times

# Set up git repository for this merge
. /path/to/xgit.bash
xgit
mkdir git gitwd  # git = moved .git directory; gitwd = working directory
                 # without .git
git init

Next I import the sources package into git, modelling the Ubuntu divergence by having the new Debian package have a parent commit of the base Debian package, and the Ubuntu packages on a separate branch also rooted at the base package:

git dsc-commit <base-revision .dsc>
git dsc-commit <latest-debian-version .dsc>
git checkout <base-revision tag>
git dsc-commit <Ubuntu version since base .dsc>  # 0 or more times
git dsc-commit <current Ubuntu version .dsc>

git-dsc-commit automatically tags revisions, but since ~ and : are invalid in git tag names, _ is substituted. So right now, I have to correctly name the tag that git-dsc-commit used in the git checkout call above.

git-dsc-commit commits "3.0 (quilt)" source packages without patches applied. I prefer to work with quilt patches directly if they need refreshing or other changes made. Otherwise I just get noise in .pc/, and it is difficult to rationalise any changes made back into the separate quilt patches they belong to.

Note that git-dsc-commit commits the entire source package tree exactly as it is. It is not like a normal commit, where logically you're committing a change. Underneath, git commits are really snapshots, not changesets, so git-dsc-commit just commits a snapshot identical to the source package. For example, if you have made a change, then from your point of view git-dsc-commit will effectively commit the reverse of that change if necessary so that the result looks identical to the source package you're importing.

When this step is done, I have a git repository with imported source packages in commits that mirror the Ubuntu divergence.

I find this point very useful in itself, since I can now easily compare things. If I want to know if two files specific are different between specific Debian and Ubuntu source package versions, or how they are different, or want a list of files in a particular debian/ subdirectory that have changed, then I just ask and git will tell me. Querying for changes between arbitrary revisions and files is something that git does very well.

Breaking down the delta into logical parts

If I have already perfomed this next step for a previous upload, then a simple git am against my saved work allows me to skip this step. I can verify the result by diffing against the imported squashed equivalent.

I won't go into how to use git rebase here; I assume you know that. For every commit I edit, I generally git reset HEAD^ back to the previous version, so all changes made in this particular source package version become unstaged. Then I go through the changelog entries one by one, staging only those changes (often using git add -p) and committing them one by one.

The point in this step is to reflect what was logically present in an already-uploaded source package, errors and all. Some notes:

I generally aim to end up with commits that follow the same order as the entries in debian/changelog.
git log --decorate is useful here, since all the imported source packages are tagged.
I make the commit message for each logical change identical to its entry in debian/changelog where possible, including leading whitespace and the *, - or + bullet points.
I make the debian/changelog file change for the entire upload a separate commit at the end (most recent) for each source package version.
If update-maintainer was run and thus modified debian/control, or VCS-* entries changed to XS-Debian-VCS* entries, I put this in a separate logical commit with an "ubuntu-meta" commit message.
Quilt patches that exist only in Ubuntu involve logical commits that add the file in debian/patches/ and add a single line to debian/patches/series. Patches remain unapplied. Similarly, for other types of quilt patch modification, only changes to debian/patches/ end up in the commit.
This is the stage that I often find errors in the previously documented changelog. Where this happens, I just figure out what happened logically and try and commit something that matches. If debian/changelog specified a change that was actually not present, it doesn't get a logical commit, but the commit with the full debian/changelog change does include the erroneous text.

When done, it is trivial to run git log -p and check that all commits match their description. I also run git diff <tag> and verify that the result is still identical to the source package import we started from by checking that the reported diff is empty.

Rebasing onto the newest Debian version

Again, this should be straightforward to follow for git rebasers, and I assume you know how to operate the details.

First, I drop any previous commits that changed debian/changelog only, as well as any "ubuntu-meta" commits. Then something like git rebase --onto <new_debian_version> <base_version> does the job. If there are conflicts, they can be handled during the rebase in the normal way.

While I'm doing this, I take notes of the changes I made so that I can write up the changelog later. Where possible, I directly squash these notes into the commit messages in the form of the future changelog entry. If the rebase step drops commits because they have been applied in Debian, then it's important to note these. git doesn't specifically point these out except as they scroll past.

Next, I check that all quilt patches apply and are still correct, and do any further editing required. This includes test builds, running dep8 tests, etc. As I do this, I use git rebase extensively again, squashing the commits down into their original places and updating commit messages (which will be the basis of the future changelog message).

When doing test builds at this stage, I don't want to overwrite the Debian source package in my parent directory, so I do have to insert a temporary changelog entry or something. I haven't worked out a strong pattern for this yet; sometimes I complete the remaining steps first to avoid this issue, and rebase and squash any changes I needed back in. git-buildpackage can probably help me here; I haven't looked into integrating it into my workflow at this step yet.

It is important to note that this stage really uses git as an advanced patchset editor. I am editing the patchset itself. I specifically do not add new commits to the end, except temporarily before I squash them down again.

When this step is complete, my commits start from the imported latest Debian source package version, and show the logical delta (one logical entry per commit) that will form the new Ubuntu upload. Changelog entries exist only as commit messages; debian/changelog is not modified at all yet.

Updating the changelog

Since an Ubuntu merge is expected to include a merged changelog, adding to the Debian changelog will not do; we need to import all previous Ubuntu changelog entries too.

Most of this could probably be automated more.

Merging old changelog entries

My tool git-merge-changelogs does this. Calling it as git merge-changelogs <base version tag> <latest Debian version tag> <current Ubuntu version tag> fetches the changelog entries out of the imported source packages, calls dpkg-mergechangelogs and writes out debian/changelog in the working tree. Then I usually just git commit -mmerge-changelogs debian/changelog to commit this step.

Automatically creating new changelog entries

Next, I need to add the changelog entry for the merge itself. I do this with my tool git-reconstruct-changelog. Calling git reconstruct-changelog <latest Debian version tag> inserts the commit messages into debian/changelog. Then I usually run git commit -mreconstruct-changelogs debian/changelog to commit this step.

Finishing the changelog

Reconstructing the changelog will miss out the merge introduction, and also will fail to mention any dropped changes since there are no commits that correspond to these. Consulting my notes from earlier, I edit up the changelog manually, fix any whitespace/wrapping issues, release it with dch -r '' and commit it with git commit -mchangelog debian/changelog.

Other metadata

Next I run update-maintainer, and then git commit -mubuntu-meta debian/control to commit this step. Any VCS-* to XS-Debian-VCS-* type translation goes into this commit, too.

Uploading

That's it. Since my working tree has no .git directory, I can just run debuild as usual to create my source package ready for upload.

If there's a problem and I need to go round again, it's quite easy to squash a change in where I need it, re-run git reconstruct-changelog and edit the changelog, and rebuild the source package.

Saving the logical delta for future use

After upload, I make sure to save my logical delta by using git format-patch. This allows me to reconstruct it quickly the next time I merge the same package. There is no need for me to keep the git repository around.

The patchsets I've saved this way don't always follow what I've written here precisely, as I have taken a while to settle on it, and I still deviate on a whim. It doesn't really matter though; by separating out logical changes into separate commits, when I look at it the next time it's easy to mould a patchset into whatever form I will need.

Example of use

This workflow allows me to handle any merge that is thrown at me, however complex it may be. When I merged mysql-5.5 last cycle, it had diverged considerably from Debian, but with much cherry-picking going both ways. The sheer complexity of it, and the time necessary to figure it all out, had put off developers before me from sorting it out. Instead, some changes kept getting cherry-picked and other changes were getting lost.

When I reconstructed the logical set of changes made in Ubuntu since we diverged, I ended up with a branch of around 120 commits (IIRC). With extensive rebasing, I ended up reducing this to 8 logical changes to send to Debian, and just 4 commits remaining in the Ubuntu delta. Importantly, I did this in a way that I could be confident about the results, since I could easily verify my work.

I'm now going to do the same for mysql-5.6, and I'm much happier doing it knowing that I can manage it this way.

Future

I have a local store of these logical delta patchsets. Currently this is for apache2, facter, nginx, php5, subversion and vsftpd. If others want to follow the same workflow, we should work out some way to share them.

And if many people find a git repository that follows Debian and Ubuntu source packages useful, then perhaps we should set one of those up to share, too, to save doing the import step.

I did have some code that auto-imported into git from UDD bzr, and cached, so I could just git clone a UDD branch, but this is limited to UDD's package import reliability, so I stopped using it. My git-dsc-commit tool should always work. I have had to fix a number of edge cases, but I am not aware of any that are outstanding.

The End

What I think I have here are the pieces needed to make merging Ubuntu packages with git work. The workflow itself doesn't matter so much - you can mix and match, and you should be fine.

Ubuntu Server Q&A Sessions

2014-06-09T11:38:48+00:00

As part of the Ubuntu Server team's participation in the Ubuntu Open Summit, we'll be running two Q&A sessions this week aimed at Ubuntu Server users. We want to gather questions from the community both before and during the event, so that users can get direct and authoritative answers from those in the know.

This will be the first Ubuntu Open Summit, combining the previously separate Ubuntu Developer Summit and Ubuntu Open Week events. The event will run online, with live video streaming and participation using Google Hangouts and IRC. See the schedule for details. Sessions will be recorded and the videos will be available afterwards.

I suggest that questions be asked beforehand in any place we can grab them (eg. the mailing list, r/Ubuntu, etc), and then we can best prepare answers for you.

Alternatively, and for more interactivity, there will be an IRC channel that we will monitor during the sessions themselves.

Ubuntu Server Security Q&A

Currently scheduled for: Tuesday (2014-06-10) 19:00 - 19:55 UTC

Also known as "Security team reads mean tweets", Marc Deslauriers and Seth Arnold of the Ubuntu Security Team will be on hand to answer all questions security.

If you want to know how to secure your freshly installed Ubuntu Server, the pros and cons of various individual hardening approaches, how to make sure that your system has received a particular security update, or have any other question related to Ubuntu Server and security, this is your opportunity to find out.

And of course, I presume everyone wants to hear the Ubuntu Security Team read mean tweets!

Ubuntu Server systemd Q&A

Currently scheduled for: Wednesday (2014-06-11) 18:00 - 18:55 UTC

Following a long debate, Debian chose systemd in Februrary as the default init system for its upcoming "jessie" release, and Mark has confirmed that Ubuntu will follow Debian and also switch to systemd by default.

What are the implications of this move for Ubuntu Server users? Will 16.04 switch to systemd, and if so, when? When will Ubuntu Server users need to rewrite all of their custom upstart jobs?

Dimitri John Ledkov, an Ubuntu Core Developer closely involved with the systemd switch, will be online to answer your questions.

Submitting questions

We'll accept questions at any time, including during the event on IRC. Or if you know what you'd like to ask already, please post them to Reddit or to the Ubuntu Server mailing list and we will pick them up and prepare answers for you.

Schedule changes

Please keep an eye on the schedule in case it changes!

New in Ubuntu 14.04 LTS: PHP 5.5

2014-05-08T14:11:51+00:00

Ubuntu 14.04 LTS ships with PHP 5.5, which is a significant upgrade over PHP 5.3 as found in 12.04.

PHP 5.5 actually first appeared in 13.10, though of course if you intend to do an LTS to LTS upgrade, you won't notice this until now.

PHP upstream introduced some incompabilities in this update, and recommend testing before upgrading production environments. For more details, see the PHP migration guide.

Getting PHP

PHP 5.5 is available in the main Ubuntu repository for Trusty, so apt-get install php5-cli (or libapache2-mod-php5, etc) will suffice. Ondřej Surý, the primary Debian PHP maintainer, also maintains a series of PPAs on Launchpad if you need a different version. Taking a quick look at Ondřej's Launchpad page, I see:

Thanks to Ondřej for kindly taking the time to maintain these PPAs for the community.

Help Wanted

Ondřej Surý, the Debian maintainer of PHP, posted a request for help with PHP Debian packaging. If you are an available Debian developer or maintainer, please join him. Helping with Debian PHP packaging also helps Ubuntu, since Ubuntu's PHP packages are heavily based on Debian's work.

Upstream dissonance

There are a few areas in which multiple distributions (at least the Fedora and Debian families) patch PHP, but these patches have not been taken upstream. Perhaps I'm mistaken, but I believe that most PHP users consume PHP through a distribution, so it does not seem ideal that multiple distributions carry these patches instead of their changes being adopted in upstream directly. I'd love to see better collaboration with upstream in these areas so that these particular distribution patches become unnecessary.

I mention some of these differences here since the differences appear to have affected some Ubuntu users.

JSON module

PHP upstream ship a JSON module whose licence is not considered acceptable to distributions. This was reported and accepted in Debian bug 692613, and Fedora also concurs.

As a workaround, Remi Collet from Fedora removed the JSON module, but arranged to ship a PECL module that builds JSON from an external, API compatible source that is not based on the upstream that is licensed dubiously. Debian fixed the licensing problem by doing the same thing.

Ubuntu follows the Debian lead by default, so Trusty also ships without the upstream core PHP module. Instead, since Trusty, the php5-json module is automatically brought in as a dependency. Users will still see an API-compatible JSON module available; it is just an alternative implementation that is more acceptable to Debian (and to Fedora).

Note that there is a claim in bug 1287726 that there exist edge cases where behaviour has changed. This bug is waiting on a volunteer to fix behaviour in the alternative JSON implementation. It sounds like this other upstream would be happy to take the change. This is an unfortunate consequence of the situation, but at least there are no fundamental disagreements on how to fix this. It is "just a bug" with a simple technical solution; we just need an expert in the area to take a look and fix it.

Some background: the Debian Free Software Guidelines (DFSG) ensures that Debian is free to install and use by everyone, without restriction. Ubuntu is primarily based on Debian, so inherits this status. Much of the hard work to make this reality has been done by the Debian FTP masters who have painstakingly reviewed every package in Debian (and consequently most of the packages in Ubuntu) to ensure that this is the case. So all software shipped by Debian must comply with the DFSG, and the PHP JSON module does not meet this requirement.

The other side of this argument is that the PHP JSON module is too important to compromise quality over a petty licensing issue. Why can't distributions just ignore the nonsensical clause?

This issue has brought out many angry people on both sides of the debate, and there have been some flames over this. I would like to remind readers of the Ubuntu Code of Conduct:

Disagreements, social and technical, are normal, but we do not allow them to persist and fester leaving others uncertain of the agreed direction.

We expect participants in the project to resolve disagreements constructively. When they cannot, we escalate the matter to structures with designated leaders to arbitrate and provide clarity and direction.

In Ubuntu, we make decisions based on consensus, or (rarely) through leadership when necessary. Right now, Ubuntu has not specifically made any decision on this point. We are simply following Debian by default, which is what we do unless we have particular reason to diverge.

Please be constructive about this issue. Remember: Ubuntu is a community project, and we have a well defined path for making decisions. The appropriate avenues to make a change in Ubuntu on this point is to first try to achieve consensus (eg. on mailing lists), and failing that, to take the matter to the Ubuntu Technical Board.

More discussion can be found in the upstream bug and reddit thread.

Timezone handling

In a distribution, users expect to set a system setting once, and to have all applications to pick up that setting automatically. This includes, for example, the system timezone setting.

This principle also applies to system timezone updates. When daylight savings time rules change in some country or other, users expect to have these picked up in a single system update, and for all applications to immediately use this new data.

PHP upstream disagree on this point. PHP as shipped by upstream requires the system timezone to be manually set in php.ini. PHP also uses its own inbuilt timezone rule database, so users do not receive rule updates without also bumping the PHP version with a new upstream tarball.

So Fedora, Debian, Ubuntu (and others?) all patch PHP to make it use the system timezone setting and the system timezone database.

More information is available in a comment in Debian bug 618462.

As upstream releases changed, we had a couple of regressions in the past in bug 1069529 and bug 1244343. I managed to sort these out (and submitted patches to Debian also), so I think we now have this functionality properly nailed in Trusty.

As all distributions are carrying this patch (written originally by Remi Collet of Fedora, I think), it would be nice if upstream PHP could take this patch too. But my understanding based on the Debian bug above is that they are reluctant.

libgd

PHP includes a GD module, which provides an API to various graphics operations. PHP upstream bundle this library. But distributions prefer to use a single version of a library shipped separately, rather than using the versions shipped bundled inside upstreams.

The reasons for this are well documented in Fedora's No Bundled Libraries Policy and in Debian's Upstream Guide, so I won't repeat their justifications here.

In PHP's case, they effectively forked GD by adding functions that were not available upstream, and then made these functions available in the PHP GD API. Since distributions used the upstream GD library, this meant that PHP programs written against this API failed to work on PHP as shipped by distributions, since they used missing functions that were not available.

Bug 74647 contains the details. I believe this is a mostly solved problem now, as PHP upstream did focus on trying to get all of the required functions accepted in GD upstream. There is one report that there is one remaining function (imagerotate) that is still not available upstream, and thus not in distributions that ship PHP.

I am pleased to note that it does appear that upstream are working on the problem, so I hope that this issue will completely go away in a future release.

Thanks

Thanks to Ondřej Surý for his relentless effort in keeping PHP maintained in Debian, for monitoring PHP bugs in Ubuntu, and for maintaining PPAs for PHP in Ubuntu.

Getting help

As always, see Ubuntu's main page on community support options. askubuntu.com, #ubuntu-server on IRC (Freenode) and the Ubuntu Server mailing list are appropriate venues.

New in Ubuntu 14.04 LTS: nginx 1.4 in main

2014-05-01T17:04:16+00:00

Ubuntu 14.04 LTS ships with nginx 1.4, which is now in main for the first time. Packages in main are covered by the Ubuntu Security Team and generally receive particular focus and attention in Ubuntu. This brings nginx up to par with Apache as a first class citizen in Ubuntu.

This move also led us to closer collaboration with nginx upstream. This is great to see happening in Ubuntu, and can only help to improve quality in our ecosystem.

Note that it is only nginx, nginx-core and the other support packages nginx-doc and nginx-common that are in main. The other packages (extras, full, light, naxsi etc) contain third party plugins and thus remain in universe. See below for details.

Background

Thomas Ward had been looking after the nginx packages in Ubuntu for quite a while, so when I received requests to get nginx into main, I made sure to contact him. One requirement for main inclusion is a team commitment to look after the package. We concluded that Thomas would carry on looking after nginx in general in Ubuntu, but that the rest of the Ubuntu Server Team would be able to back him as necessary.

Following Jorge's blog post about nginx plans for main, Sarah Novotny from nginx upstream contacted us to see how we might be able to collaborate. We are now all in touch so that we can work together to make the nginx experience better for Ubuntu users. I made sure that we all connected with the Debian nginx team also.

Thomas also blogged about nginx in main as soon as it landed.

Packaging notes

There are a couple of notable differences in Ubuntu's nginx packaging (inherited from Debian):

The default path served is not /var/www/ like it is with Apache. Instead, it is /usr/share/nginx/html/. This directory contains the index.html file that is served by default. However, /usr/share/ is not a suitable location to place your own files to serve, since this area is maintained by packaging. Instead, you should configure nginx to serve from a different path, and then use that. According to the Filesystem Hierarchy Standard, /srv is a suitable path to use for this.

Placing your own files in /usr/share/nginx/html/ is dangerous, as they can be arbitrarily overwritten by package upgrades. This unfortunate behaviour has been reported in bug 1194074, and there there has been some discussion in Debian bug 730382. But as this is a consequence of the choice of default document root as a deliberate decision by the Debian nginx maintainers, there isn't yet any solution to stop users falling into this trap, except to know about it. So please heed this warning and make sure that you change your document root appropriately.
The nginx daemon does not start by default as soon as the package is installed. You must do this by hand using service nginx start. This makes sense since you will usually need to reconfigure nginx to use a different document root first (see the previous point).

Adjustments for main inclusion

A requirement for main inclusion in Ubuntu is quality and maintainability from a security perspective. The security team reviewed nginx and passed this requirement for nginx itself, noting that "Nginx is high-quality legible code, excellent explanatory comments and platform notes, very useful utility functions, and defensive error checking and logging".

However, some third party modules shipped with nginx in Debian varied in quality, so did not pass this requirement for main inclusion.

Since nginx does not currently support dynamically loadable modules, it is not possible for binary distributions such as Debian and Ubuntu to independently build plugin modules using separate source packages. Since module selection is done at build time, this makes it impossible for users to select the precise set of modules they want to have enabled in their nginx binaries, or to add modules written by third parties afterwards, as the distribution has already built the nginx binaries as part of the distribution.

So instead, Debian supplies a selection of third party modules as part of the nginx packaging. This results in binary packages such as nginx-light, nginx-full and nginx-extras, so that users can at least pick from a list of predefined sets of modules, which include common third party modules.

Since third party modules could not be included in main in Ubuntu, a new binary package nginx-core was created which contains only modules supplied in the nginx source itself. It is nginx-core, generated from the nginx source only, and related support packages nginx, nginx-common, and nginx-doc, that were promoted to main.

nginx 1.6

nginx 1.6 was released on 24 April, which was a week after the release of 14.04 LTS. This means that it will not be available as part of Ubuntu except in future releases. If you need nginx 1.6 on 12.04 or 14.04, you can use a PPA or the upstream-provided package repository. Read on for details.

Multiple package sources

You now have a variety of sources for nginx packages. You can install nginx from the Ubuntu repository itself, use the Launchpad nginx team PPA or use the packages provided by nginx upstream.

When deciding which source to use, I suggest that you consider the differences in release management, how security updates are handled and by whom, and your deployment's external repository dependencies.

This sort of choice in package repository source seems to be becoming increasingly common for key packages as the Free Software ecosystem continues to develop. Ubuntu Server LTS releases remain the stable, solid ground that production server deployments are built on, but the faster development pace of upstreams means that there is constant demand for newer upstream releases to be made available in older LTS releases.

Here are my own personal opinions on the pros and cons of the different approaches, from an nginx perspective.

nginx from Ubuntu itself

nginx as shipped within Ubuntu follows the Ubuntu release cycle and release management. You get the version available at the time the Ubuntu release you're using entered feature freeze, with only high-impact bugfixes and security updates issued as updates, as curated by the Ubuntu Server Team and the Ubuntu Security Team.

This provides a stable platform, where by stable I mean that the package does not functionally change in the lifetime of the Ubuntu release. From a production perspective, this means that if you successfully deployed last week, you can have maximum confidence in performing an identical redeployment next week. If your workflow is to have a validated, consistently reproducable deployment, then this approach minimises the chance of your deployment regressing, by not changing it. More information on Ubuntu's stable release policy can be found on the Stable Release Updates page.

The trade-off to this release stability is that the latest and greatest is not available, except through six-monthly non-LTS releases, whose use is less common on production servers.

You can see which nginx version ships with which Ubuntu release on the Ubuntu nginx package page. If the version of nginx shipped with Ubuntu is suitable for your needs, then I recommend using this option.

nginx from the Launchpad nginx team PPAs

The Launchpad nginx team PPAs are mainly maintained by Thomas Ward nowadays, who is the same person who generally looks after the nginx packaging in Ubuntu itself. You have a choice of two PPAs "stable" and "mainline", which follow the two lines of upstream development.

The version of nginx in these PPAs move along with upstream releases. For example: if you installed nginx 1.4 from this repository on 12.04 previously, it would automatically have upgraded to 1.6 when you performed your regular system updates after the PPA was updated to 1.6. This effectively gives you a "rolling release" of the latest nginx, but based on the stable release of Ubuntu 12.04 LTS.

The advantage of this approach is that you get the latest version of nginx, assuming that this matters to you. For example, if you need a more recent feature that is not present in the version of nginx shipped with the latest LTS release of Ubuntu, then this is useful.

If you want to continue to have the latest version, then this option will work well for you.

However if you want the latest version but for it to subsequently not change, then this is dangerous, since not updating your system from this PPA having used it also means that you will not receive security updates, and bugfixes will generally not be available to you unless you also bump to the latest release version.

Assuming that you do stay up-to-date with the PPA, then instead of managing regression risk by not changing things, regression risk must now necessarily be managed by the nginx upstream team's QA process. In this case, I know that they do have a comprehensive test suite that they run before release, but clearly the regression risk is higher than the approach of not changing anything at all.

Also, note that as a PPA this does not receive the attention of the Ubuntu Security Team. Thomas is very good at keeping this PPA up to date, but clearly PPA maintanance primarily by one person does have a very low bus factor in the context of timely updates.

nginx packages from upstream

nginx upstream also publish package repositories for nginx. The trade-offs in using these are quite similar to using the Launchpad nginx team PPA from a release management perspective.

The key difference is in packaging. nginx upstream packaging is designed to appear largely the same regardless of which distribution you are using, for more consistency across distribution families. This is different from the nginx distribution packaging, which is generally designed to follow the patterns commonly used across the Debian and Ubuntu distributions. So if you move from distribution packaging to upstream packaging or vice versa, you will probably need to adapt your deployment configuration.

I assume that the bus factor for timely updates here is much higher than for the PPA, since this repository is managed by the larger upstream nginx team. Security updates generally originate from upstreams anyway, so in general all nginx repositories are to some extent reliant on the upstream nginx team for updates, of course, regardless of their direct source.

Note that if you choose this option, your deployment will additionally rely on the availability of the upstream nginx package archive. If you use many upstream repositories for many different components in your deployment, then this magnifies to many points of failure. You can mitigate this risk entirely by mirroring the packages you are using. Of course, this type of deployment dependency also applies to anything based on the Ubuntu archive, in that your deployment already has a dependency on the Ubuntu archive if you are not mirroring the Ubuntu packages you use. I think that how you consider this trade-off, or whether it is even a trade-off at all, is a matter of opinion.

Backports

I will note that the Ubuntu Backports repository exists, but it is not currently used for nginx, so I will not discuss this option further here.

Getting help

As always, see Ubuntu's main page on community support options. askubuntu.com, #ubuntu-server on IRC (Freenode) and the Ubuntu Server mailing list are appropriate venues.

Thanks

A big shout out is due to Thomas Ward, who has been looking after both nginx in Ubuntu and the Launchpad nginx team PPA for quite a while now. Thomas was pivotal in getting nginx into main, and blogged about it when it landed.

Thanks also to the Debian nginx packaging team. Ubuntu's nginx packaging is based on their hard work.

And to Sarah Novotny of nginx upstream, for reaching out and collaborating with us to help make the nginx experience of Ubuntu users better.

New in Ubuntu 14.04: Apache 2.4

2014-04-24T13:13:10+00:00

Ubuntu 14.04 ships with Apache 2.4, which is a significant upgrade over Apache 2.2 as found in 12.04.

Apache 2.4 actually first appeared in 13.10, though of course if you intend to do an LTS to LTS upgrade, you won't notice this until now.

If you have a default configuration, then everything should upgrade automatically.

Of course, server deployments typically do not run on defaults. In this case, there are significant changes of which you should be aware. Expect the apache2 postinst script to fail to restart Apache after the upgrade. You'll need to fix up your own customisations to meet the requirements in Apache 2.4 and then run sudo dpkg --configure -a and sudo apt-get -f install to recover. Be sure to back up your system before you begin.

Instead of upgrading, you may want to consider this as an opportunity to enter the new world of automated deployments. Codify your deployment, and then test and deploy a fresh instance of Apache on 14.04 instead, using virtual machines as needed. This is far less stressful than trying to upgrade an existing production system!

Upstream changes

You will need to update any custom configuration according to latest upstream configuration syntax.

See upstream's document "Upgrading to 2.4 from 2.2" for details of required configuration changes. Authorization and access control directives have changed, and will likely need adjustment. Various defaults have also changed.

Significant packaging changes

The default path to served files has changed from /var/www to /var/www/html, mainly for security reasons. See the debian-devel thread "Changing the default document root for HTTP server" for details.

The packaging has been overhauled quite significantly. /etc/apache2/conf.d/ is now /etc/apache2/conf-available/ and /etc/apache2/conf-enabled/, to match the existing sites-enabled/ and mods-enabled/ mechanisms.

Before you upgrade, I suggest that you first make sure that everything in /etc/apache2/*-available is correctly a symlink to the corresponding /etc/apache2/*-enabled. Note that all configurations in sites-enabled and conf-enabled need a .conf suffix now.

Make use of the a2enmod, a2ensite, a2enconf series tools! These help you easily manage the symlinks from *-available to *-enabled.

See Debian's apache2 packaging NEWS file for full details.

Other Notes

Debian changed the default "It works!" page into a comprehensive page explaining on where to go after an initial installation. Initially, I imported this into Ubuntu without noticing this change. Thank you to Andreas Hasenack for pointing out that the page referred to Debian and the Debian bug tracker in a misleading way, in bug 1288690. I fixed this in Ubuntu by essentially doing a s/Debian/Ubuntu/g and crediting Debian appropriately instead.

Thanks

I think the Apache 2.4 packaging is a shining example of complex packaging done well. All credit is due to Stefan Fritsch and Arno Töll, the Debian maintainers of the Apache packaging. They have done the bulk of the work involved in this update.

Getting help

As always, see Ubuntu's main page on community support options. askubuntu.com, #ubuntu-server on IRC (Freenode) and the Ubuntu Server mailing list are appropriate venues.

Mailcap, HTML and AppArmor

2014-02-09T01:34:14+00:00

Fed up of being unable to read legitimate multipart/alternative emails in mutt because of badly formatted text/plain sections, I set about fixing my mailcap to have mutt show me the text/html section rendered sensibly.

The usual way to do this is to have the mailcap call out to lynx to convert the HTML into text, and then have mutt display this instead. In my case, I chose links, since by default it renders tables better.

Trouble is, many emails are malicious. They may contain web bugs or attempt to exploit my HTML viewer in some way. How secure is links' (or lynx's) -dump mode in protecting me from this? I wasn't sure, and it seemed like unnecessary attack surface to me. AppArmor to the rescue!

With a custom AppArmor profile, I can arrange for my mailcap to run links constrained. I only need the program to run, read the input file, and write to stdout. No need for it to have network capability or access any other files. So why let it?

To arrange this, I had to edit three files.

The first file is the AppArmor profile itself. I created /etc/apparmor.d/links-local to define a non-attaching profile as follows:

#include <tunables/global>

profile links-local {
  #include <abstractions/base>
  /tmp/links-local-* r,
}

For now, I loaded the profile with the command sudo apparmor_parser -r /etc/apparmor.d/links-local. I think this will automatically load on reboot; I will find out when I reboot next.

The #include <abtractions/base> line pulls in the standard definitions that allow regular programs to run, including things like loading shared libraries. These depend on the tunables I pulled in earlier. Finally, I permit the profile to load a single file in /tmp which I will write to with a wrapper. The prefix should help to ensure that the profile cannot even load some other file in /tmp.

The next file is a wrapper I will use to set up the file in /tmp, and then run links constrained with the profile I defined above. I put this in ~/bin/links-local:

#!/bin/sh
set -e

tmpfile=`mktemp --tmpdir links-local-XXXXXXXXXX`
clean() { rm -f "$tmpfile"; }
trap clean EXIT
cp "$1" "$tmpfile"
aa-exec -p links-local -- links -dump -force-html "$tmpfile"

Note that I use mktemp for secure /tmp file creation, named using a template that my profile will permit. Then I call aa-exec to constrain links with the profile I loaded earlier.

Finally, I added this entry to ~/.mailcap:

text/html; links-local '%s'; copiousoutput; description=HTML Text

(~/bin/ is already in my PATH)

That's it. Now, from mutt, I can view a text/html section and it'll be rendered securely and appear directly in mutt's pager. I am protected in two ways. First, I hope that links is doing the right thing securely, anyway. Second, even if it is compromised in some way, AppArmor will constrain any compromise from being able to do much on my computer.

Incidentally, I noticed that links does try to do some things that are being blocked by AppArmor. I can see this in dmesg output. It mainly seems to relate to ~/.links2 but doesn't seem to stop the program from operating correctly. This is fine with me. I have no need for it to be touching my home directory when called like this.

Continuous integration of your Ubuntu-based server deployments

2013-09-19T10:45:11+00:00

Yesterday I posted a call for testing to the ubuntu-server list:

There has been much work in Debian since wheezy was released, including a major transition to Apache 2.4[1]. The maintainers used this opportunity to overhaul the packaging, which also affected dependencies such as PHP[2].

Ubuntu has picked this up. We're now well into feature freeze, and expect to release in October with Apache 2.4 and PHP 5.5.

I have done some testing. Everything seems to work, but I am aware that users do quite radically different things with their Apache configurations.

So if you intend to use Apache and/or PHP in a future release, please take some time to check that your use cases still work on our current development release (Saucy).

If you have automated deployments and testing, then please adapt, run and test your deployments against the development release. This will put you in great shape for our upcoming LTS release next year.

It would be great to identify and fix any issues before release, when bugs are much easier to fix.

I know that there are many users who use Ubuntu Server as an easy platform on which to deploy a LAMP stack. It would be a shame if we broke something and were unaware of it. Both success and failure reports are appreciated.

A generally accepted key goal for best practice production use is an automated, reproducible and automatically tested deployment. Creating such an environment is a separate topic and out of the scope of this post. But if you're already doing this, or you intend to do this in the future, then I have a feature suggestion for you to note.

How about a branch that targets the Ubuntu development release? This way your continuous integration can flag any action you need to take to make your deployment work against the next release (and thus the next LTS, if you deploy on LTS). If we make a change that makes your life difficult, then you will be able to feed this information back to us, and we will be able to take this into account.

Bugs and design changes are particularly hard to fix after release, since we have to worry about not causing regressions for existing users. During development, this is not a concern. So doing this helps you too: more bugs get fixed quicker in time for the release that you want to use.

While I'm talking about continuous integration on deployments, it's worth noting that it's worthwhile running your tests on the -proposed branch of the stable release you're using, too. If you also test your deployment with the -proposed pocket enabled, then you can detect and flag any failures on proposed stable updates before we release them. If we know that a proposed update causes a regression in your deployment, then we won't release it. To flag a proposed update as causing a regression, just tag the bug with regression-proposed and add an explanation. You can find the bug number from the affected package's changelog in /usr/share/doc/package/changelog.Debian.gz.

Better two factor ssh authentication on Ubuntu

2013-07-31T02:27:50+00:00

In the past, true two factor authentication in ssh has been something of a hack to set up. Now that OpenSSH 6.2 has now been released with full and proper support, the next release of Ubuntu (Saucy Salamander) will include it.

Quick Start

All you have to do is:

apt-get install libpam-google-authenticator.
Users who want to continue using ssh must each run the command google-authenticator. This tool interatively helps you to create the file ~/.google_authenticator, which contains a shared secret and emergency passcodes. It's a terminal application, but it does still display a QR code for quick loading of the shared secret into your two factor device (in my case, this is the Google Authenticator app on my Android smartphone).
Edit /etc/ssh/sshd_config. Set:
```
ChallengeResponseAuthentication yes
PasswordAuthentication no
AuthenticationMethods publickey,keyboard-interactive
```
In case you have changed them in the past, you should also check the following two settings (these are both defaults on Ubuntu):
```
UsePAM yes
PubkeyAuthentication yes
```
Run sudo service ssh reload to pick up your changes to /etc/ssh/sshd_config.

Edit /etc/pam.d/sshd and replace the line:

@include common-auth

with:

auth required pam_google_authenticator.so

That's it! Now ssh logins will require a key, and after your key is verified will additionally require proof that you hold your second factor device.

Your existing ssh session should not be affected by these changes. So before you continue, make sure that you can still access the machine by authenticating with your new two factor system in another session. If you're using shared connections (or aren't sure), be sure to use the -Snone option to ssh in order to make sure that you don't accidentally skip authentication by re-using an existing connection.

How It Works

Traditionally, ssh only verified one thing. That was either your password, or verification that you held your private key, or something else (GSSAPI/Kerberos and other things I won't cover here). Multiple methods were allowed, and success using any one method was considered a successful authentication.

ssh continues to be able to defer to PAM for authentication. However, authentication with an ssh key happens outside PAM, and ssh treats any one of these (key or PAM) as successful. This means that it checks for your password, or that you hold your private key, but not both. This makes it difficult to use a key together with PAM for a second factor, since ssh will just go ahead and never consult PAM for the second factor if your client proves that you hold your key.

With the new feature, AuthenticationMethods can be used to specify two methods that are both required. This means that you can require both ssh key authentication and PAM, and adjust your PAM configuration to be the second factor device in the case of the ssh service.

Why is this more secure?

It's always a good idea to reduce your attack surface, and limit authentication code only to areas designed with security in mind. Previous methods to implement this kind of support involved hacks or patches. With this new support, the only security sensitive code used is PAM and ssh itself, being used as they were designed to be used, and without any third party patches. Both were designed for security from the beginning.

Any catches?

The Google Authenticator PAM module (libpam-google-authenticator) is in universe, which means that it is only community supported for security updates. But this can be changed through the main inclusion process. If this method to implement two factor security in Ubuntu becomes popular, then I'd love to see this happen, and would be happy to drive it.

Variations

This is a clean and flexible mechanism. For example: you can leave /etc/pam.d/sshd alone, and then you'll get a system which requires both a key and the user's password. Or leave the @include common-auth line in place, insert auth required pam_google_authenticator.so before it, and you'll get a system which requires all three: a key, the password and the code from the second factor device.