What is Git cherry picking and how do you use it?

“Cherry pick a commit”. I’ve heard the phrase often. It sounds kind of endearing, yet scarily technical at the same time. What is cherry picking and why would you want to do it? One fine day I found that I needed it, and suddenly I appreciated the what and the why. So I figured out the how. I hope this post will help you towards the same understanding.

Here’s the scenario: I’d applied a change to the latest version of the Kubeflow docs. Specifically, the change added a banner and associated logic to inform readers if they’re reading an archived version of the docs. Now I needed to copy the same banner and logic to the older (archived) versions of the docs.

More details of the scenario

The screenshot below shows the banner that I wanted to add to all the archived versions of the docs:

The way we store archived versions of the Kubeflow docs is to make a branch of the current version (that is, a branch from the master). For example, here’s v0.6 of the docs, for which the source is in this branch on GitHub. The master branch contains the current version of the docs.

I’d added the banner and accompanying logic to the master branch in this pull request (PR). Now I needed to copy the code to all the archived branches. I didn’t want to have to copy/paste all my changes into the relevant files in every affected branch.

Enter cherry picking.

Picking sweet cherries

It’s useful to know that, when you’re using GitHub, cherry picking a commit is equivalent to cherry-picking a PR. GitHub squashes all the commits in a PR into a single commit when merging the PR into the code base.

What does a cherry-picked PR look like? No different from any other PR. It’s a collection of changes that you want to make, pointing to the branch on which you want to make them. For example, PR #1550 is a cherry pick of PR #1535, with a few extra changes added after cherry picking.

Below are the steps that I figured out to prepare and do the cherry picking. One thing to note in particular is that I had to do something different if my fork of the repository already contained a copy of the branch into which I intended to cherry pick.

The first step is to check out the master branch, which contains the updates that I want to copy to the archive branches:

git checkout master

Make sure my local working directory is up to date, by pulling all content from the remote master branch. (I’m working on a fork of the Kubeflow website repository. The convention is to give the name upstream to the repository from which you forked.)

git pull upstream master

Get a log of commits made to the master branch, to find the commit that I want to cherry pick:

git log upstream/master

A commit name consists of a long string of letters and numbers. Let’s say that I need the commit named e895a107edba5e68cc0e36fa3a05a687e806cc19.

Check to see which branches I have locally:

git branch -v

Also check my fork on GitHub to see which branches I already have there.

Now I’m ready to prepare the first archived branch for cherry picking. Let’s say I start with the version 0.6 branch of the docs, named v0.6-branch. If I don’t already have the branch on my fork, I need to get a copy of the branch from the remote master, and then push that copy up to my fork, so that I have a clean slate to apply the cherry pick to. So, I pull the branch down to my local working directory then push it up to my fork. In this example, the branch name is v0.6-branch:

git checkout master
git pull upstream v0.6-branch:v0.6-branch
git checkout v0.6-branch
git push origin v0.6-branch

(I’m working on a fork of the Kubeflow website repository. By default, the name of your fork of the repository is origin.)

In the cases where I do already have the branch on my fork, I need to copy the branch from my fork down to my local working directory, check that the branch is up to date by fetching updates from the main repository, then push the branch back up to my fork. In this example, the branch name is v0.5-branch:

git fetch origin v0.5-branch:v0.5-branch
git checkout v0.5-branch
git status
git fetch upstream v0.5-branch
git push origin v0.5-branch

Now I’m ready to cherry pick the changes I need. Remember, I’m cherry picking from master into an archive branch. Let’s say I want to cherry pick into the v0.6-branch:

git checkout v0.6-branch
git cherry-pick e895a107edba5e68cc0e36fa3a05a687e806cc19

The long string of letters and numbers is the name of the commit, which I obtained earlier by running git log.

The changes are now in my local copy of the branch. I can make extra changes if I want to. (For example, in my case I needed to update some metadata that relates specifically to the branch, including an archive flag used in the logic that determines whether to display the banner on the doc pages.)

When I’m happy with the cherry-picked updates and any other changes I’ve made, I push the updated branch up to my fork:

git push origin v0.6-branch

Then I create a PR and specify the base branch to be the name of the branch into which I’m cherry picking the changes. In the case of the above example, the base branch should be “v0.6-branch”. The screenshot below shows the base option, currently pointing to “master”, on the GitHub UI when creating a PR:

Can the cherries turn sour?

In the above scenario, I used cherry picking to apply a change going backwards in time. The requirement was to apply an update to older versions of the docs, which as a rule we don’t update very often. I didn’t cherry pick from a feature branch into the master branch. There are plenty of warnings on the web about things that could go wrong when you cherry pick. I found this post by Rob Friesel helpful in providing context in a non-scary way.

How did I make the banner itself?

That’s another story. 🙂

How to add a banner to website pages using Hugo

A while back, I needed to display a banner on every page of a documentation website. Furthermore, I wanted the banner to appear only under specific conditions. We use Hugo as the static site generator for the website. Here’s what I figured out, using Hugo templates.

I wanted to add a banner to the archived versions of the Kubeflow documentation, such as v0.7 and v0.6, letting readers know that they’re viewing an unmaintained version and pointing them to the latest docs.

Here’s an example of such a banner:

In Kubeflow’s case, the purpose of the banner is to catch people who enter the archived documentation from a web search and make sure they realise that a more up-to-date set of docs is available.

Summary: Adding a banner to a page with Hugo templating

In essence, you need to do the following:

  • Figure out which Hugo layout file is responsible for the base layout of your pages. In the case of the Kubeflow docs, the responsible layout file is at layouts/docs/baseof.html. You can see an example of the layout file in the Docsy theme: layouts/docs/baseof.html. (Kubeflow uses Docsy on top of Hugo.)
  • Add the code for your banner to the layout file. Or, even better, create a partial layout, often called just a partial. A partial is a snippet of code written in Hugo’s templating language. Put the code for your banner into the partial, then call the partial from the base layout. For the Kubeflow version banner, the code sits in a Hugo partial named version-banner.html.

There’s an explanation of the code later in this post.

Making the banner’s appearance conditional

In order to offer docs for multiple versions of Kubeflow, we have a number of websites, one for each major version of the product. The overall configuration of the websites for the different versions is the same. For example, we have the current Kubeflow documentation, and archived versions 0.7 and 0.6.

I wanted to make sure we had to do only minimal extra configuration to cause the banner to appear on the archived doc sets. I didn’t want to have to edit the layouts each time we create an archive. A good solution seemed to be a parameter that we can set in the site’s configuration file.

How it works – first the configuration settings

The parameter that controls the appearance/non-appearance of the banner is named archived_version. If the parameter is set to true, the banner appears on the website. The parameter value is false for the main doc site, kubeflow.org. When we create an archived version of the docs, we set the parameter to true.

The parameter is defined in the site configuration file, config.toml. The configuration file also contains a version number and the URL for the latest version of the docs. Both these fields are used in the banner text.

Here’s a snippet showing the relevant part of the configuration file:

# The major.minor version tag for the version of the docs represented in this
# branch of the repository. Used in the "version-banner" partial to display a
# version number for this doc set.
version = "v1.0"

# Flag used in the "version-banner" partial to decide whether to display a
# banner on every page indicating that this is an archived version of the docs.
archived_version = false

# A link to latest version of the docs. Used in the "version-banner" partial to
# point people to the main doc site.
url_latest_version = "https://kubeflow.org/docs/"

How it works – the content and logic

For the Kubeflow website, a Hugo layout file is responsible for the base layout of the documentation pages:layouts/docs/baseof.html. You can see an example of the layout file in the Docsy theme: layouts/docs/baseof.html. (Kubeflow uses Docsy on top of Hugo.)

I inserted the following line into the base layout:

{{ partial "version-banner.html" . }}

The above line calls a Hugo partial named version-banner.html. The partial contains the banner content and logic. (I’ve contributed the logic for the version banner to Docsy, which is why the URL leads to the Docsy repository.)

Below is a screenshot of the content of the partial. Unfortunately I can’t paste the code, because WordPress strips out all HTML:

You can copy the code from the partial: version-banner.html.

The code does the following:

  • Check the value of the archived_version parameter. If true, continue to the next step.
  • Get the value of the url_latest_version parameter, for use in the banner content when giving readers a link to the latest version of the docs.
  • Get the value of the version parameter, for use in the banner content when showing readers the version of the website that they’re viewing.
  • Create an HTML div containing the styling and content for the banner.

Here’s the screenshot of the banner again, so that you can compare it to the HTML div:

Docs or it didn’t happen

I created a user guide for other people who want to use the banner logic on their sites using the Docsy theme: How to display a banner on archived doc sites.

I also updated the release procedures for the Kubeflow engineering/docs team, explaining that we must set the archived_version parameter to true when archiving a website version.

In closing

I hope this post is useful to you, if you find that you need to add a banner to a website using Hugo templating!

How to remove an updated file from a PR on GitHub

This is my third post about GitHub techniques that aren’t necessarily obvious to those of us who think in non-Git terminology. This post derives from the fact that I searched the internet for “remove file from PR” and was led astray by helpful people telling me how to use Git to delete a file.

Say you changed the content of a file by mistake, and to your surprise the file has become part of your set of changes tracked by Git, and has thus become part of your pull request (PR). Now you want to remove the file from the PR. You don’t want to delete the file itself, you just want to revert the inclusion of the file in the PR. (Because, if you delete the file, then that deletion becomes part of your change set, which is not what you want at all.)

The basic principle is: Undo the changes that you made to the file, to make the file disappear from the PR.

Assumptions:

  • The instructions below assume that you have not yet merged the PR into the GitHub repository. If you have merged the PR, then you should revert the PR and create a new PR with just the updates you need.
  • The instructions below assume that the unwanted updates are in a file that already exists in the GitHub repository. If the unwanted updates are in an entirely new file that you created, then you can delete the file from your file system, then run git rm <file path> followed by git commit to remove the file from the PR.

Note: Save your updates if you need them. If you still need the updates that you made to the file,  perhaps to include in another PR, copy and save the updates somewhere outside your Git working area. You’ll lose the updates when you follow the steps described below.

Removing the updates manually

You can remove the updates manually, by copy-pasting the original contents of the file into your version of the file.

  1. Find the original version of the file, either on your fork if you forked from the main repository, or on the repository from which you cloned your local repository.
  2. Copy the entire content of the file and paste it into your copy of the file.
  3. Commit the new changes if necessary:
    • If you had not yet committed your unwanted changes, then you don’t need to do any more.
    • If you’d already committed your unwanted changes, create another commit now. The new commit wipes out your previous changes.
    • If you’d already pushed the changes up to a remote repository, push the new commit now too.

Using git to remove the updates

Case 1: You haven’t yet committed the unwanted updates to your local repository.

Run the following command if you want to undo the updates in your working directory and you haven’t yet committed the unwanted updates:

git restore <file-name-including-path>

Case 2: You’ve committed the unwanted updates to your local repository but you haven’t yet pushed the unwanted updates to your remote repository.

By default, you have a remote repository named origin, which is the repository from which you cloned your local copy of the files.

If you haven’t pushed the unwanted updates to the remote origin repository, you can retrieve the file contents from the origin repository.

The following sequence of commands assumes that the master branch in the remote repository named origin contains the unchanged version of the file. This is the case if you have not pushed your unwanted changes up to origin master.

git fetch origin master
git checkout origin/master -- <file-name-including-path>
git commit -m "Reverted content of file."

Example of the checkout command:

git checkout origin/master -- docs/my-file.md

Case 3: You’ve already pushed the unwanted updates to your remote origin repository.

In this case, you may be able to retrieve the file contents from your upstream repository.

If you’re working on a fork of a repository, the convention is to give the name upstream to the repository from which you forked. You can run the following command to see which remote repositories Git knows about:

git remote -v

Name the upstream repository now if you haven’t already named it. In the following example, replace <project> with the GitHub project name and <repository> with the repository name within the project:

git remote add upstream https://github.com/<project>/<repository>.git

Run the following commands to retrieve the file content from the upstream file:

git checkout upstream/master -- <file-name-including-path>
git commit -m "Reverted content of file."

That’s all, folks

I’ve tried out all these commands myself, and they do what I expect them to do. Let me know if they do what you wanted to achieve too!

Here are my other two posts about Git and GitHub:

Countdown to the first Write the Docs Sydney meetup of 2020

A new year. A new decade. New friends and old. New skills. New ideas. It’s all happening in Sydney at the first 2020 meetup of Write the Docs Australia’s Sydney group. The date is nearly upon us!

The meetup is open to technical writers in all disciplines, and to engineers, editors, product managers, and others I’ve unwittingly left out. If you think technical documentation is a good thing, you’re in. If you think technical documentation is not a good thing, you’re in too. Just be prepared to explain your reasoning. 🙂

  • Date and time: Tuesday 25 February at 6pm to 7.30pm.
  • Location: Google, 48 Pirrama Road, Pyrmont. (Map.)
  • Registration: Sign up to the meetup and then click Attend, or email me. Details are on the Meetup page.

Pizza and conversation are a given. We also have two talented and interesting speakers lined up:

David Parker (Deaf Dave) will talk about Accessible Media for the Deaf and Hard of Hearing.

From David:

My short presentation will be about how accessible media is for the deaf and hard of hearing communities in Australia and worldwide. Digital examples will be given. I will talk about myself as a Deaf person and how I utilise technology to do my job effectively.

Giles Gaskell will talk about Using Antora to build public and internal docs sites.

From Giles:

At Squiz, we’ve built a powerful documentation system with Antora. This presentation provides an overview of how we’ve used Antora and other tools to produce both our public documentation site, and an internal site with additional docs for review and Squiz staff only.

I’m looking forward to this gathering of doc-aware people! I hope you can make it.

How to update your PR on GitHub

So, you’ve created a pull request (PR) on GitHub, and you’ve received some review comments. Now you need to update the PR to address the comments. How do you do that? This post shows you how to update the files in an existing PR, using either the command line or the GitHub UI.

When you ask how to update the files in an existing PR on GitHub, the answer is usually something like this:

Just push the updates to the same branch and GitHub takes care of it automagically.

Well, it’s true. GitHub does take care of it, and it is fairly magical. But how do you “just push the updates to the same branch”?

It took me a while to figure that out, the first time I needed to update the files in a PR. You can do it using command-line Git or using the GitHub UI. I prefer the command line, as it’s usually cleaner and simpler in the end, particularly if you need to update more than one file.

First, of course, you need to create the initial PR. If you haven’t yet created a PR, you can follow this guide to working on GitHub, which I created for the Kubeflow open source doc set that I’m currently working on.

I’ll assume that you already have a PR and that you need to update one or more files in that PR. Below are instructions on how to use the command line or use the GitHub UI.

Using the command line to update the files in a PR

Prerequisites:

  • You need Git on your local computer. See the Git installation guide.
  • If you don’t already have a Git repository on your local computer with a branch for your PR, create the local Git repository and branch now. See my blog post on how to download a PR from GitHub to your local computer. This process is necessary if you’ve been working online using the GitHub UI up to now, or if you’ve used a different computer to work on this PR up to now.

Follow these steps to update the PR:

  1. Open a command window on your local computer.
  2. Go to the directory containing the GitHub repository on your local computer. The command below assumes that you’ve cloned the repository into a directory named git-repositories/awesome-repo:
    cd git-repositories/awesome-repo
  3. Tell Git to check out the relevant branch. In this example, replace your-branch-name with the name of your branch:
    git checkout your-branch-name
    
  4. Run git status to check the status of your local files. Make sure the response shows that you’re in the branch that you expect to be in, and that there are no uncommitted changes in your local repository. (It’s a good idea to run git status regularly, just to check that things are as you expect.) If there are uncommitted changes in your local repository, you should take a look at them and consider committing them and pushing them up to GitHub before going any further. The result from git status should look like this:
    On branch your-branch-name
    nothing to commit, working tree clean
    
  5. Run the following command to pull down the most recent changes from your branch on GitHub to your local repository. It’s fine to run this command even if there are no changes to pull:
    git pull origin your-branch-name
    
  6. Update the files that you need to change. You can add, edit, and delete files using your favourite text editor. I’m currently using Visual Studio Code.
  7. Run git status to check the status of your local files. The response should tell you which files you still need to commit for Git tracking. For example, the following status response shows one changed file, named your-updated-doc.md, that you still need to commit:
    On branch your-branch-name
    Changes not staged for commit:
      (use "git add ..." to update what will be committed)
      (use "git restore ..." to discard changes in working directory)
    	modified:   your-updated-doc.md
    
    no changes added to commit (use "git add" and/or "git commit -a")
    

    Another example: The following status response shows one deleted file and one added file, both of which you need to commit for Git tracking:

    On branch your-branch-name
    Changes not staged for commit:
      (use "git add/rm ..." to update what will be committed)
      (use "git restore ..." to discard changes in working directory)
    	deleted:    your-old-doc.md
    
    Untracked files:
      (use "git add ..." to include in what will be committed)
    	your-new-doc.md
    
    no changes added to commit (use "git add" and/or "git commit -a")
    
  8. Run the following commands to commit the updated files to your local Git repository. Here’s an example commit command to tell Git about one or more updated files:
    git commit -am "Fixed some doc errors."
    

    Here’s another example, which tells Git that you’ve added a file named your-new-doc.md, and then commits all changes include the added file:

    git add your-new-doc.md
    git commit -am "Added a shiny new doc."
    

    And another example, which tells Git that you’ve deleted a file named your-old-doc.md, and then commits all changes include the file deletion:

    git rm your-old-doc.md
    git commit -am "Deleted an obsolete doc."
    
  9. Push the updates from your local branch to the corresponding branch on GitHub:
    git push origin your-branch-name
    

That’s it. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. For example, the following screenshot shows two commits on a PR. One commit has the description “Updated for review comments”, and the other has the description “Added instruction to use different deployment name”:

When you view the files in the PR, you’ll see your changes incorporated into the latest version of each file. When the repository maintainers approve your PR, the changes will be merged into the repository.

Some notes:

  • The word origin refers to the remote repository on GitHub from which you cloned your local repository when you first started working on it.
  • You can use the following command to see which remote repositories Git knows about:
git remote -v

 

Using the GitHub UI to update the files in a PR

There are two ways to update a file in an existing PR using the GitHub UI:

  • Access the file from the PR (described in this section).
  • Access the file from your fork of the repository that you’re updating (described in the section below this one).

The first way is the most direct route to the file that needs updating, once you know how to do it.

To access a file from a PR:

  1. Open a browser window.
  2. Open your PR in GitHub, and click the Files changed tab at the top of the PR:
  3. Click the three dots on the right-hand side of the window next to the name of the file that you want to edit, then click Edit file in the panel that opens up:
  4. Make your changes in the editing interface that opens up.
  5. When you’re ready, scroll down to the bottom of the editing interface. Enter a short description of the updates, and a longer description if necessary. Make sure the option is selected to Commit directly to the your-branch-name branch:
  6. Click Commit changes.

That’s it. You’ve now updated the file in the PR. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. When the repository maintainers approve your PR, the changes will be merged into the repository.

An interesting point is that other people can also add commits into your PR, provided they have authority to do so. Sometimes the repository maintainers may make an update to your PR before merging it, if it’s simpler to make the update than to explain it in a PR comment. (When you create the PR, you can grant or deny permission for the maintainers to make changes to the PR.) For example, the following screenshot shows a commit that I made into someone else’s PR, before merging the PR into the main repo. The commit has the description “Added specific link to Chainer page”:

 

Accessing your PR from your fork of the repository in the GitHub UI

The above section shows you how to access a file directly from a PR in order to update the file. As an alternative, you can go to your fork of the repository that you’re updating, and navigate through the files in the repository to find the one you want to update.

Usually, you work on a fork of the main repository on GitHub when you create a PR. If you use the GitHub UI to create the PR, GitHub creates the fork automatically for you, the first time you create a PR for a particular repository. The reason for creating the fork is that you probably don’t have update rights on the main repository. In the instructions below, I’m assuming that you have a fork of the repository.

To access a file from your fork of the repository:

  1. Open a browser window.
  2. Find your fork of the repository on GitHub. For example, if the repository name is “awesome-repo”, then the fork should be at this URL: https://github.com/your-github-username/awesome-repo.You can find your fork by going to the list of your repositories on GitHub. Click the dropdown arrow next to your profile image, then click Your repositories:
    You should see a list of repositories something like this:
  3. Click the name of the forked repository that you want to update. You should see the details of the forked repository, including the files within the repository.For example, I clicked website to open my fork of the Kubeflow website repository. The Code tab shows the list of files within the repository:
  4. Change to the branch that contains your PR within the repository fork. By default, the repository fork opens on the master branch. To change the branch, first click the Branch option:
    Then in the dropdown list that appears, select the branch that contains your PR. The branch name may be something like “your-username-patch-1”, or it may be something meaningful that you entered when you created the PR. For example, I need to select the v1-discussion branch to find my PR:
    Now you should see the same repository fork, with roughly the same files as you saw in the master branch. But the branch which you’ve selected contains all the changes you’ve made in your PR.
  5. Click a file or directory name to navigate through the files within your fork of the repository. For example, I need to click the Content directory to find the file I’m interested in:
  6. When you find your file, click the edit icon to edit the file as usual:
  7. When you’re ready, scroll down to the bottom of the editing interface. Enter a short description of the updates, and a longer description if necessary. Make sure the option is selected to Commit directly to the your-branch-name branch:
  8. Click Commit changes.

That’s it. You’ve now updated the file in the PR. When you look at your PR on GitHub, you’ll see a new commit listed among the comments on the PR. When the repository maintainers approve your PR, the changes will be merged into the repository.

%d bloggers like this: