Blog Archives

What is Git cherry picking and how do you use it?

“Cherry pick a commit”. I’ve heard the phrase often. It sounds kind of endearing, yet scarily technical at the same time. What is cherry picking and why would you want to do it? One fine day I found that I needed it, and suddenly I appreciated the what and the why. So I figured out the how. I hope this post will help you towards the same understanding.

Here’s the scenario: I’d applied a change to the latest version of the Kubeflow docs. Specifically, the change added a banner and associated logic to inform readers if they’re reading an archived version of the docs. Now I needed to copy the same banner and logic to the older (archived) versions of the docs.

More details of the scenario

The screenshot below shows the banner that I wanted to add to all the archived versions of the docs:

The way we store archived versions of the Kubeflow docs is to make a branch of the current version (that is, a branch from the master). For example, here’s v0.6 of the docs, for which the source is in this branch on GitHub. The master branch contains the current version of the docs.

I’d added the banner and accompanying logic to the master branch in this pull request (PR). Now I needed to copy the code to all the archived branches. I didn’t want to have to copy/paste all my changes into the relevant files in every affected branch.

Enter cherry picking.

Picking sweet cherries

It’s useful to know that, when you’re using GitHub, cherry picking a commit is equivalent to cherry-picking a PR. GitHub squashes all the commits in a PR into a single commit when merging the PR into the code base.

What does a cherry-picked PR look like? No different from any other PR. It’s a collection of changes that you want to make, pointing to the branch on which you want to make them. For example, PR #1550 is a cherry pick of PR #1535, with a few extra changes added after cherry picking.

Below are the steps that I figured out to prepare and do the cherry picking. One thing to note in particular is that I had to do something different if my fork of the repository already contained a copy of the branch into which I intended to cherry pick.

The first step is to check out the master branch, which contains the updates that I want to copy to the archive branches:

git checkout master

Make sure my local working directory is up to date, by pulling all content from the remote master branch. (I’m working on a fork of the Kubeflow website repository. The convention is to give the name upstream to the repository from which you forked.)

git pull upstream master

Get a log of commits made to the master branch, to find the commit that I want to cherry pick:

git log upstream/master

A commit name consists of a long string of letters and numbers. Let’s say that I need the commit named e895a107edba5e68cc0e36fa3a05a687e806cc19.

Check to see which branches I have locally:

git branch -v

Also check my fork on GitHub to see which branches I already have there.

Now I’m ready to prepare the first archived branch for cherry picking. Let’s say I start with the version 0.6 branch of the docs, named v0.6-branch. If I don’t already have the branch on my fork, I need to get a copy of the branch from the remote master, and then push that copy up to my fork, so that I have a clean slate to apply the cherry pick to. So, I pull the branch down to my local working directory then push it up to my fork. In this example, the branch name is v0.6-branch:

git checkout master
git pull upstream v0.6-branch:v0.6-branch
git checkout v0.6-branch
git push origin v0.6-branch

(I’m working on a fork of the Kubeflow website repository. By default, the name of your fork of the repository is origin.)

In the cases where I do already have the branch on my fork, I need to copy the branch from my fork down to my local working directory, check that the branch is up to date by fetching updates from the main repository, then push the branch back up to my fork. In this example, the branch name is v0.5-branch:

git fetch origin v0.5-branch:v0.5-branch
git checkout v0.5-branch
git status
git fetch upstream v0.5-branch
git push origin v0.5-branch

Now I’m ready to cherry pick the changes I need. Remember, I’m cherry picking from master into an archive branch. Let’s say I want to cherry pick into the v0.6-branch:

git checkout v0.6-branch
git cherry-pick e895a107edba5e68cc0e36fa3a05a687e806cc19

The long string of letters and numbers is the name of the commit, which I obtained earlier by running git log.

The changes are now in my local copy of the branch. I can make extra changes if I want to. (For example, in my case I needed to update some metadata that relates specifically to the branch, including an archive flag used in the logic that determines whether to display the banner on the doc pages.)

When I’m happy with the cherry-picked updates and any other changes I’ve made, I push the updated branch up to my fork:

git push origin v0.6-branch

Then I create a PR and specify the base branch to be the name of the branch into which I’m cherry picking the changes. In the case of the above example, the base branch should be “v0.6-branch”. The screenshot below shows the base option, currently pointing to “master”, on the GitHub UI when creating a PR:

Can the cherries turn sour?

In the above scenario, I used cherry picking to apply a change going backwards in time. The requirement was to apply an update to older versions of the docs, which as a rule we don’t update very often. I didn’t cherry pick from a feature branch into the master branch. There are plenty of warnings on the web about things that could go wrong when you cherry pick. I found this post by Rob Friesel helpful in providing context in a non-scary way.

How did I make the banner itself?

That’s another story. 🙂

How to remove an updated file from a PR on GitHub

This is my third post about GitHub techniques that aren’t necessarily obvious to those of us who think in non-Git terminology. This post derives from the fact that I searched the internet for “remove file from PR” and was led astray by helpful people telling me how to use Git to delete a file.

Say you changed the content of a file by mistake, and to your surprise the file has become part of your set of changes tracked by Git, and has thus become part of your pull request (PR). Now you want to remove the file from the PR. You don’t want to delete the file itself, you just want to revert the inclusion of the file in the PR. (Because, if you delete the file, then that deletion becomes part of your change set, which is not what you want at all.)

The basic principle is: Undo the changes that you made to the file, to make the file disappear from the PR.

Assumptions:

  • The instructions below assume that you have not yet merged the PR into the GitHub repository. If you have merged the PR, then you should revert the PR and create a new PR with just the updates you need.
  • The instructions below assume that the unwanted updates are in a file that already exists in the GitHub repository. If the unwanted updates are in an entirely new file that you created, then you can delete the file from your file system, then run git rm <file path> followed by git commit to remove the file from the PR.

Note: Save your updates if you need them. If you still need the updates that you made to the file,  perhaps to include in another PR, copy and save the updates somewhere outside your Git working area. You’ll lose the updates when you follow the steps described below.

Removing the updates manually

You can remove the updates manually, by copy-pasting the original contents of the file into your version of the file.

  1. Find the original version of the file, either on your fork if you forked from the main repository, or on the repository from which you cloned your local repository.
  2. Copy the entire content of the file and paste it into your copy of the file.
  3. Commit the new changes if necessary:
    • If you had not yet committed your unwanted changes, then you don’t need to do any more.
    • If you’d already committed your unwanted changes, create another commit now. The new commit wipes out your previous changes.
    • If you’d already pushed the changes up to a remote repository, push the new commit now too.

Using git to remove the updates

Case 1: You haven’t yet committed the unwanted updates to your local repository.

Run the following command if you want to undo the updates in your working directory and you haven’t yet committed the unwanted updates:

git restore <file-name-including-path>

Case 2: You’ve committed the unwanted updates to your local repository but you haven’t yet pushed the unwanted updates to your remote repository.

By default, you have a remote repository named origin, which is the repository from which you cloned your local copy of the files.

If you haven’t pushed the unwanted updates to the remote origin repository, you can retrieve the file contents from the origin repository.

The following sequence of commands assumes that the master branch in the remote repository named origin contains the unchanged version of the file. This is the case if you have not pushed your unwanted changes up to origin master.

git fetch origin master
git checkout origin/master -- <file-name-including-path>
git commit -m "Reverted content of file."

Example of the checkout command:

git checkout origin/master -- docs/my-file.md

Case 3: You’ve already pushed the unwanted updates to your remote origin repository.

In this case, you may be able to retrieve the file contents from your upstream repository.

If you’re working on a fork of a repository, the convention is to give the name upstream to the repository from which you forked. You can run the following command to see which remote repositories Git knows about:

git remote -v

Name the upstream repository now if you haven’t already named it. In the following example, replace <project> with the GitHub project name and <repository> with the repository name within the project:

git remote add upstream https://github.com/<project>/<repository>.git

Run the following commands to retrieve the file content from the upstream file:

git checkout upstream/master -- <file-name-including-path>
git commit -m "Reverted content of file."

That’s all, folks

I’ve tried out all these commands myself, and they do what I expect them to do. Let me know if they do what you wanted to achieve too!

Here are my other two posts about Git and GitHub:

Git-based technical writing workflow – notes from a TC Camp session

This week I attended TC Camp 2016 in Santa Clara. In the morning there were a few workshops, one of which was titled “Git-based Technical Communication Workflows”. A team from GitHub walked us through a workflow using Git and GitHub for technical documentation. These are my notes from the session. Any inaccuracy is my mistake, not that of the presenters.

The session covered primarily workflow on GitHub.com, and also touched on using Git on the command line. There was a good variety of Git skill levels amongst the attendees, from people who had never used Git or GitHub, to people who were comfortable using Git on the command line.

The presenters were Jamie Strusz and Jenn Leaver, both from GitHub. Stefan Stölzle was there as technical advisor, and answered plenty of questions from attendees.

Jamie started with an overview of the  traditional GitHub workflow. Then Jenn, a technical writer at GitHub, explained her workflow for technical writing in particular. Here’s the repository that they created during the session, to illustrate the workflow: TCCamp demo repo on GitHub.

Some things I gleaned about a technical writing workflow using Git and GitHub:

  • When a fix or update is required to the documentation, the technical writers start by raising an issue in the help docs repo.
  • Next, the technical writer creates a branch.
  • Jenn’s team occasionally uses “mega branches” used by several technical writers working on a feature. But usually, Jenn just works in her own branch.
  • The term “mega branch” isn’t generally known. I suggested that it’d be great to have some information in the GitHub docs about best practices for managing such a branch. Jenn liked that idea.
  • Jenn’s team uses the Atom text editor.
  • The source format for the documentation is Markdown.
  • A useful tool for converting documentation from HTML to Markdown: pandoc.
  • A question came from the floor about Asciidoc. A few people have heard of it, and Jenn’s team is talking about it too.
  • They make all the changes in the branch, and make commits often.
  • They commit changes locally, then push to the web (that is, sync to the repo on GitHub). Then they make their pull request on the web.
  • Jenn likes to make pull requests (PRs) often, so that she can get feedback quickly. Sometimes she’ll have a PR with 200 changes in it.
  • More about mega branches:
    • Often the mega branch is for development of a doc change over a longer period of time.
    • Create a mega branch, then create branches off that mega branch.
    • Then you create the pull requests off your branch, and do the reviews there.
    • Then eventually push to the mega branch.
  • Jenn does not ever work off the master branch. The team of presenters recommend against working off master, because it’s more difficult to back out changes. GitHub views the master branch as the deployable state – so, it’s production. Therefore, always make changes and do collaboration on a branch. Then merge back into the master branch when ready for pushing to production.
  • A tip: sync with the main repo on GitHub often. In particular, before starting a new branch. Otherwise you’ll have problems later when merging your changes back.
  • Branches can be very small (just fixing a typo, for example) or very large (for a new feature).
  • It’s a good idea to be very descriptive with your commit messages, so that you can figure out which commit to roll back if necessary.
  • For version control within the files, Jenn’s team uses Liquid syntax. They also use Liquid for other conditional publishing, such as selecting enterprise docs only.
  • Git LFS (Large File Storage) is available for uploading large files such as videos, Adobe files, etc. For some of those files, depending on file type, LFS can also help you see the difference between versions. LFS also helps ensure that the large files don’t clutter up your repo.
  • What about conflicts, with multiple people working on the same doc? There are several ways of handling such merge conflicts. Some people use diff tools. Others rely on the comments that Git adds to the files about the conflicts: open the file in an editor, assess the conflicts, and make a decision about which change to accept.
  • Merge conflicts are reasonably rare. They usually happen if someone forgets to sync (“pull”) before starting a branch. Another time when it may happen is if you start working on something, and then the work gets put on hold for a while. When you start up again on that project, you may have conflicts.
  • What to put in a README file on GitHub: A general overview of what’s in the repo, and any vital information such as contributor guidelines.
  • Labels are useful for things like indicating status, such as “ready for review” or “in progress”, or something to indicate the feature under development.
  • The team uses the GitHub issue tracker as a primary communication tool. Even for things like noting when a team member is out of office, or for ordering office items. These issues go into the relevant repo. For example, at GitHub the team orders office items by creating issues in a “Gear” repo (which isn’t available for public viewing).
  • The commands that Jenn uses most often in her technical writing workflow are: git branch, git checkout, git status, git push, git push.
  • All reviews take place in a pull request. The team starts with a comment to start the conversation. They @mention people to bring them into the review, such as the subject matter experts. Before pushing the change to master, someone has to approve, by adding a squirrel emoji. 🙂
  • The team uses emojis all over the place in reviews! They use them in place of words.

Jamie created a repository which Jenn used to walk through each stage of the workflow:

  • TCCamp demo repo on GitHub.
  • An issue within the repo’s issue tracker: WIP – Jenn’s doc.
  • A commit.
  • diff shows the changes to a file or files.
  • A pull request. People who are watching the repository will get a notification of the pull request. To request a review by specific colleagues, use an @mention in the comments.
  • If you want someone to focus on a specific area, give them the URL of the relevant commit within the pull request.
  • You can leave a comment on a specific line within the code, as well as comments on the pull request as a whole.

Titbits:

  • GitHub uses an icon of a squirrel (an emoji, :squirrel:) to indicate when reviewers are happy for a change to be shipped. In some forms the squirrel has a name: Heidi. The presenters didn’t know why GitHub uses a squirrel. Does anyone know?
  • We were given a limited-edition GitHub Technical Writer Octocat sticker!

GitHub Technical Writer Octocat

%d bloggers like this: