Analytics strategies for evaluating and planning doc updates

Over the past few months, I’ve been delving into analytics and feedback on the doc site that I currently manage. I’m crafting strategies as I go, and creating reports for product stakeholders to get their input too. I hope some of the strategies described in this post may be useful or at least interesting to other people who’re looking into how to use analytics.

Note: Although I work at Google, this post does not constitute any recommendations on the use of any Google product. I’m a technical writer, and I’m using analytics and feedback in the same way other tech writers do, to gain insights into the doc set that I manage. I am by no means an expert on analytics.

Let’s get some technical details out of the way first. The doc site under discussion is, which hosts the documentation for an open source machine learning platform called Kubeflow. The documentation is also open source. The source for the docs lives on GitHub.

I’m using Google Analytics to see the doc usage stats. The Kubeflow doc site is fairly new. I enabled Google Analytics and the feedback widget on February 27, 2019, which means that the stats start from that date.

To gather user ratings on the doc pages, I’m using the feedback widget that’s available with the Docsy theme. The Kubeflow website uses Docsy and Hugo. If you’re interested in the details of the website tooling, take a look at the website README.

Goals for the analytics reports

The Kubeflow community and I are interested to see how people are using the docs. A high percentage of page views in a particular area can indicate a high level of interest in the related product features, or can point to an area of the product where people need more help than in other areas.

From a docs point of view, my goal is to identify the top priority docs for improvement, and to get some direction on the types of improvements we need to make. For example, if people are particularly interested in an area of the docs, and at the same time are not satisfied with the information they find there, then that area of the docs is high priority for improvement.

Overall site views

I started by looking at the number of website views from March (when Google Analytics became available on the site) to November (now). The number of views per month has more than doubled in that time, from 104,000+ to 220,000+. It’s good to know our reader base is increasing.

Total website views

Most-viewed pages

I looked at the pages with highest number of views across the site as a whole, and also within a few high-priority sections of the docs.

The period for these stats is two months, from September 1 – November 1. Our previous report was in July. I didn’t include August in the stats, because we did some information architecture refactoring in August. We moved many pages around. Moving pages affects the Google Analytics stats, which makes August a bad month to use for assessments in this case.

Most-viewed pages

The top entry, “/”, denotes the Kubeflow website home page: This page consistently receives the highest number of views.

As in previous reports, the second-most viewed page is the main Getting Started guide. It’s linked from the website home page. Other getting-started guides rank highly too.

Also as in previous reports, the third-most viewed page is About Kubeflow. It’s linked from the top-level menu bar with text “What is Kubeflow”.

In a change from previous reports, the Use Cases section has replaced components and notebooks in the list of 10 most-viewed sections. I should start paying attention to this section.

Other pages in the top 10 are the same as in previous reports: the docs index page and the pipelines section.

Strategies for most-viewed sections and pages

My overall strategy for the top-viewed pages is to spend time perfecting the user experience on those pages, addressing any issues, and making sure people find the information they need:

  • Improve the textual and visual content of the most-viewed pages. For example, we recently ran a doc sprint in which we spent considerable time restructuring and rewriting the website home page, which is the most highly viewed page in the doc set. Feedback on the new design and content is good.
  • Link from the most-viewed pages to content deeper in the site, to ensure people find all the information they need. For example, we recently rewrote the “About Kubeflow” page and added links down into relevant content on the site.
  • Examine the bounce rate and time on page, to see how people are using the page.
  • Examine feedback, to see whether people are finding the content useful.

Getting feedback from readers

Every page on the Kubeflow website has a feedback option. The option asks “Was this page helpful? Yes / No”.

  • About Kubeflow received the most feedback, and 24 of 28 responses (85.7%) were positive. That’s an improvement from the July analysis, which showed 70% positive.
  • Getting Started received the second-most feedback, and 11 of 15 responses (73%) were positive. That’s exactly the same as in July.

It’s worth noting that the number of feedback responses is very low in comparison with the number of page views. Also, people are more likely to respond with negative feedback than with positive. Even so, the feedback is useful, particularly when it’s strongly positive or negative, and if the ratios of positive to negative change after we’ve updated the content.

Deep dive into specific sections and pages

Based on the above statistics and feedback results, I examined some specific pages in greater detail.

The next screenshot shows the 10 most-viewed pages within the getting-started section. We reorganized this section significantly in August. It’s useful to see which getting-started experiences are the most often viewed, in the period since that significant refactoring.

The guide to deploying Kubeflow on an existing Kubernetes cluster (roughly equivalent to on-premises installation) has most views. The workstation installation guides come next, followed by deployment to a cloud.

The following stats are for the Getting Started page, which introduces the getting-started section:

Looking at the information for this getting-started overview page in detail:

  • The page has the second-highest number of page views in the entire doc set (the top-level page has highest).
  • Bounce rate* has continued dropping, from 56% in April to 44.15% in July, to 39.6 percent now. That’s a great improvement. Our goal was see it drop below 40% – goal achieved!
  • Time on page is 1 minute 7 seconds. That’s fine. There’s no need for people to spend longer on the page, because this is an overview page and the meaty content is in sub-pages.
  • The getting-started overview page has received the second-highest amount of feedback of all pages on the site , and 11 of 15 responses (73.3%) were positive. That’s exactly the same as in July.
  • Overall, the getting-started pages continue to receive low ratings.

* The bounce rate for a page is the percentage of user sessions that started and ended with that page. So, people entered the site on that page, and left without viewing any other pages. I’ve seen guidelines indicating that, as a general rule, we should avoid a bounce rate higher than 70%. If many people visit a page but leave immediately, this may indicate that the page isn’t giving them what they need, and so they leave the site. (It does depends on the type of page. The purpose of some pages is exactly to send people elsewhere.)

We need to improve the content of the getting-started section so that it better meets the readers’ expectations. One tactic I hope to follow, if I can get time from a UX research team, is to test the pages with some specific groups of users. In addition, I’ve already seen feedback from customer issues that people are looking for a single, recommended flow for getting started quickly. Currently the docs offer all the options, but don’t give much guidance on where to start.

Next up is the About Kubeflow page:

Looking at the About Kubeflow page in detail:

  • It’s the third-most highly viewed page on the website.
  • In previous reports, bounce rate came down from 63% in April to 60.4% in July. Bounce rate has now gone up again to 62%. We need to lower the bounce rate, as this page is a highly-viewed page and we want to draw people deeper into the site. I’m working on a new Kubeflow overview (pull request #1339). When that new page is available, I’ll link to it from the About Kubeflow page, and then re-assess the bounce rate.
  • Average time on page is two minutes. That’s good for an overview page. People are engaged in the content.
  • The page has received the most feedback of all pages on the site, and 24 of 28 responses (85.7%) were positive. That’s an improvement on July (70% positive).
    We refactored the page in June to provide more information and links. I hope to improve the positivity still further by linking to the new Kubeflow overview mentioned above.

What about the Use Cases section, which has recently made it into the top 10 most highly viewed sections?

  • It’s interesting to see a set of guides arrive in the top 10 most highly-viewed pages for the first time. This change potentially indicates that our audience is maturing and looking for more in-depth use-case focused docs. The product (Kubeflow) is relatively new, and is currently working towards a v1.0 launch in 2020. Up now, perhaps most people have been focused on getting the product up and running and trying the simple use cases provided in the getting-started section. Now maybe they need more in-depth use cases.
  • The feedback ratings on this section are low. We need to make sure people get what they’re looking for.
  • One action I’m considering is to adjust the information architecture to reflect what people are probably looking for. At least in the short term, I could rename the section, as it describes highly specific ways of using the product, rather than the more generic use case information that people may be looking for. Alternatively, I could move the content into another section, such as the “further setup and trouble shooting” section.
  • Then, when we have more bandwidth and have had time to do more research, we should flesh out the section with more use cases. We do already have some good examples and tutorials, which we can include in this section.

Open source contributors to the docs

Moving from Google Analytics to GitHub stats for the doc repository, it’s interesting to see the fluctuation in the number of contributors to the docs. It’s not just me writing the docs!

The following events influenced the contributor numbers:

  • We ran a community-wide Kubeflow doc sprint in July. Contributions increased significantly during that period, and stayed high for a while afterwards.
  • Contributions picked up towards the Kubeflow v0.7 release, which happened in early November.
  • In mid November, we ran a doc fixit for external tech writers at the Write the Docs conference in Australia. That fixit causes the large spike at the right-hand edge of the graph.

We need to run more doc sprints and fixits!

Traffic sources

A product stakeholder requested information about the sources of website traffic. I haven’t yet figured out any related strategies.

In the period August 1 – November 1, 2019, close to 60% of the website traffic came from organic search. Referrals accounted for 22%.

Traffic sources

I looked at the referrals, and found that the largest percentage (29%) of referrals come from GitHub. This is not surprising, given that the source code for the product is also on GitHub. The next-largest percentage is 8.5%, coming from a related doc site.

The top 10 search terms are primarily variations of the product name, “Kubeflow”, with one outlier: “minikf” at number 4. MiniKF is a deployment tool for Kubeflow.

Search terms

More analytics tips?

If you have any analytics tips or experiences to share, I’d love to hear them. Links are welcome!

About Sarah Maddox

Technical writer, author and blogger in Sydney

Posted on 24 November 2019, in open source, technical writing and tagged , , , , . Bookmark the permalink. 3 Comments.

  1. Hi, Sarah. This is great! I’m reading and rereading to wring out all of the insights. I’ve struggled for a long time with applying analytics to technical documentation.

    For example, bounce rate – which you mention. If a customer comes to one of my pages, finds the exact bit of information they need, and goes away happy, then I consider it a success. If they have to look at two pages before finding what they need, that’s less of a success. Yet the first one increases my bounce rate, and the second one lowers it. SMH.

    It seems like the most common analytics tools, including Google Analytics, are oriented to marketing sites and not to tech docs. By sharing your experience, you’re helping me find the value in analytics that I’ve been missing.

    • Hallo Larry

      Thanks for a very thoughtful comment. Yeah, bounce rate is an interesting one. Could be good or bad, depending on the purpose of the page. Also, it’s easy to confuse bounce rate with exit rate. It was a while before I realised that bounce rate specifically reflects the pages which were both the point of entry and the point of exit, with no visits to other pages in between, whereas exit rate reflects just the point of exit.

      To your point, I guess we should compare the bounce rate with the time on page. If people spend a long time on the page before bouncing out, then that’s good.

      Ha ha, I had to look up “SMH” (shaking my head) because to me it means Sydney Morning Herald. 😀


  1. Pingback: The second Kubeflow doc sprint is happening in February | ffeathers

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: