Over the past few months, I’ve been delving into analytics and feedback on the doc site that I currently manage. I’m crafting strategies as I go, and creating reports for product stakeholders to get their input too. I hope some of the strategies described in this post may be useful or at least interesting to other people who’re looking into how to use analytics.
Note: Although I work at Google, this post does not constitute any recommendations on the use of any Google product. I’m a technical writer, and I’m using analytics and feedback in the same way other tech writers do, to gain insights into the doc set that I manage. I am by no means an expert on analytics.
Let’s get some technical details out of the way first. The doc site under discussion is kubeflow.org, which hosts the documentation for an open source machine learning platform called Kubeflow. The documentation is also open source. The source for the docs lives on GitHub.
I’m using Google Analytics to see the doc usage stats. The Kubeflow doc site is fairly new. I enabled Google Analytics and the feedback widget on February 27, 2019, which means that the stats start from that date.
To gather user ratings on the doc pages, I’m using the feedback widget that’s available with the Docsy theme. The Kubeflow website uses Docsy and Hugo. If you’re interested in the details of the website tooling, take a look at the website README.
Goals for the analytics reports
The Kubeflow community and I are interested to see how people are using the docs. A high percentage of page views in a particular area can indicate a high level of interest in the related product features, or can point to an area of the product where people need more help than in other areas.
From a docs point of view, my goal is to identify the top priority docs for improvement, and to get some direction on the types of improvements we need to make. For example, if people are particularly interested in an area of the docs, and at the same time are not satisfied with the information they find there, then that area of the docs is high priority for improvement.
Overall site views
I started by looking at the number of website views from March (when Google Analytics became available on the site) to November (now). The number of views per month has more than doubled in that time, from 104,000+ to 220,000+. It’s good to know our reader base is increasing.
I looked at the pages with highest number of views across the site as a whole, and also within a few high-priority sections of the docs.
The period for these stats is two months, from September 1 – November 1. Our previous report was in July. I didn’t include August in the stats, because we did some information architecture refactoring in August. We moved many pages around. Moving pages affects the Google Analytics stats, which makes August a bad month to use for assessments in this case.
The top entry, “/”, denotes the Kubeflow website home page: https://www.kubeflow.org/. This page consistently receives the highest number of views.
As in previous reports, the second-most viewed page is the main Getting Started guide. It’s linked from the website home page. Other getting-started guides rank highly too.
Also as in previous reports, the third-most viewed page is About Kubeflow. It’s linked from the top-level menu bar with text “What is Kubeflow”.
In a change from previous reports, the Use Cases section has replaced components and notebooks in the list of 10 most-viewed sections. I should start paying attention to this section.
Other pages in the top 10 are the same as in previous reports: the docs index page and the pipelines section.
Strategies for most-viewed sections and pages
My overall strategy for the top-viewed pages is to spend time perfecting the user experience on those pages, addressing any issues, and making sure people find the information they need:
- Improve the textual and visual content of the most-viewed pages. For example, we recently ran a doc sprint in which we spent considerable time restructuring and rewriting the website home page, which is the most highly viewed page in the doc set. Feedback on the new design and content is good.
- Link from the most-viewed pages to content deeper in the site, to ensure people find all the information they need. For example, we recently rewrote the “About Kubeflow” page and added links down into relevant content on the site.
- Examine the bounce rate and time on page, to see how people are using the page.
- Examine feedback, to see whether people are finding the content useful.
Getting feedback from readers
Every page on the Kubeflow website has a feedback option. The option asks “Was this page helpful? Yes / No”.
- About Kubeflow received the most feedback, and 24 of 28 responses (85.7%) were positive. That’s an improvement from the July analysis, which showed 70% positive.
- Getting Started received the second-most feedback, and 11 of 15 responses (73%) were positive. That’s exactly the same as in July.
It’s worth noting that the number of feedback responses is very low in comparison with the number of page views. Also, people are more likely to respond with negative feedback than with positive. Even so, the feedback is useful, particularly when it’s strongly positive or negative, and if the ratios of positive to negative change after we’ve updated the content.
Deep dive into specific sections and pages
Based on the above statistics and feedback results, I examined some specific pages in greater detail.
The next screenshot shows the 10 most-viewed pages within the getting-started section. We reorganized this section significantly in August. It’s useful to see which getting-started experiences are the most often viewed, in the period since that significant refactoring.
The guide to deploying Kubeflow on an existing Kubernetes cluster (roughly equivalent to on-premises installation) has most views. The workstation installation guides come next, followed by deployment to a cloud.
The following stats are for the Getting Started page, which introduces the getting-started section:
Looking at the information for this getting-started overview page in detail:
- The page has the second-highest number of page views in the entire doc set (the top-level kubeflow.org page has highest).
- Bounce rate* has continued dropping, from 56% in April to 44.15% in July, to 39.6 percent now. That’s a great improvement. Our goal was see it drop below 40% – goal achieved!
- Time on page is 1 minute 7 seconds. That’s fine. There’s no need for people to spend longer on the page, because this is an overview page and the meaty content is in sub-pages.
- The getting-started overview page has received the second-highest amount of feedback of all pages on the site , and 11 of 15 responses (73.3%) were positive. That’s exactly the same as in July.
- Overall, the getting-started pages continue to receive low ratings.
* The bounce rate for a page is the percentage of user sessions that started and ended with that page. So, people entered the site on that page, and left without viewing any other pages. I’ve seen guidelines indicating that, as a general rule, we should avoid a bounce rate higher than 70%. If many people visit a page but leave immediately, this may indicate that the page isn’t giving them what they need, and so they leave the site. (It does depends on the type of page. The purpose of some pages is exactly to send people elsewhere.)
We need to improve the content of the getting-started section so that it better meets the readers’ expectations. One tactic I hope to follow, if I can get time from a UX research team, is to test the pages with some specific groups of users. In addition, I’ve already seen feedback from customer issues that people are looking for a single, recommended flow for getting started quickly. Currently the docs offer all the options, but don’t give much guidance on where to start.
Next up is the About Kubeflow page:
Looking at the About Kubeflow page in detail:
- It’s the third-most highly viewed page on the website.
- In previous reports, bounce rate came down from 63% in April to 60.4% in July. Bounce rate has now gone up again to 62%. We need to lower the bounce rate, as this page is a highly-viewed page and we want to draw people deeper into the site. I’m working on a new Kubeflow overview (pull request #1339). When that new page is available, I’ll link to it from the About Kubeflow page, and then re-assess the bounce rate.
- Average time on page is two minutes. That’s good for an overview page. People are engaged in the content.
- The page has received the most feedback of all pages on the site, and 24 of 28 responses (85.7%) were positive. That’s an improvement on July (70% positive).
We refactored the page in June to provide more information and links. I hope to improve the positivity still further by linking to the new Kubeflow overview mentioned above.
What about the Use Cases section, which has recently made it into the top 10 most highly viewed sections?
- It’s interesting to see a set of guides arrive in the top 10 most highly-viewed pages for the first time. This change potentially indicates that our audience is maturing and looking for more in-depth use-case focused docs. The product (Kubeflow) is relatively new, and is currently working towards a v1.0 launch in 2020. Up now, perhaps most people have been focused on getting the product up and running and trying the simple use cases provided in the getting-started section. Now maybe they need more in-depth use cases.
- The feedback ratings on this section are low. We need to make sure people get what they’re looking for.
- One action I’m considering is to adjust the information architecture to reflect what people are probably looking for. At least in the short term, I could rename the section, as it describes highly specific ways of using the product, rather than the more generic use case information that people may be looking for. Alternatively, I could move the content into another section, such as the “further setup and trouble shooting” section.
- Then, when we have more bandwidth and have had time to do more research, we should flesh out the section with more use cases. We do already have some good examples and tutorials, which we can include in this section.
Open source contributors to the docs
Moving from Google Analytics to GitHub stats for the doc repository, it’s interesting to see the fluctuation in the number of contributors to the docs. It’s not just me writing the docs!
The following events influenced the contributor numbers:
- We ran a community-wide Kubeflow doc sprint in July. Contributions increased significantly during that period, and stayed high for a while afterwards.
- Contributions picked up towards the Kubeflow v0.7 release, which happened in early November.
- In mid November, we ran a doc fixit for external tech writers at the Write the Docs conference in Australia. That fixit causes the large spike at the right-hand edge of the graph.
We need to run more doc sprints and fixits!
A product stakeholder requested information about the sources of website traffic. I haven’t yet figured out any related strategies.
In the period August 1 – November 1, 2019, close to 60% of the website traffic came from organic search. Referrals accounted for 22%.
I looked at the referrals, and found that the largest percentage (29%) of referrals come from GitHub. This is not surprising, given that the source code for the product is also on GitHub. The next-largest percentage is 8.5%, coming from a related doc site.
The top 10 search terms are primarily variations of the product name, “Kubeflow”, with one outlier: “minikf” at number 4. MiniKF is a deployment tool for Kubeflow.
More analytics tips?
If you have any analytics tips or experiences to share, I’d love to hear them. Links are welcome!
This week I’m attending STC Summit 2017, the annual conference of the Society for Technical Communication. These are my notes from one of the sessions at the conference. All credit goes to the presenter, and any mistakes are mine.
Allie Proff‘s session had the intriguing title of “My Android Dreams of Electric Cats: Are You Capturing Your User’s Emotive Analytics?”
Allie took us through a fast-paced view of analytics and emotions. She started by looking at traditional analytics: bounce rate, time on page, number of views, etc). But this measures the “what”, and not the “why”. The “why” is emotions: how the readers are feeling when they come to the docs.
She talked about emotions, why they’re important, and the science of emotions. She told the story of Phineas Gage, who had a staking pole punched all the way through his brain, and lived to tell the tale. Later studies have shown that when you damage the areas of the brain that connect your emotions to your logic, you can’t make decisions. You can list pros and cons, but not make the decision.
We actually use the emotional part of our brain to make a decision, then use our logic to justify that decision. Emotions engage more of your brain than logic: 7 areas as opposed to 2.
Significance for technical documentation: Story telling engages emotions, which makes it very powerful. User experience focuses on delight. Gamification is a specific example of engaging emotions.
Also called emolytics, or emotional analytics: The ability to measure emotions of your reader, for example through their face, voice, wearables, bio-feedback, or text. For example, Facebook infers emotion from people’s updates.
Affective software is software that can analyse a user’s emotions and provide appropriate responses. As a simple example, you might display radio buttons asking how the user is feeling, then provide textual help based on the answers. Allie gave the example of cheery text delivered when the user is filling in a tax return, if the user says they have children.
A more complex example is voice to text software, which can analyse your words and meaning as it processes the input. Beyond Verbal does voice analytics. Their main focus is health care. You talk into the app, and it tells you how you are feeling, based on your tone, with a view to telling whether someone is well or sick.
Also face detection software, which discovers a face in an image. CV Dazzle is a website where you can find out how to trick face detection software. For example, cover up the bridge of your nose between your eyes, and add asymmetrical patterns. Sunglasses dont work. Affectiva provides software (Affdex) that can quantify emotion, such as joy, surprise, anger, based on your face as you watch a video. There are SDKs available for developers to use. A cat scored 99% disdain.
There are a number of companies providing affective software. Allie’s presentation deck lists a number of them.
Allie also showed us some companies producing robots that show or teach emotions to some extent.
Thanks for a fun and informative session, Allie!
This week an analytics ninja showed me how to use Google Analytics to track the values entered into a text field. It comes down to sending a dummy page name to Google Analytics, containing the value entered into the field. Google Analytics faithfully records a “page view” for that value, which you can then see in the analytics reports in the same way that you can see any other page view. Magic.
For example, let’s say you have a search box on a documentation page, allowing readers to search a subset of the documentation. It would be nice to track the most popular search terms entered into that field, as an indication of what most readers are interested in. If people are searching for something that is already documented, you might consider restructuring the documentation to give more prominence to that topic. And how about the terms that people enter into the search box without finding a match? The unmatched terms might indicate a gap in the documentation, or even give a clue to functionality that would be a popular addition to the product itself.
It turns out that you can track input values via Google Analytics. The trick is to make a special call to Google Analytics, triggered when the input field loses focus (
<input onblur="ga('send', 'pageview', 'my-page-name?myParam=' + this.value);" />
ga call sends a customised page view to Google Analytics, passing a made-up page name that you can track separately from the page on which the input field occurs. The made-up page name is a concatenation of a string (
'my-page-name?myParam=') plus the value typed into the input field (
my-page-name can contain any value you like. It’s handy to use the title of the page on which the input box occurs, because then you can see all the page views in the same area of Google Analytics.
Similarly, the part that contains the input text can have any structure you like. For example, if the page is called “Overview” and the input field is a search box, the Google Analytics call could look like this:
<input onblur="ga('send', 'pageview', 'overview?searchText=' + this.value);" />
This blog post assumes you have already set up Google Analytics for your site. See the Google Analytics setup guide. The Google Analytics documentation on page tracking describes the syntax of the above “ga” call, part of “analytics.js”.
This week I took a look at the top 100 pages in the Confluence documentation, as reported by Google Analytics on our documentation wiki. Google Analytics data is nice-looking as well as interesting, and I thought you’d like to see the results too. So here goes.
The analysis covers the pages in the documentation space for the latest version of Confluence only – the “DOC” space on our documentation site. The analysis does not include the documentation for any other products on the site (such as JIRA or Bitbucket) and it does not include earlier versions of the Confluence documentation either.
I chose to do the analysis over a period of two months: 16 August to 16 October 2012. In the middle of that period was the release of Confluence 4.3, on 4 September. As the spike in the graph shows, something happened on 3 October too!
Inferences drawn from the Google Analytics results
I’ve drawn some conclusions which will help me in restructuring the documentation space and prioritising work on the documentation:
- Information about specific, complex topics tops the bill:
- JAVA_HOME variable (at position 1, this is the most popular page. It’s likely that many readers are not even looking for information about Confluence specifically.)
- Wiki markup (position 4)
- Working with tables (9)
- Integrating JIRA and Confluence (10)
- Installation and upgrade come next:
- Upgrading Confluence (5)
- Installation Guide (8)
- Release notes are popular:
- Confluence 4.3 release notes (7)
- “Getting started” information is popular:
- Confluence User’s Guide (6)
- Confluence 101 (23)
- Getting Started with Confluence (24)
- Dashboard (43)
- About Confluence (54)
- Both “Confluence 101” and “Getting Started with Confluence” are popular. It may benefit customers to merge these two documents. Both need a refresh. We need to better define purpose and audience of each.
- Something big happened on 3 October – possibly the launch of a “collaboration” campaign by our marketing team, heralded by this blog post: Collaboration Best Practices – 3 Reasons Interruptions Hurt Your Team’s Productivity.
The pretty picture
Summary of statistics for the whole space
|Page Views||Unique Page Views||Avg. Time on Page||Entrances||Bounce Rate||% Exit|
% of Total:
% of Total:
% of Total:
Detailed statistics for the top 100 pages
Due to problems with the fixed width theme of this blog, I’ve split the table in two. First the list of top 100 pages:
And now the figures for each page:
|Page Views||Unique Page Views||Avg. Time on Page||Entrances||Bounce Rate||% Exit|