WtD Prague: Localisation of open source docs
This week I’m attending Write the Docs Prague. It’s super exciting to attend a European Write the Docs conference, and to be visiting the lovely city of Prague. This post contains my notes from a talk at the conference. All credit goes to the presenter, any mistakes are my own.
Zachary Sarah Corleissen‘s talk was titled, “Found in Translation: Lessons from a Year of Open Source Localization”.
[From Sarah Maddox, author of this blog: Localisation is the process of translating content to different languages, and of adapting the content to take regional idioms and linguistic customs into account.]
Zach’s experience comes from localising the Kubernetes docs.
Advantages of localisation
Zach discussed the advantages of localising an open source project. Localisation opens doors to a wider audience. It’s a tool to drive adoption of a product. Localisation also offers the opportunity for more people to contribute new features to a product. It therefore distributes power within the open source project.
When the Kubernetes docs managers considered localising the docs, they made some assumptions that later proved to be unfounded. For instance, they thought the localisation contributors would contribute only to their language. That proved not to be the case. Localisation contributors update the English source as well as their own language source, and they also help other localisation teams get started. For example the French teams help other teams get started with localisation infrastructure, and groups of related languages get together to define grammatical structures for technology-specific terms, such as “le Pod”. Thus the localisation contributors embody the best of open source contributions.
Localised pages increase the number of page views, which is a good thing for a doc set. Zach showed us some stats from Google Analytics with some impressive numbers. Each language added around 1% page views, which represents a big number in a doc set as large as Kubernetes.
Zach said we should also consider the support ratio that the localised docs provide. For example, there are 8 localisation contributors for the Korean docs, catering for 55,187 readers. So, 8 : 55,187 is a ratio of 1 : 6,900.
These are some of the nuggets of advice Zach shared:
- Let each of the local teams determine for themselves how they create the localised content. That fits in best with open source philosophy, and the local teams know their local customs best.
- The Kubernetes project does require that the localisation teams adhere to the Kubernetes code of conduct, and that the code of conduct is one of the first docs translated.
- Bottlenecks include infrastructure, permissions, and filtering by language. You need to put solutions in place to manage these bottlenecks.
- Trust is essential to collaboration.
- To make it possible for a high level of mutual trust, make sure the boundaries are clear, and be careful with the permissions that you assign in the repository.
- Choose a site generator that has strong multi-language support. A good one is Hugo. Jekyll makes things very difficult.
- Filter the issues and pull requests by language. Zach doesn’t know of any good tools for this filtering. (If you know of any, shoot him a tweet.) Zach mentioned some possibilities from the Kubernetes world: Prow is a possibility but it’s a heavyweight tool for just localisation. Another option is zparnold’s language labeler.
- Use version control to review and approve by language. Require review and approval from a user in a different company from the submitter of the pull request.
Some cautionary tales:
- Look out for raw machine-generated content.
- Make sure the translators are not being exploited as free labour. Even if you’re not directly engaging the translators, take steps to ensure ethical content.
I learned a lot from this session. It was especially relevant as we’re starting to consider localisation of the Kubeflow docs which I work on. Thank you Zach for a very informative session.