Blog Archives

WtD Prague: Localisation of open source docs

This week I’m attending Write the Docs Prague. It’s super exciting to attend a European Write the Docs conference, and to be visiting the lovely city of Prague. This post contains my notes from a talk at the conference. All credit goes to the presenter, any mistakes are my own.

Zachary Sarah Corleissen‘s talk was titled, “Found in Translation: Lessons from a Year of Open Source Localization”.

[From Sarah Maddox, author of this blog: Localisation is the process of translating content to different languages, and of adapting the content to take regional idioms and linguistic customs into account.]

Zach’s experience comes from localising the Kubernetes docs.

Advantages of localisation

Zach discussed the advantages of localising an open source project. Localisation opens doors to a wider audience. It’s a tool to drive adoption of a product. Localisation also offers the opportunity for more people to contribute new features to a product. It therefore distributes power within the open source project.

When the Kubernetes docs managers considered localising the docs, they made some assumptions that later proved to be unfounded. For instance, they thought the localisation contributors would contribute only to their language. That proved not to be the case. Localisation contributors update the English source as well as their own language source, and they also help other localisation teams get started. For example the French teams help other teams get started with localisation infrastructure, and groups of related languages get together to define grammatical structures for technology-specific terms, such as “le Pod”. Thus the localisation contributors embody the best of open source contributions.

Localised pages increase the number of page views, which is a good thing for a doc set. Zach showed us some stats from Google Analytics with some impressive numbers. Each language added around 1% page views, which represents a big number in a doc set as large as Kubernetes.

Zach said we should also consider the support ratio that the localised docs provide. For example, there are 8 localisation contributors for the Korean docs, catering for 55,187 readers. So, 8 : 55,187 is a ratio of 1 : 6,900.

Advice

These are some of the nuggets of advice Zach shared:

  • Let each of the local teams determine for themselves how they create the localised content. That fits in best with open source philosophy, and the local teams know their local customs best.
  • The Kubernetes project does require that the localisation teams adhere to the Kubernetes code of conduct, and that the code of conduct is one of the first docs translated.
  • Bottlenecks include infrastructure, permissions, and filtering by language. You need to put solutions in place to manage these bottlenecks.
  • Trust is essential to collaboration.
  • To make it possible for a high level of mutual trust, make sure the boundaries are clear, and be careful with the permissions that you assign in the repository.
  • Choose a site generator that has strong multi-language support. A good one is Hugo. Jekyll makes things very difficult.
  • Filter the issues and pull requests by language. Zach doesn’t know of any good tools for this filtering. (If you know of any, shoot him a tweet.) Zach mentioned some possibilities from the Kubernetes world: Prow is a possibility but it’s a heavyweight tool for just localisation. Another option is zparnold’s language labeler.
  • Use version control to review and approve by language. Require review and approval from a user in a different company from the submitter of the pull request.

Some cautionary tales:

  • Look out for raw machine-generated content.
  • Make sure the translators are not being exploited as free labour. Even if you’re not directly engaging the translators, take steps to ensure ethical content.

Thanks

I learned a lot from this session. It was especially relevant as we’re starting to consider localisation of the Kubeflow docs which I work on. Thank you Zach for a very informative session.

Intelligent content at stc17

This week I’m attending STC Summit 2017, the annual conference of the Society for Technical Communication. These are my notes from one of the sessions at the conference. All credit goes to the presenter, and any mistakes are mine.

Val Swisher presented a session called “The Holy Trifecta of Intelligent Technical Content”. The trifecta comprises structured intelligent technical content, terminology management, and translation memory. With these three, technical writers can efficiently produce content for multiple channels, for an international audience.

Val explained each of the three elements (structured content, source terminology management, and translation memory) and the magic that happens when you use them all together. Using the three together makes content development better, cheaper, and faster.

Structured authoring

Val walked us through the original content development process, where a writer wrote the content, then passed it off for translation and desktop publishing. This process was slow, expensive, and gave the writer little control.

In a structured environment, the author writes smaller chunks of content (sometimes called topics) and checks it into a CMS. The information product (PDF file, web page, book, etc) are a collection of these chunks in a certain order. In theory, you should be able to combine the chunks in different orders and arrangements for different content products.

Structured authoring should therefore produce more deliverables through content reuse, create consistency, and support multichannel publishing.

The content itself is separated from the eventual publication style and medium. Desktop publishing is a thing of the past.

Each individual chunk is independently translated. Each chunk is now in the database with its related translations.

There are a few problems to solve. In particular, terminology. For example, what do you do to a button: Click, Click on, Tap, Select, Hit… We’re not consistent in our use of terminology in our source.

Source terminology

We need to manage our source terminology. People do it in various ways, such as via a document or style guide, via reviews (tribal knowledge), or via a specific tool.

Val emphasised the importance of picking one term for a particular thing or concept. For example, when talking about a dog, choose a word: doc, pooch, hound – it often doesn’t matter which term you pick, provided you’re consistent.

No-one reads style guides! Everyone wants to, because we all want to do a great job. But no-one has the time. Also, it’s hard to know whether the word you’re about to write is a managed term.

We need a way to manage the words we’re using and how we’re using them, that we don’t have to go and look for. The information must be pushed to us.

It’s almost better not to have structured authoring if you don’t manage your terminology. We split the topic development amongst a group of writers, which leads to greater problems with consistency. Val showed us a screenshot from an automated terminology tool, which allows you to define preferred terms, banned terms, etc, and then prompts the authors when they use a deprecated word.

Translation memory

Val asked the audience whether we had translation memory (TM), whether our company owned the translation memory, whether we had more than one translation vendor, and whether those vendors shared the same memory. She stressed the importance of owning your own translation memory.

Translation memory (TM) is one of the automated tools that the translation vendor uses. If something in the source content has already been translated, the tool pops up the translation. This is because the translations are stored in a database called the translation memory. The bits of source content are stored as translation units, which are phrases, usually more than a word.

This makes translation cheaper. If you say the same thing in exactly the same way each time you say it, the tool pulls up the same translation as used the first time. This is called a 100% match. Note that a 100% match doesn’t cost zero dollars. To have no charge, you have to have an in-context match.

Val emphasised that you should make sure that what’s in the TM is pushed to the writers, although she knows of very few companies that are doing this. That way, writers would know what’s already been translated and be able to ensure we use the same terms when developing new content.

Ideally, there’s be an automated link from the translation memory to the terminology management system. But that’s complicated, and doesn’t happen often.

Tying them together

Val discussed the intersection of three technology areas:

  • Structured authoring – write it once, use it many times.
  • Terminology management – say the same thing the same way, every time you say it. Be as boring as you can and as simple as you can.
  • Translation memory – use already-translated terms in your source content.

This takes a lot of setup and maintenance.  But it’s worth it.

Conclusion

Val’s presentation was funny, engaging, and informative. She had the audience nodding and laughing throughout the session. Thanks Val!

What languages do our readers speak – from Google Analytics

I’ve grabbed some Google Analytics statistics about the languages used by visitors to the Atlassian documentation wiki. The information is based on the language setting in people’s browsers. It’s a pretty cool way of judging whether we need to translate our documentation!

The statistics cover a period of 3 months, from 7 September to 7 December 2012.

Summary

Approximately 30% of our readers speak a language other than English. The most popular non-English language is German (approximately 7%), followed by French (approx 2.6%). Japanese is hard to quantify, because we have separate sites for Japanese content.

The pretty picture

This graph shows the results for the top 10 locales:

Top 10 locales via Google Analytics

Top 10 locales via Google Analytics

The grey sector represents a number of smaller segments, each one below 1%. In Google Analytics, I can see them by requesting more than 10 lines of data.

The figures

Here are the figures that back the above graph:

Locale Number of visits Percentage of total
1. en-us 1,951,818 66.75%
2. en-gb 163,897 5.60%
3. de 105,526 3.61%
4. de-de 102,578 3.51%
5. fr 77,666 2.66%
6. ru 66,342 2.27%
7. zh-cn 38,850 1.33%
8. en 38,826 1.33%
9. es 37,129 1.27%
10. pl 30,064 1.03%

More Google Analytics?

Google Analytics is a useful tool. If you’re interested in a couple more posts about it, try the Google Analytics tag on this blog. I hope the posts are interesting. 🙂

Translation interoperability at Tekom tcworld 2012

I’m at Tekom tcworld 2012, in Wiesbaden. These are my notes from a session by Arle Lommel titled, “Linport: A New Standard for Translation Interoperability”.

Arle published this blurb about the session:

In 2011 the Globalization and Localization Association, the European Commission Directorate General for Translation, and the Brigham Young University Translation Research Group began work on an open, standards-based container for translation projects. Known as the Linport Project, it has brought together language technology developers who have agreed to implement the resulting specification. Additional collaboration with the Interoperability Now! group is leading toward a joint specification that promises to overcome technical fragmentation that leads to inefficiency in translation processes. This presentation describes the Linport format and its use in a globalization production chain.

Note: Linport is not yet a standard. They are working towards its becoming a standard.

What is Linport?

Linport is the Language INteroperability PORTfolio.

At present Linport consists of two related zip-encoded formats:

  • A portfolio for presenting translation projects.
  • A package for representing individual tasks. Packages can ge generated from portfolios, and you can combine packages to create a portfolio.

Linport also contains standardised metadata which is very important in translation.

Linport is a format that can be applied to various types of content, including different formats. It’s based on recognised open standards such as XML, XLIFF, TMX etc. It allows you to standardise, but doesn’t force that where it’s not possible.

It supports tasks such as translation, proof-reading, authoring.

The intention is that Linport be used at any point where you need to exchange data, such as between clients and translation vendors, and between translators and their employers. It allows all relevant resources to be sent together. It streamlines the transmission of data and reduces costs and confusion.

Goals

Arle summarised the aim of Linport: Simplified interchange.

Arle described the current processes in the translation industry: very manual, with people transferring data haphazardly. Individual small items are transferred, with a lot of manual routing and work. This adds costs, time, and potential for error.

As a result, only a small portion of time and cost in a translation project currently goes into the actual translation: approximately 30%. Translators themselves only spend about 50% of their time doing the actual translation. So, it’s possible that only about 15% of what you pay for translation goes into the actual translation.

The real cost savings therefore would come from speeding up processes. Eliminate the manual transactions.

Arle compared the goals to shipping containers. A very simple specification (dimensions, strength, and corner locking devices) yields a very powerful result.

Linport aims to be the “shipping container” for the translation and localisation community.

Specific features: structured translation specifications

A definition of quality:

A quality translation achieves needed accuracy and fluency, while meeting specifications that are appropriate to end-user needs.

Arle showed us the specific standards that Linport is based on. It’s designed to eliminate most points of confusion.

You can go to this URL to get the specs: www.ttt.org/specs

At the heart of Linport is the idea that, when running a translation project, we document everything up front, so that there is no possibility of confusion. For example, document the required target language. This seems obvious, but Arle has seen occasions where that is not given.

Areas of the specifications are:

  • A: Linguistic. These define the requirements for the source content and the target content. For example, for the source: textual characteristics, specialised language, volume, complexity. On the target side, the target language, audience, purpose, file format, and so on.
  • B: Production. Is it OK to use machine translation. Translation memory.
  • C. Environment. Special workplace requirements, such as secure facility. Technology. Reference materials.
  • D. Relationships. Payment, copyright, communication.

Linport provides STS files: Structured Translation Specifications. These are standardised formats as XML files. The assumption is that people will use templates to supply the information.

Tool demo

Arle gave us a demo of an open source tool under development in the Linport project, to make the formats accessible to people. The tool presents a form for you to complete, supplying the data required for the specifications.

Another function allows you to create a TIPP file, which is the one that goes out to be translated. You can choose the format, from some known standards. A TIPP file is a zip file, containing:

  • a manifest,
  • the source text to be translated,
  • and the structured specifications.

The TIPP file can be more complex, for example including the translation memory.

Go to app.linport.org, to try the tools yourself. Sign up for an account. It’s entirely free and open, and Linport will value your input. You can also send feedback to info@linport.org.

Current state

The team is finalising the structural details. They hold monthly conference calls, mainly to resolve minor details. Arle feels that they are approaching a stable specification (not available yet).

They’re working on early implementation, and already seeing early adoption. Some organisations are in beta stages of the implementation. Linport is working to gain implementation commitments, and some organisations are close to giving those commitments.

The team are building the online Linport tools, which will be open source. All the documentation and tools will be available free of charge. See the demo and URL above.

They’re also moving towards testing with real data in larger volumes.

Further development

The team is working towards finalising all the small details of formatting. When they reach consensus, they want to submit Linport to a standards body: ETSI or OASIS.

They also plan to improve the Linport apps, making them more user friendly. They also want to implement Linport in more tools and in more DGT production tasks.

You can go to linport.org for details about getting involved.

Selecting a translation vendor at Tekom tcworld 2012

I’m at Tekom tcworld 2012, in Wiesbaden. This morning I’m attending a session called “Considerations in Translation Vendor Selection”, by Bernard Aschwanden. This is a topic close to my heart, as I’m keen to start planning for the translation of our own documentation.

Here is the blurb that Bernard published for his session:

When a company identifies a need for documentation to be translated into new languages for both existing customers and new customers it is important to ensure you choose the right translation vendor. In doing so, it is necessary to identify options (with associated costs and risks) for meeting current demands, processes for handling future translation requests, and a big-picture strategy for documentation translation needs across product lines and worldwide needs. Learn about key considerations in vendor selection, and identify the factors that matter most to a successful partnership.

Bernard joined us via a remote connection. He became sick just before the conference began, so he recorded his presentation, and joined us via Skype to answer questions. What’s more it was 4 a.m. for him, so kudos that he was able to string a coherent sentence together!

Role of a translation vendor

The role of the vendor is to be a partner, working with you to identify your needs and manage people and processes. They should help with localisation as well as translation. Localisation means making the images and concepts and ideas understandable to a local audience.

The vendor should always provide you with the translation memory. If they don’t provide it, don’t use the vendor.

Should you translate yourself or outsource?

Some things to consider:

  • Are you comfortable with sending your content outside?
  • Are you happy with the changes in processes that will be required.
  • Associated costs, both short and long term.

Stakeholders

Make sure you identify everyone who is involved: Reviewers, authors, translators, managers, outside vendors, your clients… the list is long.

Ways to get started

  • Talk to people at conferences and make other uses of word of mouth.
  • Join interest groups.
  • Read industry articles.
  • Do web searches.

Then narrow down your options, by Googling the vendors, checking their websites, sending them an email.

Make sure you have a good list of questions to ask potential vendors, based on what’s important to you.

Start building relationships with a short list of vendors. Schedule a demo, set up in-person meetings. Ask them to walk you through the process. Then discuss the results with your team.

Important questions

These are some initial questions to ask the vendor:

  • Does the company outsource its work, and to whom?
  • What languages do they manage, and which are their specialities?
  • What industry do they specialise in?
  • Do they have a recent client list and references.
  • What is the rate of turnover for the translators?
  • What is their industry ranking?

Bernard then took us through a number of more specific questions. If you’re intending to take this further, it’s worth getting the list of useful questions to investigate. The questions revolved around fees, technology and tools, managing of the translators, workflow, and more.

It’s important to note how responsive the vendor is to your questions, and what your overall impression is of the organisation.

Getting a sample translated

Send the vendor a sample and ask them to do a translation. Make sure the sample is realistic.

Do a workflow model. Assess the result, and show it to the stakeholders who will use it.

Have a systematic way of scoring and comparing the results of the chosen vendors.

Costs

It’s important to identify the total costs. Make sure you specify which parts of the job you will do in-house and which parts you will outsource.

Find out about these costs:

  • Per word.
  • Total engineering effort to set up the system.
  • Editing and proofing.
  • Project management.
  • Layout, graphics, tables.
  • Review of the material.

You can save on costs by:

  • Content re-use.
  • Translation memory. Words that are matched one-to-one will need reviewing, but not full service. Make sure you own the translation memory.
  • Using just one firm. You may get a discount for doing all languages with one vendor.
  • Doing some of the tasks in house. For example, in-house review by a subject matter expert, or layout.

Common mistakes

Bernard closed with some points to note:

  • Know what you need. Go in prepared. Be up front about what you need, so that the vendor knows everything they need to know.
  • Build trust, on both sides.
  • Get the credentials of the vendor. Check their online presences, also on Facebook and Twitter. This will give an idea of how professional they are and what image they are putting out for people to see.
  • Find out all of the costs.
  • Don’t let the vendor use you as a test case.
  • Make sure the vendor has experience in the industry you work in.
  • Understand their quality processes, and make sure you know how they handle verification.

At the end, there was plenty of animated discussion from the floor. Audience members included translators, translation vendors, and people interested in hiring a vendor. This is a topic dear to many people’s hearts!

%d bloggers like this: