Category Archives: technical writing

A portal for life science training resources

This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.

Niall Beard from the University of Manchester presented a session on “TeSS: The ELIXIR Training Portal“.

TeSS is a life science research training portal that provides access to the training tools and platforms from various universities and other institutions around the world. TeSS provides metadata and a classification scheme for the items in the registry. The items are divided into two overarching categories: events and materials.

Getting data into TeSS

TeSS provides an online form for adding new training materials. TeSS also pulls in the material automatically, using things like:

  • Schema.org / Bioschemas
  • Website scraping, but that’s not an efficient  or reliable way of gathering data
  • XML schemas, but it’s tricky to get developers to use the XML schema to create content describing their site
  • And more

Schema.org provides a lightweight way of structuring online data. This is the most useful type of integration for TeSS, and is used by other content providers too. Schema.org has plenty of plugins that developers can use to apply the data to their applications.

Bioschemas.org is a community project that creates specifications for life science resources, and proposes those specifications for inclusion into Schema.org. The community is also working on tools to make bioschemas easier to create, and to extract and use the data.

Getting data out of TeSS

TeSS provides widgets that you can use to display TeSS content on a website, and a JSON API for interacting with TeSS programmatically.

Thanks Niall for a good overview of the TeSS training portal.

Wikidata, open data, and interoperability

This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.

Franziska Heine presented a keynote on Wikidata, a Wikimedia project that provides structured data to Wikipedia and other data sets. Franziska is Head of Software & Development at Wikimedia Deutschland.

Franziska’s talk was titled “Wikidata, Interoperability and the Future of Scientific Work“.

The Wikidata project

Franziska said she’s very excited to be here and talk about Wikidata, as it’s such a big part of what her team does. She cares about making Wikipedia, which started 20 years ago, into something that remains meaningful in the future.

Wikidata makes interwiki link semantics so that computers can understand the relationships between the pieces of data. When you ask Siri or Google Assistant a question, the answer comes from Wikidata. Franziska also showed us a map of the world with a data overlay sourced from Wikidata. (I can’t find a link to that specific map, alas.)

Wikidata has more than 20,000 active editors per month. That’s the highest number in the entire Wikimedia movement, surpassing even the number of edits of the English-language Wikipedia.

How Wikidata works

The core of Wikidata is a database of items. Each item describes a concept in the world. Each item has an ID number (“Q number”). Items also have descriptions and language information. In Wikipedia, the content for each language is completely separate. So, you can have the same topic in various languages, each with entirely different content. By contrast, in Wikidata all the languages are properties of the single data item. So, for example, each item has a description, and the description may be available in various languages.

Each item is also linked to all the various Wikipedia instances.

Each item has a number of statements (pieces of information), such as date of birth, place of birth, date of death, and so on. Each statement lists the sources of the information. It is of course possible that different sources may provide conflicting information about a particular statement. For example, there may be different opinions about the date of birth of a person.

Wikidata can be edited by people, but there are also bots that do the updates. The concepts within Wikidata are not built primarily for humans to navigate, but rather for machines to understand. For example, Wikidata is able to give Siri and Google Assistant information in ways that Wikipedia can’t.

But can humans look at the data?

Yes! You can use the Wikidata Query Service to access the data. To get started, grab an example query and then adapt it. The query language is SPARQL.

Franziska showed us some interesting query results:

  • The location of trees grown from seeds that have travelled around the moon. 🙂
  • Natural arches around the world
  • Cause of death of members of noble families

The expanding use of Wikidata

Wikidata was created to help the Wikipedia team maintain their data. Over the last few years, Wikidata has become a useful tool for other Wikimedia projects and even other organisations to manage their own data and metadata. Franziska showed a diagram of a future where various wikis can share and interlink data.

Existing projects:

  • The Sum of all Welsh Literature – a project presented by Jason Evans at the WikiCite Conference 2018.
  • Gwiki: Combining Wikidata with other linked databases by Andra Waagmeester and Dragan Espenschied.

Franziska showed us some graphs from the above projects, to demonstrate the research value that comes out of combining data from different large databases and analysing the results. This is what we’re about, she said: opening up data and making it freely accessible.

How interoperability fits in

Interoperability means mpre than just technical standards. Franziska referred to Mark Zuckerberg’s recent speech about the future of Facebook. Interoperability in his world, she commented, means the ability to communicate with people who are important to you, regardless of which platform they’re on.

Looking at the Gwiki project quoted above: It will connect very different people with each other: different languages, different cultures, different roles (academia, industry, etc). To facilitate this meeting of different worlds, we need to build tools and platforms – this is the social aspect of interoperability.

Instead of independent researchers working in their own worlds, they’ll be able to cooperate across disciplines, provided they have a shared metadata or infrastructure. This is the data aspect of interoperability.

In closing

Scientific knowledge graphs are key, said Franziska. They enable data analysis and power artificial intelligence. Semantic data and linked data are core to innovation and research.

We need to be able to provide data in a way that makes sense to people. This is where the infrastructure fits in. We must provide APIs and other interfaces that make it appealing to use and integrate the data. This is the essential infrastructure for free knowledge, so that research can transcend disciplinary silos, and we can make data and research available to everyone.

Thank you Franziska for a very interesting deep dive into Wikidata, interoperability, and open data.

Open data reduces friction in sharing and use of data

This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training, and sustainability. I’m planning to post a blog or two about the talks I attend. All credit goes to the presenter, and all mistakes are my own.

I’m very much looking forward to the conference. The audience is slightly different from the developer-focused and tech-writer-focused gatherings that I see more often. At this conference, attendees are a lively mix of researchers, engineers, educators, and others. The goal of the Software Sustainability Institute is to cultivate and improve research software.

Better software, better research [reference]

Opening keynote by Catherine Stihler

Catherine Stihler is the Chief Executive Officer of Open Knowledge International. She presented the opening keynote of the conference.

Catherine’s talk was titled “Transporting data more easily with Frictionless Data“.

Frictionless Data

Frictionless Data is one of the primary initiatives of Open Knowledge International. The website offers this description:

Frictionless Data is about removing the friction in working with data through the creation of tools, standards, and best practices for publishing data using the Data Package standard, a containerization format for any kind of data.

These are the challenges the Frictionless Data initiative addresses:

  • Legal barriers
  • Data quality
  • Interoperability
  • Hard to find
  • Tool integration

A goal of Frictionless Data is to provide a common packaging format that can hold many different types of data. So people can understand and use your data as easily as possible. Catherine used the metaphor of shipping containers to talk about data packages.

  • Publishers can create the data packages, and
  • consumers can plug the data packages into their systems.

There’s more information at the frictionlessdata.io website, including the Frictionless Data specifications and software (apps, integrations, libraries, and platforms).

As well as revolutionising how data is shared and used, the Frictionless Data initiative aims to massively improve data quality.

Open data

Open Knowledge International is a strong supporter of open data. They’re currently advocating against the EU copyright law, specifically Article 13, which many fear will result in the implementation of upload filters to ensure that the big content aggregation companies don’t fall foul of the law.

Catherine spoke passionately about the issues around political advertising on social media, the Responsible Data initiative, and the Open Definition which sets out principles defining openness in relation to data and content.

Catherine says the key challenge right now is that we could go down a closed, proprietary route where only those who have money will have access to knowledge. We need to win the debate about the importance of an open society and open and free knowledge.

Thank you Catherine for a spirited introduction to Open Knowledge International and its work.

Illustrating the multifaceted role of a tech writer

After a few conversations with various people, I decided that it may be useful to have an illustration of the multifaceted role of a tech writer. In particular, many of our stakeholders (product teams, engineering teams, marketing teams, etc) may not know all the ins and outs of what we do. A handy illustration may help conversations with product teams and other stakeholders.

Many people see our role as being focused on documenting the features of a product:

Here’s an illustration of the tech writing role, de-emphasizing the feature docs part of the role:

The many aspects of a tech writer’s role:

  • Gather the information required to develop the docs. Some ways to gather information: interview engineers, product managers, etc; read the code; read the product requirements document and other specifications; experiment with the product.
  • Analyse and define the audience.
  • Analyse the most common tasks and workflows of the target users.
  • Design the structure of the docs as a whole (information architecture).
  • Own the user experience (UX) of the docs: consistency; findability, readability, accessibility.
  • Ensure consistency with the product user interface (UI).
  • Ensure cross-product consistency, that is, consistency with the docs of related products: tone, terminology, coverage.
  • Review doc updates from other technical writers, engineers, and community contributors.
  • Stand up for the customer and give UX advice, as the first-time user of the product.
  • Create conceptual guides that explain the product at a high level, for new starters, and that explain the principles and details behind the product design and functionality, for people who want a deep dive into a particular technology.
  • Create getting-started guides covering the primary product features.
  • Create end-to-end tutorials, each covering a use case that involves multiple features.

Some technical writers have an even broader purview, depending on the team and the doc set they’re working with. For example, if the doc set is hosted on a self-managed website as opposed to a corporate website with shared infrastructure, the tech writer often takes on website design and management tasks. Small teams may not have software engineers available to create code samples, and so the tech writer creates those too. Open source docs in particular bring additional responsibilities.

Here are some of the additional tasks a tech writer may take on:

  • Develop illustrative code samples to include in the documentation.
  • Develop training material.
  • Produce videos, either as training material or as illustrative material to supplement or even replace the documentation.
  • Own the design and infrastructure of the documentation website: look and feel (theme), site search, version control, navigation aids, etc.
  • For open source products, educate and encourage the community to contribute to the docs.
  • Manage the repository that holds the documentation source files.

FYI, I based the above diagram on one that’s often used in presentations about machine learning (ML) to show the relatively small part of the ML workflow that’s devoted to actually writing the ML code. The original ML diagram is in this paper.

What do you think?

Does this diagram present an interesting way of starting a conversation about the role of a tech writer? I’d love to hear your thoughts and ideas!

How open source organizations can prepare for Season of Docs

Last week Google announced a new program, Season of Docs (g.co/seasonofdocs). The program provides a framework for technical writers and open source organizations to work together on a specific documentation project chosen by the open source organization and the tech writer concerned. From April 2, interested open source organizations can start applying for this year’s Season of Docs. Exciting news indeed! But what happens before April 2? I decided to blog about some ways you can get started with Season of Docs right now.

Open source organizations can start planning the documentation projects they’d like help with, and letting technical writers know about those projects. Get the conversation going, and build up excitement amongst your open source community and amongst the technical writing community.

The first step is to think about a good project or projects that a technical writer can help you with. The Season of Docs website provides some generic ideas for doc projects. You should to craft a specific project or two, based on the actual doc needs of your project. Include links to the relevant docs or other resources within your open source repository or on your website. I’d recommend that you propose a a few different project types, because different tasks may be of interest to different tech writers. For example, you could offer one project to refactor your existing docs, another to create a specific tutorial, and so on.

Your goal is to attract tech writers by making them feel comfortable about approaching your organization and excited about what they can achieve in collaboration with your mentors during Season of Docs.

It’s a good idea to find out who in your community wants to be a mentor. The mentors don’t need to be tech writers. There’s help about the mentors’ role on the Season of Docs website too.

When you’ve gathered some project ideas, blog about the fact that your organization is putting forward an application to participate in Season of Docs. Use the blog post to tell tech writers about your ideas and ask for input. You don’t need to wait for applications to open. You can get a head start by kicking off the discussion now.

Use the tag #SeasonOfDocs when promoting your ideas on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource.

%d bloggers like this: