Category Archives: open standards

Interoperability and the FAIR principles – a discussion

This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.

Patricia Herterich from the University of Birmingham presented a session on “Interoperable as in FAIR – A librarian’s personal point of view“.

A simple definition of interoperability: the ability of computer systems or software to exchange and make use of information. People also talk about semantic interoperability and other interpretations of the term.

Data interoperability

Patricia introduced the FAIR principles: a set of guidelines that aim to ensure data is:

  • findable,
  • accessible,
  • interoperable, and
  • reusable,

by both people and machines. FAIR principles focus more on the semantic aspects of interoperability rather than the technical aspects.

Patricia highlighted a big problem: Interoperability is not a well defined term. No-one knows what it means.

Some organisations have developed tools to assess data interoperability:

Software interoperability

For software, we can think of defining interoperability in this way:

  • Use of open standards
  • Use of platform/plugin architectures
  • Use of common libraries and package managers

Patricia pointed out that FAIRsharing.org offers various standards, but there are already well over 1000 standards there.

So how does a researcher go about choosing the right standard? How do we train researchers to make data FAIR? Patricia left this as an open question for discussion.

Questions and comments from the floor:

  • The FAIR principles were originally developed for data. Does it make sense to apply them to software?
  • The FAIR principles seem like just a catchy way of packaging techniques that have been applied for a long time.
  • Interoperability is not simple, and we need a set of user-friendly tools.

Thank you Patricia for a good discussion of the complex world of interoperability.

Wikidata, open data, and interoperability

This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.

Franziska Heine presented a keynote on Wikidata, a Wikimedia project that provides structured data to Wikipedia and other data sets. Franziska is Head of Software & Development at Wikimedia Deutschland.

Franziska’s talk was titled “Wikidata, Interoperability and the Future of Scientific Work“.

The Wikidata project

Franziska said she’s very excited to be here and talk about Wikidata, as it’s such a big part of what her team does. She cares about making Wikipedia, which started 20 years ago, into something that remains meaningful in the future.

Wikidata makes interwiki link semantics so that computers can understand the relationships between the pieces of data. When you ask Siri or Google Assistant a question, the answer comes from Wikidata. Franziska also showed us a map of the world with a data overlay sourced from Wikidata. (I can’t find a link to that specific map, alas.)

Wikidata has more than 20,000 active editors per month. That’s the highest number in the entire Wikimedia movement, surpassing even the number of edits of the English-language Wikipedia.

How Wikidata works

The core of Wikidata is a database of items. Each item describes a concept in the world. Each item has an ID number (“Q number”). Items also have descriptions and language information. In Wikipedia, the content for each language is completely separate. So, you can have the same topic in various languages, each with entirely different content. By contrast, in Wikidata all the languages are properties of the single data item. So, for example, each item has a description, and the description may be available in various languages.

Each item is also linked to all the various Wikipedia instances.

Each item has a number of statements (pieces of information), such as date of birth, place of birth, date of death, and so on. Each statement lists the sources of the information. It is of course possible that different sources may provide conflicting information about a particular statement. For example, there may be different opinions about the date of birth of a person.

Wikidata can be edited by people, but there are also bots that do the updates. The concepts within Wikidata are not built primarily for humans to navigate, but rather for machines to understand. For example, Wikidata is able to give Siri and Google Assistant information in ways that Wikipedia can’t.

But can humans look at the data?

Yes! You can use the Wikidata Query Service to access the data. To get started, grab an example query and then adapt it. The query language is SPARQL.

Franziska showed us some interesting query results:

  • The location of trees grown from seeds that have travelled around the moon. 🙂
  • Natural arches around the world
  • Cause of death of members of noble families

The expanding use of Wikidata

Wikidata was created to help the Wikipedia team maintain their data. Over the last few years, Wikidata has become a useful tool for other Wikimedia projects and even other organisations to manage their own data and metadata. Franziska showed a diagram of a future where various wikis can share and interlink data.

Existing projects:

  • The Sum of all Welsh Literature – a project presented by Jason Evans at the WikiCite Conference 2018.
  • Gwiki: Combining Wikidata with other linked databases by Andra Waagmeester and Dragan Espenschied.

Franziska showed us some graphs from the above projects, to demonstrate the research value that comes out of combining data from different large databases and analysing the results. This is what we’re about, she said: opening up data and making it freely accessible.

How interoperability fits in

Interoperability means mpre than just technical standards. Franziska referred to Mark Zuckerberg’s recent speech about the future of Facebook. Interoperability in his world, she commented, means the ability to communicate with people who are important to you, regardless of which platform they’re on.

Looking at the Gwiki project quoted above: It will connect very different people with each other: different languages, different cultures, different roles (academia, industry, etc). To facilitate this meeting of different worlds, we need to build tools and platforms – this is the social aspect of interoperability.

Instead of independent researchers working in their own worlds, they’ll be able to cooperate across disciplines, provided they have a shared metadata or infrastructure. This is the data aspect of interoperability.

In closing

Scientific knowledge graphs are key, said Franziska. They enable data analysis and power artificial intelligence. Semantic data and linked data are core to innovation and research.

We need to be able to provide data in a way that makes sense to people. This is where the infrastructure fits in. We must provide APIs and other interfaces that make it appealing to use and integrate the data. This is the essential infrastructure for free knowledge, so that research can transcend disciplinary silos, and we can make data and research available to everyone.

Thank you Franziska for a very interesting deep dive into Wikidata, interoperability, and open data.

How to get started with Markdown and where to try it out

Technical writers have heard quite a bit recently about Markdown. Putting aside the question of whether Markdown is the right choice for technical documentation, it’s interesting as a tech writer to know more about the language itself. What is Markdown, where can we see it in action, and how can we try it out? Here are some pointers. If you have any other tips or stories about Markdown, I’d love to hear them!

Markdown is a markup language designed for quick and easy creation of simple documents. The syntax is pared down to the minimum, with the result that:

  • The syntax is easy to remember.
  • A Markdown document is easy to read, since much of the content is free of markup tags.

Along with the markup syntax, Markdown comes with a parser (a piece of software) that converts the markup to HTML, so you can display the document on a web page.

Other well-known markup languages are HTML, XML, reStructuredText, various forms of wiki markup, and many others.

Example of Markdown

Here’s a chunk of the above text in Markdown format, with an added level 2 heading, “What is Markdown?”

## What is Markdown?

[Markdown](https://daringfireball.net/projects/markdown/) is a markup language
designed for quick and easy creation of simple documents. The syntax is pared
down to the minimum, with the result that:

* The syntax is easy to remember.
* A Markdown document is easy to read, since much of the content is free of
  markup tags.

Along with the markup syntax, Markdown comes with a parser (a piece of software)
that converts the markup to HTML, so you can display the document on a web
page.

Equivalent in HTML

Here’s the same text in HTML:

<h2>What is Markdown?</h2>

<p><a href="https://daringfireball.net/projects/markdown/">Markdown</a> is
  a markup language designed for quick and easy creation of simple documents.
  The syntax is pared down to the minimum, with the result that:</p>

<ul>
  <li>The syntax is easy to remember.</li>
  <li>A Markdown document is easy to read, since much of the content is free of
    markup tags.</li>
</ul>

<p>Along with the markup syntax, Markdown comes with a parser (a piece of
  software) that converts the markup to HTML, so you can display the
  document on a web page.</p>

Getting started with Markdown

When I first encountered Markdown, I already knew HTML and the wiki markup syntax used in Confluence. For me, the best approach to Markdown was:

  • First quickly scan the most basic syntax elements, to get an idea of the philosophy behind Markdown and to pick up the patterns. I’ve included some pointers below, to give you an idea of the patterns in the syntax. Note, though, that there are variations in Markdown syntax.
  • Then find a good cheatsheet and refer to it whenever you need to check up on something. Here’s a good cheatsheet.
  • If something doesn’t work, consult the full syntax guide.

Where can you try it out?

The best way to learn is to do.

  1. Grab my Markdown code from above, or write some of your own.
  2. Paste it into the text box at the top of Dingus.
  3. Click Convert.
  4. Scroll down the page to see first the HTML code and then the rendered version (HTML Preview) of your text.

Basic syntax

Here are those pointers I promised, to get you started.

Heading levels

# My level 1 heading
# Another level 1 heading
## My level 2 heading
### My level 3 heading
#### You get the drift

Paragraphs

No markup, just an empty line before and after each paragraph.

Links

Put the link text inside square brackets, followed by the URL in round brackets.

[Link text](https://my.site.net/path/)

Another way of doing links is to define a variable for the URL somewhere on the page, and use that variable instead of the URL in the text. This is useful if you need to use the same URL in more than one place in the document, or if you want to keep the messy, long URL away from the text.

[Markdown] is a markup language,
blah blah blah - this is the rest of my page.

[Markdown]: https://daringfireball.net/projects/markdown/

Bulleted list

* My list item
* Another list item
  * A list item embedded within the previous one
  * Another embedded item
* An item in the main list

There must be an empty line before and after each list, otherwise it gets mixed up with the preceding or following paragraph.

Numbered list

There are a few ways to do numbered lists. Here’s one:

1. My list item
1. Another list item
  * An embedded bulleted list item
  * Another embedded item
1. An item in the main list

You can mix and match bulleted and numbered lists, with varying degrees of success. 🙂

More markup

There’s plenty more you can do with Markdown, and there are a couple of syntax varieties to trap the unwary. For example, GitHub has a special flavour of Markdown.

Recent articles about Markdown

There’s been a fair bit of discussion about the pros and cons of Markdown recently. Here are a few of them:

My opinion

In my day job, I write docs in both HTML and Markdown. I prefer HTML for comprehensive technical documentation. Markdown is good for very simple documents, but the syntax becomes clumsy for more complex things like tables, anchors, and even images. On the other hand, there are excellent benefits to using Markdown for quick collaboration on a document.

As is so often true, we need to choose the best tool for each use case. It’s a good idea to get to know Markdown, so that you can form an opinion and be able to use it when you need it.

Mozilla Popcorn Maker and a tour of the Confluence documentation

Mozilla’s Popcorn Maker is pretty neat. You can grab a video and augment it with clickable text boxes. You can also add other interactive widgets, such as a live Twitter stream or a fully-functioning map from Google Maps. I’ve been playing with Popcorn Maker for a couple of weeks, and I thought other people may like to have a go. So I’ve put together a video for you to mess up… hrrm… review. It’s cunningly disguised as a tour of the Atlassian Confluence documentation. But actually, it’s a bit of fun. 😉

Popcorn Maker is all online. There’s nothing to download. You give it the URL of a video from YouTube or another supported location, then drag and drop events onto the video. The Popcorn Maker editing environment adds a timeline, a bit like the one you see in a Flash editor, but driven entirely by Javascript, HTML, and CSS. You also get a library of widgets to add and configure, such as text boxes, popups, maps, Twitter streams, and so on.

Jumping right in

Are you keen to try Popcorn Maker? Try making a remix of my Popcorn Maker movie, “Popping the Confluence docs“. I’d love it if you’d add a comment on this blog post with a link to your remix!

Mozilla Popcorn Maker and a tour of the Confluence documentation

Making the video itself

I used Screencast-O-Matic to record the movie itself. It’s a great tool too. Just like Popcorn Maker, everything is online. You do need to install Java on your computer, and it’s handy to have a webcam for the audio part of the movie. Other than that, all you need is your connection to the Internet. You can use Screencast-O-Matic free of charge, if you’re happy to have a watermark at the bottom of your movie.

Once I’d made the movie, I uploaded it to YouTube and then used Popcorn Maker to annotate it and make it available for remixing.

Some thoughts on Popcorn Maker

It’s pretty cool to be able to grab a video from YouTube (or Vimeo, Soundcloud, or an HTML 5 video) and add bits to it online, all within your web browser. Nifty technology!

But I think the huge potential lies in the fact that anyone can remix the videos. Just grab a movie that someone else has created, and decorate it yourself.

This has very interesting possibilities for collaborative development of “how to” videos. Another use that springs to mind: The review of videos. Instead of writing separate notes, people can paste their comments directly onto the relevant spot in the video. And they don’t need specialised tools to do it.

The icons and styling in general could do with some tender loving care from an artist or designer.

The integration with Twitter, Flickr and Google Maps is awesome! It makes me wonder what other integrations would be useful. Perhaps a HipChat room. Or an RSS feed from WordPress?

I’d also love to see some way of finding and sharing remixes of a given video. Ha ha, searching for “Popcorn remixes” brings up a number of song remixes!

References

STC Summit day 2 – Using DITA

I’m at STC 2012, the annual conference of the Society for Technical Communication. This post contains my notes from a session called “Using DITA”, by Michael Priestley, lead DITA architect at IBM.

Michael started off by telling us what DITA is and giving us some historical background.

What is DITA and who is using it?

DITA (Darwin Information Typing Architecture) is an open standard for designing, creating and publishing modular information, such as technical publications, help sets and websites. The standard is owned by OASIS. It was originally created for technical communication, but was designed to be more broadly applicable. We are starting to see other types of content adopt the DITA standard now too, such as learning and training. As a result, some LMS systems are now adapting to support DITA too.

The latest DITA standard, OASIS DITA 1.2, was approved in December 2010.

A public survey at ditawriter.com conducted a survey and found that around 250 companies have posted public case studies. Breaking the usage down by industry sector, around 30% of companies that use DITA are in the software sector. Geographically, the largest number of DITA users is in North America, followed by Europe.

In technical communication, a survey in 2008 found that DITA was the most popular standard, at 35%.

Michael discussed some case studies from around 2008-9,including the reasons why the companies adopted DITA :

  • Avaya adopted DITA to solve problems of low-quality for globalised content, inconsistency in content and style, and problems in training new writers. They reported improvements in these areas, as well as increased user satisfaction and happier writers,
  • CaridianBCT adopted DITA to reduce translation costs. They reported savings of $100,000 in the first year.
  • Medtronic needed to improve productivity. The metrics quoted were based on the amount of content that the team managed to produce using the new reuse model.
  • WebSphere adopted DITA for content reuse in the documentation for their application server product. They report 80% reuse of their content across the entire set. Note that the percentage of reuse required depends on the amount of commonality across the different products.

Smart content

Michael discussed a study by the Gilbane Group, called “Smart Content in the Enterprise”. One of the most interesting aspects was a shift from an inward-facing to an outward-facing view of the content. Focus on what the users need from the content, and on delivery as the starting point of design.

The core elements are providing metadata, integrating social systems with the content, and using structured content as the source.

Getting practical – DITA topics and maps

DITA is all about chunks of content called topics. This concept has nothing to do with reuse. It’s all about content use. With all types of content other than novels, people jump in and out of a book or manual to find what they need. They don’t read the book from start to finish. What we must do is understand what the user needs to do, and prioritise that over the needs of the product.

Here Michael took us through the core elements of DITA:

  • The high-level structure of a topic in DITA.
  • An example of a DITA map. A map does all the referencing and organising. You would have a map per output format.
  • A map in topic links.
  • A map with generated navigation and links. The build process takes the information in the map, and turns it into a table of contents and supported linking patterns.

Conditional processing

Michael showed us the underlying DITA tags for conditional processing. We don’t assert that you should include or exclude something. Instead, we assert that it applies to a specific product or products.

Then you set conditions as part of the build process, to include or exclude products, or to flag specific content.

Conref

In some cases, you want to reuse a fragment of a topic or a fragment of a map. In those cases, you would use a conref.

A conref is basically an empty element that instructs the build process to pull in the content from another element at publication time.

Note that you should do this as little as possible. In most cases, you would use conditional processing to include or exclude a topic. Reason: If people change a paragraph, it could mess up your content reuse. Topics are a more basic unit of structure, and so are less subject to breaking changes.

Information typing

The idea is that different types of information are best served by different structures. What’s more if a particular type of information is presented in a particular structure, that will help the user understand the information.

DITA defines a specific set of structures for tasks. For example

  • Prerequisites
  • Steps
  • Warnings

Each of the above elements has sub-elements, and rules about what can go where. The task  is a type of topic. You can use a task anywhere that you would use a topic. The task has rules unique to it. We call this “specialisation”.

What’s new in DITA 1.2?

  • Taxonomies. Michael showed us the big vision of how his team uses taxonomies to provide progressive disclosure from the product UI to the web-based user assistance.
  • Learning and training specialisations.

What’s coming in DITA1.3?

The group is working to provide a lighter-weight DITA editing experience, for technical communicators and other DITA users too.

Michael’s presentation style is easy and authoritative. Thank you Michael for an informative insight into DITA.

%d bloggers like this: