Category Archives: open standards
Interoperability and the FAIR principles – a discussion
This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.
Patricia Herterich from the University of Birmingham presented a session on “Interoperable as in FAIR – A librarian’s personal point of view“.
A simple definition of interoperability: the ability of computer systems or software to exchange and make use of information. People also talk about semantic interoperability and other interpretations of the term.
Data interoperability
Patricia introduced the FAIR principles: a set of guidelines that aim to ensure data is:
- findable,
- accessible,
- interoperable, and
- reusable,
by both people and machines. FAIR principles focus more on the semantic aspects of interoperability rather than the technical aspects.
Patricia highlighted a big problem: Interoperability is not a well defined term. No-one knows what it means.
Some organisations have developed tools to assess data interoperability:
- The Dutch Data Archiving and Networked Services (DANS) organisation has developed a FAIR data assessment tool (see the prototype) that attempts to measure data interoperability.
- The Australian Research Data Common (ARDC) has also developed a FAIR Data self-assessment tool.
Software interoperability
For software, we can think of defining interoperability in this way:
- Use of open standards
- Use of platform/plugin architectures
- Use of common libraries and package managers
Patricia pointed out that FAIRsharing.org offers various standards, but there are already well over 1000 standards there.
So how does a researcher go about choosing the right standard? How do we train researchers to make data FAIR? Patricia left this as an open question for discussion.
Questions and comments from the floor:
- The FAIR principles were originally developed for data. Does it make sense to apply them to software?
- The FAIR principles seem like just a catchy way of packaging techniques that have been applied for a long time.
- Interoperability is not simple, and we need a set of user-friendly tools.
Thank you Patricia for a good discussion of the complex world of interoperability.
Wikidata, open data, and interoperability
This week I’m attending a conference titled Collaborations Workshop 2019, run by the Software Sustainability Institute of the UK. The conference focuses on interoperability, documentation, training and sustainability. I’m blogging my notes from the talks I attend. All credit goes to the presenter, and all mistakes are my own.
Franziska Heine presented a keynote on Wikidata, a Wikimedia project that provides structured data to Wikipedia and other data sets. Franziska is Head of Software & Development at Wikimedia Deutschland.
Franziska’s talk was titled “Wikidata, Interoperability and the Future of Scientific Work“.
The Wikidata project
Franziska said she’s very excited to be here and talk about Wikidata, as it’s such a big part of what her team does. She cares about making Wikipedia, which started 20 years ago, into something that remains meaningful in the future.
Wikidata makes interwiki link semantics so that computers can understand the relationships between the pieces of data. When you ask Siri or Google Assistant a question, the answer comes from Wikidata. Franziska also showed us a map of the world with a data overlay sourced from Wikidata. (I can’t find a link to that specific map, alas.)
Wikidata has more than 20,000 active editors per month. That’s the highest number in the entire Wikimedia movement, surpassing even the number of edits of the English-language Wikipedia.
How Wikidata works
The core of Wikidata is a database of items. Each item describes a concept in the world. Each item has an ID number (“Q number”). Items also have descriptions and language information. In Wikipedia, the content for each language is completely separate. So, you can have the same topic in various languages, each with entirely different content. By contrast, in Wikidata all the languages are properties of the single data item. So, for example, each item has a description, and the description may be available in various languages.
Each item is also linked to all the various Wikipedia instances.
Each item has a number of statements (pieces of information), such as date of birth, place of birth, date of death, and so on. Each statement lists the sources of the information. It is of course possible that different sources may provide conflicting information about a particular statement. For example, there may be different opinions about the date of birth of a person.
Wikidata can be edited by people, but there are also bots that do the updates. The concepts within Wikidata are not built primarily for humans to navigate, but rather for machines to understand. For example, Wikidata is able to give Siri and Google Assistant information in ways that Wikipedia can’t.
But can humans look at the data?
Yes! You can use the Wikidata Query Service to access the data. To get started, grab an example query and then adapt it. The query language is SPARQL.
Franziska showed us some interesting query results:
- The location of trees grown from seeds that have travelled around the moon. 🙂
- Natural arches around the world
- Cause of death of members of noble families
The expanding use of Wikidata
Wikidata was created to help the Wikipedia team maintain their data. Over the last few years, Wikidata has become a useful tool for other Wikimedia projects and even other organisations to manage their own data and metadata. Franziska showed a diagram of a future where various wikis can share and interlink data.
Existing projects:
- The Sum of all Welsh Literature – a project presented by Jason Evans at the WikiCite Conference 2018.
- Gwiki: Combining Wikidata with other linked databases by Andra Waagmeester and Dragan Espenschied.
Franziska showed us some graphs from the above projects, to demonstrate the research value that comes out of combining data from different large databases and analysing the results. This is what we’re about, she said: opening up data and making it freely accessible.
How interoperability fits in
Interoperability means mpre than just technical standards. Franziska referred to Mark Zuckerberg’s recent speech about the future of Facebook. Interoperability in his world, she commented, means the ability to communicate with people who are important to you, regardless of which platform they’re on.
Looking at the Gwiki project quoted above: It will connect very different people with each other: different languages, different cultures, different roles (academia, industry, etc). To facilitate this meeting of different worlds, we need to build tools and platforms – this is the social aspect of interoperability.
Instead of independent researchers working in their own worlds, they’ll be able to cooperate across disciplines, provided they have a shared metadata or infrastructure. This is the data aspect of interoperability.
In closing
Scientific knowledge graphs are key, said Franziska. They enable data analysis and power artificial intelligence. Semantic data and linked data are core to innovation and research.
We need to be able to provide data in a way that makes sense to people. This is where the infrastructure fits in. We must provide APIs and other interfaces that make it appealing to use and integrate the data. This is the essential infrastructure for free knowledge, so that research can transcend disciplinary silos, and we can make data and research available to everyone.
Thank you Franziska for a very interesting deep dive into Wikidata, interoperability, and open data.
How to get started with Markdown and where to try it out
Technical writers have heard quite a bit recently about Markdown. Putting aside the question of whether Markdown is the right choice for technical documentation, it’s interesting as a tech writer to know more about the language itself. What is Markdown, where can we see it in action, and how can we try it out? Here are some pointers. If you have any other tips or stories about Markdown, I’d love to hear them!
Markdown is a markup language designed for quick and easy creation of simple documents. The syntax is pared down to the minimum, with the result that:
- The syntax is easy to remember.
- A Markdown document is easy to read, since much of the content is free of markup tags.
Along with the markup syntax, Markdown comes with a parser (a piece of software) that converts the markup to HTML, so you can display the document on a web page.
Other well-known markup languages are HTML, XML, reStructuredText, various forms of wiki markup, and many others.
Example of Markdown
Here’s a chunk of the above text in Markdown format, with an added level 2 heading, “What is Markdown?”
## What is Markdown? [Markdown](https://daringfireball.net/projects/markdown/) is a markup language designed for quick and easy creation of simple documents. The syntax is pared down to the minimum, with the result that: * The syntax is easy to remember. * A Markdown document is easy to read, since much of the content is free of markup tags. Along with the markup syntax, Markdown comes with a parser (a piece of software) that converts the markup to HTML, so you can display the document on a web page.
Equivalent in HTML
Here’s the same text in HTML:
<h2>What is Markdown?</h2> <p><a href="https://daringfireball.net/projects/markdown/">Markdown</a> is a markup language designed for quick and easy creation of simple documents. The syntax is pared down to the minimum, with the result that:</p> <ul> <li>The syntax is easy to remember.</li> <li>A Markdown document is easy to read, since much of the content is free of markup tags.</li> </ul> <p>Along with the markup syntax, Markdown comes with a parser (a piece of software) that converts the markup to HTML, so you can display the document on a web page.</p>
Getting started with Markdown
When I first encountered Markdown, I already knew HTML and the wiki markup syntax used in Confluence. For me, the best approach to Markdown was:
- First quickly scan the most basic syntax elements, to get an idea of the philosophy behind Markdown and to pick up the patterns. I’ve included some pointers below, to give you an idea of the patterns in the syntax. Note, though, that there are variations in Markdown syntax.
- Then find a good cheatsheet and refer to it whenever you need to check up on something. Here’s a good cheatsheet.
- If something doesn’t work, consult the full syntax guide.
Where can you try it out?
The best way to learn is to do.
- Grab my Markdown code from above, or write some of your own.
- Paste it into the text box at the top of Dingus.
- Click Convert.
- Scroll down the page to see first the HTML code and then the rendered version (HTML Preview) of your text.
Basic syntax
Here are those pointers I promised, to get you started.
Heading levels
# My level 1 heading # Another level 1 heading ## My level 2 heading ### My level 3 heading #### You get the drift
Paragraphs
No markup, just an empty line before and after each paragraph.
Links
Put the link text inside square brackets, followed by the URL in round brackets.
[Link text](https://my.site.net/path/)
Another way of doing links is to define a variable for the URL somewhere on the page, and use that variable instead of the URL in the text. This is useful if you need to use the same URL in more than one place in the document, or if you want to keep the messy, long URL away from the text.
[Markdown] is a markup language, blah blah blah - this is the rest of my page. [Markdown]: https://daringfireball.net/projects/markdown/
Bulleted list
* My list item * Another list item * A list item embedded within the previous one * Another embedded item * An item in the main list
There must be an empty line before and after each list, otherwise it gets mixed up with the preceding or following paragraph.
Numbered list
There are a few ways to do numbered lists. Here’s one:
1. My list item 1. Another list item * An embedded bulleted list item * Another embedded item 1. An item in the main list
You can mix and match bulleted and numbered lists, with varying degrees of success. 🙂
More markup
There’s plenty more you can do with Markdown, and there are a couple of syntax varieties to trap the unwary. For example, GitHub has a special flavour of Markdown.
Recent articles about Markdown
There’s been a fair bit of discussion about the pros and cons of Markdown recently. Here are a few of them:
- Eric Holscher wrote a popular and much commented post in March this year: Why You Shouldn’t Use “Markdown” for Documentation. Congrats Eric, this is a really great example of in depth analysis and reasoned opinion. It also sparked a large amount of impassioned conversation, which is still going on in forums around the world.
- Tom Johnson has a section on Markdown in his course on documenting REST APIS: More about Markdown.
- Victor Zverovich compares Markdown and reStructuredText: reStructuredText vs Markdown for documentation.
- Ben Cotton comparesMarkdown, reStructuredText, DocBook and LaTeX: Markup lowdown: 4 markup languages every team should know.
My opinion
In my day job, I write docs in both HTML and Markdown. I prefer HTML for comprehensive technical documentation. Markdown is good for very simple documents, but the syntax becomes clumsy for more complex things like tables, anchors, and even images. On the other hand, there are excellent benefits to using Markdown for quick collaboration on a document.
As is so often true, we need to choose the best tool for each use case. It’s a good idea to get to know Markdown, so that you can form an opinion and be able to use it when you need it.
STC Summit day 2 – Using DITA
I’m at STC 2012, the annual conference of the Society for Technical Communication. This post contains my notes from a session called “Using DITA”, by Michael Priestley, lead DITA architect at IBM.
Michael started off by telling us what DITA is and giving us some historical background.
What is DITA and who is using it?
DITA (Darwin Information Typing Architecture) is an open standard for designing, creating and publishing modular information, such as technical publications, help sets and websites. The standard is owned by OASIS. It was originally created for technical communication, but was designed to be more broadly applicable. We are starting to see other types of content adopt the DITA standard now too, such as learning and training. As a result, some LMS systems are now adapting to support DITA too.
The latest DITA standard, OASIS DITA 1.2, was approved in December 2010.
A public survey at ditawriter.com conducted a survey and found that around 250 companies have posted public case studies. Breaking the usage down by industry sector, around 30% of companies that use DITA are in the software sector. Geographically, the largest number of DITA users is in North America, followed by Europe.
In technical communication, a survey in 2008 found that DITA was the most popular standard, at 35%.
Michael discussed some case studies from around 2008-9,including the reasons why the companies adopted DITA :
- Avaya adopted DITA to solve problems of low-quality for globalised content, inconsistency in content and style, and problems in training new writers. They reported improvements in these areas, as well as increased user satisfaction and happier writers,
- CaridianBCT adopted DITA to reduce translation costs. They reported savings of $100,000 in the first year.
- Medtronic needed to improve productivity. The metrics quoted were based on the amount of content that the team managed to produce using the new reuse model.
- WebSphere adopted DITA for content reuse in the documentation for their application server product. They report 80% reuse of their content across the entire set. Note that the percentage of reuse required depends on the amount of commonality across the different products.
Smart content
Michael discussed a study by the Gilbane Group, called “Smart Content in the Enterprise”. One of the most interesting aspects was a shift from an inward-facing to an outward-facing view of the content. Focus on what the users need from the content, and on delivery as the starting point of design.
The core elements are providing metadata, integrating social systems with the content, and using structured content as the source.
Getting practical – DITA topics and maps
DITA is all about chunks of content called topics. This concept has nothing to do with reuse. It’s all about content use. With all types of content other than novels, people jump in and out of a book or manual to find what they need. They don’t read the book from start to finish. What we must do is understand what the user needs to do, and prioritise that over the needs of the product.
Here Michael took us through the core elements of DITA:
- The high-level structure of a topic in DITA.
- An example of a DITA map. A map does all the referencing and organising. You would have a map per output format.
- A map in topic links.
- A map with generated navigation and links. The build process takes the information in the map, and turns it into a table of contents and supported linking patterns.
Conditional processing
Michael showed us the underlying DITA tags for conditional processing. We don’t assert that you should include or exclude something. Instead, we assert that it applies to a specific product or products.
Then you set conditions as part of the build process, to include or exclude products, or to flag specific content.
Conref
In some cases, you want to reuse a fragment of a topic or a fragment of a map. In those cases, you would use a conref.
A conref is basically an empty element that instructs the build process to pull in the content from another element at publication time.
Note that you should do this as little as possible. In most cases, you would use conditional processing to include or exclude a topic. Reason: If people change a paragraph, it could mess up your content reuse. Topics are a more basic unit of structure, and so are less subject to breaking changes.
Information typing
The idea is that different types of information are best served by different structures. What’s more if a particular type of information is presented in a particular structure, that will help the user understand the information.
DITA defines a specific set of structures for tasks. For example
- Prerequisites
- Steps
- Warnings
Each of the above elements has sub-elements, and rules about what can go where. The task is a type of topic. You can use a task anywhere that you would use a topic. The task has rules unique to it. We call this “specialisation”.
What’s new in DITA 1.2?
- Taxonomies. Michael showed us the big vision of how his team uses taxonomies to provide progressive disclosure from the product UI to the web-based user assistance.
- Learning and training specialisations.
What’s coming in DITA1.3?
The group is working to provide a lighter-weight DITA editing experience, for technical communicators and other DITA users too.
Michael’s presentation style is easy and authoritative. Thank you Michael for an informative insight into DITA.