I’m coming to the conclusion that there are specific types of content that suit a DITA environment, and that the converse is also true: DITA is not the best solution for every content type. (DITA is the Darwin Information Typing Architecture, an XML architecture for designing, writing, managing, and publishing information.)
“Well, duh,” you may say. I’ve never worked in a DITA environment, but I’ve attended two indepth training courses and a number of case studies that walked through successful DITA implementations. The most recent was at the ASTC (NSW) conference last week, where Gareth Oakes presented a case study of an automotive content management system that he designed and implemented in collaboration with Repco. The content is stored and managed in DITA format, and published on a website. (See my report on the session: Repco and DITA.) This was a convincing case study of a situation where DITA has succeeded very well.
In my analysis, the DITA implementations that work well are those where the content consists of a large number of topics, and where those topics have an identical structure. It’s almost as if you’re building a database of content. Good examples are catalogues of automotive spare parts, machine repair instructions, safety procedures, aircraft manufacturing manuals, and so on.
Apart from volume and support for a standard layout, this type of content has other requirements that DITA can satisfy well, including the ability to automatically build a variety of manuals by combining the topics into different configurations (via DITA maps) and multi-channel publishing.
On the other hand, some content doesn’t benefit much from such a highly structured storage format. Potentially, the overhead of a DITA environment is overkill and the costs may outweigh the benefits. If we have contributors to the docs who are not tech writers or developers, asking them to learn DITA or specific source control and editor rules can be a deterrent.
Dare I say it: Much of the documentation we write in the software industry falls into the latter category. Our topics tend to be lengthy, less uniform in structure, and more discursive than, say, an auto parts manual. API reference docs are an exception, but they’re auto-generated from software code anyway. We also don’t usually need to recombine the topics into different output configurations, such as different models of a car.
What do you think? Please contradict me. Do you have examples that gainsay or support the above conclusions? I’d love to see some examples of well-structured and well-presented documentation produced from DITA source.
I’m attending the annual conference of the Australian Society for Technical Communication (NSW), ASTC (NSW) 2014. These are my session notes from the conference. All credit goes to the presenter, and any mistakes are mine. I’m sharing these notes on my blog, as they may be useful to other technical communicators. Also, I’d like to let others know of the skills and knowledge so generously shared by the presenter.
Gareth Oakes walked us through a case study of an automotive content management system that he designed and implemented in collaboration with Repco. The content is stored and managed in DITA format, and published on a website.
Overview of the system: Autopedia
Repco supplies automotive parts in Australia and New Zealand, dealing with retailers as well as their workshop division. They were founded in Australia in 1922. The system is called Autopedia, a subscription product for mechanics and workshops. It provides automotive service and diagnostic information for the majority of vehicles on the road in Australia. For example, diagnostic codes, wiring diagrams, procedures for replacing timing belts, and so on.
Autopedia is a replacement for an “encyclopedia” in the form of a CD that was mailed out to the mechanics. The content was written by Repco technical writers. It was becoming more and more difficult to keep up with the number of vehicles, and variations of vehicle, on the road. It was expensive to maintain the content. And customers wanted online content, not CDs.
Designing a solution
Repco came up with a vision for a replacement system, and Gareth’s team worked with them on the technical solution.
A summary of what the solution comprised:
Content (Repco & third party) -> DITA -> DITA CCMS -> HTML -> Web server
(CCMS = Component Content Management System. The team uses the Arbortext Content Manager.)
The team wrote a multiple conversion pipeline, consisting of a set of Java tools, to convert the content into DITA format. A vehicle mapping table related the vehicle models in other countries to the Australian models. At first this required a lot of quality control, but now it’s up and running it doesn’t need so much attention.
Creating and storing the content
All content is stored as a simple DITA topic. The aim was to keep the system simple. Each topic is tagged to indicate which vehicle it applies to. Other semantics are added when required to support the needs of website display. This was done iteratively, as the need arose. For example, in a voltage table you may want certain values to stand out.
Repco content makes up only 1% of the content. Repco now authors all new content in DITA, using Arbortext Editor. Third-party content come from a vendor in the USA, in multiple formats: database, CSV, custom XML, images. The automatic process converts all this to DITA topics, grouping the topics by vehicle applicability. The system then does a diff process and sends only what’s changed to the web.
Delivering the content
All content is converted to HTML and sent to the web server (Umbraco). The vehicle mapping table is used to decide where the content belongs in the navigation structure. Images are hosted on a CDN (Content Delivery Network) hosted by Rackspace. Topics are marked as published or unpublished on the web service, so separating the publication process from the content storage and update process, and allowing the publishing process to respond quickly to customer feedback.
In all, the project took approximately 9 months. The team was reasonably small: approximately 9 people across the various teams at both GPSL (where Gareth works) and Repco.
(Members of the audience at Gareth’s session expressed surprise at and admiration for the short timeframe of this project.)
The initial design sessions with Repco took approximately a month. The other phases included planning sessions with the web team, development of the code and the vehicle mapping table (which took around 6 months), and migration of existing content to DITA. Then following integration of all the components, and testing. Documentation, training and knowledge transfer was important. Then the initial conversion and content upload took a while – more than 200GB. Then came the go live date, and ongoing support.
Results of the launch
The project launched late in 2012. The response from the market was very positive. They achieved their revenue goals and ROI well within the first year. Most existing customers migrated to the new system, and more than 1000 new customers signed up.
A few project notes
DITA was a good solution for this system, using basic topics and a very light layer of specialisation for vehicle tagging. The team may need to add more specialisation in future, based on customer demands for dynamic representations, such as decision tables. A next step may be a live link to parts, so that the parts are ready when you come in to work the next day. The single sourcing aspect is extremely useful. Store the content in one place and be able to output in many formats, such as PDF. The team found DITA easy to work with, as there are many tools available.
You need a level of skill with XML. DITA also very much steers you to author your content as topics, which may not suit every solution. You may also need new tools. And with a laugh, Gareth said that you risk turning into one of those DITA fans who runs around recommending DITA as a solution for everyone else.
The huge amount of content caused many delays, which were not entirely expected. The information structure required a number of design changes during development, due to the complexity of vehicle classification. The DITA CCMS required a lot of specific configuration and optimisation to ensure it was performing as required.
Thank you Gareth for an insight into a very interesting project and a cool system.
Michael started off by telling us what DITA is and giving us some historical background.
What is DITA and who is using it?
DITA (Darwin Information Typing Architecture) is an open standard for designing, creating and publishing modular information, such as technical publications, help sets and websites. The standard is owned by OASIS. It was originally created for technical communication, but was designed to be more broadly applicable. We are starting to see other types of content adopt the DITA standard now too, such as learning and training. As a result, some LMS systems are now adapting to support DITA too.
The latest DITA standard, OASIS DITA 1.2, was approved in December 2010.
A public survey at ditawriter.com conducted a survey and found that around 250 companies have posted public case studies. Breaking the usage down by industry sector, around 30% of companies that use DITA are in the software sector. Geographically, the largest number of DITA users is in North America, followed by Europe.
In technical communication, a survey in 2008 found that DITA was the most popular standard, at 35%.
Michael discussed some case studies from around 2008-9,including the reasons why the companies adopted DITA :
- Avaya adopted DITA to solve problems of low-quality for globalised content, inconsistency in content and style, and problems in training new writers. They reported improvements in these areas, as well as increased user satisfaction and happier writers,
- CaridianBCT adopted DITA to reduce translation costs. They reported savings of $100,000 in the first year.
- Medtronic needed to improve productivity. The metrics quoted were based on the amount of content that the team managed to produce using the new reuse model.
- WebSphere adopted DITA for content reuse in the documentation for their application server product. They report 80% reuse of their content across the entire set. Note that the percentage of reuse required depends on the amount of commonality across the different products.
Michael discussed a study by the Gilbane Group, called “Smart Content in the Enterprise”. One of the most interesting aspects was a shift from an inward-facing to an outward-facing view of the content. Focus on what the users need from the content, and on delivery as the starting point of design.
The core elements are providing metadata, integrating social systems with the content, and using structured content as the source.
Getting practical – DITA topics and maps
DITA is all about chunks of content called topics. This concept has nothing to do with reuse. It’s all about content use. With all types of content other than novels, people jump in and out of a book or manual to find what they need. They don’t read the book from start to finish. What we must do is understand what the user needs to do, and prioritise that over the needs of the product.
Here Michael took us through the core elements of DITA:
- The high-level structure of a topic in DITA.
- An example of a DITA map. A map does all the referencing and organising. You would have a map per output format.
- A map in topic links.
- A map with generated navigation and links. The build process takes the information in the map, and turns it into a table of contents and supported linking patterns.
Michael showed us the underlying DITA tags for conditional processing. We don’t assert that you should include or exclude something. Instead, we assert that it applies to a specific product or products.
Then you set conditions as part of the build process, to include or exclude products, or to flag specific content.
In some cases, you want to reuse a fragment of a topic or a fragment of a map. In those cases, you would use a conref.
A conref is basically an empty element that instructs the build process to pull in the content from another element at publication time.
Note that you should do this as little as possible. In most cases, you would use conditional processing to include or exclude a topic. Reason: If people change a paragraph, it could mess up your content reuse. Topics are a more basic unit of structure, and so are less subject to breaking changes.
The idea is that different types of information are best served by different structures. What’s more if a particular type of information is presented in a particular structure, that will help the user understand the information.
DITA defines a specific set of structures for tasks. For example
Each of the above elements has sub-elements, and rules about what can go where. The task is a type of topic. You can use a task anywhere that you would use a topic. The task has rules unique to it. We call this “specialisation”.
What’s new in DITA 1.2?
- Taxonomies. Michael showed us the big vision of how his team uses taxonomies to provide progressive disclosure from the product UI to the web-based user assistance.
- Learning and training specialisations.
What’s coming in DITA1.3?
The group is working to provide a lighter-weight DITA editing experience, for technical communicators and other DITA users too.
Michael’s presentation style is easy and authoritative. Thank you Michael for an informative insight into DITA.
This week I published a post on the Atlassian blog about single source publishing on a wiki. I’m cross-posting it here because it may be useful to technical writers who read this blog.
The post discusses a few of the reasons why we may want to write our documents on a wiki and then publish them to other formats, or conversely write the documents using another tool and then publish them to a wiki as one of the delivery formats.
Next, the post recommends some good tools for converting content from these formats into Confluence wiki format:
- From Microsoft Word to Confluence wiki
- From Adobe FrameMaker to Confluence wiki
- From DITA XML to Confluence wiki
And some tools for converting content from a Confluence wiki into these formats:
- From Confluence to PDF
- From Confluence to Microsoft Word
- From Confluence to HTML
- From Confluence to XML (Confluence-specific format)
- From Confluence to DocBook XML
- From Confluence to Eclipse Help
- From Confluence to JavaHelp
In case it’s useful, there’s also a post I wrote a while ago about getting content into and out of wikis. That post looks at a couple of other wikis as well as Confluence, and covers a wider range of tools. The new post on the Atlassian blog is more up to date and is specifically about conversion tools to and from Confluence.
If you’re interested, mosey on over to the Atlassian blog and take a look. I’d love to hear your experiences with the tools mentioned in the blog post, or if you’ve used any other tools or need any other conversions. What did I miss out? There’s an interesting discussion going on already. Here’s the link again: Technical writing in a wiki – single source publishing.
A couple of weeks ago I attended AODC 2010, the Australasian Online Documentation and Content conference. We were in Darwin, in Australia’s “Top End”. This post is my summary of one of the sessions at the conference and is derived from my notes taken during the presentation. All the credit goes to Dave Gash, the presenter. Any mistakes or omissions are my own.
This year’s AODC included a number of useful sessions on DITA, the Darwin Information Typing Architecture. I’ve already written about Tony Self’s session, an update on DITA features and tools, and about Suchi Govindarajan’s session, an introduction to DITA.
Now Dave Gash presented one of the more advanced DITA sessions, titled “Introduction to DITA Conditional Publishing”.
At the beginning of his talk, Dave made an announcement. He has presented in countries all over the world, many times, and he has never ever ever before done a presentation in shorts!
Introducing the session
To kick off, Dave answered the question, “Why do we care about conditional processing?” One of the tenets of DITA is re-use. You may have hundreds or even thousands of topics. In any single documentation set, you probably don’t want to publish every piece of the documentation every time.
Conditional processing is a way to determine which content is published at any one time.
Dave’s talk covered these subjects:
- A review of DITA topics, maps and publishing flow
- The use of metadata
- The mechanics of conditional processing
- Some examples
Metadata and the build process
Dave ran us through a quick review of the DITA build process and the concept of metadata. Metadata has many uses. Dave talked specifically about metadata for the control of content publication.
Metadata via attributes
There are a number of attributes available on most DITA elements. These are some of the attributes Dave discussed:
- audience – a group of intended readers
- product – the product name
- platform – the target platform
- rev – product version number
- otherprops – you can use this for other properties
Using metadata for conditional processing
Basically, you use the metadata to filter the content. For example, let’s assume you are writing the installation guide for a software application. You may store all the instructions for Linux, Windows and Mac OS in one file. When publishing, you can filter the operating systems and produce separate output for each OS.
In general, you can put metadata in these 3 locations (layers):
- maps – metadata on the <map> element. You might use metadata at this layer to build a manual from similar topics for specific versions of a product.
- topics – metadata to select an entire topic. You might use metadata at this layer to build a documentation set for review by a specific person.
- elements – metadata on individual XML elements inside a topic. You might use this metadata to select steps that are relevant for beginners, as opposed to intermediate or advanced users.
Dave gave us some guidelines on how to decide which of the above layers to use under given circumstances.
Defining the build conditions to control the filtering
Use the ditaval file to define the filter conditions. This file contains the conditions that we want to match on, and actions to take when they’re matched. The build file contains a reference to the ditaval file, making sure it drives the build.
Dave talked us through the <prop> element in the ditaval file, and its attributes:
- att – attribute to be processed
- val – value to be matched
- action – action to take when match is found
A hint: You can use the same attribute in different layers (map, topic and element). Also, you don’t need to specify the location. The build will find the attributes, based on the <prop> element in the ditaval file.
Next we looked at the “include” and “exclude” actions. Remember, the action is one of the attributes in the <prop> element, as described above. Here’s an example of an action:
<prop att="audience" val="novice" action="exclude" />
Dave’s recommendation, very strongly put is:
Don’t use “include”. Stick to “exclude”.
The basic rule is: Everything not explicitly excluded is included.
Dave’s final recommendation
Go get DITA and play with it!
It was great to have a focus on the conditional publishing side of DITA. It’s something I haven’t had a chance to get into before. Now I know the basics, which rounds off the DITA picture for me. Thank you Dave for an entertaining and information-packed talk.