ffeathers — a technical writer’s blog

AODC - separating content, structure, format and behaviour

Posted in AODC, technical writing, xml by ffeathers on May 16th, 2008

This week I’m attending the Australasian Online Documentation and Content Conference (AODC) on the Gold Coast in Queensland, Australia. With his inimitable flair and style, Dave Gash presented a session this morning entitled “The Search for the UA Grail: True Separation of Content, Structure, Format and Behaviour”.

Dave is the owner of HyperTrain dot Com, specialising in training and consulting for hypertext developers. Today he told us what’s wrong with the way we traditionally do things, what’s wrong with the conventional wisdom on how we might improve our way of working, and what’s a better way.

What’s wrong with the traditional way we do things

The basic problem is that we write our content, e.g. a web page, and tweak elements of it on the fly. For example, we might make some text bold, or colour other text, or whatever. The result is spaghetti code — difficult to maintain and share.

What’s wrong with the conventional wisdom on improving the above situation

People usually say we should “separate format from content”. But what is “format”? That term is too vague. And the phrase implies that everything that’s not “content” is “format”. Wrong.

The better way

We should separate our document into four components instead of two:

  • Content (which you might realise via XML)
  • Structure (XSLT)
  • Format (CSS)
  • Behaviour (JavaScript)

What we’re aiming for:

  • Maintainability — you can change one of the above four components without breaking the others.
  • Re-usability — you can re-use the same bit of JavaScript, for example, in other documents.
  • Separation of skill sets — different people can work on the component they know best and enjoy most.
  • Simplified updating of content — content is likely to be the component you update most often.

How to do it

Dave demonstrated the procedure we would follow in order to separate a document into the above four components. There are five basic steps. Dave walked us through the details of each step, using code examples of CSS, JavaScript, XML and XSLT. In summary, the steps are:

  1. Identify all JavaScript and move it to an external JS file.
  2. Identify JavaScript that could be better done in CSS. Examples are “onmouseover” and “onmouseout” event handlers that change the style of the text item, and image swaps. Use the CSS “hover” pseudo-class instead.
    • Dave’s tip: You don’t have to specifically code the “I’m through hovering” handler because it’s implicit in the pseudo-class.
  3. Move all CSS styles to an external file. Convert local formatting to classes too.
    • Dave’s tip: If the boss says “Change the list spacing in all lists on all pages”, it’s in one spot — change it and take the rest of the day off.
  4. Add semantic markup to the content, using XML.
  5. Now it’s time for some XSLT. Identify the output HTML you want, then write the XSL transforms to produce it.
    • Write small, individual templates to create the HTML for each specific XML tag. Then use the “magic” <xsl:apply-templates/> element to pull it all together. This nests the processing of the templates, so that the transforms will just keep happening for each XML element, hierarchically down and back up the tree, until they’re all done.

The XSLT generates the HTML and links in the CSS and JavaScript.

Dave has made the code available on the “Downloads” tab of the HyperTrain dot Com web site.

A recommended editing tool: EditPlus.

Thank you for a great session, Dave. And a special thanks for changing “behavior” to “behaviour” throughout your presentation, just so that we Ozzies felt comfortable ;)

AODC - a DITA case study

Posted in AODC, open standards, technical writing, xml by ffeathers on May 14th, 2008

This week I’m attending the Australasian Online Documentation and Content Conference (AODC) on the Gold Coast in Queensland, Australia. In one of the sessions today, Sarah Goodall walked us through a DITA application developed and implemented by Tactics Consulting under consultation with Tony Self from HyperWrite.

This session was interesting as a practical application of DITA and as an application which is not primarily a technical documentation system. Tactics developed a DITA-based system to create and store proposals. Before the new system was implemented, the creation of proposals was largely a manual process involving copying and pasting information from similar proposals in Word documents. This could lead to inconsistencies and even inaccuracies.

The proposed solution was a single-source XML-based solution. Tactics chose DITA because they saw a lot of commonality between DITA and Information Mapping, the technology used and promoted by Tactics. It was relatively straight forward to map DITA information types to the Information Mapping types. Also, Tactics was keen to use an open source solution, and one that they can share with their customers.

As a next step, Tactics plan to move even further into single sourcing, by drawing their web site content from the same source as their proposal documents, brochures, etc.

Sarah’s presentation was interesting from the technology side of things. She also mentioned other aspects of the project, such as change management — moving the staff to the new procedures and technologies.

A couple of snippets:

  • Elkera XML Print is an Australian product that renders DITA into Microsoft Word.
  • I raised the point that people often don’t like to see the same information thrown back at them in different media. For example, if they see some information on a brochure they might go to the web site for more in-depth information or to see the same information but worded differently. They find it annoying to see exactly the same words. So while you can use conditional tagging to include more or less information, would you also need to ensure that the wording itself is different and is this a design consideration? Sarah Goodall replied that Tactics will continually test their users’ responses to the content, and supply different content for different media if necessary.

Thank you Sarah (Goodall) for an interesting and useful session!

BTW, there are eight Sarahs at this session, in a total of 65 attendees :)

AODC - web technology and standards

Posted in AODC, open standards, technical writing, xml by ffeathers on May 14th, 2008

This week I’m attending the Australasian Online Documentation and Content Conference (AODC) on the Gold Coast in Queensland, Australia. One of the sessions today covered web technology and standards, presented by Joe Welinske.

Joe is the president of Seattle-based WritersUA. This is the second of his sessions that I’ve attended. The first is covered in my earlier blog post, on trends, tools and technologies in online documentation. Today’s session was more technical, covering various standards in the following groups:

  • W3C standards such as HTML, XHTML, XML/XSL, Web Accessibility Initiative, CSS.
  • W3C hybrids such as HTML 5 (more below), AJAX, RSS and jQuery.
  • OASIS technologies — DocBook and DITA.
  • Other open source technologies — Oracle Help, IBM WebSphere.
  • Proprietary technologies that are still relevant and useful because they are so widely adopted and stable, such as Adobe PDF, AIR and even Microsoft Silverlight (emerging)

As well as talking about the above standards, Joe discussed each one’s possible application to technical communication.

In this blog post, I’ve extracted the subjects that were new to me and a couple of interesting items. Joe covered a lot more than I’ve mentioned here.

HTML 5

HTML 5 is an emerging standard that Joe feels we need to keep on our radar. It’s also known as “Web Applications 1.0″. It includes capabilities that are relevant to technical writers. Specifically, it supports new modular objects as part of a web page e.g. <aside>, <article>, <nav>. So a web page can recognise chunks of content in a non-linear way. Here’s a link Joe gave us to a demonstration: HTML 5 Support by Browser

I found this really interesting, after all the discussion in previous sessions about modular documentation and structured authoring.

Joe thinks that, because HTML 5 has a high amount of interest from some big players, it will probably go ahead.

A question from the floor led to a discussion around HTML’s leniency as opposed to the strictness of XML / XHTML. So HTML 5 may meld HTML’s leniency with the semantic tagging provided by XML. This is potentially useful for the less-technically-savvy authors, because the browsers and other viewers will be instructed to render a page if at all possible even if it contains formatting errors.

Other snippets

Some more interesting items from Joe’s talk:

  • Comparison of XML rendering via XSL versus CSS: Printing XML: Why CSS Is Better than XSL.
  • Neat use of Flash for an online tutorial, from Verizon Wireless — demonstrates how to use a mobile phone. Click the ‘Interactive “How To” Simulator’. It has a movie of a hand clicking the buttons, plus a block of text in the right-hand panel. The text is in sync with the movie, and you can influence the movie by clicking the text.

My conclusions

Joe covered a huge area in this short session, and his knowledge is huge too. Thank you Joe! The next two days of the conference include other sessions with more detail on some of the areas which Joe has introduced, including one on AIR (Adobe Integrated Runtime) tomorrow.

AODC - DITA workshop

Posted in AODC, open standards, technical writing, xml by ffeathers on May 13th, 2008

This week I’m attending the Australasian Online Documentation and Content Conference (AODC) on the Gold Coast in Queensland, Australia. This morning Tony Self hosted a workshop entitled “Introduction to DITA”. I learned a lot, about the DITA schema itself and especially about its application and the team and management structures you might need if you’re planning a large documentation project using structured authoring.

Tony is a founding partner of HyperWrite. He has a wide experience in technical documentation projects and is a skilled and engaging presenter.

A point of interest: Much of the HyperWrite web site is maintained in DITA and is transformed to XHTML when rendering the web page.

This was an excellent session, and there’s far too much content to cover in a blog post. Here are some highlights from Tony’s lecture and the discussions within the class.

Introductory points

These are items Tony discussed before we dived into DITA itself.

  • A stated aim of XML is that all human knowledge should be stored in XML. Wow, I hadn’t heard that before. That beats the theory of everything!
  • We compared DITA with DocBook. DocBook was developed by O’Reilly as a means of making their publishing process more efficient. It’s now an open standard maintained by OASIS. DITA was originated by IBM, but is now also maintained by OASIS.
  • DITA is topic-based. A book, or other publication, is a “collection” of topics. DITA is usually thought to be better for procedures, help systems, and other types of documentation with can be broken down into chunks.
  • DocBook is document-based — you might write an entire book in one single file. It’s generally thought to be better for books etc.

Structured authoring

Structured authoring is a whole new way of writing. It requires new procedures, new team structures and new ways of thinking about content. The advantages you gain are things like:

  • Content re-usability (one chunk of information can be pulled into multiple different documents)
  • Single sourcing (documentation stored in one place and format; can be published to multiple formats)
  • Separation of content from presentation

DITA is designed specifically for structured authoring.

A question arose: How can writers make sure that the content they write will fit into the context in which it is used? Tony agreed that this is a concern, and one that is often discussed. In practice, the writer will often be given some context. He says that this concern is probably not as much of a problem as it appears up front.

This discussion brought to mind a web site I saw a few weeks ago, which allows DITA authors to load their documents (i.e. topic collections) onto a platform where others can review the output in its final form. The web site is supplied as a SAAS application. I can’t remember the site URL, and Googling it hasn’t helped. Does anyone know of this site?

DITA itself

Now we dived into the details of content re-use; repurposing via transformations; inclusion/exclusion of conditional text and the creation of a content model.

A question arose: Talking about the content model and the DITA schema, what do you do if the DITA schema does not contain the elements you need for your particular application? In this case, the example was taken from the RADAR equipment industry. Interestingly enough, a committee is currently sitting to design a new content model for the machine industry, which may contain the sort of “warning” elements required by the questioner.

In the wider context, DITA architecture is designed to encourage “specialisation”. You create your own elements to suit your needs. When you need to share your content model with others, DITA supports a process called “generalisation” to make this possible too. The core concepts of “evolution”, “specialisation” and “generalisation” are implicit in the name “Darwin Information Typing Architecture” or DITA.

We did some work on the basic elements of the DITA schema. A point of interest: One of the attendees noticed that many of the DITA elements are similar to the Dublin Core. Tony said that this is by design, for easy interchange.

The big debate around stem sentences ;)

DITA does not allow stem sentences! (Oh dear, the things that technical writers worry about.)

A stem sentence is the short introduction that you might put at the top of a bulleted list, for example. Like this:

To wash the dishes:

  1. Put the plug in the plug hole.
  2. Turn on the tap.
  3. …etc…

In DITA, there’s no legitimate way to add the above phrase “To wash the dishes”. This has led to fiery debate in the documentation community. No doubt it will continue to do so. You could cheat and add the phrase into a <p> element within the <context> element that DITA does allow before the <steps> in a <task>. But this has problems:

  • The stem sentence will not be included in the list of <steps>, if you use the <steps> as a unit outside the <task>.
  • The stem sentence will be an ugly orphan at the end of <context> if you use <context> as a unit outside the <task>.
  • The stem sentence probably duplicates the <title> anyway.

Oh dear oh dear.

So what do you think — are stem sentences a Good Thing or a Bad Thing?

The future

Tony asked what reading formats we might be using in 2010, for example. He mentioned Sony’s BBeB (BroadBand electronic Books). Now all we need is a transformation from DITA to BBeB.

A fun simulation

Try this out: http://www.structuredauthoring.com/simulation

First, do the “Interactive Puzzle” in Part 1. Follow the instructions in the left-hand panel. You will remove all formatting from a web page, bit by bit. Now you can see how difficult the information is to assimilate when the presentation layer has been removed and there’s no semantic tagging. Then Part 2 lets you practise some structured authoring.

A DITA project team

These are the people who might be involved in a large DITA documentation project: the schema designer (if you need to add specialisations to the standard DITA schema); the information architect (creates the ditamap i.e. the structure of the documentation); information developers (these are the content authors); a publisher (defines the data transformation).

Tony pointed out that this specialisation of roles will actually take us back a few years, to before the desk-top publishing era. With DTP, most technical writers create the content, presentation, graphics, and so on. With structured authoring, the information developers are concerned only with the content.

The complexity in designing a topic-based documentation system

If your documentation base is very large, it’s a time-consuming and complex task to design and allocate the topics. Each topic may be re-used in multiple documents, and the topics are written by multiple authors. This needs very careful coordination and management.

Tools

Tony mentioned a number of tools for DITA and other structured authoring, including these:

  • XMetaL — authoring environment; an example of a DITA editing tool; includes a map editor (used to build your structure i.e. table of contents)
  • Task Modeler from IBM — for design of the maps
  • WebWorks — for defining the structure and publishing the DITA topics
  • Antenna House — converts ditamaps into PDF
  • Author-it — structured authoring; single sourcing; a mature product. But note that its DITA support is output only — you can output DITA but you can’t edit it within AuthorIT.

Tony has also developed his own publishing tool, using the DITA Open Toolkit. This tool handles the publication side of things, not the authoring or structuring. He is interested in any feedback you may have.

My conclusions

Thank you Tony for a very interesting session. I can certainly see the benefits of DITA as a storage format. The one thing I haven’t yet come across is that killer WYSIWYG editing front-end. With any luck, I might hear more about that in the next few days at the conference.

Document Freedom Day in Sydney 26 March 2008

Posted in open standards, technical writing, xml by ffeathers on March 29th, 2008

Document freedom — what’s that? It’s all about being able to read something you wrote a few years ago, and being able to read something that someone else has written — whether now or hundreds of years ago. Or even just knowing that you’ll be able to read what someone writes tomorrow. It’s all about freedom from the bounds that may be imposed by a proprietary document format.

On Wednesday this week, a group of us got together at the Google offices in Sydney to swap stories and ideas and to kick off the Sydney team for the Document Freedom Day initiative. The immediate aim is to raise awareness of the problems and to promote the idea of open standards for document formats.

We carefully, almost, didn’t mention Microsoft’s Open Office XML (OOXML) document format and its bid to get it declared an ISO standard.

The problem

If your writing is encased in a proprietary format, then to a certain extent you are at the mercy of the owners of that format. If they abandon backwards compatibility, the world will move on without really taking note of that event. A few years later, no-one will be able to read your work. Even worse, you may be unable to find some essential information that you know is out there, but is hidden from you because you’re using a different technology.

The meeting

The Sydney meeting was one of 200 similar events happening in 60 countries on the same day.

We started with a short introduction from each of the three sponsors:

  • Alan Noble, chief of engineering at Google Australia, was a debonair and skilled MC. His introduction was accompanied by a fair bit of wry humour, mentioning ‘notorious formatting incompatibilities, without naming specific software suites or operating systems’. He raised the simple question: Who owns the data? And he issued
    • a challenge to software engineers to come up with a format that will be readable for the next 1000 years.
  • Holly Raiche from Internet Society of Australia pointed out that part of the need is to educate and assist people who are concerned about interoperability. Many people are worried, but feel that they don’t know enough and are worried about seeming foolish.
  • Sridhar Dhanapalan from the Sydney Linux Users Group said that open formats tie in with open source. He looked at the Magna Carta and the Domesday Book — low tech, but we can still read them centuries later — and compared them to the BBC Domesday Project, putting the book onto laser disks which were unreadable 16 years later.

The two main speakers of the evening were Kate Lundy and David Vaile. Both so different, and both so interesting.

Kate Lundy — Wow, what a dynamo! She is Senator for the Australian Capital Territory, and has a strong interest in information technology, the National Archives, open technology and the laws governing freedom of information. Her talk was short and pithy. One of the main points I got out of it is this (my synopsis, not a direct quote):

With the recent change in the Australian Federal Government comes a unique opportunity for creative change. We should seize this opportunity to promote a drive towards open standards — particularly within the government services themselves. The National Archives, for example, have a range of rules governing standards for document formats, such as metadata.

Kate drew a parallel between the government’s ‘New Federalism’ (breaking down boundaries and sharing responsibilities) and the drive for open standards. I’m not so sure I get the comparison, but it was great to hear her enthusiasm, commitment and ideas.

David Vaile — The enthusiastic self-professed devil’s advocate. He was determined not to mention the war, and mostly succeeded. David is executive director of the Cyberspace Law and Policy Centre at the University of New South Wales. His talk covered a wide range of information. He divided the topic into three areas:

  • Open content (public domain; creative commons and free for education licences; Google’s friendly acquisition and acquiescence; crown copyright etc; and hybrids of the above)
  • Open source and free/libre software
  • Open standards and formats

There are some disquieting ins and outs to all this. Here are just some of the things David raised, in his role as devil’s advocate. Should we consider standards as ‘legislation lite’? Look at the wars going on within big companies. Even the Microsoft-versus-the-world standards wars are not black and white. Yes, we did mention the war after all ;) Look at the process of defining the standards — does it work? Does the buzzword ‘open’ mask other interests? What does ‘free’ actually mean — does it include the rider ‘but within the boundaries laid down by me’? Are we in danger of moving towards extremism?

David urged us always to take a step back (actually, it was more like being gently shoved back) and be sceptical. He challenged us to think in terms of 500 years — especially for archives and such.

After the speeches, we held a short question time. I thought these two were the most interesting:

  • How can we participate in the Document Freedom initiative, moving on from this meeting? Kate Lundy answered that the online facilities were being set up.
  • How does cloud computing affect the openness of documents? There was quite a discussion around this point. Does the cloud get around the problem of obsolescence? Who owns the data? Bandwidth is costly, especially in some areas of the world. Governments, including the Australian government, have a risk aversion to technology — but this may not be a bad thing. We should look at a dual model, i.e. cloud plus local presences. The concepts of ‘control of data’ and ‘location of data’ should be separated because they actually have nothing to do with each other. (You may think you control your data if it’s on your own hard disk. But what happens when your disk crashes, or becomes obsolete?)

My conclusions

This was really interesting. I’m staying tuned.

One thing I’d like to get deeper into, is the open formats themselves. At the meeting, we concentrated a fair bit on the content of documents (ownership and security) rather than the actual format. We did mention ODF and HTML. What about XML formats like DITA and DocBook? Are they mature enough for mainstream use? If not, what can we do to promote them? And why do WYSIWYG editing tools always seem to lag behind? Why is transformation between formats so difficult (e.g. XSLT) — is this stuff just for geeks ;)

Update on 1 April: Check out this excellent comparison of DocBook and DITA by Teresa Mulvihill.

After the formal meeting, most people went on to dinner and informal networking. I had been awake since 01:30 that morning (yes, that’s just after midnight — and don’t ask why, because it’s nothing exciting) so I left. I guess I probably missed out on some really cool stuff. If anyone who was there reads this blog, please let me know what happened.