Blog Archives

Out of print: “Confluence, Tech Comm Chocolate”

A few months ago, I asked my publisher to take my Confluence wiki book out of print. The book is titled “Confluence, Tech Comm, Chocolate: A wiki as platform extraordinaire for technical communication”. It takes a while for the going-out-of-print process to ripple across all the sources of the book, but by now it seems to have taken effect in most sellers.

solong_300pxWhy did we decide to take the book out of print? I’m concerned that it no longer gives the best advice on how to use Confluence for technical documentation. The book appeared early in 2012, and applies to Confluence versions 3.5 to 4.1. While much of the content is still applicable, particularly in broad outline, it’s not up to date with the latest Confluence – now at version 5.6 and still moving fast. I thought about producing an updated edition of the book. But because I don’t use Confluence at the moment, I can’t craft creative solutions for using the wiki for technical documentation.

Here are some sources of information, for people who’re looking for advice on using Confluence for technical communication:

  • If you have a specific question, try posting it on Atlassian Answers, a community forum where plenty of knowledgeable folks hang out.
  • Some of the Atlassian Experts specialise in using Confluence for technical documentation. The Experts are partner companies who offer services and consultation on the Atlassian products. The company I’ve worked with most closely on the documentation side, is K15t Software. I heartily recommend them for advice and for the add-ons they produce. For example, Scroll Versions adds sophisticated version control to a wiki-based documentation set.
  • AppFusions is another excellent company that provides Confluence add-ons of interest to technical communicators. For example, if you need to supply internationalised versions of your documentation, take a look at the AppFusions Translations Hub which integrates Confluence with the Lingotek TMS platform.

A big and affectionate thank you to Richard Hamilton at XML Press, the publisher of the book. It’s been a privilege working with him, and a pleasure getting to know him in person.

For more details about the book that was, see the page about my books. If you have any questions, please do add a comment to this post and I’ll answer to the best of my knowledge or point you to another source of information.

The Making of “The Language of Content Strategy” (stc14)

This week I’m attending STC Summit 2014, the annual conference of the Society for Technical Communication. Where feasible, I’ll take notes from the sessions I attend, and share them on this blog. All credit goes to the presenters, and any mistakes are mine.

The Language of Content Strategy is a book by Scott Abel and Rahel Anne BailieIn this session at STC Summit 2014, Scott Abel discussed the content strategy, tools and technologies behind the making of the book.

Problem statement

Time, money, skills and experience are in short supply. Hand-crafting content is expensive, time consuming and not scalable.

The demands of the audience are changing. People use social media, rather than going to a specific website to gather information.

To meet the demands of content delivery today, we need to adopt manufacturing principles. The is made possible by content engineering: The application of engineering discipline to the design and delivery of content.

Case study: The making of  The Language of Content Strategy

In this session, Scott will show us how he and Rahel created a book, an eBook, a website, and a set of learning materials, from a single source, without breaking the bank. They did it by harnessing technology and crowd sourcing.

Scott talked about the differences in approach between technologists and editorialists. Conflict and time wasting arise because of a lack of a common language. Rahel and Scott wanted to craft a solution: A crowd-sourced book about content strategy that is both a case study in content engineering and a practical example of content marketing.

Setting up

The team started with careful analysis of the educational landscape, contributors, and more. Then they defined the content types they needed.

  • The smallest unit of content they would create would be a term and definition pair.
  • Another content type is an essay of 250 words.
  • Then there are contributor bios, statements of importance, and resources.

For the authoring environment, the team selected Atlassian Confluence. It’s a wiki with support for XML content re-use.

They also chose a gimmick: 52. The project included 52 terms, 52 definitions, by 52 experts, published over 52 weeks, and one of the output formats was 52 cards.

Then they selected a team of experts: the best and brightest in tangentially-related fields.

Other roles and responsibilities: markup specialist, editor, indexer, peer reviewers, and a graphic artist.

The source data

The source was authored in Confluence wiki. The content types are clearly labelled: Biography, importance statement, topic name, definition, etc.

The output

In the various output formats, the content is structured differently but still consists of the various topic types. For example, in the printed book every chapter is two pages long, and consistently structured. The eBook format is slightly different, as are the website format and the flash cards learning format.

Each Thursday, one chapter is automatically published. The web output also contains audio files, photos, and additional resources that are not contained in the book.

The advantages of a future-proofed content strategy

The team was able to add content after the fact, such as the audio files for accessibility. The content strategy was designed to future proof the content, so the team was able to adjust to challenges and opportunities. And the strategy is repeatable. Now that it’s been done, it can be done again.

Scott told an amusing story of how he disobeyed his own rules, and tried to create another channel by copying and pasting instead of using the single-sourced content. A marketing person asked him to create a slide deck from the content. He was on a plane, without WiFi, so decided to do it by cutting and pasting. Needless to say, this didn’t work. By the end of the flight he had only 13 slides of the required 52, and had run out of laptop battery!

Cost

The cost of the project came in at under $10,000USD.

  • Approximately $4000USD forgraphic design, indexing, editing, markup assistance, audio tracks and hosting, the URL for the first year, and site hosting for a year.
  • Approximately $5,440 for book donations, postage, Adobe InDesign, Confluence Wiki, and overhead/administrative costs.

Scott’s promise

Scott finished by saying that if you want to undertake a similar project, ask him. He will try to help.

This was a fun and inspiring talk. Thanks Scott!

Moved – tips on Confluence editor and XML storage format

Graham Hannington’s advanced tips on the Confluence editor and XML storage format have moved to a new site: Advanced Confluence tips on the Knowledge Workers Wiki.

The pages were previously housed on the wiki associated with my book, Confluence, Tech Comm, Chocolate. That wiki is now shut down, but the tips live on!

What tips?

These are the tips currently available:

Thanks

Many thanks to Martin Cleaver at Blended Perspectives, for hosting this treasure trove of tips. And many thanks also to Graham Hannington, for all the work and insight he’s put into investigating and documenting the tips.

Looking for a Confluence wiki to play with while reading the book?

If you’re reading, Confluence, Tech Comm, Chocolate, you may want a wiki to try out the techniques described in the book. For the first 18 months after publication, a Confluence, Tech Comm, Chocolate wiki site was available for readers to experiment with. That site is no longer available. If you like, you can get a free evaluation licence from Atlassian, to experiment with Confluence.

Early spring flowers in the Australian bush

Flowers from a recent walk in the Australian bush. Early spring.

How to manage attachment usage in Confluence wiki with some Python scripts

Do you need to find out whether the attachments on a Confluence wiki page are used anywhere in the space? Having discovered they’re not, do you want to delete them from the page? I’m hoping this post will help.

The Confluence user interface doesn’t offer the option to delete attachments in bulk. Nor does it offer any way of cross-referencing attachment usage. You can’t get a list of attachments and find out where they’re used. So, I’ve written four Python scripts that you can run consecutively to do the following:

  • Get a list of all attachments on a given page.
  • Get the content of all pages in a given space.
  • Produce two reports, one listing the attachments that are not referenced anywhere in the space, and the second showing the attachments that are referenced and the pages that use them.
  • Accept a list of attachment names and delete them from a given page.

Our use case

In the Confluence documentation we have a page called Space Attachments Directory. It’s been there for yonks. It has an enormous number of screenshots attached to it (396, to be precise). The page was created in 2005, with the aim of storing screenshots that can be re-used on various pages. A good aim in principle, but in practice unmanageable when applied across a large space maintained by many authors. Various technical writers over the years have either used or not used this page and its attachments.

As a result, we didn’t know how many of the attachments are actually used anywhere in the space. I suspected that only a few of the attachments were still in use.

Python to the rescue.

The scripts

The four Python scripts are available on Bitbucket. Please feel free to download and use them. If you have any suggestions for improvement, I’d love to hear them.

A friendly warning: These scripts are provided “as is” and without any guarantees. I developed them to solve a specific problem. I’m sharing them because I hope they will be useful to others too. If you have any improvements to share, please let me know.

1. getConfluencePageAttachments.py: Gets all attachments on a given Confluence page. It puts the list of attachments into a text file, and prints a report of the number of attachments and total file size.

2. getConfluencePageContent.py: Gets the content of all pages in a given Confluence space. It puts the content of each page into a separate text file, in a given directory. The content is in the form of the Confluence “storage format”, which is a type of XML consisting of HTML with Confluence-specific elements. A note for the curious: The “wherePageContent.py” script is a dummy, which simply tells you where to find getConfluencePageContent.py, which I wrote for a different purpose and which works well here too. (We need content re-use on Bitbucket!)

3. findAttachmentUsage.py: Reads a text file containing attachment file names, matches them against the source of Confluence pages, and produces a report on used and unused attachments.

4. deleteAttachments.py: Reads a text file containing attachment file names, accepts a Confluence page name, and removes the given attachments from the page.

Note: To run scripts 1, 2 and 4 successfully, you need access to Confluence, and the Confluence remote API must be enabled. Script 3 does all its work in text files. It’s like greased lightning.  :)

So, in my use case, how many of the attachments are actually used?

71

That’s right. Of the 396 attachments on the “space attachments directory” page, only 71 are still in use. The other 325 are taking up space on our documentation wiki, taking up space in our XML exports, and slowing down our processes when we copy the Confluence documentation to the OnDemand space.

What’s next?

After some final testing, I’ll run the scripts on our production wiki next week. The first candidate is the Space Attachments Directory page. We’ll look at other pages that have a large number of attachments too.

The findAttachmentUsage.py script produces a cross-referenced list of matched attachments and the pages that reference them. We may use that cross-reference to decide whether we want to retain the “space attachments directory”. We may decide instead to move all the attachments to the pages where they’re used, and remove the shared page.

How to run the Python scripts

New to Python? It’s fun, and remarkably easy. This earlier post describes how to download and use Python: Confluence full-text search using Python and grep. There’s more about Python, and some interesting comments from readers, on this post: Python as a useful tool for technical writers.

Python as a useful tool for technical writers

Every now and then, and perhaps particularly so when working on a wiki, we technical writers need to manipulate our content in some way that’s not provided by our content management system. A few times recently, I’ve dabbled with Python to solve some problems. Do you often find the need to wrangle your content outside your CMS, and do you use Python or another scripting tool?

Python is a scripting language. It’s easy to learn, especially if you’ve done some programming in other languages. It’s just the ticket for data manipulation. It also offers a number of useful libraries. For example:

  • There are various libraries that you can use to access a web application via a SOAP or an XML-RPC remote API. I use the “xmlrpc.client” library in a few scripts, to get access to Confluence data.
  • The “os” library is useful for creating directories on the local file system of the computer you’re running on. For example, I use it to create a directory for the script’s output file.
  • The “re” library offers regular expression functions.

A script to find duplicate page names across Confluence spaces

This was the first Python script that I wrote to wrangle Confluence data. I started with a specific problem: I had five text files, each containing a list of page names. These were the pages in five Confluence spaces, that we needed to copy into another, single space. The problem is that Confluence does not allow duplicate page names within a space. So I needed to check my lists for matching page names.

I hacked together a Python script that checked for duplicate page names. The script reads a text file containing Confluence space keys and page names, and reports on duplicate page names. My first script used nested lists to store and compare the page names. A kind Atlassian developer reviewed the script and suggested I use a dictionary instead. So I did. A dictionary stores data in key-value pairs. Much neater!

Then I thought: Some people may not have their page names in a handy text file. They may want to get a list of all pages in a Confluence space. So I wrote a script to get the names of all pages in a given set of Confluence spaces.

The details of the scripts are in this post: How to find duplicate page names across Confluence spaces.

A script to get the source code of all pages in a Confluence space, for a full-text search

The search functionality in the Confluence web interface will return results from the visible content of the page, but it cannot get inside the XML-like elements that make up the Confluence storage format. For example, it’s not possible to find all pages that reference a certain image. And you can’t search for macro parameter values. This means, for example, you can’t search for all pages that include content from a given page.

Just recently I wrote a script that gets the XML storage format of all pages in a given Confluence space, and puts the code into text files on your local machine. Then you can use a powerful full-text search like grep, to find what you need. The details are in this post: Confluence full-text search using Python and grep

More on the way

I’m currently writing a couple more Python scripts to solve another problem. I’ll blog about it when I’ve finished.

Resources

If you’re interesting in Python, here are some links you many find useful:

A chuckle, courtesy of the Python technical writers

From the Python documentation:

By the way, the language is named after the BBC show “Monty Python’s Flying Circus” and has nothing to do with reptiles. Making references to Monty Python skits in documentation is not only allowed, it is encouraged!

Probably not a python

Last week I was lucky enough to be in New Orleans in the USA. I went on a tour of the Honey Island swamp, and saw this snake coiled comfortably on a tree trunk. I’m not sure what type of snake it is. Maybe a Copperhead:

Python for technical writers

What do you use?

Do you often use Python or some other scripting tool to automate those pesky tasks your CMS can’t handle?

Follow

Get every new post delivered to your Inbox.

Join 1,494 other followers

%d bloggers like this: