Blog Archives

Python as a useful tool for technical writers

Every now and then, and perhaps particularly so when working on a wiki, we technical writers need to manipulate our content in some way that’s not provided by our content management system. A few times recently, I’ve dabbled with Python to solve some problems. Do you often find the need to wrangle your content outside your CMS, and do you use Python or another scripting tool?

Python is a scripting language. It’s easy to learn, especially if you’ve done some programming in other languages. It’s just the ticket for data manipulation. It also offers a number of useful libraries. For example:

  • There are various libraries that you can use to access a web application via a SOAP or an XML-RPC remote API. I use the “xmlrpc.client” library in a few scripts, to get access to Confluence data.
  • The “os” library is useful for creating directories on the local file system of the computer you’re running on. For example, I use it to create a directory for the script’s output file.
  • The “re” library offers regular expression functions.

A script to find duplicate page names across Confluence spaces

This was the first Python script that I wrote to wrangle Confluence data. I started with a specific problem: I had five text files, each containing a list of page names. These were the pages in five Confluence spaces, that we needed to copy into another, single space. The problem is that Confluence does not allow duplicate page names within a space. So I needed to check my lists for matching page names.

I hacked together a Python script that checked for duplicate page names. The script reads a text file containing Confluence space keys and page names, and reports on duplicate page names. My first script used nested lists to store and compare the page names. A kind Atlassian developer reviewed the script and suggested I use a dictionary instead. So I did. A dictionary stores data in key-value pairs. Much neater!

Then I thought: Some people may not have their page names in a handy text file. They may want to get a list of all pages in a Confluence space. So I wrote a script to get the names of all pages in a given set of Confluence spaces.

The details of the scripts are in this post: How to find duplicate page names across Confluence spaces.

A script to get the source code of all pages in a Confluence space, for a full-text search

The search functionality in the Confluence web interface will return results from the visible content of the page, but it cannot get inside the XML-like elements that make up the Confluence storage format. For example, it’s not possible to find all pages that reference a certain image. And you can’t search for macro parameter values. This means, for example, you can’t search for all pages that include content from a given page.

Just recently I wrote a script that gets the XML storage format of all pages in a given Confluence space, and puts the code into text files on your local machine. Then you can use a powerful full-text search like grep, to find what you need. The details are in this post: Confluence full-text search using Python and grep

More on the way

I’m currently writing a couple more Python scripts to solve another problem. I’ll blog about it when I’ve finished.

Resources

If you’re interesting in Python, here are some links you many find useful:

A chuckle, courtesy of the Python technical writers

From the Python documentation:

By the way, the language is named after the BBC show “Monty Python’s Flying Circus” and has nothing to do with reptiles. Making references to Monty Python skits in documentation is not only allowed, it is encouraged!

Probably not a python

Last week I was lucky enough to be in New Orleans in the USA. I went on a tour of the Honey Island swamp, and saw this snake coiled comfortably on a tree trunk. I’m not sure what type of snake it is. Maybe a Copperhead:

Python for technical writers

What do you use?

Do you often use Python or some other scripting tool to automate those pesky tasks your CMS can’t handle?

Confluence full-text search using Python and grep

The standard search in Confluence wiki searches the visible content of the page. It also offers keywords for some specific searches, such as macro names and page titles. But sometimes we need to find things that the search  cannot find, because the content of the relevant XML elements is not indexed. This post offers a solution of sorts: Copy the XML storage format of your pages into text files on your local machine, then use a powerful search like grep to do the work.

Here are some examples of the problem:

  • We may want to find all pages that reference a certain image, or other attachment. It’s easy enough to find the page(s) where the image is attached. But it’s not possible to find all pages that display a given image which is attached to another page.
  • It’s possible to search for all occurrences of a macro name, using the macroName: keyword in the search. But it’s not possible to search for parameter values. This means, for example, you can’t search for all pages that include content from a given page.

I’ve written a script to solve the problem, by downloading the storage format from Confluence onto your local machine, where you can use all sorts of powerful text searches. You’re welcome to use the script, with the proviso that it’s not perfect.

Python script: getConfluencePageContent

The script is in a repository on Bitbucket: https://bitbucket.org/sarahmaddox/confluence-full-text-search.

Note: To run the script successfully, you need access to Confluence, and the Confluence remote API must be enabled.

Installing Python

To run the script, you need to install Python. The scripts are designed for Python 3, not Python 2. There were fairly significant changes in Python 3.

  1. Download Python 3.2.3 or later: http://www.python.org/getit/
    (I downloaded python-3.2.3.amd64.msi, because I’m working on a 64-bit Windows machine.)
  2. Run the installer to install Python on your computer.
    (I left all the options at their default values.)
  3. Add the location of your Python installation to your path variable in Windows:
    1. Go to ‘Start’ > ‘Control Panel’ > ‘System’ > ‘Advanced system settings’
    2. Click ‘Environment Variables’.
    3. In the ‘System variables’ section, select ‘Path’.
    4. Click ‘Edit’.
    5. Add the following to the end of the path, assuming that you installed Python in the default location:
      ;C:\Python32
    6. Click ‘OK’ three times.
    7. Open a command window and type ‘python’ to see if all is OK. You should see something like this:

Confluence full-text search using Python and grep

Getting the script

Go to the Bitbucket repository and choose ‘Downloads’ > ‘Branches’, then download the zip file and unzip it into a directory on your computer.

Running the script to get the content of your pages

To use the getConfluencePageContent script:

  1. Enable the remote API (XML-RPC & SOAP) on your Confluence site.
  2. Open the getConfluencePageContent script in Python’s ‘IDLE’ GUI.  (Right-click on the script and choose ‘Edit with IDLE’.)
  3. Run the script from within IDLE. (Press F5.)
  4. The Python shell will open and prompt you for some information:
    • Confluence URL – The base URL of your Confluence site. If the site uses SSL, enter ‘HTTPS’ instead of ‘HTTP’. For example: https://my.confluence.com
    • Username – Confluence will use this username to access the pages. This username must have ‘view’ access to all the spaces and pages that you want to check.
    • Password – The password for the above username.
    • Space key – A Confluence space key. Case is not important – the match is not case-sensitive.
    • Output directory name – The directory where the script should put its results. The script will create this directory. Make sure it does not yet exist.
  5. Look for the output directory as a sibling of the directory that contains the getConfluencePageContent script. In other words, the output directory will appear in your file system at the same level as the script’s directory.

Python Shell

Python shell (IDLE)

 

Output of the script

The Bitbucket repository contains an example of the output, based on the Demonstration space shipped with Confluence. See the outputexample directory in the repository. For example, this file contains the content of the page titled ‘Welcome to Confluence’.

The script gets the content of all pages in the given Confluence space. It puts the content of each page into a separate text file, in a given directory.

The script creates the output directory as a sibling of the directory that contains the getConfluencePageContent script. In other words, the output directory will appear in your file system at the same level as the script’s directory.

The file name is a combination of the page name and page ID. To prevent problems when creating the files, the script removes all non-alphanumeric characters from the file name. To ensure uniqueness, it appends the page ID to the page name when creating the file name.

The content is in the form of the Confluence storage format, which is a type of XML consisting of HTML with Confluence-specific elements. (Docs.)

The script also writes a line at the top of each file, containing the URL of the page, and marked with asterisks for easy grepping.

Notes:

  • The script will show an error if the output directory already exists.
  • If you see the following error message, you need to enable the remote API (XML-RPC & SOAP) on your Confluence site: xmlrpc.client.ProtocolError: <ProtocolError for localhost:8090/rpc/xmlrpc: 403 Forbidden>

Grep and winGrep

Now that you have the page content in text form, the world’s your oyster. :) You can use the full power of text search tools. If you’re on UNIX, you’ll already know about grep.

If you’re on Windows, let me introduce grepWin. It’s a free, powerful search tool that you can install on Windows. It offers regular expression (regexp) searches as well as standard searches, and it has a very nice UI (user interface).

This screenshot shows a search for an image called ‘step-2-image-1-confluence-demo-space.png’. The image is attached to one page, and referenced in two pages. QED. :D

grepWin

grepWin

 

Comments welcome!

I’d love to know if you think you’ll find the script useful, and if you have any ideas for improving it.

Doc sprints at STC Summit 2013 #stc2013

STC Summit 2013 is fast approaching. I’m looking forward to getting the latest gen on all things #techcomm, meeting old friends, and making new acquaintances. I’ll also be giving a presentation on doc sprints!

Update on Wednesday 7 May 2013: The report on the actual presentation is now available: http://ffeathers.wordpress.com/2013/05/08/doc-sprints-at-stc-summit-2013-the-presentation/

A doc sprint is similar to a book sprint. It’s an event where a group of people get together for a couple of days and write tutorials, or a book, or other forms of documentation. Often there’s coding involved too. And always, plenty of fun, making new contacts, and learning cool new technologies.

Doc Sprints: The Ultimate in Collaborative Document Development

My presentation is called Doc Sprints: The Ultimate in Collaborative Document Development. It’s full of information about planning and running a doc sprint, and how doc sprints are useful in developing the documentation our readers need.

Even more exciting: there are a number of stories and tips, gleaned from doc sprinters around the world. Thanks to Anne Gentle, Swapnil Ogale, Ellis Pratt, Katya Stepalina, Andreas Spall, Jay Meissner, and Peter Lubbers, for contributing their ideas!

The presentation covers these topics:

  • Introduction to doc sprints, agile environments, and why a doc sprint is a good fit for technical documentation.
  • Who to invite, when to start, and how to ensure that the sprint will produce the documents you need.
  • How to get the best out of the sprinters.
  • Collaborative tools for use during the sprint.
  • Sprinting across the world: Handling multiple time zones, early sprinters, late sprinters.
  • How to run a retrospective, and why.
  • Reviewing and publishing the documents, and writing up the results.
  • Other innovative types of sprints for documentation teams.

Here’s what the presentation looked like a few weeks ago:

Doc sprints at STC Summit 2013

Come to my session at STC Summit 2013 to see how it’s turned out. :)

Getting documentation feedback via customer forums – a story of UX and UA

I spend a few minutes each day trawling our online question-and-answer forum, answering questions when I can, and keeping an eye out for posts directly related to the documentation. This paid dividends yesterday when a customer asked where he can download the offline version of our documentation. After giving him the link, I delved a little deeper into his reasons for preferring the offline to the online version. It’s an enlightening discussion.

Kevin’s primary requirement was the link to the downloadable documentation. His question is therefore titled, Offline Confluence Documentation. I gave him the link. That was easy.

But the forum post also explains why Kevin wants the offline documentation. He mentions the fact that the online documentation was unavailable when he needed it. We did indeed have several problems with the server, now fixed.

It was this bit of Kevin’s post that caught my attention:

(Also the documentation has gotten much harder to use for experienced users because we need to wade through pages of fluff before we get to content found in the old user manuals right on the top level).

He had put that bit in parentheses, almost as if it’s not so relevant. That in itself is a worry for us as technical writers. We don’t want customers feeling that there’s no way of getting the documentation improved or getting their voices heard.

Also very interesting is the fact that Kevin describes himself as an experienced user. He knows the product (Confluence wiki) and he therefore also has an expectation of how the Confluence-based documentation will work. He wanted a quick fix for a problem (how to recover a deleted page) and was frustrated enough to resort to PDF to find it!

So I asked Kevin if he’d be kind enough to give more details about why the documentation has become harder to use.

His response was awesome. He described his troubled workflow in detail, giving us technical writers an excellent insight into how an experienced user is navigating through our documentation. If you’re interested in the details, take a look at this comment and the subsequent discussions.

It’s great when people take the time to respond like this. It shows a high level of commitment to the product and the various types of help that we offer, including the documentation and the forum. It also shows how willing people are to help each other. Thanks so much, Kevin!

Want an XML schema viewer in Confluence wiki?

You got it. :) Avisi have developed two nifty macros to display an XML schema (XSD) in tabular and graphic format on a Confluence page. The XSD Viewer is a new add-on for Confluence wiki, and the Avisi developers are keen for input from technical writers and others interested in XML schemas.

I’ve been playing around with the add-on, so I’d love to show you a couple of examples and tell you how to get it working for yourself. I’ve also chatted with Yanne from Avisi, who says that he and his team would love to have your feedback.

Example 1:  A purchase order schema

I’ve grabbed the sample schema for a purchase order from MSDN: http://msdn.microsoft.com/en-us/library/ms256129.aspx. I’ve instructed the XSD viewer to start with the purchaseOrder element, and show a depth of 2 levels.

Want an XML schema viewer in Confluence?

Example 2: Graham Hannington’s schema for the Confluence storage format

Hehe, if you put Confluence and XSD in the same blog post, then ‘twould be remiss not to include Graham’s XML schema for the Confluence storage format. :D

The XSD Viewer is using confluence.xsd, starting with the image element.

Want an XML schema viewer for Confluence?

One point of interest here is that the confluence.xsd file references two other schema files: confluence-ri.xsd and confluence-xhtml.xsd. All I had to do to make this work, was to attach all three XSD files to the page. This screenshot shows the attachments on the above page:

Want an XML schema viewer in Confluence?

Hiccups

A couple of times, the XSD Viewer has declined to show any rows in the table. I’m not sure why this occurs. If it happens to you too, it’s worth letting the Avisi team know.

My environment

I’m using Confluence 5.0.1, with version 1.1.1 of the XSD Viewer. I’m running Confluence on my Windows 7 laptop, and I’m using Chrome to view the wiki pages.

How to get your own XSD viewer

To make this happen, you need to do the following:

  1. Download and install Confluence, if you don’t already have it. You can try it for free for 30 days. See the Confluence download page.
  2. Download the XSD Viewer add-on and install it into Confluence. The add-on is also available for free for 30 days. See the XSD Viewer page on the Atlassian Marketplace.
  3. Create a page in Confluence.
  4. Attach your XSD file to the Confluence page, just as you would attach a screenshot or other file. See the documentation on adding attachments.
  5. Edit the page.
  6. Add the “XSD Image” and/or the “XSD Table” macros to the page. See the documentation for the XSD Viewer.
  7. Save the page.

Resources

Useful links:

Feedback so far

I’ve given Yanne at Avisi some feedback already:

  • At first the error messages were a bit too generic to be useful. Avisi have already followed up on this in the latest version of the add-on, which gives more specific error messages. Great!
  • Currently the macro autocomplete in Confluence is triggered by “XSD”. Suggestion: Add “schema” and “XML” to the list of triggers.
  • Add the option to add a border and other styling to the image.

The Avisi team like the latter two suggestions, and are waiting for more feedback before implementing them. Would you be interested in an XSD viewer in Confluence, and what requirements would you have for it?

Follow

Get every new post delivered to your Inbox.

Join 250 other followers

%d bloggers like this: