Category Archives: Confluence
A few months ago, I asked my publisher to take my Confluence wiki book out of print. The book is titled “Confluence, Tech Comm, Chocolate: A wiki as platform extraordinaire for technical communication”. It takes a while for the going-out-of-print process to ripple across all the sources of the book, but by now it seems to have taken effect in most sellers.
Why did we decide to take the book out of print? I’m concerned that it no longer gives the best advice on how to use Confluence for technical documentation. The book appeared early in 2012, and applies to Confluence versions 3.5 to 4.1. While much of the content is still applicable, particularly in broad outline, it’s not up to date with the latest Confluence – now at version 5.6 and still moving fast. I thought about producing an updated edition of the book. But because I don’t use Confluence at the moment, I can’t craft creative solutions for using the wiki for technical documentation.
Here are some sources of information, for people who’re looking for advice on using Confluence for technical communication:
- If you have a specific question, try posting it on Atlassian Answers, a community forum where plenty of knowledgeable folks hang out.
- Some of the Atlassian Experts specialise in using Confluence for technical documentation. The Experts are partner companies who offer services and consultation on the Atlassian products. The company I’ve worked with most closely on the documentation side, is K15t Software. I heartily recommend them for advice and for the add-ons they produce. For example, Scroll Versions adds sophisticated version control to a wiki-based documentation set.
- AppFusions is another excellent company that provides Confluence add-ons of interest to technical communicators. For example, if you need to supply internationalised versions of your documentation, take a look at the AppFusions Translations Hub which integrates Confluence with the Lingotek TMS platform.
A big and affectionate thank you to Richard Hamilton at XML Press, the publisher of the book. It’s been a privilege working with him, and a pleasure getting to know him in person.
For more details about the book that was, see the page about my books. If you have any questions, please do add a comment to this post and I’ll answer to the best of my knowledge or point you to another source of information.
The pages were previously housed on the wiki associated with my book, Confluence, Tech Comm, Chocolate. That wiki is now shut down, but the tips live on!
These are the tips currently available:
- Bypassing the Confluence rich text editor interface
- Converting Confluence rich text editor content to wiki markup
- Converting Confluence storage format XML to wiki markup
- Editing Confluence pages in an external validating XML editor
- Validating Confluence XML storage format
- Showing tag names and some attributes in the Confluence rich text editor
- Promoting heading levels
- Searching Confluence storage format XML for content or markup
- Finding duplicate page names in Confluence
- Adding custom toolbar buttons to the Confluence rich text editor with Greasemonkey
Many thanks to Martin Cleaver at Blended Perspectives, for hosting this treasure trove of tips. And many thanks also to Graham Hannington, for all the work and insight he’s put into investigating and documenting the tips.
Looking for a Confluence wiki to play with while reading the book?
If you’re reading, Confluence, Tech Comm, Chocolate, you may want a wiki to try out the techniques described in the book. For the first 18 months after publication, a Confluence, Tech Comm, Chocolate wiki site was available for readers to experiment with. That site is no longer available. If you like, you can get a free evaluation licence from Atlassian, to experiment with Confluence.
Flowers from a recent walk in the Australian bush. Early spring.
Every now and then, and perhaps particularly so when working on a wiki, we technical writers need to manipulate our content in some way that’s not provided by our content management system. A few times recently, I’ve dabbled with Python to solve some problems. Do you often find the need to wrangle your content outside your CMS, and do you use Python or another scripting tool?
Python is a scripting language. It’s easy to learn, especially if you’ve done some programming in other languages. It’s just the ticket for data manipulation. It also offers a number of useful libraries. For example:
- There are various libraries that you can use to access a web application via a SOAP or an XML-RPC remote API. I use the “xmlrpc.client” library in a few scripts, to get access to Confluence data.
- The “os” library is useful for creating directories on the local file system of the computer you’re running on. For example, I use it to create a directory for the script’s output file.
- The “re” library offers regular expression functions.
A script to find duplicate page names across Confluence spaces
This was the first Python script that I wrote to wrangle Confluence data. I started with a specific problem: I had five text files, each containing a list of page names. These were the pages in five Confluence spaces, that we needed to copy into another, single space. The problem is that Confluence does not allow duplicate page names within a space. So I needed to check my lists for matching page names.
I hacked together a Python script that checked for duplicate page names. The script reads a text file containing Confluence space keys and page names, and reports on duplicate page names. My first script used nested lists to store and compare the page names. A kind Atlassian developer reviewed the script and suggested I use a dictionary instead. So I did. A dictionary stores data in key-value pairs. Much neater!
Then I thought: Some people may not have their page names in a handy text file. They may want to get a list of all pages in a Confluence space. So I wrote a script to get the names of all pages in a given set of Confluence spaces.
The details of the scripts are in this post: How to find duplicate page names across Confluence spaces.
A script to get the source code of all pages in a Confluence space, for a full-text search
The search functionality in the Confluence web interface will return results from the visible content of the page, but it cannot get inside the XML-like elements that make up the Confluence storage format. For example, it’s not possible to find all pages that reference a certain image. And you can’t search for macro parameter values. This means, for example, you can’t search for all pages that include content from a given page.
Just recently I wrote a script that gets the XML storage format of all pages in a given Confluence space, and puts the code into text files on your local machine. Then you can use a powerful full-text search like grep, to find what you need. The details are in this post: Confluence full-text search using Python and grep
More on the way
I’m currently writing a couple more Python scripts to solve another problem. I’ll blog about it when I’ve finished.
If you’re interesting in Python, here are some links you many find useful:
A chuckle, courtesy of the Python technical writers
From the Python documentation:
By the way, the language is named after the BBC show “Monty Python’s Flying Circus” and has nothing to do with reptiles. Making references to Monty Python skits in documentation is not only allowed, it is encouraged!
Probably not a python
Last week I was lucky enough to be in New Orleans in the USA. I went on a tour of the Honey Island swamp, and saw this snake coiled comfortably on a tree trunk. I’m not sure what type of snake it is. Maybe a Copperhead:
What do you use?
Do you often use Python or some other scripting tool to automate those pesky tasks your CMS can’t handle?
The standard search in Confluence wiki searches the visible content of the page. It also offers keywords for some specific searches, such as macro names and page titles. But sometimes we need to find things that the search cannot find, because the content of the relevant XML elements is not indexed. This post offers a solution of sorts: Copy the XML storage format of your pages into text files on your local machine, then use a powerful search like
grep to do the work.
Here are some examples of the problem:
- We may want to find all pages that reference a certain image, or other attachment. It’s easy enough to find the page(s) where the image is attached. But it’s not possible to find all pages that display a given image which is attached to another page.
- It’s possible to search for all occurrences of a macro name, using the
macroName:keyword in the search. But it’s not possible to search for parameter values. This means, for example, you can’t search for all pages that include content from a given page.
I’ve written a script to solve the problem, by downloading the storage format from Confluence onto your local machine, where you can use all sorts of powerful text searches. You’re welcome to use the script, with the proviso that it’s not perfect.
The script is in a repository on Bitbucket: https://bitbucket.org/sarahmaddox/confluence-full-text-search.
Note: To run the script successfully, you need access to Confluence, and the Confluence remote API must be enabled.
To run the script, you need to install Python. The scripts are designed for Python 3, not Python 2. There were fairly significant changes in Python 3.
- Download Python 3.2.3 or later: http://www.python.org/getit/
python-3.2.3.amd64.msi, because I’m working on a 64-bit Windows machine.)
- Run the installer to install Python on your computer.
(I left all the options at their default values.)
- Add the location of your Python installation to your path variable in Windows:
- Go to ‘Start’ > ‘Control Panel’ > ‘System’ > ‘Advanced system settings’
- Click ‘Environment Variables’.
- In the ‘System variables’ section, select ‘Path’.
- Click ‘Edit’.
- Add the following to the end of the path, assuming that you installed Python in the default location:
- Click ‘OK’ three times.
- Open a command window and type ‘python’ to see if all is OK. You should see something like this:
Getting the script
Go to the Bitbucket repository and choose ‘Downloads’ > ‘Branches’, then download the zip file and unzip it into a directory on your computer.
Running the script to get the content of your pages
To use the
- Enable the remote API (XML-RPC & SOAP) on your Confluence site.
- Open the
getConfluencePageContentscript in Python’s ‘IDLE’ GUI. (Right-click on the script and choose ‘Edit with IDLE’.)
- Run the script from within IDLE. (Press F5.)
- The Python shell will open and prompt you for some information:
- Confluence URL – The base URL of your Confluence site. If the site uses SSL, enter ‘HTTPS’ instead of ‘HTTP’. For example:
- Username – Confluence will use this username to access the pages. This username must have ‘view’ access to all the spaces and pages that you want to check.
- Password – The password for the above username.
- Space key – A Confluence space key. Case is not important – the match is not case-sensitive.
- Output directory name – The directory where the script should put its results. The script will create this directory. Make sure it does not yet exist.
- Confluence URL – The base URL of your Confluence site. If the site uses SSL, enter ‘HTTPS’ instead of ‘HTTP’. For example:
- Look for the output directory as a sibling of the directory that contains the
getConfluencePageContentscript. In other words, the output directory will appear in your file system at the same level as the script’s directory.
Output of the script
The Bitbucket repository contains an example of the output, based on the Demonstration space shipped with Confluence. See the
outputexample directory in the repository. For example, this file contains the content of the page titled ‘Welcome to Confluence’.
The script gets the content of all pages in the given Confluence space. It puts the content of each page into a separate text file, in a given directory.
The script creates the output directory as a sibling of the directory that contains the
getConfluencePageContent script. In other words, the output directory will appear in your file system at the same level as the script’s directory.
The file name is a combination of the page name and page ID. To prevent problems when creating the files, the script removes all non-alphanumeric characters from the file name. To ensure uniqueness, it appends the page ID to the page name when creating the file name.
The content is in the form of the Confluence storage format, which is a type of XML consisting of HTML with Confluence-specific elements. (Docs.)
The script also writes a line at the top of each file, containing the URL of the page, and marked with asterisks for easy grepping.
- The script will show an error if the output directory already exists.
- If you see the following error message, you need to enable the remote API (XML-RPC & SOAP) on your Confluence site: xmlrpc.client.ProtocolError: <ProtocolError for localhost:8090/rpc/xmlrpc: 403 Forbidden>
Grep and winGrep
Now that you have the page content in text form, the world’s your oyster. :) You can use the full power of text search tools. If you’re on UNIX, you’ll already know about grep.
If you’re on Windows, let me introduce grepWin. It’s a free, powerful search tool that you can install on Windows. It offers regular expression (regexp) searches as well as standard searches, and it has a very nice UI (user interface).
This screenshot shows a search for an image called ‘step-2-image-1-confluence-demo-space.png’. The image is attached to one page, and referenced in two pages. QED. :D
I’d love to know if you think you’ll find the script useful, and if you have any ideas for improving it.