Author Archives:

Flexible content and future-ready organisations at STC Summit 2013

This week I’m attending STC Summit 2013, the annual conference of the Society for Technical Communication. I’ll blog about the sessions I attend, and give you some links to other news I hear about too. You’ll find my posts under the tag stc13 on this blog.

I love Sara Wachter-Boettcher’s bio:

Content strategist, writer, thinker, cocktail drinker. My name eats character limits for breakfast. Chomp.

Sara’s presentation is titled Flexible Content Demands Future-Ready Organizations. She talked about mobile users, mobile content, and the world of responsive sites, apps, APIs, and read-later services. Structured content is the way to give customers what they need. Producing structured content isn’t just about getting the right CMS. It affects the entire organisation.

Most of what Sara works on is web-based. But the issues and challenges are shared in other tech comm areas too, as are the skills needed to address them.

Introduction

Sara quoted some web designers who are realising that their customers are concerned with content problems. The challenges that mobile has created for us, have made people in the web field realise that content is a problem that needs attention. Designers are realising they only have one “known” to work with, and that’s content. All the other aspects of mobile design are fluid – basically, the presentational aspects.

But what do web designers actually know about content? Luckily, they have content strategists to help them. Talking about content as a strategic business asset for the organisation. Defining how content supports the customers and the business, and the benchmarks and measurements for success.

Strategies:

  • Organise your content.
  • Define what you want to sound like.
  • Devise plans for creating the content you need. Templates. Ways of getting information out of SMEs.

OK, so we have a lot of emphasis on content and how it matters for mobile.

But it’s still really really difficult

How do we implement our strategies so that we can get content that’s more flexible? We don’t want our readers being told the content is not available on a mobile device. Or something that just looks awful, or is even illegible, because not designed for a mobile device.

Look at read-it-later applications, not solely devoted to mobile, but most people use such services on mobile devices. Examples: Instapaper – allows you to save content for reading at a later date. We don’t want our content to be missing when people schedule it for later reading.

We also don’t want our mobile interface to override the user’s choice by giving them a simplified view of the web app they were looking for. Sara gave us the example when she Googled something, got the useful link, and was directed to a simplified mobile overlay of the site. The URL she had found was lost to her. The content had become useless because she couldn’t get at it.

Why is it difficult?

Because of the content. It’s difficult getting the content ready for mobile. Sara referred us to an article called “The Story of the New Microsoft.com” by Nishant Kothary.

Looking at the redesign of a web page to cater for mobile platforms, you could say, “This is a new home page”. Or you could say, “This is a huge change to the way the organisation works”. Sara says the latter is the case, and that’s why mobile is such a problem.

Our content is stuck

Sara often hears people saying, “Just stick it up on the website”. You’ll end up with a website that looks more or less like Seattle’s chewing-gum wall. A lot of content that’s just stuck there. No reason. No way to move it.

We tend to create content by putting it in a big box, to fit on a big web page. That’s what our CMSes are designed to do.

If you’re building things for just one platform, or for one platform at a time, you’re going to keep on getting stuck. Because more and more devices will keep appearing. You’ll need separate strategies for each one. Sara mentioned the possibility of our content having to appear in a car, or on a web-enabled fridge. Trying to make more content for every new device is a losing battle.

Single sourcing

We have the tools to make our content more adaptable. This idea is fairly well known in tech comm, but not in the wider world of web development.

DITA has a number of great features.

NPR has a great system called COPE (Create Once Publish Everywhere). They have everything in a custom-build central content management system. No representational information. Then it goes via an APR to all the various websites, apps, mobile browsers, and so on.

Content like water

Web designers have started saying we need content that can flow, and fill whatever container size it’s given. But content doesn’t magically flow. It’s messy!

Sara showed us an example of an API diagram for NRP’s COPE. It illustrates beautifully that it’s complicated to get the content from content entry, via the APIs, to the end destination.

And it all starts with the data entry. Writing stuff in chunks, so they can be stored separately, is the most flexible way of doing it. It allows you to make separate decisions about each chunk.

Content models

Next you need to think about how content is connected – the relationships between the chunks of information. Create a model of your content. Compare this to a database model that defines how data is interrelated.

Data is also content. When data people design the data model, they tend to forget this. Decisions that ultimately affect how our content reads are often made by people who aren’t trained or empowered to think about content.

The structure of the content must be valuable to people, and must reflect the way people think. You can’t just break it up into parts arbitrarily.

So, find the patterns

We must analyse our existing content to find patterns. Pages are not all the same. They support meaning.

From the patterns, you can identify content types. For example: blog posts, press releases, “how to” guides, and so on.

Each content type has basic structural elements. Sara gave the example of a recipe, which has a title, ingredients, instructions, and even metadata. Decide the inherent elements required in each content type for your organisation.

Then decide how the content types fits together. For example, a recipe may be just one of many for a particular dish, which may be part of a cuisine.

Adaptive content

Adaptive content automatically adjusts to its environment. If you have a resilient and flexible content model, you can make decisions about how it is displayed.

We can’t manage how every bit of content will look. But structure allows us to make rules, which we can apply to our content as a whole. Structure sets our content free. It can go where we want it to go.

Why is our content stuck?

Why is it still so hard? Because organisations are stuck in their existing processes. They get bogged down in expectations of stakeholders.

  • Organisations have a mass-production strategy. People keep producing content the way they always have, without thinking about where it’s going to end up.
  • Content-producing roles aren’t tied to organisational strategies. Content producers don’t know how their work fits in to the corporate goals. It’s hard to make changes, and people don’t see why they should.

Content strategy should be the bridge between the corporate vision and the content producers’ role.

  • Teams are siloed. Teams don’t communicate, or are even hostile towards each other.
  • As a result, content is duplicated and confusing.
  • It’s impossible to have the users’ needs at the forefront.
  • Organisations are so concerned with how they’re organised internally, that that’s what they show to the rest of the world.

These are hard problems to solve. One way is to create small teams with a cross-department focus. Spread new ideas, and focus on a single issue.

  • Organisations have obsessions with control.
  • Stakeholders don’t get digital. They want to see their content in fixed format, such as print.
  • Businesses are scared of the idea will take their content away and read it later in another format. User control terrifies them.
  • Organisations aren’t built to change, but things are changing very rapidly. So people try to freeze things rather than change. But that doesn’t work.

The organisation doesn’t have to learn just how to deal with mobile. It has to learn how to become adaptable.

What can we do?

There are some good things we can do. We need to change the way we work.

We have a lot of passion about content. Share it with people. Here’s how:

  • Make mobile a start, but not the end goal. Karen McGrane says we should “use mobile as a wedge to create a better experience for ALL users”. Sara says this is true for organisations too. Mobile is an opportunity to get people from different departments to work together.
  • Don’t sell solutions. Invest more deeply. Don’t tell people we’re going to fix everything. We can’t be our organisation’s saviour or mastermind. An organisation doesn’t need a mastermind. You need to find ways of getting people to work with you. Teamwork is messy, but it’s the only way to sustainable change how we deal with content.
  • Do less, and facilitate more. It’s not all about the thing we produce. It’s about the way our work is carried on. So we need to be facilitative. This is also the way we can take on leadership roles. Find the people your work will affect, and involve them from the start of the project.

What’s coming?

We can’t know what the future will bring. We do know mobile is not going to go away. But we can’t know what it’s going to look like. How many people will be using mobile, and what kind of devices will there be? We can’t know.

But if we and our content are adaptable, we’re in a better place.

Thanks Sara

This was an inspiring and lively session. It put a lot of things into perspective. Thanks Sara!

Information development flexibility in Agile at STC Summit 2013

This week I’m attending STC Summit 2013, the annual conference of the Society for Technical Communication. I’ll blog about the sessions I attend, and give you some links to other news I hear about too. You’ll find my posts under the tag stc13 on this blog.

It’s bright and early on Monday morning. Alyssa Fox is kicking off with a session titled Bending Without Breaking: Info Dev Flexibility in Agile. She has made her presentation available on SlideShare. Here’s the blurb:

This session helps technical communicators face challenges in agile planning and execution. It’s increasingly common for writers to work on multiple agile teams. The session includes tips for better communication and teamwork on your agile team, with the goal of a “whole team approach” in mind.

The old way and the new way

Alyssa explained that initially in her organisation, the functional areas were split: the developers, QE (quality engineers), ID (information developers, or technical writers) and other members of a product development team worked separately on their own tasks.

Later, they moved to the “whole team” approach, which means all members of the team accept responsibility for delivering all aspects of the user story. QE, technical writers, developers, and all. Everyone accepts that all aspects of the user story, including documentation, must be complete in order to declare the sprint complete.

How to get to the new state

There are a number of ways to get to that happy state. These are the ones I noted while Alyssa was speaking:

  • All developers do their own unit testing.
  • Automated testing is also essential.
  • QE and information development must be included in all estimates. If you don’t do that, it will not be possible to meet your targets.
  • Make sure you fix all bugs within the sprint in which they were created.
  • Help other functional areas to complete their tasks, when you’ve finished yours. For example, you can help with usability testing, setting up environments, grooming the backlog of tasks.

Swarming

To ensure the info dev team can successfully become a useful and productive part of an agile team, you need to give them support by adapting the processes of the agile team.

When setting up user stories, do them in vertical slices rather than horizontal. In other words, make sure all functional areas are covered during a single sprint. This means you can have something potentially shippable by the end of the sprint.

Feature testing, regression testing and documentation therefore happen during the sprint. If you have a testing crunch or a doc crunch at the end of the sprint, it means the team as a whole is doing something wrong. This requires the discipline where all members of team know what is required of them.

(I looked up “swarming” after Alyssa’s talk. It’s the practise where an agile team focuses all its effort on one story at a time, where practical.)

Release planning

The way you plan a release has a big impact on the success of the release. A release consists of a number of sprints, culminating in a marketable release to present to customers.

Have a backlog of user stories to pull your sprint tasks from. Make sure the user stories are well estimated. Define a theme for the release, then pick stories that fit into that theme. Take the velocity of the team into account.

Before you start, make sure you have a good idea of the stories you will tackle in the first two to three sprints, and have all the stories for the first sprint clearly defined.

Document review cycles

Alyssa recommends that you plan for two to three drafts of each document: a first draft, an approval draft, and a quality edit draft. Get the SMEs to review the first draft during the sprint.

Creating user stories

This is the most vital part of ensuring success for information development. You need good user stories so the info dev team can plan and draft the documentation.

User stories must focus on the user’s point of view. Ensure you prioritise those of the highest value to the customers.

Making user stories focus on the user

There are three parts to a user story:

  • Problem statement. This is the most essential part of the user story. Usually, the developer writes the problem statement. Info dev can add tremendous benefit by helping to refine the problem statement.
  • Acceptance criteria. This is essential to know how the users can know that the problem is solved. Again, the developers start by writing this part, and the technical writers add value by reviewing it.
  • Acceptance tests. These form the technical way of testing that the problem is fixed. Info dev typically doesn’t have much input here.

Thanks to the info dev involvement in defining the problem and the acceptance criteria, it is much easier for the writerss to start the documentation and release notes.

More advantages of this approach

It gets everyone on the same page. Everyone knows what the problem is and how we can know when it’s done.

It eliminates unnecessary ad hoc testing and reveals possibilities for user testing.

It forces everyone to think about the user’s side of things and the user impact of the planned changes. Writing a user story may make you realise the problem actually isn’t such a great problem, and it would be a waste to spend effort on it.

It gives the product manager a sense of security that the team understands the requirements.

Backlog grooming

Backlog grooming is the process of looking through the stories in the backlog, analysing, estimating and prioritising them. You should do your backlog grooming frequently. For example, once a week or at the beginning of each sprint.

Alyssa discussed the technique of “planning poker”. The idea is to discuss the stories and estimate the effort using story points for all functional areas. Make sure the user stories at the top of the backlog are small enough to complete in a single sprint.

It’s important that all team members, or at least the leads from each functional area, are involved in this exercise.

Sprint planning

Create your sprint backlog by pulling stories from the product backlog into the sprint backlog. Estimate the tasks in hours, and remember to include time for overhead (meetings, technical setup, admin, etc). Make sure your estimate fits your capacity for the project and the sprint. This is particularly important for people working on multiple projects, as so many technical writers do. Book yourself for the number of hours you have available, and no more.

Allyssa recommends planning 20-25% of each writer’s time for vacation and non-sprint responsibilities. She says we should exclude the last day of the sprint from our planning.

Make sure the development finishes before the end of the sprint, to give testing and info dev time to complete their tasks too.

The writers on Alyssa’s team maintain spreadsheets showing their capacity (in hours) across the various projects.

Coping with multiple projects and meetings

If you’re on multiple projects, you can’t attend all meetings. You’d never get your work done. Think about attending one or two scrum meetings per week instead of all of them, and prioritise the ones that are the most important to you. Ask the scrum master to send status emails for the meetings, so you can catch up on what you miss.

Be pushy. Make your presence and needs known. You need to make sure the team knows you need the information. Alyssa has put together a list of criteria to help development teams know when info dev needs to be involved. For example, when the development team is ready to start coding. Then they hold a team kickoff meeting, so that the info dev team can decide when they need to start work on the project.

What does “potentially shippable” mean?

A document should be shippable for that part of the product developed during the sprint. The documentation for that part of the product needs to be ready. This includes the topics, videos, and other material required, but not the whole book or whole help system.

Tasks such as an approval draft of the whole document, final reviews, the production process, and the release notes, are delivered at the end of the release. They don’t need to be part of the sprint delivery.

More topics arising from questions

There were a number of questions from the audience.

How do you handle translation? Alyssa’s team fits this into their milestones. They give the translators 90% of everything towards the end of the release. They make sure all the English-language stuff is ready to go by release date, and deliver the rest later. All their documentaiton is online.

What is the typical sprint length in Alyssa’s organisation? Sprint length is three weeks for most teams. This works best. Some teams have shorter sprints, but those are difficult to work with.

How hard was it to get all the development teams to follow the same process? Some of them find this difficult. In the early days, some teams were using different processes. This was very difficult for the teams working on multiple projects, like QA and infro dev. Support from upper management is essential, to get agreement from all teams.

How do you handle changes in the software resulting from user feedback? It’s a good idea to build in buffers when doing your estimates, because you may need to make changes as a result of customer feedback. Being able to change is built into agile.

Alyssa mentioned the concept of “targeted” documentation, which focuses more on conceptual documentation rather than “how to” guides. It comes from the perspective of “how do I do my job with this product” rather than “how do I use this product”. Alyssa’s team don’t document everything, but spend time analysing what’s needed. If something is obvious from the UI, they don’t document it. If they get feedback from customers that they need the documentation that has been left out, then they consider adding it.

What source material is available on doing info dev in an agile environment? Alyssa says there’s not a lot available. There are blogs, but there’s a lot to say and not a lot of information yet. It’s been mostly trial and error for her team.

Thanks Alyssa

Alyssa spoke engagingly and passionately, and covered a complex topic with aplomb. Someone in the audience suggested she write a book. I second that suggestion!

EPUB and technical communication at STC Summit 2013

This week I’m attending STC Summit 2013, the annual conference of the Society for Technical Communication. I’ll blog about the sessions I attend, and give you some links to other news I hear about too. You’ll find my posts under the tag stc13 on this blog.

Scott Prentice presented a session titled EPUB and Techcomm – Are We Ready? EPUB is a standard for ebook publishing. In other words, the EPUB standard defines a format the ebook readers can interpret. Scott’s talk focuses on the current state of EPUB tools and technologies, and how technical communicators can make the most of EPUB as a content delivery option.

Quick audience survey

Scott asked for a show of hands for how many people were already delivering EPUBS. A handful of people put up their hands. A couple of others responded when Scott asked how many were producing EPUBS but not yet delivering them. There were approximately 50 people in the room.

What is an EPUB?

Scott started by explaining the EPUB format. He encourages people to get hold of an EPUB file and unzip it (it’s just a zip file) to see what’s in it. This is a good way of understanding what’s going on. At the end of the session, Scott opened an EPUB file in an Oxygen XML editor and walked us through the content of an EPUB file.

Interestingly, there’s a fixed layout format, approved in 2012. This is useful where you need a fixed format output, such as a comic book.

EPUB and tech comm

Scott says tech comm is late to getting into the area of EPUB. The tools are slow to move in this direction. EPUBs are best for linear content, where you move from page to page, like a book. There isn’t really the concept of a topic. EPUBs are also not great for tabular data, because it tends to truncate tables.

EPUB will probably not be your primary deliverable, but it may be useful as another format available to customers.

Readers

You can make custom readers and embed them in your product. But typically, your customers will be using existing devices such as mobile phones and Kindles. Each brand of mobile phone has different apps for reading EPUBS. There are also desktop readers. For example there are plugins for browsers like Firefox and Chrome.

Scott recommends Chrome Readium as a good reader to test your EPUB content.

The EPUB document will look different in each reader. This is one of the big challenges, and something to be aware of when developing content.

Scott showed us some screenshots of the same document on a few different devices, with slight formatting differences. The screenshots come from Tony Self’s book, DITA Style Guide.

Technologies and tools

An EPUB file is a collection of XHTML, CSS, XML, and media files. Most technical writers will be single-sourcing their content, and using EPUB as just one output. Most other people writing EPUBs will really craft their content for the EPUB.

You could hand-code your EPUB files, but Scott doesn’t recommend this, because there are many interdependencies amongst the files. Instead, use a tool and then tweak the output. You will need to tweak it, because no tool is perfect. You’ll find it’s missing metadata, for example, that you need for your publication.

EPUB 2 or 3?

EPUB 3 has plenty of new features, but the tools don’t yet support it fully. For now, Scott recommends sticking with EPUB 2 or using simple EPUB 3.

EPUB 3 is based on HTML5, so you can use scripting and get a lot of interactivity. You can use JavaScript, for example, to modify the content. For example, a table could start off small, or convert itself to a list. But at the moment, to take advantage of these features you have to hand-code them. Even the tools that claim to support EPUB 3 don’t really use the new features.

One of these new features is the “read aloud” feature, one of the “media overlays”. Others are fixed format, and flow from right to left. Scott thinks an EPUB could replace PDF, because it contains all the information in your help system and is now available in fixed format.

Formatting

The most important thing is to keep it simple. This is the best way to ensure your content will work on as many readers as possible.

You can embed fonts, but Scott says don’t bother. You’re adding to the download size of the file, and adding the possibility that it won’t work on all readers.

If you do use styling, don’t use the style attribute on elements. Use CSS selectors. Note that many tools use the style attribute.

Kindle and EPUB

Kindle does not support EPUB directly. They have their own format. Amazon provides a tool called KindleGen that will convert EPUB to MOBI or KF8 for use on a Kindle.

But you may need to modify your EPUB file, and some things may not work on the Kindle.

Always test your content after converting to Kindle. Don’t rely on the emulators. The Kindle desktop app will render the content differently from the Kindle device.

Keep it simple!

EPUB publishing tools

Scott gave us a list of tools that offer EPUB generation and that he considers suitable for tech comm, giving an overview of the pros and cons of each.

  • Adobe TCS
  • Doc-To-Help
  • Help & Manual
  • RoboHelp
  • MadCap Flare
  • Webworks ePublisher
  • DITA Open Toolkit with the DITA for Publishers plugin. In Scott’s opinion, this tool provides the cleanest EPUB output of all the tools.
  • DocBook, with various scripts e.g. Python scripts
  • FrameMaker with the ElmSoft EPubFm2 plugin. Scott says this does a good job at a low cost.

Problems with publishing tools

This is an overview of the type of problems you’ll find when generating EPUB from the tools Scott discussed:

  • Some tools find the EPUB file structure difficult to handle. Keep your document simple, so that you can make manual adjustments.
  • Most tools provide you with “fake” lists i.e. styled lists, instead of real HTML lists. These don’t work well in an EPUB.
  • Many tools use the style attribute on HTML element, instead of CSS classes. If you use proper CSS styles, these will typically follow through into your EPUB. But the tools prevent this.
  • EPUBs allow you to provide many different types of metadata, such as author name, dates. The tools typically don’t allow you to add this metadata, or offer only limited parts of it. You’ll need to add metadata later.
  • Inside an EPUB, each HTML file starts a new chapter. This will start on a new page in an EPUB. Some tools give you the ability to specify the topics that start a new page. Other tools give you no control at all.
  • As already mentioned, the tools that claim to support EPUB 3 don’t really offer more than EPUB 2 tools.

Other tools

You’ll need other tools:

  • EPUB editors. Scott recommends Oxygen XML editor, BlueGriffon EPUB Edition, and Sigil.
  • Calibre, a multi-purpose tool for cataloging and organising your EPUBS. It has a server, which you can use to make your EPUBs available. It has a reader and some conversion options too.
  • epubcheck, a validator. You should always validate your EPUB file after generating and tweaking it.
  • KindleGen, for converting EPUB to MOBI or KF8 for use on a Kindle.

Where does an EPUB best fit in

Because of its linear nature, an EPUB is not useful for “just in time” learning, or solving a problem. Rather, for conceptual information and guides. Something you want to take away with you and read on the train.

An audience member suggested it would be useful for installation guides, when you don’t yet have the app installed.

Conclusion

Thanks Scott, this was a useful session, with plenty of take-away information from an expert in the field.

Confluence full-text search using Python and grep

The standard search in Confluence wiki searches the visible content of the page. It also offers keywords for some specific searches, such as macro names and page titles. But sometimes we need to find things that the search  cannot find, because the content of the relevant XML elements is not indexed. This post offers a solution of sorts: Copy the XML storage format of your pages into text files on your local machine, then use a powerful search like grep to do the work.

Here are some examples of the problem:

  • We may want to find all pages that reference a certain image, or other attachment. It’s easy enough to find the page(s) where the image is attached. But it’s not possible to find all pages that display a given image which is attached to another page.
  • It’s possible to search for all occurrences of a macro name, using the macroName: keyword in the search. But it’s not possible to search for parameter values. This means, for example, you can’t search for all pages that include content from a given page.

I’ve written a script to solve the problem, by downloading the storage format from Confluence onto your local machine, where you can use all sorts of powerful text searches. You’re welcome to use the script, with the proviso that it’s not perfect.

Python script: getConfluencePageContent

The script is in a repository on Bitbucket: https://bitbucket.org/sarahmaddox/confluence-full-text-search.

Note: To run the script successfully, you need access to Confluence, and the Confluence remote API must be enabled.

Installing Python

To run the script, you need to install Python. The scripts are designed for Python 3, not Python 2. There were fairly significant changes in Python 3.

  1. Download Python 3.2.3 or later: http://www.python.org/getit/
    (I downloaded python-3.2.3.amd64.msi, because I’m working on a 64-bit Windows machine.)
  2. Run the installer to install Python on your computer.
    (I left all the options at their default values.)
  3. Add the location of your Python installation to your path variable in Windows:
    1. Go to ‘Start’ > ‘Control Panel’ > ‘System’ > ‘Advanced system settings’
    2. Click ‘Environment Variables’.
    3. In the ‘System variables’ section, select ‘Path’.
    4. Click ‘Edit’.
    5. Add the following to the end of the path, assuming that you installed Python in the default location:
      ;C:\Python32
    6. Click ‘OK’ three times.
    7. Open a command window and type ‘python’ to see if all is OK. You should see something like this:

Confluence full-text search using Python and grep

Getting the script

Go to the Bitbucket repository and choose ‘Downloads’ > ‘Branches’, then download the zip file and unzip it into a directory on your computer.

Running the script to get the content of your pages

To use the getConfluencePageContent script:

  1. Enable the remote API (XML-RPC & SOAP) on your Confluence site.
  2. Open the getConfluencePageContent script in Python’s ‘IDLE’ GUI.  (Right-click on the script and choose ‘Edit with IDLE’.)
  3. Run the script from within IDLE. (Press F5.)
  4. The Python shell will open and prompt you for some information:
    • Confluence URL – The base URL of your Confluence site. If the site uses SSL, enter ‘HTTPS’ instead of ‘HTTP’. For example: https://my.confluence.com
    • Username – Confluence will use this username to access the pages. This username must have ‘view’ access to all the spaces and pages that you want to check.
    • Password – The password for the above username.
    • Space key – A Confluence space key. Case is not important – the match is not case-sensitive.
    • Output directory name – The directory where the script should put its results. The script will create this directory. Make sure it does not yet exist.
  5. Look for the output directory as a sibling of the directory that contains the getConfluencePageContent script. In other words, the output directory will appear in your file system at the same level as the script’s directory.

Python Shell

Python shell (IDLE)

 

Output of the script

The Bitbucket repository contains an example of the output, based on the Demonstration space shipped with Confluence. See the outputexample directory in the repository. For example, this file contains the content of the page titled ‘Welcome to Confluence’.

The script gets the content of all pages in the given Confluence space. It puts the content of each page into a separate text file, in a given directory.

The script creates the output directory as a sibling of the directory that contains the getConfluencePageContent script. In other words, the output directory will appear in your file system at the same level as the script’s directory.

The file name is a combination of the page name and page ID. To prevent problems when creating the files, the script removes all non-alphanumeric characters from the file name. To ensure uniqueness, it appends the page ID to the page name when creating the file name.

The content is in the form of the Confluence storage format, which is a type of XML consisting of HTML with Confluence-specific elements. (Docs.)

The script also writes a line at the top of each file, containing the URL of the page, and marked with asterisks for easy grepping.

Notes:

  • The script will show an error if the output directory already exists.
  • If you see the following error message, you need to enable the remote API (XML-RPC & SOAP) on your Confluence site: xmlrpc.client.ProtocolError: <ProtocolError for localhost:8090/rpc/xmlrpc: 403 Forbidden>

Grep and winGrep

Now that you have the page content in text form, the world’s your oyster. :) You can use the full power of text search tools. If you’re on UNIX, you’ll already know about grep.

If you’re on Windows, let me introduce grepWin. It’s a free, powerful search tool that you can install on Windows. It offers regular expression (regexp) searches as well as standard searches, and it has a very nice UI (user interface).

This screenshot shows a search for an image called ‘step-2-image-1-confluence-demo-space.png’. The image is attached to one page, and referenced in two pages. QED. :D

grepWin

grepWin

 

Comments welcome!

I’d love to know if you think you’ll find the script useful, and if you have any ideas for improving it.

Doc sprints at STC Summit 2013 #stc2013

STC Summit 2013 is fast approaching. I’m looking forward to getting the latest gen on all things #techcomm, meeting old friends, and making new acquaintances. I’ll also be giving a presentation on doc sprints!

Update on Wednesday 7 May 2013: The report on the actual presentation is now available: http://ffeathers.wordpress.com/2013/05/08/doc-sprints-at-stc-summit-2013-the-presentation/

A doc sprint is similar to a book sprint. It’s an event where a group of people get together for a couple of days and write tutorials, or a book, or other forms of documentation. Often there’s coding involved too. And always, plenty of fun, making new contacts, and learning cool new technologies.

Doc Sprints: The Ultimate in Collaborative Document Development

My presentation is called Doc Sprints: The Ultimate in Collaborative Document Development. It’s full of information about planning and running a doc sprint, and how doc sprints are useful in developing the documentation our readers need.

Even more exciting: there are a number of stories and tips, gleaned from doc sprinters around the world. Thanks to Anne Gentle, Swapnil Ogale, Ellis Pratt, Katya Stepalina, Andreas Spall, Jay Meissner, and Peter Lubbers, for contributing their ideas!

The presentation covers these topics:

  • Introduction to doc sprints, agile environments, and why a doc sprint is a good fit for technical documentation.
  • Who to invite, when to start, and how to ensure that the sprint will produce the documents you need.
  • How to get the best out of the sprinters.
  • Collaborative tools for use during the sprint.
  • Sprinting across the world: Handling multiple time zones, early sprinters, late sprinters.
  • How to run a retrospective, and why.
  • Reviewing and publishing the documents, and writing up the results.
  • Other innovative types of sprints for documentation teams.

Here’s what the presentation looked like a few weeks ago:

Doc sprints at STC Summit 2013

Come to my session at STC Summit 2013 to see how it’s turned out. :)

Follow

Get every new post delivered to your Inbox.

Join 250 other followers

%d bloggers like this: