In an effort to understand an SDK, I found myself drawing diagrams. Just to get the concepts into my head, and to visualise the way the bits and pieces fit together. It turned out that those diagrams are useful to more people than just me. My diagrams became part of the SDK overview that I was writing.
SDKs are conceptual and tricky to grasp at the best of times. This one offered a couple of special complexities. I’d heard from many sources that people were having trouble getting started with the SDK.
My goals for the introduction to the SDK
What are the two complexities that I wanted to illustrate?
Firstly, there are a few different paths that you can follow to achieve your goal of using the SDK to build a machine learning pipeline. For this doc, my audience is fairly comfortable with machine learning, so that’s not the complexity in this case. But it is confusing that there are a few different ways of putting a pipeline together.
Secondly, your code at some stage ends up in a Docker container managed by Kubernetes. It’s not a given that my audience is well up on Kubernetes. And they almost certainly don’t know how the containerisation of their code relates to their use of the SDK.
The doc and diagrams
The doc is an introduction to the Kubeflow Pipelines SDK. It starts off with an overview of the packages in the SDK, then describes the four ways you can use the SDK to build a pipeline, depending on your starting point and your use case.
The first use case describes the most common usage of the SDK: creating pipeline components from existing code. The section starts with a diagram:
As you can see, the diagram has some minimal code snippets. Just enough to give readers a hook to attach more information to.
Note: The diagram in this post is a png file, but in the docs I’ve used SVG (XML). The use of the XML format means that we can internationalise the text, when the time comes to provide the Kubeflow docs in languages other than English.
After the diagram comes a more indepth description of the diagram, with more meat to the code. It’s still probably not enough for people to copy, paste, and start coding, but that’s not the goal of this doc. The goal is for people to gain a conceptual understanding of what fits where, and of when you’d use this particular workflow.
At the end of the section is an info box containing links to further information and examples.
The subsequent sections follow the same pattern:
- Diagram with minimal code.
- More indepth explanation with more code.
- Pointers to detailed docs and examples.
The design of the doc mimics the user interface design principle of progressive disclosure.
The doc starts by revealing the primary concepts and code patterns in each use case. That allows the reader to grasp the similarities and differences between the use cases.
Next, for readers who want to know more about a particular use case, the doc gives a little more information below the diagram. There’s just enough terminology and code to give people a good idea of what the workflow involves.
For novices, that level of information provides a logical compartment into which they can wedge their growing understanding. They can go away and look at more detailed docs, then come back and fit that new information into the right place in the use case.
For readers who are already experts in most of the required areas and just want to know how this particular SDK works, the information under the diagram may even provide enough context to start coding.
Lastly, for each use case the doc provides links to detailed information and examples.
Do people like the doc?
I wrote the doc because I’d heard people were having trouble understanding how to use the SDK. My initial reason for drawing the diagrams was for my own understanding. But then I realised that the diagrams would probably be useful to others too. I showed the diagrams to an engineer, and he agreed. I showed a draft of the doc to an intern who had just started learning how to use the SDK. He said the diagrams would have been extremely useful if he’d had them two weeks earlier when he was starting out with the SDK.
The doc is new, so it’s too early to tell how well it performs for a wider audience. I plan to check the analytics and ratings in a few weeks’ time.
Do the diagrams hit the right spot between too naive and too complex?
I’m sure there are improvements we can make in the way I’ve described the technology. We’ll iterative and improve over the next weeks and months, as we do with most of our docs.
Still, I think we should be careful to keep a certain level of naivety. After all, these diagrams helped me kickstart my understanding of the concepts involved in using the SDK. Other people reading the docs will be at the same level of not knowing stuff when they start out. We need to keep that level in mind. The curse of knowledge is a powerful thing!