Modularizing Information: a Path to Leveraging Enterprise Content
Building modular documents from shared information blocks improves the technical writer’s productivity and documents’ quality. Leveraging enterprise content, however, requires effectively managing a network of information blocks.
Organizations publish a vast amount of marketing and technical documents. However, much enterprise content is shared across various documents: company address on every document, list of upcoming events on the website and in the consumer magazine, company boilerplate in press releases, notes and warnings in user guides, and so on. Organizations often rely on word processors to create and manage content; every single document is associated with a distinct file, which contains all of its text, images and page layout information. Only the website, at best, is based on a different model, and its content is managed through a CMS such as WordPress or Drupal.
This model is widespread because it is easily understood by all people involved in managing or using content. It is based on the WYSIWYG paradigm: users only trust what they see. However, it has a number of drawbacks.
First, if you use a CMS, you will know how easier it is to delegate the layout to a separate style sheet. The layout is thus consistent across the website, and changing it only requires editing a single file, as exemplified on CSS Zen Garden. Only a few structure tags such as title, citation, and so on, are specified in the content posted in the CMS. The style sheet styles are automatically applied to tags. Just an example of what differentiates the two models: if you accidentally enter two spaces between two words, the text printed from a word processor will display the extra space. The website page will not.
Then, if pictures are included in the file, you have to manually extract, then reinsert the images to edit them. If you’ve ever had to update screenshots in a Word document, you know what I’m talking about. If images are individual files located in an external directory, things are much simpler, of course.
Last, if each file contains all of the text featured in the corresponding document, the same information blocks are duplicated again and again. Editing an information block then requires you to manually update every single file by copying and pasting chunks of information. This process is particularly error prone and costly. If the organization moves, for instance, you have to edit many files just to change its address. And as far as internationalization is concerned, is it really useful to have the same warning translated again and again across the requires technical documentation set?
It is much more efficient to build modular documents. A document is then like a building block game: by combining the same shared information blocks in various ways, you build distinct documents. The DocBook format, for instance, is based on this model. One could also use master documents with word processors, but frankly I would not: this functionality is notoriously buggy. In fact, the most efficient models strictly separate the document structure from the information blocks themselves.
DITA XML, for instance, organizes content in external .ditamap files. These files contain a series of hierarchically arranged links to external content files. Each content file corresponds to a document section with a title, a text body, and so on. Content which is shared across documents (the section specifying the organization’s address, for instance) is contained in unique files, to which various .ditamap structures can provide a link. More atomic information units, which can not be separate sections, such as a note, can be dynamically included in any section by the conref mechanism.
The advantages such a system offers are obvious. The challenge it brings is that each document becomes a network of information blocks. This network must be managed as such, for instance when updating an information unit: if note n from document A is edited, does that note must be edited in document B as well? Correctly sizing and arranging information blocks is then critical for leveraging enterprise content. The technical communicator becomes a content architect.
Information network documents are relatively new. For centuries, linear and monolithic documents prevailed. Scrolls and books did not favor dynamic content reorganization. Only, perhaps, some twentieth century technical documents which were organized in ring binders (and where single pages could thus be replaced) were somewhat close to this paradigm. All our culture and our education are based on the linear model; no wonder then that the new paradigm is a little hard for us to master. The Internet is based on this model, though, and we are now familiar with it. And usually, when technology brings up new complexity, it also offers tools to better master it. This is no exception. For instance, Componize offers to manage clusters of DITA XML files under the Alfresco enterprise content management system. For very complex content networks, Magillem offers an elegant paradigm. You then manage all your content under a closed proprietary format, but it is often already so.
As far as I’m concerned, I manage DITA XML files under the Subversion and Git versioning and revision control systems. I must admit that I would enjoy using a graphical interface that would display the links that make up documents. But even without such a system, I would never switch back to the monolithic paradigm.
French version on the Rédacteur technique blog.