What Documentation Is Required?
Different levels of documentation are required for the casual user of a program, for the user who must depend upon a program, and for the user who must adapt a program for changes in circumstance or purpose.
To use a program. Every user needs a prose description of the program. Most documentation fails in giving too little overview. The trees are described, the bark and leaves are commented, but there is no map of the forest. To write a useful prose description, stand way back and come in slowly:
- Purpose. What is the main function, the reason for the pro-gram?
- Environment. On what machines, hardware configurations, and operating system configurations will it run?
- Domain and range. What domain of input is valid? What range of output can legitimately appear?
- Functions realized and algorithms used. Precisely what does it do?
- Input-output formats,, precise and complete.
- Operating instructions, including normal and abnormal ending behavior, as seen at the console and on the outputs.
- Options. What choices does the user have about functions?
Exactly how are those choices specified?
- Running time. How long does it take to do a problem of specified size on a specified configuration?
- Accuracy and checking. How precise are the answers expected to be? What means of checking accuracy are incorporated?
Often all this information can be set forth in three or four pages. That requires close attention to conciseness and precision. Most of this document needs to be drafted before the program is written, for it embodies basic planning decisions.
To believe a program. The description of how it is used must be supplemented with some description of how one knows it is working. This means test cases.
Every copy of a program shipped should include some small test cases that can be routinely used to reassure the user that he has a faithful copy, accurately loaded into the machine.
Then one needs more thorough test cases, which are normally run only after a program is modified. These fall into three parts of the input data domain:
- Mainline cases that test the program’s chief functions for commonly encountered data.
- Barely legitimate cases that probe the edge of the input data domain, ensuring that largest possible values, smallest possible values, and all kinds of valid exceptions work.
- Barely illegitimate cases that probe the domain boundary from the other side, ensuring that invalid inputs raise proper diagnostic messages.
To modify a program. Adapting a program or fixing it requires considerably more information. Of course the full detail is required, and that is contained in a well-commented listing. For the modifier, as well as the more casual user, the crying need is for a clear, sharp overview, this time of the internal structure. What are the components of such an overview?
- A flow chart or subprogram structure graph. More on this later.
- Complete descriptions of the algorithms used, or else references to such descriptions in the literature.
- An explanation of the layout of all files used.
- An overview of the pass structure—the sequence in which data or programs are brought from tape or disk—and what is accomplished on each pass.
- A discussion of modifications contemplated in the original design, the nature and location of hooks and exits, and discursive discussion of the ideas of the original author about what modifications might be desirable and how one might proceed. His observations on hidden pitfalls are also useful.
Fred Brooks, The Mythical Man-Month, pg. 165
The Flow-Chart Curse
The flow chart is a most thoroughly oversold piece of program documentation. Many programs don’t need flow charts at all; few programs need more than a one-page flow chart.
Flow charts show the decision structure of a program, which is only one aspect of its structure. They show decision structure rather elegantly when the flow chart is on one page, but the over view breaks down badly when one has multiple pages, sewed together with numbered exits and connectors.
The one-page flow chart for a substantial program becomes essentially a diagram of program structure, and of phases or steps. As such it is very handy. Figure 15.1 shows such a subprogram structure graph.
Of course such a structure graph neither follows nor needs the painfully wrought ANSI flow-charting standards. All the rules on box shapes, connectors, numbering, etc. are needed only to give intelligibility to detailed flow charts. The detailed blow-by-blow flow chart, however, is an obsolete nuisance, suitable only for initiating beginners into algorithmic thinking. When introduced by Goldstine and von Neumann, the little boxes and their contents served as a high-level language, grouping the inscrutable machine-language statements into clusters of significance. As Iverson early recognized, in a systematic high-level language the clustering is already done, and each box contains a statement (Fig. 15.2). Then the boxes themselves become no more than a tedious and space-hogging exercise in drafting; they might as well be eliminated. Then nothing is left but the arrows. The arrows joining a statement to its successor are redundant; erase them. That leaves only GO TO’s. And if one follows good practice and uses block structure to minimize GO TO’s, there aren’t many arrows, but they aid comprehension immensely. One might as well draw them on the listing and eliminate the flow chart altogether.
In fact, flow charting is more preached than practiced. I have never seen an experienced programmer who routinely made detailed flow charts before beginning to write programs. Where organization standards require flow charts, these are almost invariably done after the fact. Many shops proudly use machine programs to generate this “indispensable design tool” from the completed code. I think this universal experience is not an embarrassing and deplorable departure from good practice, to be acknowledged only with a nervous laugh. Instead it is the application of good judgment, and it teaches us something about the utility of flow charts.
The Apostle Peter said of new Gentile converts and the Jewish law, “Why lay a load on [their] backs which neither our ancestors nor we ourselves were able to carry?” (Acts 15:10, TEV). I would say the same about new programmers and the obsolete practice of flow charting.
Fred Brooks, The Mythical Man-Month, pg. 168 (Emphasis mine)
A basic principle of data processing teaches the folly of trying to maintain independent files in synchronism. It is far better to combine them into one file with each record containing all the information both files held concerning a given key.
Yet our practice in programming documentation violates our own teaching. We typically attempt to maintain a machine-readable form of a program and an independent set of human-readable documentation, consisting of prose and flow charts.
The results in fact confirm our teachings about the folly of separate files. Program documentation is notoriously poor, and its maintenance is worse. Changes made in the program do not promptly, accurately, and invariably appear in the paper.
The solution, I think, is to merge the files, to incorporate the documentation in the source program. This is at once a powerful incentive toward proper maintenance, and an insurance that the documentation will always be handy to the program user. Such programs are called self-documenting.
Now clearly this is awkward (but not impossible) if flow charts are to be included. But grant the obsolescence of flow charts and the dominant use of high-level language, and it becomes reasonable to combine the program and the documentation.
The use of a source program as a documentation medium imposes some constraints. On the other hand, the intimate availability of the source program, line by line, to the reader of the documentation makes possible new techniques. The time has come to devise radically new approaches and methods for program documentation.
As a principal objective, we must attempt to minimize the burden of documentation, the burden neither we nor our predecessors have been able to bear successfully.
An approach. The first notion is to use the parts of the program that have to be there anyway, for programming language reasons, to carry as much of the documentation as possible. So labels, declaration statements, and symbolic names are all harnessed to the task of conveying as much meaning as possible to the reader.
A second notion is to use space and format as much as possible to improve readability and show subordination and nesting.
The third notion is to insert the necessary prose documentation into the program as paragraphs of comment. Most programs tend to have enough line-by-line comments; those programs produced to meet stiff organizational standards for “good documentation” often have too many. Evert these programs, however, are usually deficient in the paragraph comments that really give intelligibility and overview to the whole thing.
Since the documentation is built into the structure, naming, and formats of the program, much of it must be done when the program is first written. But that is when it should be written. Since the self-documentation approach minimizes extra work, there are fewer obstacles to doing it then.
Fred Brooks, The Mythical Man-Month, pg. 169