What About Reuse?
The best way to attack the essence of building software is not to build it at all. Package software is only one of the ways of doing this. Program reuse is another. Indeed, the promise of easy reuse of classes, with easy customization via inheritance, is one of the strongest attractions of object-oriented techniques.
As is so often the case, as one gets some experience with a new way of doing business the new mode is not so simple as first appears.
Of course, programmers have always reused their own handiwork. Jones says,
Most experienced programmers have private libraries which allow them to develop software with about 30% reused code by volume. Reusability at the corporate level aims for 75% reused code by vol-
ume, and requires special library and administrative support. Corporate reusable code also implies changes in project accounting and measurement practices to give credit for reusability.
W. Huang proposed organizing software factories with a matrix management of functional specialists, so as to harness the natural propensity of each to reuse his own code.
Van Snyder of JPL points out to me that the mathematical software community has a long tradition of reusing software:
We conjecture that barriers to reuse are not on the producer side, but on the consumer side. If a software engineer, a potential consumer of standardized software components, perceives it to be more expensive to find a component that meets his need, and so verify, than to write one anew, a new, duplicative component will be written. Notice we said perceives above. It doesn’t matter what the true cost of reconstruction is.
Reuse has been successful for mathematical software for two reasons: (I) It is arcane, requiring an enormous intellectual input per line of code; and (2) there is a rich and standard nomenclature, namely mathematics, to describe the functionality of each component. Thus the cost to reconstruct a component of mathematical software is high, and the cost to discover the functionality of an existing component is low. The long tradition of professional journals publishing and collecting algorithms, and offering them at modest cost, and commercial concerns offering very high quality algorithms at somewhat higher but still modest cost, makes discovering a component that meets one’s need simpler than in many other disciplines, where it is sometimes not possible to specify one’s need precisely and tersely. These factors collaborate to make it more attractive to reuse rather than to reinvent mathematical software.
The same reuse phenomenon is found among several communities, such as those that build codes for nuclear reactors, climate models, and ocean models, and tor the same reasons. The communities each grew up with the same textbooks and standard notations.
How does corporate-level reuse fare today? Lots of study; rel-
atively little practice in the United States; anecdotal reports of
more reuse abroad.
Jones reports that all of his firm’s clients with over 5000 programmers have formal reuse research, whereas fewer than 10 percent of the clients with under 500 programmers do.26 He reports that in industries with the greatest reuse potential, reusability research (not deployment) “is active and energetic, even if not yet totally successful.” Ed Yourdon reports a software house in Manila that has 50 of its 200 programmers building only reusable modules for the rest to use; “I’ve seen a few cases—adoption is due to organizational factors such as the reward structure, not technical factors.”
DeMarco tells me that the availability of mass-market packages and their suitability as providers of generic functions such as database systems has substantially reduced both the pressure and the marginal utility of reusing modules of one’s application code. “The reusable modules tended to be the generic functions anyway.”
Reuse is something that is far easier to say than to do. Doing it requires both good design and very good documentation. Even when we see good design, which is still infrequently, we won’t see
the components reused without good documentation.
Ken Brooks comments on the difficulty of anticipating which generalization will prove necessary: “I keep having to bend things even on the fifth use of my own personal user-interface library.”
Real reuse seems to be just beginning. Jones reports that a few reusable code modules are being offered on the open market at prices between 1 percent and 20 percent of the normal development costs.27 DeMarco says,
I am becoming very discouraged about the whole reuse phenomenon. There is almost a total absence of an existence theorem for reuse. Time has confirmed that there is a big expense in making things reusable.
Yourdon estimates the big expense: “A good rule of thumb is that such reusable components will take twice the effort of a ‘one-shot’ component.” I see that expense as exactly the effort of productizing the component, discussed in Chapter 1. So my estimate of the effort ratio would be threefold.
Clearly we are seeing many forms and varieties of reuse, but not nearly so much of it as we had expected by now. There is still a lot to learn.
Learning Large Vocabularies—A Predictable but Unpredicted Problem for Software Reuse
The higher the level at which one thinks, the more numerous the primitive thought-elements one has to deal with. So programming languages are much more complex than machine languages, and natural languages are more complex still. Higher-level languages have larger vocabularies, more complex syntax, and richer semantics.
As a discipline, we have not pondered the implications of this fact for program reuse. To improve quality and productivity, we want to build programs by composing chunks of debugged function that are substantially higher than statements in programming languages. Therefore, whether we do this by object class libraries or procedure libraries, we must face the fact that we are radically raising the sizes of our programming vocabularies. Vocabulary learning constitutes no small part of the intellectual barrier to reuse.
So today people have class libraries with over 3000 members. Many objects require specification of 10 to 20 parameters and option variables. Anyone programming with that library must, learn the syntax (the external interfaces) and the semantics (the detailed functional behavior) of its members if they are to achieve all of the potential reuse.
This task is far from hopeless. Native speakers routinely use vocabularies of over 10,000 words, educated people far more.
Somehow we learn the syntax and very subtle semantics. We correctly differentiate among giant, huge, vast, enormous, mammoth; people just do not speak of mammoth deserts or vast elephants.
We need research to appropriate for the software reuse problem the large body of knowledge as to how people acquire language. Some of the lessons are immediately obvious:
- People learn in sentence contexts, so we need to publish many examples of composed products, not just libraries of parts.
- People do not memorize anything but spelling. They learn syntax and semantics incrementally, in context, by use.
- People group word composition rules by syntactic classes, not by compatible subsets of objects.
Fred Brooks, The Mythical Man-Month, pg. 222
Posts Tagged software design
High-level language. The chief reasons for using a high-level language are productivity and debugging speed. We have discussed productivity earlier (Chapter 8). There is not a lot of numerical evidence, but what there is suggests improvement by integral factors, not just incremental percentages.
The debugging improvement comes from the fact that there are fewer bugs, and they are easier to find. There are fewer because one avoids an entire level of exposure to error, a level on which one makes not only syntactic errors but semantic ones, such as misusing registers. The bugs are easier to find because the compiler diagnostics help find them and, more important, because it is very easy to insert debugging snapshots.
For me, these productivity and debugging reasons are overwhelming. I cannot easily conceive of a programming system I would build in assembly language.
Well, what about the classical objections to such a tool? There are three: It doesn’t let me do what I want. The object code is too big. The object code is too slow.
As to function, I believe the objection is no longer valid. All testimony indicates that one can do what he needs to do, but that it takes work to find out how, and one may occasionally need unlovely artifices.
As to space, the new optimizing compilers are beginning to be very satisfactory, and this improvement will continue.
As to speed, optimizing compilers now produce some code that is faster than most programmer’s handwritten code, Furthermore, one can usually solve speed problems by replacing from one to five percent of a compiler-generated program by handwritten substitute after the former is fully debugged.
Fred Brooks, The Mythical Man-Month, pg. 135
What Documentation Is Required?
Different levels of documentation are required for the casual user of a program, for the user who must depend upon a program, and for the user who must adapt a program for changes in circumstance or purpose.
To use a program. Every user needs a prose description of the program. Most documentation fails in giving too little overview. The trees are described, the bark and leaves are commented, but there is no map of the forest. To write a useful prose description, stand way back and come in slowly:
- Purpose. What is the main function, the reason for the pro-gram?
- Environment. On what machines, hardware configurations, and operating system configurations will it run?
- Domain and range. What domain of input is valid? What range of output can legitimately appear?
- Functions realized and algorithms used. Precisely what does it do?
- Input-output formats,, precise and complete.
- Operating instructions, including normal and abnormal ending behavior, as seen at the console and on the outputs.
- Options. What choices does the user have about functions?
Exactly how are those choices specified?
- Running time. How long does it take to do a problem of specified size on a specified configuration?
- Accuracy and checking. How precise are the answers expected to be? What means of checking accuracy are incorporated?
Often all this information can be set forth in three or four pages. That requires close attention to conciseness and precision. Most of this document needs to be drafted before the program is written, for it embodies basic planning decisions.
To believe a program. The description of how it is used must be supplemented with some description of how one knows it is working. This means test cases.
Every copy of a program shipped should include some small test cases that can be routinely used to reassure the user that he has a faithful copy, accurately loaded into the machine.
Then one needs more thorough test cases, which are normally run only after a program is modified. These fall into three parts of the input data domain:
- Mainline cases that test the program’s chief functions for commonly encountered data.
- Barely legitimate cases that probe the edge of the input data domain, ensuring that largest possible values, smallest possible values, and all kinds of valid exceptions work.
- Barely illegitimate cases that probe the domain boundary from the other side, ensuring that invalid inputs raise proper diagnostic messages.
To modify a program. Adapting a program or fixing it requires considerably more information. Of course the full detail is required, and that is contained in a well-commented listing. For the modifier, as well as the more casual user, the crying need is for a clear, sharp overview, this time of the internal structure. What are the components of such an overview?
- A flow chart or subprogram structure graph. More on this later.
- Complete descriptions of the algorithms used, or else references to such descriptions in the literature.
- An explanation of the layout of all files used.
- An overview of the pass structure—the sequence in which data or programs are brought from tape or disk—and what is accomplished on each pass.
- A discussion of modifications contemplated in the original design, the nature and location of hooks and exits, and discursive discussion of the ideas of the original author about what modifications might be desirable and how one might proceed. His observations on hidden pitfalls are also useful.
Fred Brooks, The Mythical Man-Month, pg. 165
The Flow-Chart Curse
The flow chart is a most thoroughly oversold piece of program documentation. Many programs don’t need flow charts at all; few programs need more than a one-page flow chart.
Flow charts show the decision structure of a program, which is only one aspect of its structure. They show decision structure rather elegantly when the flow chart is on one page, but the over view breaks down badly when one has multiple pages, sewed together with numbered exits and connectors.
The one-page flow chart for a substantial program becomes essentially a diagram of program structure, and of phases or steps. As such it is very handy. Figure 15.1 shows such a subprogram structure graph.
Of course such a structure graph neither follows nor needs the painfully wrought ANSI flow-charting standards. All the rules on box shapes, connectors, numbering, etc. are needed only to give intelligibility to detailed flow charts. The detailed blow-by-blow flow chart, however, is an obsolete nuisance, suitable only for initiating beginners into algorithmic thinking. When introduced by Goldstine and von Neumann, the little boxes and their contents served as a high-level language, grouping the inscrutable machine-language statements into clusters of significance. As Iverson early recognized, in a systematic high-level language the clustering is already done, and each box contains a statement (Fig. 15.2). Then the boxes themselves become no more than a tedious and space-hogging exercise in drafting; they might as well be eliminated. Then nothing is left but the arrows. The arrows joining a statement to its successor are redundant; erase them. That leaves only GO TO’s. And if one follows good practice and uses block structure to minimize GO TO’s, there aren’t many arrows, but they aid comprehension immensely. One might as well draw them on the listing and eliminate the flow chart altogether.
In fact, flow charting is more preached than practiced. I have never seen an experienced programmer who routinely made detailed flow charts before beginning to write programs. Where organization standards require flow charts, these are almost invariably done after the fact. Many shops proudly use machine programs to generate this “indispensable design tool” from the completed code. I think this universal experience is not an embarrassing and deplorable departure from good practice, to be acknowledged only with a nervous laugh. Instead it is the application of good judgment, and it teaches us something about the utility of flow charts.
The Apostle Peter said of new Gentile converts and the Jewish law, “Why lay a load on [their] backs which neither our ancestors nor we ourselves were able to carry?” (Acts 15:10, TEV). I would say the same about new programmers and the obsolete practice of flow charting.
Fred Brooks, The Mythical Man-Month, pg. 168 (Emphasis mine)
A basic principle of data processing teaches the folly of trying to maintain independent files in synchronism. It is far better to combine them into one file with each record containing all the information both files held concerning a given key.
Yet our practice in programming documentation violates our own teaching. We typically attempt to maintain a machine-readable form of a program and an independent set of human-readable documentation, consisting of prose and flow charts.
The results in fact confirm our teachings about the folly of separate files. Program documentation is notoriously poor, and its maintenance is worse. Changes made in the program do not promptly, accurately, and invariably appear in the paper.
The solution, I think, is to merge the files, to incorporate the documentation in the source program. This is at once a powerful incentive toward proper maintenance, and an insurance that the documentation will always be handy to the program user. Such programs are called self-documenting.
Now clearly this is awkward (but not impossible) if flow charts are to be included. But grant the obsolescence of flow charts and the dominant use of high-level language, and it becomes reasonable to combine the program and the documentation.
The use of a source program as a documentation medium imposes some constraints. On the other hand, the intimate availability of the source program, line by line, to the reader of the documentation makes possible new techniques. The time has come to devise radically new approaches and methods for program documentation.
As a principal objective, we must attempt to minimize the burden of documentation, the burden neither we nor our predecessors have been able to bear successfully.
An approach. The first notion is to use the parts of the program that have to be there anyway, for programming language reasons, to carry as much of the documentation as possible. So labels, declaration statements, and symbolic names are all harnessed to the task of conveying as much meaning as possible to the reader.
A second notion is to use space and format as much as possible to improve readability and show subordination and nesting.
The third notion is to insert the necessary prose documentation into the program as paragraphs of comment. Most programs tend to have enough line-by-line comments; those programs produced to meet stiff organizational standards for “good documentation” often have too many. Evert these programs, however, are usually deficient in the paragraph comments that really give intelligibility and overview to the whole thing.
Since the documentation is built into the structure, naming, and formats of the program, much of it must be done when the program is first written. But that is when it should be written. Since the self-documentation approach minimizes extra work, there are fewer obstacles to doing it then.
Fred Brooks, The Mythical Man-Month, pg. 169