Where you might want to start reading ...

Is there something wrong with software architecture - or with us?

I am a software architect (one of a few) for a 20-million LOC business software, with currently a few thousand installations, developed and ...

Wednesday, May 3, 2017

Purposes of architectural documentation disentangled

I have been a little unfair in my last posting: The eight pages on UML 2.0 in Gorton's "Essential Software Architecture" are more than a mere advertisement for that (then) new UML version 2.0—they do actually contain some core advice about how to document architectural aspects of a program. I'll try to extract a compact view of what architecture documentation is, in Gorton's and, I think, the mainstream architecture's textbooks' view, from these pages and the case study in chapter 7.

First of all, architecture documentation is a collection of artifacts for human beings only. This is in contrast to code, which is targeted both at the "machine" and at human readers. In the background, there looms the idea of model-driven architecture, where an architecture model is used to create code—essentially, a compiler for a new language on some "higher" level than standard programming languages. However, like the book, I will disregard this aspect right now and return to it somewhat later.

The clear target of providing information to humans has lead most of us to the use of informal diagrams and standard prose to describe the architectural aspects of a software—"simple box-and-arrow diagrams", as Gorton calls them. He claims that there is "an appropriate diagram key to give a clear meaning to the notation used" in his examples, but most diagrams in his chapters 1 to 5 don't have such a key, and in any case, most people drawing such diagrams don't include one. The problem with this is that any plan to derive hard facts from such diagrams is then doomed.

Now, one purpose of architecture documentation is to give someone a "feeling of the interplay of things", and for this purpose, informal diagrams with textual or oral explanations are perfectly fine and, I am quite sure, even preferable: They appeal to our intuitive approach to most problems, which includes working with somewhat unclear terms and their relations in order to limit thinking about tricky consequences, so that our mind is free to "suck in the universe" of the problem area at hand.

Maybe it should be noted that formal clarity, precise meaning and even "simple" (mathematical) consistency entail, in almost all cases, "hard thought work", as the history of mathematics has shown:
  • Geometry in the plane seems like an easy subject, until you start trying to understand its base and algorithms from Euclid's axioms and definitions, well over 2300 years old: There is nothing easy with concepts like parallels or ratios of line segment lengths! And later formalizations, mainly from about the 1800s onwards, are even more intricate.
  • The other, apparently so "simple" basis of mathematics, namely the natural numbers, lost its simplicity also in ancient times with some prime number theory by the Greeks. It was and is by no means obvious what can emerge from simple addition and multiplication, let alone from the algebraic structures and formalizations extracted in the 19th century, leading to Gödel's mind-bending encodings and Turing's work.
Let me state this in my "Axiom 1": Mathematics, by and large, is not what we want in software documentation (and that from me, who majored in theoretical computer science ...).

Still, it seems we all want something more than the informal box-and-arrow-diagrams.

Gorton, like many others, proposes the use of UML. I cannot help the feeling that he is not really happy about it. The summary of chapter 6 has the following two sentences:
I’m a bit of a supporter of using UML-based notations and tools for producing architecture documentation. The UML, especially with version 2.0, makes it pretty straightforward to document various structural and behavioral views of a design.
"A bit of a supporter", "pretty straightforward": This does not really sound like wholehearted endorsement.

So, what is the problem?

The problem is, in my humble opinion, that there is no clear picture of what a notation for architectural documentation should do. The described use-cases typically oscillate between a "better notation" for those informal, easily comprehensible overviews over some aspects of a software system, and a more formal notation that can help derive hard knowledge about a system, with that implied goal of "generating code" in model-driven approaches.

I am, after many years in the field, now certain that we have to structure the use cases for architectural documentation in a threefold classification, with different notations for each area:
  1. Informal documentation, from which humans can learn easily and intuitively gather a common understanding and a useful overview about some aspects of the system. In the best case, such a documentation is part of a common culture about "how we name and see things." However, this documentation is not intended to derive any hard facts: Everything shown can be disputed and discussed and viewed differently, and the notation can be extended at will if it helps with that intuitive understanding. All must agree that formal arguments based on such documentation are futile and hence must be avoided.
  2. Formally sound and precise documentation that can be used to derive invariants and definitive properties of the documented system. If such documentation is used as the basis for a tool-supported model-driven approach, then there is no difference between a descriptive and a prescriptive architectural documentation for the aspects covered by the process. However, such an approach is very expensive in more than one respect:
    • First, especially without full tool support, keeping such a documentation in line with the system is much work, as even tiny changes on one or both sides require precise updates.
    • Second, as software can exhibit very complex behavior, the notation must be capable of describing many and, usually, deep concepts, which makes it hard and "mathematical" to understand and even harder to write. Such documentation therefore blatantly contradicts "Axiom 1".
    • Last, on a conceptual level, it is not really clear that such a documentation is actually "documentation" in the sense of "humanly accessible information relevant for many decisions in the software life-cycle". Rather, it might be more of a formal specification or even—when used in a model-driven process with code generation—part of the implementation, albeit (maybe) on some higher or "more compact" level than standard programming languages.
Thus, rich informal and deep formal notations are not sufficient for documenting and arguing about architectural aspects of a software.
  1. Therefore, we need notations that are somewhere in-between: Not informal, so that they can be used to derive and ensure hard facts. But equally, they must be easily usable so that they can be read and written by the average software engineer under average project circumstances. It should be obvious that this type of notation cannot be very rich and also not very abstract. Only then, it can on the one hand avoid requiring an extensive semantics for formal derivations, and on the other hand being too esoteric to be used for understandable documents. In other words, it must be a quite mundane notation. I'll show my preferred notation for this, and its uses, in later postings—just in case you think that this looks a little like the search for the holy grail.
UML, incidentally and unfortunately, does not work really well for any of these purposes if its complex semantics is taken seriously:
  1. For an informal notation, it carries a too heavy backpack of that formal semantics which no-one wants to remember when drawing informative diagrams in a running text (as, e.g., in the case study in Gorton's book).
  2. For a formal notation, it is too indirect: One needs to map UML propositions back to the underlying semantic model (like Petri nets or state machines), and only then one can formally draw conclusions; as far as I can oversee it, the number of publications that use UML as a formal base has declined quite a bit over the last years.
  3. Finally, as a simple but yet strict notation, UML is much too baroque, because it was lobbied to include every useful diagram and icon. This large notational size would recommend it for many different informal diagrams—if it weren't for that formal semantics ballast ...
But even if  you think that UML does work well (or well enough) for one area, there is the danger of misinterpreting UML diagrams: Is a diagram which your team uses as a basis for a decision a "type 1." diagram?—then it conveys informal concepts, but does not limit the decision strictly or formally. A "type 2." or "type 3." diagram, on the other hand, would narrowly limit some choices you can make—and definitely require a formally (for "type 2.") or at least collectively (for "type 3.") approved update of the diagram for any change in the software or the architecture. But most diagrams do not spell out explicitly their "conformance level".

Nonetheless, our analysts and some of our developers and architects (including me) are happy enough to use UML as a pool of symbols for sketching explanatory diagrams that help us to keep our complex machinery at least somewhat documented. So yes, I am, and we are also "a bit of a supporter of using UML-based notations and tools", as Ian Gorton puts it.

But now, I feel, I am starting to owe you an explanation how to do architectural documentation better. The next posting ... well, after I wrote it, it turned out to still consider some general observations about software architecture and how we deal with it.

No comments:

Post a Comment