Where you might want to start reading ...

Is there something wrong with software architecture - or with us?

I am a software architect (one of a few) for a 20-million LOC business software, with currently a few thousand installations, developed and ...

Monday, May 15, 2017

Models, notations, and languages

In previous postings, I used the terms "notation" and "language" too sloppily. Here is a short explanation how I intend to use them in subsequent postings—I hope that this is in line with common usage:
  • Whenever we want to work with some real world things, we need a model of it. The model is a more or less rigid (mathematical) abstraction of the object(s) under consideration (which are called the "universe"). For example, a real CPU might be modelled via an abstract processor, which only considers its assembly-level commands, but not e.g. its heat emission. A model of an SQL database might only consider tables, columns and views, but "abstract away" triggers, stored procedures and everything else the vendor might have added as a feature.
  • A notation is some sort of symbols that adhere to some syntax. Many notations are linear text notations (all programming languages I know of), but there are graphical notations like UML's notation.
  • A language is a combination of a model and a notation, where the notation is mapped to the model or modifications of it.
The last definition implies that one can have many languages for the same model. Here is a simple example of this: Let our model (and also universe) be expressions of integral numbers, with e.g. subtraction, multiplication, and evaluation. Three possible notations are
  • parenthesized infix expressions, e.g. "(5 – 3) * (5 – 2)"
  • postfix expressions, e.g. "5 3 – 5 2 – *"
  • and a tree notation that shows the expression tree.
All three (and many other) notations can be mapped to the model in a way that they "compute the result" of an expression correctly.

Having a good notation is important, but getting notation right is better done by a series of experiments with real people than a conceptual process. I will, therefore, not be very stubborn about notations for architectural problems. On the other hand, I will try to find a single small representation and manipulation model for architectural problems; and then try to argue that the chosen model is sufficient and practical and, well, good.

In spite of my laissez-faire approach to notation, I do hold a few beliefs about notation that I will try to argue more or less emphatically.

The most important is that any notation must scale to large descriptions. Thus, it must be possible to describe, in a manageable and legible way, a system that consists of a "flat 1000 different parts". By "flat 1000 different parts", I mean that the notation must not force the writer and the reader to introduce any sort of abstractions solely because the notation becomes unwieldy. I call this the "telephone directory property": A useful notation must be capable of practically notating a large, boring list of slightly different things "just so".

As a special case, I will not consider any diagrammatic notations for the moment (later, I'll come back to diagrams). For almost the complete history of software engineering, people—intelligent people—have tried to come up with a graphical replacement for formal textual languages like programming languages. There is a complete theory and much practical experience with two-dimensional diagram languages—but on the whole, they have never replaced textual languages in anything but small, and often not-too critical, software systems. The reason is exactly that diagram notations do not have the "telephone directory property"—diagrams describing a 1000-part system are, for all purposes, unusable: They cannot be viewed easily (especially if they contain longer, winding line paths), cannot be printed easily, and they cannot be manipulated easily. The morale: Designing and maintaining diagrams that are not useless from the outset is very hard.

(If you think "UML", and especially "UML according to all the software architecture textbooks out there", I remind you that my focus is not the use of diagrams for informal or "semi-formal"—whatever that means—purposes. For this, many diagram notations are perfectly fine. But I consider only what I called "use case no.3", i.e., languages for describing and maintaining architectures that have a strict semantics that can be used to prove or maintain something interesting in a software system).

There are a few more aspects—important aspects— that will influence all the many parts I want to assemble for useful "rule-based architecturing", but in order to keep the suspense low, I will now immediately give away what my proposed model is: (Finite) directed graphs with labelled edges and nodes. I will not restrict this quite general model much more, except that I have to define the allowed labels. They are:
  • A node is identified by a label that is a tuple of strings. In addition, a node can have informational tags, each of which consist of a name and a real number.
  • An edge is only identified by the nodes at its ends. It has three counts that are called the overall count, the questionable count, and the bad count. In addition, it can also have name+number tags like nodes.
  • Both nodes and edges can have an arbitrary source information that is intended to find the object from which the node or edge were derived at some time.
For "historical reasons" (we invented the basics of this model some 10 years ago), I use the following terms:
  • Nodes are called items.
  • Edges are called dependencies, and the two items at the end of a dependency are called the using item and the used item.
  • The identifying strings of items are called values, and the non-identifying tags of items and dependencies are called markers.
I hope that these terms do confer a rough meaning of the purposes for which they are used.

But—for which purposes are they used?

No comments:

Post a Comment