Analysis Tools and Michael Hicks

Dated: 

21 September 2016

Before really discussing category mistakes in computer science in follow-up posts, I will first continue testing my aforementioned categories on the writings of people I admire the most. I shall take a 2014 blog post, written by Michael Hicks in connection with the Heartbleed bug. (His post was written on July 1st, 2014 and I accessed it on September 19th, 2016.)

To recapitulate, and based largely on the work of Raymond Turner, I distinguish between three separate categories:

  • computers are concrete physical objects,
  • computer programs and also computer programming languages are technical artefacts, and
  • Turing machines, finite state machines, and prime numbers are abstract objects.

Furthermore, based on the writings of James Moore, James Fetzer, Timothy Colburn, Donald MacKenzie (and others whom I have cited repeatedly in previous blog posts), at the very least, a conceptual distinction is in order between a "mathematical program" and a "computer program". Based on Turner's work, I can say today that the former belongs to the category of abstract objects while the latter belongs to the category of technical artefacts.

Now, coming to the blog post of Michael Hicks, I have two observations to make. First, he writes about:

"full formal verification, the end result of which is a proof that the code will always behave as it should. Such an approach is being increasingly viewed as viable."

So Hicks says we can have a mathematical proof about the behavior of an engineered system. Fetzer already made clear that this is not possible. The best we can do is prove a mathematical property about some mathematical model of a running system. And doing so can, indeed, be of great value in practice, as I experience daily. Hicks can only be right (and Fetzer is then wrong) if the mathematical object at hand and the engineered system under scrutiny belong to the same category.

Have I taken Hicks's words out of context? Am I wrong in linking Hicks's statement to that of John Reynolds and Dr. X? Judge for yourself. Links to statements made by Tony Hoare and other great minds in computer science have already been provided by Fetzer, Colburn, and MacKenzie.

Regardless of whether you think Fetzer and I are wrong, at least we can all agree on the following historiographical observation: the best of the best in computer science have largely ignored the writings of philosophers of computer science. They do not mention Fetzer and the like. Perhaps they have some very good reasons not to do so (and perhaps they do not).

 

Soundness & Completeness

My second point concerning Hicks's blog post is about the alleged practical implications of Rice's Theorem. This is where my own thoughts are put center stage, for the writings of Fetzer and the other aforementioned philosophers do not cover computability theory.

Hicks provides definitions of a "sound analysis" and a "complete analysis" — definitions that I will scrutinize later. Hicks then provides the following statement which I honestly have difficulty comprehending:

Ideally, we would have an analysis that is both sound and complete [Yes], so that it reports all true bugs, and nothing else [No!].

Even if we would have a mathematical analysis that is both sound and complete, then this mathematical accomplishment cannot guarantee something about the real world, including the behavior of the engineered system under scrutiny — unless, again, the mathematical object and the engineered system belong to the same category (and, moreover, we can specify absolutely everything about the engineered system in a concise and useful manner).

Is Hicks's reasoning flawed or am I missing something? This is a genuine question. In my words, a sound and complete analysis can provide engineers extra confidence that their system will behave appropriately in the real world.

Let me just mention at this point that Edsger Dijkstra, Aad van Wijngaarden, and several others would have mathematically modeled a computer program with a Turing-incomplete model of computation and, specifically, with a finite state machine. So I also struggle with Hick's suggestion to only use Turing-complete languages when mathematically modeling computer programming languages. I believe he makes that suggestion in the following passage:

Unfortunately, such an [ideal] analysis [which is both sound and complete] is impossible for most properties of interest, such as whether a buffer is overrun (the root issue of Heartbleed). This impossibility is a consequence of Rice's theorem, which states that proving nontrivial properties of programs in Turing-complete languages is undecidable. So we will always be stuck dealing with either unsoundness or incompleteness.

I definitely agree with Hicks that if we use Turing-complete languages, then we cannot ignore Rice's Theorem, "which states that proving nontrivial properties" of our mathematical programs (expressed in our Turing-complete language) "is undecidable." But most engineers I know don't have a preference for

  • sticking to one modeling language only, nor do they
  • advocate a Turing-complete language per se.

Contrary to programming language specialists, engineers don't want to attach precisely one meaning to each computer program and, likewise, to each computer programming language. A technical artefact can be mathematically modeled in more than one way. Each model has its pros and cons. I can model a computer with both a finite state machine and a linear bounded automaton. Likewise, I can model my C computer program in multiple, complementary ways; e.g., with a finite state machine, with primitive recursive functions, and with general recursive functions. The richness lies in the multitude of ways to mathematically model reality.

Do these relatively simple philosophical principles of programming languages have any bearing on Michael Hicks and his community of researchers?

 

Hicks's definitions

The root of the confusion, I believe, lies in Hicks's defintions. Here is what Hicks says about a "sound analysis:"

  • A sound analysis is one that, if there exists an execution that manifests a bug at run-time, then the analysis will report the bug.

I would like to emphasize that Hicks's "execution" is a mathematical object and, moreover, it is a mathematical object in Hicks's chosen model of computation. (Somebody else can choose another model of computation). Of course the word "execution" refers to a "real execution" but the reference is only indirect and the indirection can be made more explicit for the sake of conceptual clarity by distinguishing between "mathematical executions" and "real executions". The former belongs to the category of abstract objects while the latter belongs to the category of concrete physical objects.

Again, a sound analysis says something about the mathematical program and only indirectly something about the computer program under scrutiny. Furthermore, since someone else can choose another mathematical program to model the same computer program, it is misleading to suggest that there is a one-to-one mapping between the computer program and the chosen mathematical program.

Similar remarks can be made about Hicks's definition of a "complete analysis:"

On the flip side, a complete analysis is one that, if it reports a bug, then that bug will surely manifest at run-time.

There is a categorical distinction to be made between a mathematically modeled bug and a bug encountered during the execution of a real-world program. They are not the same thing.

Am I right to conclude that Hicks and other programming language specialists oftentimes think they are directly referring to both their mathematical model and the actual computer program?

Tags: