An Exercise in Unplugging the Church-Turing Thesis [Part 2]

Submitted by egdaylight on Wed, 02/13/2019 - 10:41

Dated:

13 February 2019

In my previous blog post I provided my latest work on the Church-Turing Thesis; that is, a paper which I submitted for peer review in December 2018 to a computer science journal. In my paper, I introduce the ADM model of computation and claim that it is more powerful than the traditional Turing Machine (TM) model of computation. In the first week of February 2019, I received three potentially good reasons (from colleagues, via email) why I might have made a mistake or overlooked something in my paper. Each received critique along with my rebuttal is presented in a separate section below.

Operationally, there is no difference between TM's and ADM's

The critique goes like this:

1.) Any ADM can be simulated by a standard TM. The data structure in Definition 11 and the subsequent description of how an ADM machine executes a move (step of computation) are clearly implementable in any general purpose model of computation (such as Turing Machines).

Therefore:

2.) The argument outlined at the bottom of page 2, and which is used in the proof of Theorem 25, does not show that ADM's produce a strictly larger class of partial numerical functions f than are computed by TM's.

I approve Statement 1 on page 4 and in the rest of my paper. Operationally, there is no difference between TM's and ADM's. Loosely speaking, Statement 1 is about the operational semantics of ADM's, which is no different from the operational semantics of TM's. To be more precise, the “next move” relation between two instantaneous descriptions is (intrinsically) the same for both ADM's and TM's. That is why Statement 1 is correct.

Does Statement 1 imply Statement 2? The answer is “yes” according to the person who made the above remark. In contrast, an underlying theme in my paper is, that:

Statement 1 (which is about the operation of a machine) does not imply Statement 2 (which is about the denotation f of a machine).

Indeed, in computer science it is perfectly possible, that:

The denotations of machines M and N are functions f and g, respectively.
Machine M operationally simulates machine N (and vice versa). That is, M and N have the same input/output behavior at the syntactic level of strings.
Functions f and g are not equal to each other.

So, if there is only one message I can get across about my paper, then it is this:

I have deliberately ensured that no operational distinction can be made between ADM's and TM's. I have, however, arranged the denotational semantics of both kinds of machines to be different. The operational semantics (of both ADM's and TM's) is incomplete: it excludes the interpretation (made by the human observer) of the raw syntax. The denotational semantics, in contrast, is complete in this sense and thus sits in the driver's seat.

How, then, do the denotations between ADM's and TM's differ? Can I explain informally? Yes. The difference amounts to:

A.) In my ADM account, you can look at the output tape when the machine says it has produced a new output symbol. (In fact, you can always observe the output tape.)

B.) In the traditional TM story, you are prohibited from looking at the output tape if the machine has not (yet) halted.

The output tape (in both A and B) is a write-only tape, and output symbols are printed from left to right only.

To further understand the denotational difference between ADM's and TM's, here's how you can misuse an ADM machine so that it does no more than a TM:

Feed the finite input to the machine.
Press the “go” button.
Close your eyes until the machine says “I have halted.”
Then, and only then, open your eyes and read the finite output.

In Step 3 you, the user, are asked to consistently ignore the following kind of feedback from the machine: “I already have part of the final output for you to read.” Surely, humans use machines in adherence to the ADM approach, not the TM way. That is, humans do look at the intermediate, definite output symbols and they do something with them while the machine continues to operate. Likewise, an ADM can resemble a mathematician better than a TM. For, we look at the mathematician's intermediate output results (= lemmas), awaiting the mathematician's grand theorem.

At this point the critical reader might object that my non-standard denotational semantics has to comply with “locality” and “finite-in-time” constraints. I agree and my ADM model does exactly that. After all, my ADM model is almost completely identical to the traditional TM model.

An important remark is, that, I use the words “TM model” to refer to the raw TM syntax (= quintuples, tape squares, ...), the next-move function (cf. operation semantics), and the “implements-a-function” relation (cf. denotational semantics). Concerning the latter, function f is, in my paper, consistently called the “semantics” or the “functional semantics” of the TM in hand. The distinction I make between “syntax” and “semantics” complies with standard terminology used in both linguistics and logic. And I deliberately do not use the terms “operational semantics” and “denotational semantics” in my paper, for they are foreign terms to many recursive function theorists.

Coming now to the comment (made above in Statement 2) about my Theorem 25. My proof of that theorem is in character no different from any other classical diagonal argument in computer science, in that I refer to both the denotation of a machine and its operation. The person who provided the above critique has chosen to only compare the operational semantics of the two kinds of machines used in my proof. He has not scrutinized my complete mathematical argument, which explicitly refers to functions and the different ways in which ADM's and TM's implement them.

Again:

As with any classical diagonal argument, the denotation (semantics) of each machine has to be taken into account.

My remark thus also holds for the diagonal arguments presented in standard computer science textbooks. The whole point, after all, is to prove theorems about static, mathematical objects; that is, the partial numerical functions f; that is, the denotations.

Begin of Intermezzo.

One could also use a machine M as a language acceptor. In this case the denotation of M is a set of natural numbers. I also discuss this second kind of denotation of my ADM's in my paper, and contrast it with the one used in standard computability textbooks.

One could, like Alan Turing in 1936, use a machine M to calculate the infinite digits that represent a particular real number. In this case the denotation of M is a real number. I also discuss this third kind of denotation in my paper.

My point is that it is tempting, and not always harmless, to ignore the distinction between abstract objects (denotations) and concrete representations (syntax) — as Turing came to realize (even more) after submitting his 1936 paper (for publication), which led him to write his 1937 correction: a beautiful explanation is provided in [3]. Did Turing, in writing, explicitly distinguish between the operation of a machine and its denotation when spelling out his diagonal argument? I take the answer to be “no,” nor was this necessary.

End of Intermezzo.

The person who provided the above critique is an expert in programming language semantics. So, here's a more refined way for me to rebut:

For most programming language experts, the operational semantics of a programming language L is given precedence over the denotational semantics of L, should there be a discrepancy between the two [1, Section 4.2]. However, in computability theory, when experts (such as Hopcroft & Ullman) reason about computability per se, they are (often implicitly) giving priority to the denotation f of a machine. The simulation of one machine's operations by another is merely a tool to prove something about partial numerical functions (or other denotations).

The implicit assumption, made by the person who approves both Statements 1 and 2, goes as follows: There is no discrepancy between the operational semantics of raw TM's and any conceivable, realistic, denotational semantics of raw TM's. This assumption is false. In my paper I contrast the standard, functional denotation of a raw TM (cf. Hopcroft & Ullman) with a non-standard, functional denotation (which brings us to my ADM story). They are not the same. And then I rigorously prove that ADM's do produce a strictly larger class of partial numerical functions f than are computed by TM's — contrary to Statement 2.

My submitted paper has been rejected without peer review. (This comes as no surprise. Recently, I read an article by Paul Vitanyi in which he explains that it took about a decade for him to get some of his ideas officially accepted [2].) I had a very brief correspondence with the editor of the journal. His professional feedback is well intended. I have paraphrased all of his technical remarks in Statements 1 and 2 above. I take him to be reading my proof of Theorem 25 merely in terms of an operational semantics. From that restricted perspective, his reasoning makes sense to me. Unfortunately, consistently eschewing the distinction between a machine's simulation and a machine's denotation is unwarranted, also when thoroughly studying classical textbooks.

I don't think somebody with the profile of Martin Davis will use Statements 1 and 2 to scrutinize my research findings. This brings me to the following critique of my work.

Aren't you merely shifting the problem on defining the possible values of computable functions?

The second critique again pertains to my informal explanation at the bottom of page 2, and it goes like this:

3.) The fact that ADM V, in case of non-termination, outputs 0 (i.e. bottom), only shifts the problem on defining the possible values of computable functions.

4.) In my view, machine V in this case simply does not return a value.

First of all, with regard to Statement 3, also terminating ADM's are allowed to output the string 0. Second, concerning Statement 4, what does “return” mean in the phrase “a machine that returns a value”? Does the machine have to halt in order to “return” a value? The classical answer is “yes.” And my ADM's do not comply with this classical understanding of what an algorithm entails. But I quote Moshe Vardi, Robin Milner et al. in my paper; that is, researchers who have embraced a non-classical understanding of the word algorithm (for several decades now). In this setting, my ADM's fit the bill.

Two more detailed comments now follow in connection with both Statements 3 and 4. First of all:

In classical computability theory, the “undefined value” is not represented in the syntax; that is, on the tape of a Turing machine.

In my paper I consistently represent the “undefined value” with the empty string.

(And see Remark 9 on page 13 if one wishes to represent the “undefined value” with a non-empty string, although that would complicate some matters.)

In other words: the “undefined value” can be represented in, say, bits on the tapes of my machines. The caveat is that my machines will never halt with the empty string as output. This comes with an increase, not a loss, of generality concerning the partial numerical functions (that are "algorithmic").

Second, if the critical reader insists that only “bottom” and not, say, the natural number 0, may be associated with non-terminating computation, then she is absolutely right that my ADM model of computation should be rejected. However, there is no harm done in associating multiple natural numbers with non-terminating behavior. Nothing in computability theory precludes this.

How will you ever know?

The third critique is already addressed at length in my paper. It goes like this:

5.) How will you ever know that the “real” output of your ADM V (as in Theorem 25) is 0 or 00 or 01? Seeing just 0 on the output tape you will never know if this is the final output.

I counter the question posed in Statement 5 with another one:

How will you ever know that your favorite TM computation halts?

You don't know in the general case, and thus also not in a classical diagonal argument pertaining to TM's. (The point of diagonalization is that you don't know.) So, the critique in Statement 5 also holds for TM's, not just ADM's. It is the infinite tape which is the source of this “problem,” not my ADM's. And both TM's and ADM's have infinite tapes.

Coming to another concern raised in Statement 5, a direct practical application of a Turing machine that diagonalizes does not exist either. The profit of diagonalizers (be they ADM's or TM's) is that they help the pure mathematician to prove impossibility results — which, if used meticulously, can have industrial implications.

To recap, if you want to use ADM's in practice, then you will have to introduce constraints, just like we do with TM's. In practice, nobody waits forever to observe a computation.

Perhaps the following refined remarks help:

A.) For every TM which computes a partial function f, there is an ADM which computes the same function f.

B.) There is an ADM which computes a partial function g, which no TM computes.

Statement A can be strengthened:

For every TM which computes a partial function f, there is an ADM which computes the same f by halting on each (encoded) input x for which f(x) is well-defined.

So, anything you want to do with Turing machines, you can also do with well-chosen ADM's. Colloquially speaking: there are plenty of ADM's that don't have the annoying “printing later” effect that is alluded to in Statement 5. For every halting TM there is an ADM that mimics the beautiful, finite behavior of that TM and partially computes the same function; that is, has the same denotation.

References

[1] R. Turner and N. Angius, The Philosophy of Computer Science, The Stanford Encyclopedia of Philosophy (Spring 2019 Edition), E.N. Zalta (ed.), forthcoming URL = <https://plato.stanford.edu/archives/spr2019/entries/computer-science/>.
[2] S. Haldar and P. Vitanyi, Bounded Concurrent Timestamp Systems Using Vector Clocks, Journal of the ACM, Vol. 49, No. 1, January 2002, pp. 101-126.
[3] G. Gherardi, Alan Turing and the Foundations of Computable Analysis, The Bulleting of Symbolic Logic, 17(3):394-430, 2011.

Hasty Afterthoughts

When you open Michael Sipser's 2006 textbook – Introduction to the Theory of Computation – on page 177: he discusses Cantor's diagonal argument.

He carefully remarks that a real number x can be represented in more than one way (e.g. in decimal notation).
So one cannot diagonalize out correctly without taking this constraint into account. Because the proof has to work for the denotations (= the real numbers), not merely their syntactic representations.
From a macro perspective, Sipser's remarks are similar in kind to Turing's insights in 1937, which are discussed in [3].

Coming to the Hopcroft & Ullman book from 1979, entitled Introduction to Automata Theory, Languages and Computation. The authors:

Make a remark similar to Turing 1937 w.r.t. numerical functions but not w.r.t. `fomal languages'.
Indeed, much to my surprise, Hopcroft & Ullman don't distinguish between strings and naturals when dealing with `formal languages' — that's, presumably, what “formal” means in `formal languages'.

Abiding by 2. is fine as long as the fusion of denotation and representation is justified. But no justification is given. Hopcroft & Ullman implicitly assume that their TM's work in the classical input/output way with halting. That's perfectly reasonable, and then you can build a whole theory on top of that (which I call mainstream computer science). But the conflation is not warranted as soon as we peruse more general forms of computation: where we look inside the machine a bit while it is operating. An operational semantics does not capture these nuances. That says someting about the limitations of operational semantics, not of my paper.

If you, the reader, have made it up to here (in the year 2019), then chances are you are not a computer scientist.

1 Comment

McCarthy and Shapiro (1987)

Submitted by egdaylight on Fri, 02/15/2019 - 18:55

Coming again to Section 1 in this blog post and the person X who provided the corresponding critique. Observe the following:

In 1987, T. McCarthy & S. Shapiro proved that their "extended TM's " are recursive in the halting problem of ordinary TM's.
If I follow X's line of reasoning, then we have this absurd result: since ordinary TM's can also operationally simulate the "extended TM's" (for the same reasons provided in Statement 1), it follows that TM's can solve their own Halting Problem.

The critical reader will notice that:

My ADM machines are more powerful than ordinary TM's and less powerful than the aforementioned "extended TM's".
This makes perfect sense if the do's and dont's of each model of computation are compared with each other:
1. ADM's are restricted "extended TM's" in that they are not allowed to overwrite their output.
2. The "extended TM's" of McCarthy & Shapiro are allowed to overwrite their output (albeit only a finite number of times).