Lecture 14: General Grammars and Computability

# CS-3110: Formal Languages and Automata

## General Grammars and Computability

### Chapters 4.6, 5

---

# Recall: Context-Free Grammars

A formal (context-free) grammar is a four-tuple:

$$
(V,\Sigma, P, S)
$$

* `$V$` is a set of **non-terminal symbols** ("variables" in our grammar)

* `$\Sigma$` is a set of **terminal symbols** (the actual "symbols" in our produced words, i.e. the alphabet)

* `$P$` is a set of **production rules** of the form `$A \rightarrow w$` with `$A \in V$` and `$w \in (V\cup \Sigma)^\ast$`

* `$S$` is one of the non-terminal symbols from V, and called the **start symbol**

---

# Rewriting System

A concrete example:

$$
V = \\{E, X\\} \\\\
\Sigma = \\{a,b,\ldots, z, +, *, (, )\\} 
$$

$$
P = \\{ E \rightarrow X, E \rightarrow E + E\\\\
E \rightarrow E * E, E \rightarrow ( E )\\\\
X \rightarrow a, X \rightarrow b, \ldots, X \rightarrow z \\}
$$

$$
S = E
$$

---

# Let's look at the rules

`$P$` is a set of **production rules** of the form `$A \rightarrow w$` with `$A \in V$` and `$w \in (V\cup \Sigma)*$`

* On the left hand-side we have a non-terminal symbol

* On the right hand-side we have a sequence of terminals and non-terminals

* What if we relaxed this?

---

# General Grammars

* Instead of requiring a single non-terminal on the left side of a rule, we allow a *sequence* of non-terminal **and** terminal symbols

* Only limitation: There has to be **at least one** non-terminal symbol

* Why? So we know when we are "done"

* What can we do with this?

Let's try `$ \{a^n \cdot b^n \cdot c^n | n \in \mathbb{N} \} $`

---

# General Grammars

$$
S \rightarrow S ABC\\\\
S \rightarrow X\\\\
BA \rightarrow AB\\\\
CB \rightarrow BC\\\\
CA \rightarrow AC\\\\
XA \rightarrow aX\\\\
X \rightarrow Y \\\\
YB \rightarrow bY\\\\
Y \rightarrow Z \\\\
ZC \rightarrow cZ\\\\
Z \rightarrow \varepsilon
$$

---

# General Grammars

$$
S \rightarrow S ABC\\\\
S \rightarrow X
$$

This produces (intermediate) strings in the form:

X, XABC, XABCABC, XABCABCABC, ...

We have the right amount of (non-terminal!) "A", "B" and "C", but in the wrong order. And there is an X at the beginning

---

# General Grammars

$$
BA \rightarrow AB\\\\
CB \rightarrow BC\\\\
CA \rightarrow AC
$$

These rules allow us to reorder A, B and C, (only) **into the correct order**, e.g.

$$
\begin{aligned}
XAB\color{red}{CA}BC \Rightarrow& XAB\color{red}{AC}BC\\\\
        \Rightarrow& XA\color{red}{AB}CBC\\\\
        \Rightarrow& XAAB\color{red}{BC}C
\end{aligned}
$$

---

# General Grammars

$$
XA \rightarrow aX\\\\
X \rightarrow Y
$$

These rules "move" the `X` through a sequence of `A`s, replacing each `A` with the terminal `a`, and finally replacing the `X` with a `Y`.

$$
YB \rightarrow bY\\\\
Y \rightarrow Z
$$

These rules do the same for `Y`, moving it through `B`s, and replacing `Y` with `Z`.

---

# General Grammars

$$
ZC \rightarrow cZ\\\\
Z \rightarrow \varepsilon
$$

These rules, finally, are responsible for "placing" the `c`s, and removing the `Z` at the end.

---

# General Grammars

**Important**: While there is a rule to convert an `X` into a `Y`, we **can not** apply it before all `A`s are gone, or we will be unable to get rid of the `A`s anymore (similarly for `B`s and `C`s).

The words a grammar generates are **exactly** the words we can get by applying rules **in any order** (when they are applicable) that results in no non-terminal symbols being left!

---

# General Grammars

* So what can we do with these grammars?

* How about this language?

$$
L = \\{a^n | n\:\text{is prime}\\}
$$

This is actually possible!

But we are not going to write the actual grammar for it :(

---

# Grammars as computation

* Why would grammars be able to "compute" something like prime numbers?

* What did we just do for `$\{a^n b^n c^n | n \in \mathbb{N}\}$`?

* We basically used the non-terminal symbols as "memory"

* First, we "stored" the number of `a`, `b` and `c` (as the non-terminals A, B and C)

* Then we rearranged them into the correct order (which is kind of like "sorting")

---

# Grammars as computation

* We view our current string as our "memory"

* The rules tell us how we can transform/update (part of) our memory

* That's basically how programming works!

* But which "instructions" do we have?

---

# Context-Free Grammars

* In Context-Free Grammars we always had one non-terminal symbol on the left side of a rule

* This means, our "instructions" were basically writing to memory locations (potentially creating more)

* But we could never really combine values from multiple memory locations

* In the equivalent automaton that manifested itself as not being able to access arbitrary memory

---

# General Grammars

* Now we allow multiple terminal and non-terminal symbols on the left side of a rule

* This means we can do things like "addition" (`$0P0 \rightarrow 0R0, 0P1 \rightarrow 1R0, 1P0 \rightarrow 1R0, 1P1 \rightarrow 0R1$`)

* We could also do "math" with terminal symbols

.left-column[
$$
S \rightarrow XY\\\\
X \rightarrow BXa\\\\
Ba \rightarrow aaB\\\
BY \rightarrow Y\\\\
X \rightarrow \varepsilon, Y \rightarrow \varepsilon
$$
]

---

# Turing Machines

---

# Turing Machines

* Our Pushdown Automata had a simple stack as memory

* This has the advantage that it is simple

* The disadvantage is that we can't access "arbitrary" memory

* That was exactly why we could not recognize e.g. `$\{a^n b^n c^n | n \in \mathbb{N}\}$`

* So let's relax that!

---

# Memory

* We will still stay simple

* That makes it easier to prove things about our automaton

* So we will have "memory", but the automaton can only access one entry at a time

* It can then move left or right in our memory

* You can think of this like an infinite "tape", where the automaton is a read/write head that can move along the tape

---

# Turing Machines

* We still keep the general structure the same: We have states and transitions between states

* Instead of reading input separately, we just put it as the initial contents of our memory/tape

* Transitions then depend on the symbol on the tape we are currently looking at

* Each transition writes a new symbol (which may be the same one we just read), and moves left or right

---

# Turing Machines

[Turing Machine in TOC](https://www.geeksforgeeks.org/turing-machine-in-toc/)

---

# Turing Machines

Formally,  Turing Machine is a 4-tuple: `$M = (Q, \Lambda, q_0, \delta)$`

* `$Q$` is a set of states, which includes a special final (halting) state `h`

* `$\Lambda$` is the tape alphabet, which includes a special symbol for "blank" (#)

* `$q_0 \in Q$` is the initial state

* `$\delta: (Q\setminus \{h\})\times \Lambda \mapsto \Lambda \times \{L,R\} \times Q$` is the transition function

---

# Transition Function

$$
\delta: (Q\setminus \\{h\\})\times \Lambda \mapsto \Lambda \times \\{L,R\\} \times Q
$$

We take:

* The current state (which can not be the final state)
  
  * The symbol that is currently on the tape 
  
and produce:

* A new symbol to write on the tape 
  
  * A direction to move to (left or right)
  
  * The new state of the machine 
  
---

# Variations

This is the formulation from the textbook. Note:

* There is only one halting state; we could add more than one without any problems 
 
 * The read/write head must always move left or write. We could also allow it to stay where it is without a problem 
 
 * The transition function is, well, a **function**: The machine is deterministic

---

# Non-Determinism

* As with our other automata, we can have deterministic or non-deterministic Turing machines

* Deterministic: For every state/tape symbol there is exactly one option

* Non-Deterministic: There may be multiple (or no) options for a given state/tape symbol combination

* Deterministic and Non-Deterministic Turing Machines can do the exact same computations
  
---

# Turing Machines

* So how do we compute anything with this?

* Let's start with a simple problem: Adding two binary numbers

* Input: Two binary numbers, separated by a plus (+)

* Output: The sum of the two numbers

---

# Turing Machine: Intuition

* How do we add two binary numbers?

* We go digit by digit, starting from the right

* 0+0 = 0, 1+0 = 1, 1+0 = 1, 1+1 = 0, carry a 1

* Our Turing Machine has to move back and forth

---

# Turing Machine: Picking up the last digit

<center>
<img src="/CS3110/assets/img/pickupdigit.png" width="60%"/>
</center>
1. Move right until you get to a blank

2. Move one to the left

3. Read digit and change into one of two states

---

# Turing Machine: Adding Digits

---

# Turing Machine: The Rest

* Then we move to the left, until we have passed both numbers

* Write the next digit of the sum

* Move right again

* Repeat

* One detail: You need to add another case when you start with a carried 1

---

# The Entire Machine

---

# The Entire Machine

* Download the JFLAP file [/CS3110/assets/adder.jff](/CS3110/assets/adder.jff)

* You can input things like "1011+1101=" and it will add the two numbers

* Note: It only works with two numbers of the same length (why?)

* Demo Time!

---

# Computability

---

# Limits of Turing Machines?

* We can do pretty complex tasks with Turing Machines

* But what are the limits?

* Or asked another way: How could we extend this model of computation?

* First, is there anything we can't possible compute?

---

# The Halting Problem

* Say you wrote a program X doing some complex calculation

* You give it some input and then you wait ... and wait ... and wait

* You want to know: Will this ever finish its computation?

* Maybe your IDE or debugger could tell you?

---

# The Halting Problem

* What we want: A program H that reads a program (e.g. X), and some input and tells you if it will ever terminate with that input

* H should work for *any* program as input, not just some special cases

* And if it's a program, we can run it from within other programs

* So let's try something

---

# A Strange Program

* Let's write a program D that gets a program M as input and then does:

* Call H on M, with M as the input (this is not a typo!)

* If H says that M terminates with M as input, go into an infinite loop

* Otherwise terminate

---

# A Strange Program

* Let's write a program D that gets a program M as input and then does:

* Call H on M, with M as the input (this is not a typo!)

* If H says that M terminates with M as input, go into an infinite loop

* Otherwise terminate

What happens if we call D with itself as input?

---

# The Halting Problem

* We appear to have reached a contradiction: D(D) does not terminate if H says that D(D) terminates (and vice versa)

* We didn't even talk about Automata or Turing Machines

* There is *no way* we could implement H

* This means there are programs that we can describe but not implement, they are **undecidable**

---

# Other Undecideable Problems

* Program equivalence: Do two programs produce the exact same output, for all possible inputs

* Does a program terminate for *every* possible input

* Given a set of axioms, can we prove a given theorem

* etc.

---

# Undecideable

* What Undecideable means is that we can not decide between a "yes" and a "no" answer

* Technically, we could write a program that says "yes", but may not be able to reach a "no" answer

* This is called "semi-decideable"

* What does this have to do with Turing Machines?

---

# Church-Turing Thesis

* Thesis: Turing Machines can compute anything that is computable

* Other models are equivalent to Turing Machines:

- Lambda Calculus
  
  - General Grammars
  
  - Register Machines 
  
  - Most programming languages (**assuming infinite memory**)
  
---

# Brainf*ck

BF is an esoteric programming language. It runs on a "tape" (similar to a Turing machine) of integers, and has only 8 instructions:

* `+` and `-`: Add or subtract one from the current cell

* < and >: Move one space left or right

* `.` and `,`: Write and read the current cell as a character

* `[` and `]`: Loop as long as the current cell is not zero

Hello World: 
```
++++++++ [>++++ [>++>+++>+++>+<<<<-] >+>+>->>+ [<]<-]
>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
```

---

# Reality

---

# Efficient Computation

* We briefly mentioned non-deterministic Turing-machines

* They do not solve the decideability-problem either

* But they do change one thing: Efficiency

* Basically, they can try options "in parallel"

---

# A Simple Problem

* Say there are some cities connected by highways

* You know the distances between the cities

* Someone gives you a maximum distance, and you have to determine if you can visit all cities by driving less (in total) than that distance

* Can you do this efficiently?

---

# Efficient Computation

* What do we mean by "efficiently" (in this case)?

* At this point we care about the "general" case, where we have n cities

* Say we have a program that solves this problem in "quadratic time"

* This means, when we have twice as many cities, it takes 4 times as long as before

* Another program may need 9 times as long for twice as many cities, and use cubic time, etc.

* Whichever (**fixed**) exponent we have, we call this *polynomial time*

---

# Exponential Time

* What if we have a program that needs exponential time, e.g. `$2^n$`?

* If we add **one** city, our program now needs **twice** as long

* If we add **10** cities, we need **1024** times as long

* If we add **100** cities, the sun will be extinguished long before we even make it halfway through

---

# Polynomial Time vs. Exponential Time

* There are some problems that can not be solved in polynomial time, for example determining if a program (given as a binary) will terminate within k steps

* There are some problems which we have polynomial time algorithms for (sorting an array, for example)

* But there is an important third class: problems for which we do not *have* a polynomial time algorithm, but don't know whether one exists or not

---

# Traveling Salesman

* Take our city tour example, called the Traveling Salesman Problem

* A naive solution would be to try every possible ordering, but `$n! > 2^n$`

* There are some smarter algorithms, but they also require exponential time

* What no one knows: Is there a polynomial time algorithm

---

# Solution Verification

* Now imagine someone *gives* you a potential solution

* You can easily (= in polynomial time) **verify** if it is an actual solution

* While we don't know if computing a solution can be done efficiently, we know that verifying it can

---

# Non-Determinism

* Now recall Non-Deterministic Turing Machines

* We can use the non-determinism to "try" all possible solutions "in parallel"

* Then we *check* each solution and return one that works (or none)

* So we **can** solve this problem efficiently on a Non-Deterministic Turing Machine

---

# P vs. NP

* `$P$` is the set of all problems that can be solved efficiently on a Deterministic Turing Machine

* `$NP$` is the set of all problems that can be solved efficiently on a Non-Deterministic Turing Machine

* Question: Is P = NP?

* In other words: Can we simulate a Non-Deterministic Turing Machine **efficiently** on a Deterministic Turing Machine

* Or: If we know that we can **verify** a solution of a problem efficiently, does that mean that we can also compute one efficiently, or not?

---

# P vs. NP

* It is unknown whether P = NP or not

* If you figure it out, you can get $1 000 000 (+ probably life-time appointment at a university or research institute of your choice)

$$
P \subseteq \mathit{NP} \subseteq \mathit{PSPACE} \subseteq \mathit{EXPTIME}
$$

We only know for sure that P is not the same as EXPTIME

---

# Traveling Salesman

([Source](https://xkcd.com/399/))

---

# References

* [Online Brainf*ck Interpreter](https://copy.sh/brainfuck/)

* [Epigrams on Programming](http://pu.inf.uni-tuebingen.de/users/klaeren/epigrams.html)

* [P vs. NP Problem](https://www.claymath.org/millennium-problems/p-vs-np-problem)

* [All PSPACE-complete Planning Problems are Equal but some are more Equal than Others](https://www.ida.liu.se/~chrba09/Papers/socs11.pdf)