class: center, middle # CS-3110: Formal Languages and Automata ## General Grammars and Computability ### Chapters 4.6, 5 --- class: medium # Recall: Context-Free Grammars A formal (context-free) grammar is a four-tuple: $$ (V,\Sigma, P, S) $$ * `\(V\)` is a set of **non-terminal symbols** ("variables" in our grammar) * `\(\Sigma\)` is a set of **terminal symbols** (the actual "symbols" in our produced words, i.e. the alphabet) * `\(P\)` is a set of **production rules** of the form `\(A \rightarrow w\)` with `\(A \in V\)` and `\(w \in (V\cup \Sigma)^\ast\)` * `\(S\)` is one of the non-terminal symbols from V, and called the **start symbol** --- # Rewriting System A concrete example: $$ V = \\{E, X\\} \\\\ \Sigma = \\{a,b,\ldots, z, +, *, (, )\\} $$ $$ P = \\{ E \rightarrow X, E \rightarrow E + E\\\\ E \rightarrow E * E, E \rightarrow ( E )\\\\ X \rightarrow a, X \rightarrow b, \ldots, X \rightarrow z \\} $$ $$ S = E $$ --- # Let's look at the rules `\(P\)` is a set of **production rules** of the form `\(A \rightarrow w\)` with `\(A \in V\)` and `\(w \in (V\cup \Sigma)*\)` * On the left hand-side we have a non-terminal symbol * On the right hand-side we have a sequence of terminals and non-terminals * What if we relaxed this? --- # General Grammars * Instead of requiring a single non-terminal on the left side of a rule, we allow a *sequence* of non-terminal **and** terminal symbols * Only limitation: There has to be **at least one** non-terminal symbol * Why? So we know when we are "done" * What can we do with this? -- Let's try `\( \{a^n \cdot b^n \cdot c^n | n \in \mathbb{N} \} \)` --- # General Grammars $$ S \rightarrow S ABC\\\\ S \rightarrow X\\\\ BA \rightarrow AB\\\\ CB \rightarrow BC\\\\ CA \rightarrow AC\\\\ XA \rightarrow aX\\\\ X \rightarrow Y \\\\ YB \rightarrow bY\\\\ Y \rightarrow Z \\\\ ZC \rightarrow cZ\\\\ Z \rightarrow \varepsilon $$ --- # General Grammars $$ S \rightarrow S ABC\\\\ S \rightarrow X $$ This produces (intermediate) strings in the form: X, XABC, XABCABC, XABCABCABC, ... We have the right amount of (non-terminal!) "A", "B" and "C", but in the wrong order. And there is an X at the beginning --- # General Grammars $$ BA \rightarrow AB\\\\ CB \rightarrow BC\\\\ CA \rightarrow AC $$ These rules allow us to reorder A, B and C, (only) **into the correct order**, e.g. $$ \begin{aligned} XAB\color{red}{CA}BC \Rightarrow& XAB\color{red}{AC}BC\\\\ \Rightarrow& XA\color{red}{AB}CBC\\\\ \Rightarrow& XAAB\color{red}{BC}C \end{aligned} $$ --- # General Grammars $$ XA \rightarrow aX\\\\ X \rightarrow Y $$ These rules "move" the `X` through a sequence of `A`s, replacing each `A` with the terminal `a`, and finally replacing the `X` with a `Y`. $$ YB \rightarrow bY\\\\ Y \rightarrow Z $$ These rules do the same for `Y`, moving it through `B`s, and replacing `Y` with `Z`. --- # General Grammars $$ ZC \rightarrow cZ\\\\ Z \rightarrow \varepsilon $$ These rules, finally, are responsible for "placing" the `c`s, and removing the `Z` at the end. --- # General Grammars **Important**: While there is a rule to convert an `X` into a `Y`, we **can not** apply it before all `A`s are gone, or we will be unable to get rid of the `A`s anymore (similarly for `B`s and `C`s). The words a grammar generates are **exactly** the words we can get by applying rules **in any order** (when they are applicable) that results in no non-terminal symbols being left! --- # General Grammars * So what can we do with these grammars? * How about this language? $$ L = \\{a^n | n\:\text{is prime}\\} $$ -- This is actually possible! -- But we are not going to write the actual grammar for it :( --- # Grammars as computation * Why would grammars be able to "compute" something like prime numbers? * What did we just do for `\(\{a^n b^n c^n | n \in \mathbb{N}\}\)`? * We basically used the non-terminal symbols as "memory" * First, we "stored" the number of `a`, `b` and `c` (as the non-terminals A, B and C) * Then we rearranged them into the correct order (which is kind of like "sorting") --- # Grammars as computation * We view our current string as our "memory" * The rules tell us how we can transform/update (part of) our memory * That's basically how programming works! * But which "instructions" do we have? --- # Context-Free Grammars * In Context-Free Grammars we always had one non-terminal symbol on the left side of a rule * This means, our "instructions" were basically writing to memory locations (potentially creating more) * But we could never really combine values from multiple memory locations * In the equivalent automaton that manifested itself as not being able to access arbitrary memory --- # General Grammars * Now we allow multiple terminal and non-terminal symbols on the left side of a rule * This means we can do things like "addition" (`\(0P0 \rightarrow 0R0, 0P1 \rightarrow 1R0, 1P0 \rightarrow 1R0, 1P1 \rightarrow 0R1\)`) * We could also do "math" with terminal symbols .left-column[ $$ S \rightarrow XY\\\\ X \rightarrow BXa\\\\ Ba \rightarrow aaB\\\ BY \rightarrow Y\\\\ X \rightarrow \varepsilon, Y \rightarrow \varepsilon $$ ] --- class: center, middle # Turing Machines --- # Turing Machines * Our Pushdown Automata had a simple stack as memory * This has the advantage that it is simple * The disadvantage is that we can't access "arbitrary" memory * That was exactly why we could not recognize e.g. `\(\{a^n b^n c^n | n \in \mathbb{N}\}\)` * So let's relax that! --- # Memory * We will still stay simple * That makes it easier to prove things about our automaton * So we will have "memory", but the automaton can only access one entry at a time * It can then move left or right in our memory * You can think of this like an infinite "tape", where the automaton is a read/write head that can move along the tape --- # Turing Machines * We still keep the general structure the same: We have states and transitions between states * Instead of reading input separately, we just put it as the initial contents of our memory/tape * Transitions then depend on the symbol on the tape we are currently looking at * Each transition writes a new symbol (which may be the same one we just read), and moves left or right --- # Turing Machines
[Turing Machine in TOC](https://www.geeksforgeeks.org/turing-machine-in-toc/) --- # Turing Machines Formally, Turing Machine is a 4-tuple: `\(M = (Q, \Lambda, q_0, \delta)\)` * `\(Q\)` is a set of states, which includes a special final (halting) state `h` * `\(\Lambda\)` is the tape alphabet, which includes a special symbol for "blank" (#) * `\(q_0 \in Q\)` is the initial state * `\(\delta: (Q\setminus \{h\})\times \Lambda \mapsto \Lambda \times \{L,R\} \times Q\)` is the transition function --- class: medium # Transition Function $$ \delta: (Q\setminus \\{h\\})\times \Lambda \mapsto \Lambda \times \\{L,R\\} \times Q $$ We take: * The current state (which can not be the final state) * The symbol that is currently on the tape and produce: * A new symbol to write on the tape * A direction to move to (left or right) * The new state of the machine --- # Variations This is the formulation from the textbook. Note: * There is only one halting state; we could add more than one without any problems * The read/write head must always move left or write. We could also allow it to stay where it is without a problem * The transition function is, well, a **function**: The machine is deterministic --- # Non-Determinism * As with our other automata, we can have deterministic or non-deterministic Turing machines * Deterministic: For every state/tape symbol there is exactly one option * Non-Deterministic: There may be multiple (or no) options for a given state/tape symbol combination * Deterministic and Non-Deterministic Turing Machines can do the exact same computations --- # Turing Machines * So how do we compute anything with this? * Let's start with a simple problem: Adding two binary numbers * Input: Two binary numbers, separated by a plus (+) * Output: The sum of the two numbers --- # Turing Machine: Intuition * How do we add two binary numbers? * We go digit by digit, starting from the right * 0+0 = 0, 1+0 = 1, 1+0 = 1, 1+1 = 0, carry a 1 * Our Turing Machine has to move back and forth --- # Turing Machine: Picking up the last digit
1. Move right until you get to a blank 2. Move one to the left 3. Read digit and change into one of two states --- # Turing Machine: Adding Digits
--- class: medium # Turing Machine: The Rest * Then we move to the left, until we have passed both numbers * Write the next digit of the sum * Move right again * Repeat * One detail: You need to add another case when you start with a carried 1 --- # The Entire Machine
--- # The Entire Machine * Download the JFLAP file [/CS3110/assets/adder.jff](/CS3110/assets/adder.jff) * You can input things like "1011+1101=" and it will add the two numbers * Note: It only works with two numbers of the same length (why?) * Demo Time! --- class: center, middle # Computability --- # Limits of Turing Machines? * We can do pretty complex tasks with Turing Machines * But what are the limits? * Or asked another way: How could we extend this model of computation? * First, is there anything we can't possible compute? --- # The Halting Problem * Say you wrote a program X doing some complex calculation * You give it some input and then you wait ... and wait ... and wait * You want to know: Will this ever finish its computation? * Maybe your IDE or debugger could tell you? --- # The Halting Problem * What we want: A program H that reads a program (e.g. X), and some input and tells you if it will ever terminate with that input * H should work for *any* program as input, not just some special cases * And if it's a program, we can run it from within other programs * So let's try something --- # A Strange Program * Let's write a program D that gets a program M as input and then does: * Call H on M, with M as the input (this is not a typo!) * If H says that M terminates with M as input, go into an infinite loop * Otherwise terminate --- # A Strange Program * Let's write a program D that gets a program M as input and then does: * Call H on M, with M as the input (this is not a typo!) * If H says that M terminates with M as input, go into an infinite loop * Otherwise terminate What happens if we call D with itself as input? --- # The Halting Problem * We appear to have reached a contradiction: D(D) does not terminate if H says that D(D) terminates (and vice versa) * We didn't even talk about Automata or Turing Machines * There is *no way* we could implement H * This means there are programs that we can describe but not implement, they are **undecidable** --- # Other Undecideable Problems * Program equivalence: Do two programs produce the exact same output, for all possible inputs * Does a program terminate for *every* possible input * Given a set of axioms, can we prove a given theorem * etc. --- # Undecideable * What Undecideable means is that we can not decide between a "yes" and a "no" answer * Technically, we could write a program that says "yes", but may not be able to reach a "no" answer * This is called "semi-decideable" * What does this have to do with Turing Machines? --- class: mmedium # Church-Turing Thesis * Thesis: Turing Machines can compute anything that is computable * Other models are equivalent to Turing Machines: - Lambda Calculus - General Grammars - Register Machines - Most programming languages (**assuming infinite memory**) --- class: mmedium # Brainf*ck BF is an esoteric programming language. It runs on a "tape" (similar to a Turing machine) of integers, and has only 8 instructions: * `+` and `-`: Add or subtract one from the current cell * < and >: Move one space left or right * `.` and `,`: Write and read the current cell as a character * `[` and `]`: Loop as long as the current cell is not zero Hello World: ``` ++++++++ [>++++ [>++>+++>+++>+<<<<-] >+>+>->>+ [<]<-] >>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++. ``` --- class: center, middle # Reality --- # Efficient Computation * We briefly mentioned non-deterministic Turing-machines * They do not solve the decideability-problem either * But they do change one thing: Efficiency * Basically, they can try options "in parallel" --- # A Simple Problem * Say there are some cities connected by highways * You know the distances between the cities * Someone gives you a maximum distance, and you have to determine if you can visit all cities by driving less (in total) than that distance * Can you do this efficiently? --- class: mmedium # Efficient Computation * What do we mean by "efficiently" (in this case)? * At this point we care about the "general" case, where we have n cities * Say we have a program that solves this problem in "quadratic time" * This means, when we have twice as many cities, it takes 4 times as long as before * Another program may need 9 times as long for twice as many cities, and use cubic time, etc. * Whichever (**fixed**) exponent we have, we call this *polynomial time* --- class: medium # Exponential Time * What if we have a program that needs exponential time, e.g. `\(2^n\)`? * If we add **one** city, our program now needs **twice** as long * If we add **10** cities, we need **1024** times as long * If we add **100** cities, the sun will be extinguished long before we even make it halfway through --- # Polynomial Time vs. Exponential Time * There are some problems that can not be solved in polynomial time, for example determining if a program (given as a binary) will terminate within k steps * There are some problems which we have polynomial time algorithms for (sorting an array, for example) * But there is an important third class: problems for which we do not *have* a polynomial time algorithm, but don't know whether one exists or not --- # Traveling Salesman * Take our city tour example, called the Traveling Salesman Problem * A naive solution would be to try every possible ordering, but `\(n! > 2^n\)` * There are some smarter algorithms, but they also require exponential time * What no one knows: Is there a polynomial time algorithm --- # Solution Verification * Now imagine someone *gives* you a potential solution * You can easily (= in polynomial time) **verify** if it is an actual solution * While we don't know if computing a solution can be done efficiently, we know that verifying it can --- # Non-Determinism * Now recall Non-Deterministic Turing Machines * We can use the non-determinism to "try" all possible solutions "in parallel" * Then we *check* each solution and return one that works (or none) * So we **can** solve this problem efficiently on a Non-Deterministic Turing Machine --- class: mmedium # P vs. NP * `\(P\)` is the set of all problems that can be solved efficiently on a Deterministic Turing Machine * `\(NP\)` is the set of all problems that can be solved efficiently on a Non-Deterministic Turing Machine * Question: Is P = NP? * In other words: Can we simulate a Non-Deterministic Turing Machine **efficiently** on a Deterministic Turing Machine * Or: If we know that we can **verify** a solution of a problem efficiently, does that mean that we can also compute one efficiently, or not? --- # P vs. NP * It is unknown whether P = NP or not * If you figure it out, you can get $1 000 000 (+ probably life-time appointment at a university or research institute of your choice) $$ P \subseteq \mathit{NP} \subseteq \mathit{PSPACE} \subseteq \mathit{EXPTIME} $$ We only know for sure that P is not the same as EXPTIME --- # Traveling Salesman
([Source](https://xkcd.com/399/)) --- # References * [Online Brainf*ck Interpreter](https://copy.sh/brainfuck/) * [Epigrams on Programming](http://pu.inf.uni-tuebingen.de/users/klaeren/epigrams.html) * [P vs. NP Problem](https://www.claymath.org/millennium-problems/p-vs-np-problem) * [All PSPACE-complete Planning Problems are Equal but some are more Equal than Others](https://www.ida.liu.se/~chrba09/Papers/socs11.pdf)