class: center, middle # CS-3110: Formal Languages and Automata ## Non-Context Free Grammars ### Chapter 4.5 --- # Pumping Lemma Recall the pumping lemma for regular languages: * All strings longer than some threshold can be cut into three parts `xyz` * We can then repeat `y` arbitrarily often, and the resulting string will still be in the language * This holds for **all** regular languages * We usually use this to show that a language is **not** regular --- # Pumping Lemma: Example Consider the language: $$ L = \\{a^i b^j c ^k | i,j,k\in \mathbb{N} \wedge j \le i \wedge i \le k \\} $$ The number of `a`s in a word is between the number of `b`s and the number of `c`s (and they occur sequentially). --- class: center, middle # Context-Free Languages --- # Limitations of Context-Free Grammars Is this a valid class in Java? ``` public class Foo { public Bar item; } ``` -- It depends! What does "valid" mean? --- # Context `public Bar item;` is "valid" if we simply check types and variable names against the rules for identifiers (`Bar` is a legal identifier, as is `item`) If our goal is to recognize "legal Java programs", we need to perform an additional check: Is `Bar` even a type? Type definitions are **context** that needs to be evaluated, but that would require arbitrary storage! --- class: mmedium # In Practice * We will not discuss automata for non-context free languages in detail, because they are messy * In practice, a compiler often works in two parts: 1. Parse the code input following purely syntactic rules into an initial parse tree (syntax) 2. Assign meaning to the individual elements of the parse tree (semantics) * The first part can (usually) be done using a context-free grammar * The second part requires language-specific checks that would be cumbersome to describe in a more formal way --- class: center, middle # The Pumping Lemma for Context-Free languages --- # Pumping Lemma Recall the pumping lemma for regular languages: * All strings longer than some threshold can be cut into three parts `xyz` * We can then repeat `y` arbitrarily often, and the resulting string will still be in the language * How did we get this? --- # Pigeons!
Author:
en:User:BenFrantzDale
; this image by
en:User:McKay
--- # Pigeons? * Every regular language can be described a finite automaton * If we read a long word, we will visit some state twice * The automaton has no memory, so we can just repeat this loop * How would this apply to Context-Free Languages?! --- # The Pumping Lemma for CFLs * Context-free grammars were one way how we could recognize context-free languages * By derivation we can arrive at a parse tree representing a given string * We added children to nodes in the parse tree by expanding non-terminal symbols with grammar rules * But there are no "loops" in a tree?! --- # Parse Trees!
--- # A Parse Tree * Consider the parse tree for a "long" word * The tree started with the start symbol, and piece by piece we expand it with our rules * Since there are only finitely many non-terminal symbols, at least one of them has to repeat if we have a sufficiently large tree * Let's look at a path that contains a non-terminal exactly twice! --- # Repeated Non-Terminals What does it mean for a Non-Terminal to repeat (exactly) twice? * We had more than one possible choice for a rule between the two occurrences (or for the non-terminal itself) * And we chose differently the second time * (If we had chosen the same, we would have gotten to the same non-terminal a third time) * So what could we do? Repeat the same choice again (and again, and again, ...) --- # The Pumping Lemma for CFLs
--- # The Pumping Lemma for CFLs If `\(L\)` is a context-free language, there exists an n, such that, for every word `\(w \in L\)` with length at least n: * `\(w = u\cdot x\cdot y\cdot z\cdot v\)` where `\(|x\cdot y \cdot z| < n\)` * `\(|x\cdot z| > 0\)` * `\(u\cdot x^i \cdot y \cdot z^i \cdot v \in L\)` for **every** possible `\(i\)` --- # The Pumping Lemma for CFLs
--- # The Pumping Lemma for CFLs
--- # Application * As before, we will rarely use this to prove that a language is context-free * Instead, we will use it to prove that a language is not by using the contrapositive * If the pumping lemma does not hold for a language, it is not context free * Note: If the pumping lemma **does** hold, that alone is no guarantee that a language is context free! --- # Example Show: `\(L = \{a^j\cdot b^j \cdot c^j | j\in \mathbb{N} \}\)` is not context-free * Let us assume it was a context-free language * This means that the pumping lemma would hold! * We will show that that will lead to a contradiction --- # Pumping Lemma There exists an n, such that, for every word `\(w \in L\)` with length at least n: * `\(w = u\cdot x\cdot y\cdot z\cdot v\)` where `\(|x\cdot y \cdot z| < n\)` * `\(|x\cdot z| > 0\)` * `\(u\cdot x^i \cdot y \cdot z^i \cdot v \in L\)` for any `\(i\)` So this must be the case for `\(a^n b^n c^n\)`. --- # Pumping $$ w = a^n b^n c^n = u\cdot x\cdot y\cdot z\cdot v\\\\ |x\cdot y \cdot z| < n $$ * This means `\(x \cdot y \cdot z\)` can **not** contain `a`s, `b`s, **and** `c`s! * But then `\(x^2\)` and `\(z^2\)` would **not** add `a`s, `b`s **and** `c`s to the word, creating an imbalance. * The resulting word would not be in the language, which is a contradiction. * The resulting word would not be in the language, which is a contradiction. * Therefore, `\(L = \{a^j\cdot b^j \cdot c^j | j\in \mathbb{N} \}\)` is not context-free --- class: mmedium # Pumping Palindromes * Why does the pumping lemma work for palindromes? * `\(u\)` and `\(v\)` are not restricted, so we can take `\(x \cdot y \cdot z\)` from anywhere within a word! * Say our palindrome has length at least 3: Then we can choose `y` to be the center character if the palindrome has even length, and the empty word otherwise, `x` and `z` are the character directly before and after it * For example: `\(w = 101101101\)`, then `\(u = 101, x = 1, y = 0, z = 1, v = 101\)`, and adding more `\(x\)` **and** `\(z\)` still produces a palindrome, e.g. 101 111 0 111 101 --- # Another Question Is this valid C++? ```C++ int main() { MyType<1,2,3>::SubType x; return 0; } ``` This is actually **impossible** to determine in the general case! * Next time we will briefly talk about general grammars, and how grammars can be used for computation * We will also briefly discuss Turing machines