class: center, middle # CS-3110: Formal Languages and Automata ## Languages ### Chapter 3.1 --- class: center, middle # Example 1: Language Definitions --- class: mmedium1 ## Language Definitions Define the following languages using the *set notation* from class/the textbook, without any continuation dots (...), i.e. similar to how we defined the language of all palindromes, or the language of all even binary numbers: * The language of all words with length at least 2 over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same character * The language of all words over the alphabet `\(\Sigma = \{0, 1\}\)` that represent binary numbers (without leading zeroes) that are palindromes. * The language of all words over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same number of (distinct!) `\(a\)`s * The language of all words representing binary numbers (without leading zeroes) that are divisible by 3, over the alphabet `\(\Sigma = \{0, 1\}\)`. --- # Language 1 The language of all words over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same character * There are 3 characters, so there are really only three options * "options" sounds like "or" * "or" sounds like "set union"; recall: `\(x \in (A \cup B) \equiv x \in A \vee x \in B\)` * We just treat the three options separately and "or" them together --- # Language 1 The language of all words with length at least 2 over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same character $$ L_1 = (a\Sigma^\ast{}a) \cup (b\Sigma^\ast{}b) \cup (c\Sigma^\ast{}c) $$ --- # Language 1: Verification $$ L_1 = (a\Sigma^\ast{}a) \cup (b\Sigma^\ast{}b) \cup (c\Sigma^\ast{}c) $$ Let's check some words (**this is not a proof**): * `aa` should be valid * `a` should not be valid * `bcaaaaaab` should be valid * `bcaaaaa` should not be valid --- # Language 2 The language of all words over the alphabet `\(\Sigma = \{0, 1\}\)` that represent binary numbers (without leading zeroes) that are palindromes. * The first case to consider are the numbers 0 and 1, both of which are palindromes * All other valid words start with a 1 (no leading zeroes) * Because they are palindromes, they also have to end with a (second) 1 * In the middle we can have *any* palindrome --- # Language 2 The language of all words over the alphabet `\(\Sigma = \{0, 1\}\)` that represent binary numbers (without leading zeroes) that are palindromes. $$ L_2 = \\{0, 1\\} \cup \\{1w1 | w \in \Sigma^* \wedge w = w^R\\} $$ --- # Language 2 $$ L_2 = \\{0, 1\\} \cup \\{1w1 | w \in \Sigma^* \wedge w = w^R\\} $$ Let's check some words (**this is not a proof**): * `1` should be valid * `1011` should not be valid * `1111` should be valid * `10011` should not be valid --- # Language 3 The language of all words over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same number of (distinct!) `\(a\)`s * We need a way to "count" how many `\(a\)` there are at the start of the word to make sure the same number shows up at the end * In between we can have any characters * Except: Right after our memorized number of `\(a\)`s, and before the terminating `\(a\)`s there can't be any additional `\(a\)`s sneaking in that we don't count --- # Language 3 A first draft: $$ L_3^? = \\{a^nxa^n | x \in \Sigma^* \wedge n \in \mathbb{N} \\} $$ With this we have: * (Any) `\(n\)` `a`s * Something in between * The **same** `\(n\)` `a`s --- # Language 3 The language of all words over the alphabet `\(\Sigma = \{a,b,c\}\)` that start and end with the same number of (distinct!) `\(a\)`s $$ L_3^? = \\{a^nxa^n | x \in \Sigma^* \wedge n \in \mathbb{N} \\} $$ Valid words: $$ aabbaa (n=2)\\\\ bbcc (n = 0)\\\\ aaabba (n=1) $$ Oops... --- # Language 3 Recall: Right after our memorized number of `\(a\)`s, and before the terminating `\(a\)`s there can't be any additional `\(a\)`s sneaking in that we don't count (that's exactly what just happened) But: `\(aabacaa\)` **is** a valid word, so we can not just "ban" `a`s in the middle There are two cases: * There are no center `a`s * There are center `a`s --- # Language 3 Without any center `a`s: $$ L_a = \\{a^nxa^n | x \in \\{b,c\\}^\ast \wedge n \in \mathbb{N}\\} $$ With (possible) center `a`s: $$ L_b = \\{a^npxqa^n | p\in \\{b,c\\} \wedge q \in \\{b,c\\} \wedge x \in \Sigma^\ast \wedge n \in \mathbb{N}\\} $$ And then: $$ L_3 = L_a \cup L_b $$ --- # Language 3 $$ \begin{aligned} L_3 =&\:\\{a^nxa^n | x \in \\{b,c\\}^\ast \wedge n \in \mathbb{N}\\} \cup \\\\ &\:\\{a^npxqa^n | p\in \\{b,c\\} \wedge q \in \\{b,c\\} \wedge x \in \Sigma^\ast \wedge n \in \mathbb{N}\\} \end{aligned} $$ Let's check some words (**this is not a proof**): * `aaaa` should be valid * `aaaaa` should not be valid * `aabacaa` should be valid * `aabaa` should be valid --- class: medium # Language 4 The language of all words representing binary numbers (without leading zeroes) that are divisible by 3, over the alphabet `\(\Sigma = \{0, 1\}\)` * What "structure" do binary numbers that are divisible by 3 have? * Maybe we can investigate what happens when we construct them digit by digit * If we have a binary number, and we add a 0 to the end we double the value * If we have a binary number, and we add a 1 to the end, we double the value and add one --- # Language 4 Let us analyze what happens to binary numbers regarding divisibility by 3 when we add digits. Three cases: * The number has remainder 0 when divided by 3 (those are the numbers that we want) * The number has remainder 1 when divided by 3 * The number has remainder 2 when divided by 3 --- # Case 1 Let's start with numbers that are already divisible by 3. $$ n = 3k \equiv 0 \mod 3 $$ If we add a 0, we double the number: $$ 2n = 6k = 3 (2k) \equiv 0 \mod 3 $$ If we add a 1, we double and add 1: $$ 2n + 1 = 6k + 1 = 3(2k) +1 \equiv 1 \mod 3 $$ --- # Case 2 Next, numbers that have remainder 1 when divided by 3. $$ n = 3k + 1 \equiv 1 \mod 3 $$ If we add a 0, we double the number: $$ 2n = 6k + 2 = 3 (2k) +2 \equiv 2 \mod 3 $$ If we add a 1, we double and add 1: $$ 2n + 1 = 6k + 2 + 1 = 3(2k + 1) \equiv 0 \mod 3 $$ --- # Case 3 Finally, numbers that have remainder 2 when divided by 3. $$ n = 3k + 2 \equiv 2 \mod 3 $$ If we add a 0, we double the number: $$ 2n = 6k + 4 = 3 (2k + 1) + 1 \equiv 1 \mod 3 $$ If we add a 1, we double and add 1: $$ 2n + 1 = 6k + 4 + 1 = 3(2k + 1) + 2 \equiv 2 \mod 3 $$ --- # Summary
Add a
0
1
Remainder
0
0
1
1
2
0
2
1
2
And now? -- First observation: Our numbers have to start with a 1 Second observation: If we have a number that has remainder 1, the **only** way to get remainder 0 is to append a 1 Third observation: Since our number starts with a 1, it also has to have another 1 before the end Fourth observation: If we have a number with remainder 0, we can append as many 0s as we want --- # Language 4 A first draft: $$ L_4 = 1???10^* $$ What can happen in the middle? Let's look at a few patterns --- # Language 4
Add a
0
1
Remainder
0
0
1
1
2
0
2
1
2
If we have a number with remainder 1, and we append a `0`, it will have remainder 2. Any `1`s we append then will not change anything, so we need a `0` to get back to remainder 1: We can add `\(01^*0\)` If we add a 1, we get to remainder 0. If we have remainder 0, and we add a 1 again, we are at remainder 1, which is where we started (our first digit is a 1!) --- # Language 4 $$ L_4 = \\{0\\} \cup 1(01^\ast0)^\ast10^\ast (1(01^\ast0)^\ast10^\ast)^\ast $$ Let's look at this part again: $$ (1(01^\ast0)^\ast10^\ast)^\ast $$ * Has to start with a 1 (remainder 1) * The last 1 returns us to remainder 0 * The sequence `\(01^\ast0\)` takes us to remainder 2 and back * At the end we have a number with remainder 0, and we can just start over! * We just had to make sure that we go through this entire thing at least once! --- # Language 4 $$ L_4 = \\{0\\} \cup 1(01^\ast0)^\ast10^\ast (1(01^\ast0)^\ast10^\ast)^\ast $$ Let's check some words (**this is not a proof**): * `1111` (15) should be valid * `11110` (30) should be valid * `11111` (31) should not be valid * `10111` (23) should not be valid --- class: medium # Language 4 How would you go about **proving** that this is correct? Mathematical Induction! * n = the length of the word * Prove: Given a word of length n with remainder 0,1,2, adding a digit 0 or 1 will result in a correctly recognized word * In other words, you have to look at all 6 cases (current remainder + newly added digit), and prove that when the remainder becomes 0 by adding a digit the word will be accepted, and if the remainder becomes non-zero it will not be --- class: center, middle # Example 2: Proof --- # Example 2 Using the rules of set theory, show that the set E given below is a proper subset of `\(\Sigma^*\)` $$ E = 1 \Sigma^* 0 $$ (This is a part of our even binary numbers set) i.e. show that: $$ E \subset \Sigma^\ast $$ --- # What do we need to prove? "proper subset" means: * E is a subset of `\(\Sigma^*\)`, i.e. every element in E also has to be an element of `\(\Sigma^*\)` * It is a **proper** subset, i.e. there are elements in `\(\Sigma^*\)` that are not in E We will show these two properties one after the other --- # E is a subset What are the elements of E? $$ E = 1 \Sigma^\ast 0 = \\{1x0 | x \in \Sigma^\ast\\} $$ Therefore we need to show that `\(1x0\)` is also in `\(\Sigma^*\)`. $$ \Sigma^* = \Sigma^0 \cup \Sigma^1 \cup \Sigma^2 \cup \cdots $$ What are the elements of `\(\Sigma^*\)`? --- # Splitting up `\(\Sigma^n\)` Let us take an arbitrary `\(\Sigma^n\)` with `\(n \ge 2\)` $$ \begin{aligned} \Sigma^n =&\:\Sigma\Sigma^{n-2}\Sigma\\\\ \Sigma^\ast =&\: \Sigma^0 \cup \Sigma \cup \Sigma\Sigma^0\Sigma \cup \Sigma\Sigma^1\Sigma \cup \cdots \\\\ \Sigma^\ast =&\: \Sigma^0 \cup \Sigma \cup \Sigma\Sigma^\ast\Sigma \end{aligned} $$ And with this, for any `\(\sigma,\rho \in \Sigma\)`: $$ x \in \Sigma^\ast \rightarrow \sigma x \rho \in \Sigma^\ast $$ And this also holds for our concrete alphabet: $$ x \in \Sigma^\ast \rightarrow 1x0 \in \Sigma^\ast $$ --- # E is a subset What are the elements of E? $$ E = 1 \Sigma^\ast 0 = \\{1x0 | x \in \Sigma^\ast\\} $$ We just showed that when `\(x \in \Sigma^\ast\)`, then `\(1x0\)` is also in `\(\Sigma^*\)`. --- # The subset is proper * We still have to show that E is a **proper** subset of `\(\Sigma^*\)` * A subset is proper if the two sets are not equal, i.e. if **there is** an element in `\(\Sigma^*\)` that is not in E * We just need to find one! * Note: This is a proper proof, even though we just find one example --- # Counterexample Let us take the word `\(0\)`. It is an element of our alphabet `\(\Sigma\)` and therefore also in `\(\Sigma^\ast\)` However, it is not an element of E: $$ E = 1 \Sigma^* 0 $$ All elements of E start with a 1 (and end with a 0). Therefore the subset is proper. (There are actually infinitely many counterexamples, but we just needed to find one)