Presentation: Languages

# CS-3110: Formal Languages and Automata

## Languages

### Chapter 3.1

---

# Example 1: Language Definitions

---

## Language Definitions

Define the following languages using the *set notation* from class/the textbook, without any continuation dots (...), i.e. similar to how we defined the language of all palindromes, or the language of all even binary numbers:

* The language of all words with length at least 2 over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same character

* The language of all words over the alphabet `$\Sigma = \{0, 1\}$` that represent binary numbers (without leading zeroes) that are palindromes.

* The language of all words over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same number of (distinct!) `$a$`s

* The language of all words representing binary numbers (without leading zeroes) that are divisible by 3, over the alphabet `$\Sigma = \{0, 1\}$`.

---

# Language 1

The language of all words over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same character

* There are 3 characters, so there are really only three options

* "options" sounds like "or"

* "or" sounds like "set union"; recall: `$x \in (A \cup B) \equiv x \in A \vee x \in B$`

* We just treat the three options separately and "or" them together

---

# Language 1

The language of all words with length at least 2 over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same character

$$
L_1 = (a\Sigma^\ast{}a) \cup (b\Sigma^\ast{}b) \cup (c\Sigma^\ast{}c)
$$

---

# Language 1: Verification

$$
L_1 = (a\Sigma^\ast{}a) \cup (b\Sigma^\ast{}b) \cup (c\Sigma^\ast{}c)
$$

Let's check some words (**this is not a proof**):

* `aa` should be valid

* `a` should not be valid

* `bcaaaaaab` should be valid

* `bcaaaaa` should not be valid

---

# Language 2

The language of all words over the alphabet `$\Sigma = \{0, 1\}$` that represent binary numbers (without leading zeroes) that are palindromes.

* The first case to consider are the numbers 0 and 1, both of which are palindromes

* All other valid words start with a 1 (no leading zeroes)

* Because they are palindromes, they also have to end with a (second) 1

* In the middle we can have *any* palindrome

---

# Language 2

The language of all words over the alphabet `$\Sigma = \{0, 1\}$` that represent binary numbers (without leading zeroes) that are palindromes.

$$
L_2 = \\{0, 1\\} \cup \\{1w1 | w \in \Sigma^* \wedge w = w^R\\}
$$

---

# Language 2

$$
L_2 = \\{0, 1\\} \cup \\{1w1 | w \in \Sigma^* \wedge w = w^R\\}
$$

Let's check some words (**this is not a proof**):

* `1` should be valid

* `1011` should not be valid

* `1111` should be valid

* `10011` should not be valid

---

# Language 3

The language of all words over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same number of (distinct!) `$a$`s

* We need a way to "count" how many `$a$` there are at the start of the word to make sure the same number shows up at the end 
 
 * In between we can have any characters 
 
 * Except: Right after our memorized number of `$a$`s, and before the terminating `$a$`s there can't be any additional `$a$`s sneaking in that we don't count 
 
---

# Language 3

A first draft:

$$
L_3^? = \\{a^nxa^n | x \in \Sigma^* \wedge n \in \mathbb{N} \\}
$$

With this we have:

* (Any) `$n$` `a`s
  
  * Something in between 
  
  * The **same** `$n$` `a`s 
  
---

# Language 3

The language of all words over the alphabet `$\Sigma = \{a,b,c\}$` that start and end with the same number of (distinct!) `$a$`s

$$
L_3^? = \\{a^nxa^n | x \in \Sigma^* \wedge n \in \mathbb{N} \\}
$$

Valid words:

$$
aabbaa (n=2)\\\\
bbcc (n = 0)\\\\
aaabba (n=1) 
$$

Oops...

---

# Language 3

Recall: Right after our memorized number of `$a$`s, and before the terminating `$a$`s there can't be any additional `$a$`s sneaking in that we don't count (that's exactly what just happened)

But: `$aabacaa$` **is** a valid word, so we can not just "ban" `a`s in the middle

There are two cases:

* There are no center `a`s

* There are center `a`s

---

# Language 3

Without any center `a`s:

$$
L_a = \\{a^nxa^n | x \in \\{b,c\\}^\ast \wedge n \in \mathbb{N}\\}
$$

With (possible) center `a`s:
$$
L_b = \\{a^npxqa^n | p\in \\{b,c\\} \wedge q \in \\{b,c\\} \wedge x \in \Sigma^\ast \wedge  n \in \mathbb{N}\\}
$$

And then:
$$
L_3 = L_a \cup L_b
$$

---

# Language 3

$$
\begin{aligned}
L_3 =&\:\\{a^nxa^n | x \in \\{b,c\\}^\ast \wedge n \in \mathbb{N}\\} \cup \\\\
     &\:\\{a^npxqa^n | p\in \\{b,c\\} \wedge q \in \\{b,c\\} \wedge x \in \Sigma^\ast \wedge  n \in \mathbb{N}\\}
\end{aligned}
$$

Let's check some words (**this is not a proof**):

* `aaaa` should be valid

* `aaaaa` should not be valid

* `aabacaa` should be valid

* `aabaa` should be valid

---

# Language 4

The language of all words representing binary numbers (without leading zeroes) that are divisible by 3, over the alphabet `$\Sigma = \{0, 1\}$`

* What "structure" do binary numbers that are divisible by 3 have?

* Maybe we can investigate what happens when we construct them digit by digit

* If we have a binary number, and we add a 0 to the end we double the value

* If we have a binary number, and we add a 1 to the end, we double the value and add one

---

# Language 4

Let us analyze what happens to binary numbers regarding divisibility by 3 when we add digits.

Three cases:

* The number has remainder 0 when divided by 3 (those are the numbers that we want)
 
 * The number has remainder 1 when divided by 3
 
 * The number has remainder 2 when divided by 3
 
---

# Case 1

Let's start with numbers that are already divisible by 3.

$$
n = 3k \equiv 0 \mod 3
$$

If we add a 0, we double the number:

$$
2n = 6k = 3 (2k) \equiv 0 \mod 3
$$

If we add a 1, we double and add 1:
$$
2n + 1 = 6k + 1 = 3(2k) +1 \equiv 1 \mod 3
$$

---

# Case 2

Next, numbers that have remainder 1 when divided by 3.

$$
n = 3k + 1 \equiv 1 \mod 3
$$

If we add a 0, we double the number:

$$
2n = 6k + 2 = 3 (2k) +2 \equiv 2 \mod 3
$$

If we add a 1, we double and add 1:
$$
2n + 1 = 6k + 2 + 1 = 3(2k + 1) \equiv 0 \mod 3
$$

---

# Case 3

Finally, numbers that have remainder 2 when divided by 3.

$$
n = 3k + 2 \equiv 2 \mod 3
$$

If we add a 0, we double the number:

$$
2n = 6k + 4 = 3 (2k + 1) + 1 \equiv 1 \mod 3
$$

If we add a 1, we double and add 1:
$$
2n + 1 = 6k + 4 + 1 = 3(2k + 1) + 2 \equiv 2 \mod 3
$$

---

# Summary

<table width="60%" border="1">
<tr><th width="25%"></th><th width="20%">Add a</th><th>0</th><th>1</th></tr>
<tr><th>Remainder</th><th></th><th></th><th></th></tr>
<tr><td>0</td><td></td><td>0</td><td>1</td></tr>
<tr><td>1</td><td></td><td>2</td><td>0</td></tr>
<tr><td>2</td><td></td><td>1</td><td>2</td></tr>
</table>
 
And now?

First observation: Our numbers have to start with a 1

Second observation: If we have a number that has remainder 1, the **only** way to get remainder 0 is to append a 1

Third observation: Since our number starts with a 1, it also has to have another 1 before the end

Fourth observation: If we have a number with remainder 0, we can append as many 0s as we want

---

# Language 4

A first draft:

$$
L_4 = 1???10^*
$$

What can happen in the middle?

Let's look at a few patterns

---

# Language 4

If we have a number with remainder 1, and we append a `0`, it will have remainder 2. Any `1`s we append then will not change anything, so we need a `0` to get back to remainder 1: We can add `$01^*0$`

If we add a 1, we get to remainder 0. If we have remainder 0, and we add a 1 again, we are at remainder 1, which is where we started (our first digit is a 1!)

---

# Language 4

$$
L_4 = \\{0\\} \cup 1(01^\ast0)^\ast10^\ast (1(01^\ast0)^\ast10^\ast)^\ast 
$$

Let's look at this part again:

$$
(1(01^\ast0)^\ast10^\ast)^\ast
$$

* Has to start with a 1 (remainder 1)
* The last 1 returns us to remainder 0
* The sequence `$01^\ast0$` takes us to remainder 2 and back 
* At the end we have a number with remainder 0, and we can just start over!
* We just had to make sure that we go through this entire thing at least once!

---

# Language 4

$$
L_4 = \\{0\\} \cup 1(01^\ast0)^\ast10^\ast (1(01^\ast0)^\ast10^\ast)^\ast 
$$

Let's check some words (**this is not a proof**):

* `1111` (15) should be valid

* `11110` (30) should be valid

* `11111` (31) should not be valid

* `10111` (23) should not be valid

---

# Language 4

How would you go about **proving** that this is correct?

Mathematical Induction!

* n = the length of the word

* Prove: Given a word of length n with remainder 0,1,2, adding a digit 0 or 1 will result in a correctly recognized word

* In other words, you have to look at all 6 cases (current remainder + newly added digit), and prove that when the remainder becomes 0 by adding a digit the word will be accepted, and if the remainder becomes non-zero it will not be

---

# Example 2: Proof

---

# Example 2

Using the rules of set theory, show that the set E given below is a proper subset of `$\Sigma^*$`

$$
E = 1 \Sigma^* 0
$$

(This is a part of our even binary numbers set)

i.e. show that:

$$
E \subset \Sigma^\ast
$$

---

# What do we need to prove?

"proper subset" means:

* E is a subset of `$\Sigma^*$`, i.e. every element in E also has to be an element of `$\Sigma^*$`
  
  * It is a **proper** subset, i.e. there are elements in `$\Sigma^*$` that are not in E 
  
We will show these two properties one after the other

---

# E is a subset

What are the elements of E?

$$
E = 1 \Sigma^\ast 0 = \\{1x0 | x \in \Sigma^\ast\\}
$$

Therefore we need to show that `$1x0$` is also in `$\Sigma^*$`.

$$
\Sigma^* = \Sigma^0 \cup \Sigma^1 \cup \Sigma^2 \cup \cdots 
$$

What are the elements of `$\Sigma^*$`?

---

# Splitting up `$\Sigma^n$`

Let us take an arbitrary `$\Sigma^n$` with `$n \ge 2$`

$$
\begin{aligned}
\Sigma^n =&\:\Sigma\Sigma^{n-2}\Sigma\\\\
\Sigma^\ast =&\: \Sigma^0 \cup \Sigma \cup \Sigma\Sigma^0\Sigma \cup \Sigma\Sigma^1\Sigma \cup \cdots \\\\
\Sigma^\ast =&\: \Sigma^0 \cup \Sigma \cup \Sigma\Sigma^\ast\Sigma 
\end{aligned}
$$

And with this, for any `$\sigma,\rho \in \Sigma$`:

$$
x \in \Sigma^\ast \rightarrow \sigma x \rho \in \Sigma^\ast
$$

And this also holds for our concrete alphabet:

$$
x \in \Sigma^\ast \rightarrow 1x0 \in \Sigma^\ast
$$

---

# E is a subset

What are the elements of E?

$$
E = 1 \Sigma^\ast 0 = \\{1x0 | x \in \Sigma^\ast\\}
$$

We just showed that when `$x \in \Sigma^\ast$`, then `$1x0$` is also in `$\Sigma^*$`.

---

# The subset is proper

* We still have to show that E is a **proper** subset of `$\Sigma^*$`

* A subset is proper if the two sets are not equal, i.e. if **there is** an element in `$\Sigma^*$` that is not in E

* We just need to find one!

* Note: This is a proper proof, even though we just find one example

---

# Counterexample

Let us take the word `$0$`.

It is an element of our alphabet `$\Sigma$` and therefore also in `$\Sigma^\ast$`

However, it is not an element of E:

$$
E = 1 \Sigma^* 0
$$

All elements of E start with a 1 (and end with a 0).

Therefore the subset is proper.

(There are actually infinitely many counterexamples, but we just needed to find one)