Lecture 7: Automata and Languages

# CS-3110: Formal Languages and Automata

## Automata and Languages

### Chapter 3.6

---

# Regular Expressions and NFAs

---

# Thompson's Construction

* Let's start by translating a regular expression to an NFA with `$\varepsilon$` transitions

* First we will construct NFAs for the most basic regular expressions

* Then we show how to take represent regular expression operators as an assembly of NFAs representing their operands

* This approach is known as "Thompson's construction", named after Ken Thompson of Unix fame

* `$\varepsilon$` transitions make this construction easier, but are not strictly necessary

---

# Basic Building Blocks

First, let's look at the three atomic regular expressions that exist:

1. The expression that does not match anything: `$\Phi$`

2. The expression matching the empty string: `$\varepsilon$`

3. The expression matching a single character: `a`

---

# The Non-Matching Expression

`$L(\Phi) = \{ \}$`

Here is an NFA that accepts no strings:

---

# The Empty String Expression

`$L(\varepsilon) = \{ \varepsilon \}$`

Here is an NFA that accepts the empty string:

---

# The Empty String Expression

`$L(a) = \{ a \}$`

Here is an NFA that accepts the string `a`:

---

# Operators

We have four operators:

- An **or** of two regular expressions: `$x|y$`

- A **concatenation** of two regular expressions: `$xy$`

- A **repetition** of a regular expression: `$x^\ast$`

- **Parenthesis** to control which operands operators apply to: `$(x)$`

Note: `x` and `y` can be more complex regular expressions!

---

# Dividing a regular expression into its parts

$$
(0|(01^\ast{}0))^\ast
$$

]

<center>
<img src="/CS3110/assets/img/thompson/syntaxtree.png" width="100%"/>
]

---

# Parenthesis define the order

$$
(a|b)^\ast
$$

$$
a|(b)^\ast
$$

]

<center>
<img src="/CS3110/assets/img/thompson/ambiguoustree.png" width="50%"/>
]

---

# Assembling the NFA

* We start at the bottom of our tree

* Each leaf corresponds to one of our three basic building block types

* We place the NFA corresponding to each leaf

* We then apply rules to these NFAs that correspond to the operators, resulting in merged/larger NFAs

* We repeat this process until we reach the root of the tree, and only have one combined NFA

---

# Rules?

For each operator we have a rule that takes one (repetition) or two automata (concatenation, or) and combines them

---

# The **Or** operation

---

# The **concatenation** operation

---

# The **repetition** operation

---

# Example

<center>
<img src="/CS3110/assets/img/thompson/syntaxtree1.png" width="80%"/>
</center> 
---

# Example

---

# Example

---

# Example

---

# Example

---

# Example

---

# Example

---

# NFAs and DFAs

---

# Recall: DFA

A Deterministic Finite Automaton consists of:

* A finite set of states Q

* A finite alphabet `$\Sigma$`

* A transition function `$\delta: Q \times \Sigma \mapsto Q$`

* An initial state `$q_0 \in Q$`

* A set of accepting states `$F \subseteq Q$`

---

# Recall: NFA

A Non-Deterministic Finite Automaton consists of:

* A finite set of states Q

* A finite alphabet `$\Sigma$`

* A transition function `$\partial: Q \times \Sigma \mapsto P(Q)$`

* An initial state `$q_0 \in Q$`

* A set of accepting states `$F \subseteq Q$`

---

# The Difference

---

# The Difference

## DFA:

* A transition function `$\delta: Q \times \Sigma \mapsto Q$`

## NFA:

* A transition function `$\partial: Q \times \Sigma \mapsto P(Q)$`

But who chooses Q? We do!

And what are the "states"? They can be whatever we want (they're just labels/names, after all)!

---

# The Power Set Construction

* What if each "state" in our DFA actually represents a "set of states" from the NFA?

* Our (deterministic) transition function then returns a set of states, just as in the NFA!

* It also takes a set of states as parameter, representing that the machine can be in **multiple** states simultaneously

* And that's all we need

---

# Power Set of State

* Given an NFA with states `$Q_N$`

* We construct a DFA where `$Q_D = \mathcal{P}(Q_N)$`

* The transition function `$\delta: Q_D \times \Sigma \mapsto Q_D$` is defined as:

$$
\delta(q,x) = \\{r \in Q_N | \exists s \in q: r \in \partial(s,x)\\}
$$

---

# Constructing the DFA

What does this mean?

* For each combination of states of our NFA we have one state in our DFA

* For each of these states we define the transition function as the "set" of all states that can be reached from any of the states by a given symbol

* Final states are all states that represent a set that contains at least one final state

---

# Example

---

# Example

---

# Minimal DFAs

* Say we have an NFA with `$n$` states

* Then our construction will result in a DFA with `$2^n$` states

* It always works!

* But maybe there might be a smaller automaton that does the same thing?

---

# Minimal DFAs

* There are algorithms to minimize DFAs

* Observation: If, for every string, `$\delta^\ast(q_1,w) = \delta^\ast(q_2, w)$` then the two states `$q_1, q_2$` are redundant

* Another observation: Even if they lead to different states, as long as both of these states are final states or both are non-final states, the two are redundant

* This leads to the idea of **distinguishing extensions**

---

# Distinguishing Extension

* A distinguishing extension for two strings `x` and `y` is a string `z` such that either `xz` is accepted and `yz` is not, or `yz` is accepted and `xz` is not

* Two strings which have no distinguishing extension are somehow "similar"

* Looked at it from the other side: If we have two such strings that are "similar" in this sense, they could just lead to the same state

* What we can do is group strings according to this similarity

* For each group we need one state (Myhill-Nerode, proof omitted)

---

# Regular Languages

---

# Regular Languages

* We said that regular languages are languages for which there exists a regular expression

* We just showed that from a regular expression we can construct an NFA

* And from an NFA we can construct a DFA

* A DFA is also an NFA

* One can also show that a DFA can be converted to a regular expression

* DFAs, NFAs and Regular Expressions are therefore all equally expressive

---

# Regular Languages

* Regular languages are closed under the complement

* This means if you have a regular language, its complement is also regular (can be written using a regular expression)

* Here is how: Convert the Regular Expression to an NFA, and that NFA to a DFA

* Then flip accepting and non-accepting states. The automaton will now accept exactly the words it rejected before, and vice versa

---

# Regular Languages

* The union of two regular languages is also regular

* Each regular language can be defined by a regular expression

* The **or** operation on two regular expression produces exactly the union of the corresponding languages

---

# Regular Languages

* The intersection of two regular languages is also regular

* Given `$L(r_1)$` and `$L(r_2)$` as regular languages defined by regular expressions

* We know that `$\overline{L(r_1)}$` and `$\overline{L(r_2)}$` are also regular

* Then, `$\overline{L(r_1)} \cup \overline{L(r_2)}$` is also regular

* But that's the same as `$\overline{L(r_1) \cap L(r_2)}$`

* And the complement of that, which is `$L(r_1) \cap L(r_2)$` is also regular