class: center, middle # CS-3110: Formal Languages and Automata ## Automata and Languages ### Chapter 3.6 --- class: center, middle # Regular Expressions and NFAs --- class: mmedium # Thompson's Construction * Let's start by translating a regular expression to an NFA with `\(\varepsilon\)` transitions * First we will construct NFAs for the most basic regular expressions * Then we show how to take represent regular expression operators as an assembly of NFAs representing their operands * This approach is known as "Thompson's construction", named after Ken Thompson of Unix fame * `\(\varepsilon\)` transitions make this construction easier, but are not strictly necessary --- # Basic Building Blocks First, let's look at the three atomic regular expressions that exist: 1. The expression that does not match anything: `\(\Phi\)` 2. The expression matching the empty string: `\(\varepsilon\)` 3. The expression matching a single character: `a` --- # The Non-Matching Expression `\(L(\Phi) = \{ \}\)` Here is an NFA that accepts no strings:
--- # The Empty String Expression `\(L(\varepsilon) = \{ \varepsilon \}\)` Here is an NFA that accepts the empty string:
--- # The Empty String Expression `\(L(a) = \{ a \}\)` Here is an NFA that accepts the string `a`:
--- # Operators We have four operators: - An **or** of two regular expressions: `\(x|y\)` - A **concatenation** of two regular expressions: `\(xy\)` - A **repetition** of a regular expression: `\(x^\ast\)` - **Parenthesis** to control which operands operators apply to: `\((x)\)` Note: `x` and `y` can be more complex regular expressions! --- # Dividing a regular expression into its parts .left-column[ $$ (0|(01^\ast{}0))^\ast $$ ] .right-column[
] --- # Parenthesis define the order .left-column[ $$ (a|b)^\ast $$
$$ a|(b)^\ast $$ ] .right-column[
] --- # Assembling the NFA * We start at the bottom of our tree * Each leaf corresponds to one of our three basic building block types * We place the NFA corresponding to each leaf * We then apply rules to these NFAs that correspond to the operators, resulting in merged/larger NFAs * We repeat this process until we reach the root of the tree, and only have one combined NFA --- # Rules? For each operator we have a rule that takes one (repetition) or two automata (concatenation, or) and combines them
--- # The **Or** operation
--- # The **concatenation** operation
--- # The **repetition** operation
--- # Example
--- # Example
--- # Example
--- # Example
--- # Example
--- # Example
--- # Example
--- class: center, middle # NFAs and DFAs --- # Recall: DFA A Deterministic Finite Automaton consists of: * A finite set of states Q * A finite alphabet `\(\Sigma\)` * A transition function `\(\delta: Q \times \Sigma \mapsto Q\)` * An initial state `\(q_0 \in Q\)` * A set of accepting states `\(F \subseteq Q\)` --- # Recall: NFA A Non-Deterministic Finite Automaton consists of: * A finite set of states Q * A finite alphabet `\(\Sigma\)` * A transition function `\(\partial: Q \times \Sigma \mapsto P(Q)\)` * An initial state `\(q_0 \in Q\)` * A set of accepting states `\(F \subseteq Q\)` --- # The Difference
--- # The Difference ## DFA: * A transition function `\(\delta: Q \times \Sigma \mapsto Q\)` ## NFA: * A transition function `\(\partial: Q \times \Sigma \mapsto P(Q)\)` But who chooses Q? We do! -- And what are the "states"? They can be whatever we want (they're just labels/names, after all)! --- # The Power Set Construction * What if each "state" in our DFA actually represents a "set of states" from the NFA? * Our (deterministic) transition function then returns a set of states, just as in the NFA! * It also takes a set of states as parameter, representing that the machine can be in **multiple** states simultaneously * And that's all we need --- # Power Set of State * Given an NFA with states `\(Q_N\)` * We construct a DFA where `\(Q_D = \mathcal{P}(Q_N)\)` * The transition function `\(\delta: Q_D \times \Sigma \mapsto Q_D\)` is defined as: $$ \delta(q,x) = \\{r \in Q_N | \exists s \in q: r \in \partial(s,x)\\} $$ --- # Constructing the DFA What does this mean? * For each combination of states of our NFA we have one state in our DFA * For each of these states we define the transition function as the "set" of all states that can be reached from any of the states by a given symbol * Final states are all states that represent a set that contains at least one final state --- # Example
--- # Example
--- # Minimal DFAs * Say we have an NFA with `\(n\)` states * Then our construction will result in a DFA with `\(2^n\)` states * It always works! * But maybe there might be a smaller automaton that does the same thing? --- # Minimal DFAs * There are algorithms to minimize DFAs * Observation: If, for every string, `\(\delta^\ast(q_1,w) = \delta^\ast(q_2, w)\)` then the two states `\(q_1, q_2\)` are redundant * Another observation: Even if they lead to different states, as long as both of these states are final states or both are non-final states, the two are redundant * This leads to the idea of **distinguishing extensions** --- class: mmedium # Distinguishing Extension * A distinguishing extension for two strings `x` and `y` is a string `z` such that either `xz` is accepted and `yz` is not, or `yz` is accepted and `xz` is not * Two strings which have no distinguishing extension are somehow "similar" * Looked at it from the other side: If we have two such strings that are "similar" in this sense, they could just lead to the same state * What we can do is group strings according to this similarity * For each group we need one state (Myhill-Nerode, proof omitted) --- class: center, middle # Regular Languages --- class: medium # Regular Languages * We said that regular languages are languages for which there exists a regular expression * We just showed that from a regular expression we can construct an NFA * And from an NFA we can construct a DFA * A DFA is also an NFA * One can also show that a DFA can be converted to a regular expression * DFAs, NFAs and Regular Expressions are therefore all equally expressive --- # Regular Languages * Regular languages are closed under the complement * This means if you have a regular language, its complement is also regular (can be written using a regular expression) * Here is how: Convert the Regular Expression to an NFA, and that NFA to a DFA * Then flip accepting and non-accepting states. The automaton will now accept exactly the words it rejected before, and vice versa --- # Regular Languages * The union of two regular languages is also regular * Each regular language can be defined by a regular expression * The **or** operation on two regular expression produces exactly the union of the corresponding languages --- class: medium # Regular Languages * The intersection of two regular languages is also regular * Given `\(L(r_1)\)` and `\(L(r_2)\)` as regular languages defined by regular expressions * We know that `\(\overline{L(r_1)}\)` and `\(\overline{L(r_2)}\)` are also regular * Then, `\(\overline{L(r_1)} \cup \overline{L(r_2)}\)` is also regular * But that's the same as `\(\overline{L(r_1) \cap L(r_2)}\)` * And the complement of that, which is `\(L(r_1) \cap L(r_2)\)` is also regular