class: center, middle # CS-3110: Formal Languages and Automata ### Spring 2024 --- class: medium # Instructor and Schedule * Instructor: Dr. **Markus** Eger (he/him) * Email:
meger@cpp.edu
* Office hours: Tuesday, Thursday: 1pm-2.30pm, Building 8, Room 43, Wednesday 1.30-2.30pm, online (Zoom or Discord) * Class: Tuesday, Thursday, 4.00-5.15pm, Building 8, Room 345 --- # About Me * Originally from Austria --- # About Me
--- class: medium # About Me * Originally from Austria * BSc and MSc in Computer Science from University of Technology Graz, Austria * PhD in Computer Science from NC State University, USA * Visiting professor at the University of Costa Rica 2019-2020 * I work on AI for games involving communication and/or cooperation (Hanabi, Werewolf, Pandemic, etc.) * Games I played recently: The Talos Principle 2, Factorio, Star Wars: Jedi Knight II, Dominion * I also like board games --- # About Me
--- class: medium # Class Contents * What are automata/languages/why are we here? * Math review: sets and languages * Regular languages, finite automata * Context-Free Grammars and Languages, pushdown automata * Non-Context Free Grammars and Languages * General Grammars --- # Class Resources * Canvas * [Discord](https://discord.gg/Y4vuuzPCgy) * Submission: Canvas * Textbook --- class: small # Textbook (Free!)
Foundations of Computation
--- class: medium # Grading * Homework: 4*15% * Presentation: 15% * Midterm: 10% * Final exam (cumulative!): 15% --- # Class structure * Thursdays: Lecture about theory with smaller examples throughout * Tuesdays: Student presentations of two larger examples that need the theory * Every student needs to present once per semester * I will also discuss the homework assignments and any potential pitfalls in class --- class: medium # Student Presentations * Organize in groups of 2 (**one** group of three, if there is an odd number of students) * Each group will be assigned one example to work out and present **step-by-step** * There will be three presentations per week, with a time limit of 20 minutes per presentation * Student presentations start in week 6 (February 27); before that I will *demonstrate* the structure these presentations should have during the Tuesday lectures --- class: mmedium # Student Presentations: Bonus Points * You can earn up to 5 bonus points for excellent presentations * Excellent can mean: - Outstanding visualizations - An interactive tool (online or offline) - A clever alternative solution * Basically, something that goes beyond just explaining the steps of the solution * If it is truly great (i.e. 4-5 bonus points), I may want to use it in the future **with your permission**, but points are **not** contingent on giving me this permission --- # Homework * There will be four homework assignments * You have to work on these assignments **in groups of up to two students** * Homework problems will be based on the problems shown in class and the presentations * Homework assignment will usually have a bonus problem you can solve for extra points! These bonus problems are intentionally a bit trickier, but you might get partial credit even if you can't figure it out completely. --- # Homework * Homework 1: Languages and Regular Expressions, due Thursday 2/29 * Homework 2: Finite Automata, due **Tuesday** 3/19 * Homework 3: Context-Free Languages, due Thursday 4/18 * Homework 4: PDAs, Non-Context Free Languages, due Thursday, 5/9 All due dates are given as ["Anywhere on Earth"](https://en.wikipedia.org/wiki/Anywhere_on_Earth) --- # Plagiarism * Please don't cheat * Homework assignments are done in groups of up to two students: You are free to discuss them with others, but the written submission has to be **your own** (this means no ChatGPT either!) * Plagiarism results in 0 points on the entire homework for **everyone** involved * I also report every incident to the office of student conduct * It's not worth it! Ask questions if you struggle, the goal is for you to learn :) --- class: medium # Why Formal Languages? Which of these are valid email addresses? (in form; they likely won't reach anyone) * jon@snow.north * R+L@J * "cersei lannister"@casterly.rock * -\`-\`-{@roses.com -- **All of them!** Neat trick: If you have "name@gmail.com", "name+website@gmail.com" also goes to the same mail box, but you can see who sells your email address. --- # In practice ...
--- # Invalid Email? - I **know** that I can have a "+" in my email address - How do I know that? - Someone actually specified what an email address can look like! - Let's take a look ... --- # RFC 5122 ``` addr-spec = local-part "@" domain local-part = dot-atom / quoted-string domain = dot-atom dot-atom = 1*atext *("." 1*atext) atom = 1*atext atext = ALPHA / DIGIT / "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~" qtext = %d33 / %d35-91 / %d93-126 quoted-string = DQUOTE *([FWS] qtext) [FWS] DQUOTE ``` --- # So what? - I actually simplified the email address format - Parsing *actually valid* email addresses can be quite tricky - On the other hand, we just saw a glimpse of how one might specify something like that - Formal languages are exactly that: A way to formally specify syntax - We will talk about two parts: Specification and recognition --- class: medium # Specification * In order to talk about what is "valid", we need to be able to describe it unambiguously * Say you build a website with a login, and your boss requests that you ensure "strong passwords" * You will want to specify what "strong" means for passwords (e.g. "at least 8 characters, at least one digit, one upper-case and one lower-case letter") * There are many ways to write such a specification, we will look at a few of them --- class: medium # Recognition * Once you know what is "valid" and what is not, you often need to verify that a given input is indeed "valid" * This will be the "core" of the class: How do you "recognize" if a given string/word follows the rules * Depending on the complexity of the rules, there are different approaches, which are the automata in the title of this class * We will start with the most restricted formalism, and explore what it can and can not do --- # Math Cady: *I like math.* Damian: *Eww. Why?* Cady: *Because it's the same in every country.* -- Mean Girls (2004) --- # Math * Before we start with actually defining what it means for "input" to be "valid" (or "words" to be in a "language", as we will call it), we will review some discrete math * Our "languages" will be sets, so we will want to have a firm grasp on some concepts from set theory * To investigate what our different automata can and can not do, we will also do some proofs --- # To Do * Organize in groups of two for the presentations (use the [class Discord](https://discord.gg/Y4vuuzPCgy), if needed) * Submit your topic preferences and group on Canvas by February 7, AoE (It's a "quiz" called "Presentation Topic Selection") * We will start with an introduction to languages, and why this class exists and is important * We will also review some concepts from discrete math; you may know some/most of it, but you'll probably learn something new as well