University of Ottawa
SEG-2106 : Software construction
Gregor v. Bochmann
Winter 2008, 2009, 2010, 2011, 2012, 2013, 2015

Lab 5 - Part 1

Originally prepared by Nicolas Gorse

Regular expressions and automata

(1) Regular expressions: Let us consider languages over an alphabet consisting of two letters: V={a, b}. Give the regular expressions corresponding to the informally defined languages below:

  1. All words starting with aa.
  2. All words containing aa.
  3. All words containing an even number of letters.
  4. All words containing an even number of letter a.
  5. All words the length of which is a multiple of 5.
  6. All words containing three consecutive a.
  7. All words not containing three consecutive a.  

(2) Acceptor automata: For each of the above languages, find an automaton that accepts that language.

(3) Acceptor automaton corresponding to a regular expression: For each regular expression below, give an accepting automaton that accepts the language defined the regular expression:

  1. L = (a |b) c         ex. {ac, bc}
  2. L = a* b                ex. {b, ab, aab, aaab, ...}
  3. L = a* (a b c)*      
  4. L = (a | b)*
  5. L = (a | b) c* c b b*

(4) Non-deterministic automata: The following automaton is non-deterministic. Questions:

  1. In which state could the automaton be after having read the sequence " a, a, c, c "
    • Is this sequence accepted by the automaton ?
  2. In which state could the automaton be after having read the sequence " a, c, a, c "
    • Is this sequence accepted by the automaton ?
  3. In which state could the automaton be after having read the sequence " a, a, b, c"
    • Is this sequence accepted by the automaton ?

automate

(5) Equivalences: We consider the following pairs of regular expressions. For each pair, you should indicate whether the two expressions define the same language (they are equivalent) or not. If they are equivalent, you should give a proof by considering the algebraic properties of the regular expression operators. If they are not equivalent, you should give an example of a string that is included in one of the languages, but not in the other.

  1.    ab(abc)*ab             aba(bca)*b
  2.    (a | b)*                   (  a* b* )+

(6) Playing with the egrep utility: This program is available in Unix and can be called in the command line. Try to use the egrep utility (search a file for a pattern using full regular expressions) to get familiar with regular expressions.  For example, if the file test.txt contains the string accbbb and if you use 

egrep "(a|b)c*cbb*" test.txt, you will see all the  lines that contain the pattern "(a|b)c*cbb*", such as accbbb.

Note: You may use the tutorial RegexOne to get familiar with the notation used by egrep, the Java Pattern class, and Lex.

(7) Programming a simple recognizer in Java:  Please write a simple Java program that asks the user for a character string, tests whether it contains a substring of the form a* b or not, and prints out all the substrings that match the pattern. - Ne pas utiliser the Java Pattern class. That class will be used in the next task.

(8) Using the Java Pattern class: The Java utilities regex package contains the Pattern and Matcher classes (see Java documentation). The Pattern method compile generates a "pattern" which can be used to create a "matcher" object which contains an algorithm to parse an input string for the presence of the regex pattern that was given as parameter to the compile method. Here is an example program using these classes, repared by Ahmed Jeddah inspired by the example in a Java tutorial which contains more explanations about the regular expression syntax supported by Pattern. Your task is the following:

  1. Read the program and understand it (check the documentation about the methods that are used in the program).
  2. Run the program in your Java environment and test it for several regular expressions and several strings to be matched. You may use the examples of regular expressions used above.

Last update:February 12, 2015