Introduction to context-free grammars by example : defining the language of regular expressions

Trying to define the language of regular expressions as a regular language

Alphabet of the languages L described by the regular expressions: AL = {a, b, c}

Now we define the language of all regular expressions that define a language over AL. This language of regular expressions is a meta-language. Let us call it RE. Alphabet of RE: = {a, b, c, epsi, con, or, star, open, close} or {a, b, c, є, ., |, *, (, )} or {a, b, c, "є", ".", "|", "*", "(", ")"}

Recursive language equations - definition of the language RE:

Finding a solution (note that LG = L1 |  LG .L2 has solution LG = L1 . L2 * ): we get C = B ( "." B )* -- RE = C ( "|" C )* ---- therefore RE = ( ( B ( "." B )* ) ( "|" ( B ( "." B )* ) )* ) where B = Sim ("*" )*

But we also have the parenthesis : B = Sim | B "*" | "(" RE ")"

The diagram below is an finite state automaton (assuming that the "(" RE ")" branch at the bottom is not present). With the "(" RE ")" branch, it is a recursive finite state automaton. It can be used to check whether a given sequence is part of the RE language. However, no automaton (without recursion) exists that could check that language. Why not ?

automaton

Exercise: Solve the above system of recursive equation by iteration

level 0

level 1

level 2

level 3

level 4

level 5

 

RE = Ø

RE = Ø

RE = Ø

RE = Ø

RE = a | b | c | "є"

RE = a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* )

| (a | b | c | "є" ) "|" (a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* ))

 

C = Ø

C = Ø

C = Ø

C = a | b | c | "є"

C = a | b | c | "є"

| ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* )

C = a | b | c | "є" | a* | b* | c* | a** | b** | c**

| (a | b | c | "є" | ( a | b | c | "є" ) "." (a | b | c | "є" | a* | b* | c* ) ) "." (a | b | c | "є" | a* | b* | c* | a** | b** | c**)

 

B = Ø

B = Ø

B = a | b | c | "є"

B = a | b | c | "є" | a* | b* | c*

B = a | b | c | "є" | a* | b* | c* | a** | b** | c**

B = a | b | c | "є" | a* | b* | c* | a** | b** | c** | a*** | b*** | c*** | "(" ( a | b | c | "є" ) ")"

 

Sim = Ø

Sim = a | b | c | "є"

Sim = a | b | c | "є"

Sim = a | b | c | "є"

Sim = a | b | c | "є"

Sim = a | b | c | "є"

 

Note: Each sequence belonging to RE, C, B or Sim can also be represented by a tree where the alphabet symbols form the leaf nodes and the variables representing sets of strings (that is, RE, C, B and Sim) form the internal nodes of the tree. Some examples are given in class.

Define a context-free grammar for regular expressions

We can write the recursive equations above in the form of a context-free grammar as follows:

Context-free grammar

The same grammar written with separate production rules, each having an identifying number

  • RE --> C | RE "|" C

  • C --> B | C "." B
  • B --> Sim | B "*" | "(" RE ")"
  • Sim --> a | b | c | "є"
  1. RE --> C

  2. RE --> RE "|" C
  3. C --> B
  4. C -->C "." B
  5. B --> Sim
  6. B -->B "*"
  7. B -->"(" RE ")"
  8. Sim --> a | b | c | "є"

Notes:

Syntax trees

The syntax tree of a sequence of terminal symbols explains why the sequence is part of the language according to the syntax rules of the context-free grammar. Consider the following example (a):

syntax trees