*******************************************************
1.   Natural Language Processing in Prolog
(same material as in the lecture notes)
- Modify the code below by adding more words and more syntactic constructions. 
- Write an interface that reads a sentence from the keyboard and puts it in a 
list as needed to call the predicate parse.


Prolog is especially well-suited for developing natural language systems.
A natural language parser for a simple subset of English will include
sentences such as:
        * The dog ate the bone.
        * The big mouse chases a lazy cat.

This grammar can be described with the following grammar rules.
(The first rule says a sentence is made up of a noun phrase
followed by a verb phrase. The last rule says an adjective is
either 'big', or 'brown', or 'lazy'. )

     <sentence> ::=  <nounphrase> <verbphrase>
     <nounphrase> ::=  <determiner> <noun> |
                       <determiner> <adjective> <noun>
     <verbphrase> ::=     <verb> <nounphrase>

     <determiner> ::=  the | a
     <noun> ::= dog | bone | mouse | cat
     <verb> ::=  ate | chases
     <adjective> ::=  big | brown | lazy

To begin with, we will simply determine if a sentence is a legal
sentence. In other words, we will write a predicate sentence,
which will determine if its first argument is a sentence.

The sentence will be represented as a list of words. Examples:
          [the,dog,ate,the,bone]
          [the,brown,mouse,chases,a,lazy,cat]

There are two basic strategies for solving a parsing problem like this.
The first is a generate-and-test strategy, where the list to be parsed is
split in different ways, with the splittings tested to see if they are
components of a legal sentence. We can use append to generate the
splittings of a list. With this approach, the top-level rule would be:

sentence(L) :- append(NP, VP, L),
               nounphrase(NP),
               verbphrase(VP).

The append predicate will generate possible values for the variables NP
and VP, by splitting the original list L. The next two goals test each of
the portions of the list to see if they are grammatically correct. If not,
backtracking into append causes another possible splitting to be
generated.

The clauses for nounphrase and verbphrase are similar to sentence,
and call further predicates that deal with smaller units of a sentence,
until the word definitions are met, such as

          verb([ate]).
          verb([chases]).
          noun([mouse]).
          noun([dog]).

  Using  Difference Lists
 ---------------------------

The above strategy, however, is extremely slow because of the constant
generation and testing of trial solutions that do not work. Furthermore,
the generating and testing is happening at multiple levels.

The more efficient strategy is to skip the generation step and pass the
entire list to the lower level predicates, which in turn will take the
grammatical portion of the sentence they are looking for from the front of
the list and return the remainder of the list.

To do this, we use a structure called a difference list. It is two related
lists, in which the first list is the full list and the second list is the
remainder. The two lists are the last two arguments in a predicate.


Here then is the first grammar rule using difference lists. A list S is a
sentence if we can extract a nounphrase from the beginning of it, with a
remainder list of S1, and if we can extract a verb phrase from S1 with the
empty list as the remainder.

sentence(S,[]):- nounphrase(S,S1),
                 verbphrase(S1,[]).

Before filling in nounphrase and verbphrase, we will jump to the lowest
level predicates that define the actual words. They too must be difference
lists. They are simple. If the head of the first list is the word, the
remainder list is simply the tail.

          noun([dog|X],X).
          noun([cat|X],X).
          noun([mouse|X],X).
          verb([ate|X],X).
          verb([chases|X],X).
          adjective([big|X],X).
          adjective([brown|X],X).
          adjective([lazy|X],X).
          determiner([the|X],X).
          determiner([a|X],X).

 Testing shows how the difference lists work.

          ?- noun([dog,ate,the,bone],X).
          X = [ate,the,bone]
          ?- verb([dog,ate,the,bone],X).
          no

Continuing with the new grammar rules we have:

nounphrase(NP,X):- determiner(NP,S1), noun(S1,X).
nounphrase(NP,X):- determiner(NP,S1), adjective(S1,S2),
                   noun(S2,X).
verbphrase(VP,X):- verb(VP,S1), nounphrase(S1,X).


These rules can now be used to test sentences.

          ?- sentence([the,lazy,mouse,ate,a,dog],[]).
          yes
          ?- sentence([the,dog,ate],[]).
          no
          ?- sentence([a,big,cat,chases,a,lazy,dog],[]).
          yes
          ?- sentence([the,cat,jumps,the,mouse],[]).
          no


  Definite Clause Grammar
------------------------------

The use of difference lists for parsing is so common in Prolog, that most
Prologs contain additional syntactic sugaring that simplifies the syntax
by hiding the difference lists from view. This syntax is called Definite
Clause Grammar (DCG), and looks like normal Prolog, only the symbol :- is
replaced with an arrow -->. The DCG representation is parsed and
translated to normal Prolog with difference lists.

Using DCG, the 'sentence' predicate developed earlier would be phrased

          sentence --> nounphrase, verbphrase.

This would be translated into normal Prolog, with difference lists. The
above example would be translated into the following equivalent Prolog.

   sentence(S1, S2):-  nounphrase(S1, S3),
                       verbphrase(S3, S2).

Thus, if we define 'sentence' using DCG we still must call it with two
arguments, even though the arguments were not explicitly stated in the DCG
representation.

          ?- sentence([the,dog,chases,the,cat], []).
                yes

The DCG vocabulary is represented by simple lists.

          noun --> [dog].
          verb --> [chases].

These are translated into Prolog as difference lists.

          noun([dog|X], X).
          verb([chases|X], X).

     % The parser:

     sentence --> nounphrase, verbphrase.
     nounphrase --> determiner, noun.
     nounphrase --> determiner, adjective, noun.
     verbphrase --> verb, nounphrase.

     determiner --> [the].
     determiner -->[a] .
     noun --> [dog].
     noun --> [bone].
     noun --> [mouse].
     noun --> [cat].
     verb --> [ate].
     verb --> [chases].
     adjective --> [big].
     adjective --> [brown].
     adjective --> [lazy].


     % Parse tree for correct sentences:

     % To test the grammar:
     % ?- sentence(Tree,[the,cat,ate,the,mouse],[]).
     % Tree = s(np(det(the),n(cat)),vp(v(ate),np(det(the),n(mouse))))

Draw the parse tree (the names of the functors are the names of the nonterminals 
in the BNF grammar or short notations for them):
         
             s
          /     \
        np       vp
       /  \     /  \ 
     det   n   v     np
      |    |   |    /  \
     the  cat ate  det  n
                    |   |
                   the mouse


     sentence(s(NP,VP)) --> nounphrase(NP), verbphrase(VP).
     nounphrase(np(D,N)) --> determiner(D), noun(N).
     nounphrase(np(D,A,N)) -->determiner(D),adjective(A), noun(N).
     verbphrase(vp(V,NP))--> verb(V), nounphrase(NP).

     determiner(det(the)) --> [the].
     determiner(det(a)) -->[a].
     noun(n(dog)) --> [dog].
     noun(n(bone)) --> [bone].
     noun(n(mouse)) --> [mouse].
     noun(n(cat)) --> [cat].
     verb(v(ate)) --> [ate].
     verb(v(chase)) --> [chases].
     adjective(adj(big)) --> [big].
     adjective(adj(brown)) --> [brown].
     adjective(adj(lazy)) --> [lazy].


Note: If you need to add normal Prolog code, you can do it between curly
brackets.


-----------
Another example, a grammar for numbers:

          <real-number> ::= <part> . <part>
          <part>        ::= <digit> | <digit> <part>
          <digit>       ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
 
 - 3.519 is an example "sentence" in the language. 
   Draw its parse tree.

 - In our parser code below, we will assume that the sentence has been 
   read and stored as a list of atoms, such as [3, ., 5, 1, 9].  You will
   make the same assumption in your project.


Using difference lists
----------------------
 - With this approach, each of the parsing predicates has 3 arguments.  
 - For example, a parser for the little grammar above would include a predicate
        part(Tree, Input, Remainder) 
   that succeeds iff Input begins with a <part>, 
   Remainder is what's left of Input after parsing the <part>, and 
   Tree is the piece of parse tree that represents the <part>.

Here's a parser for the whole grammar (not just for <part>).

       digit(_, [H|Rest], Rest):- member(H, [0,1,2,3,4,5,6,7,8,9]).

       part(_, Input, Remainder):- digit(_, Input, Remainder).
       part(_, Input, Remainder):- digit(_, Input, R1), 
                                   part(_, R1, Remainder).

       real-number(_, Input, Remainder):- part(_, Input, [.|R1]),
                                          part(_, R1, Remainder).

   Notice that I have used "_" everywhere the pieces of parse tree would go.

 - Here is an example of what might happen if we queried the part predicate.
   Of course, the user would not directly ask such a thing; 
   it would happen during the parsing of a full real-number.

        ?- part(Str, [5, 1, 9], Rem).
        
        Rem = [1,9]
        Str = _82;

        Rem = [9]
        Str = _82;

        Rem = []
        Str = _82;

        no

 - Notice that the first solution above used up only the first element of the
   input, leaving a remainder of [1, 9].  If this goal were to be satisfied 
   during parsing of the "519" component of the real-number "3.519", 
   we would eventually hit a dead end -- the "19" at the end would have no 
   way of being used up.  

 - Exercise: By hand, trace the goal
                real-number(ParseTree, [3, ., 5, 1, 9], Rem).
             far enough to observe that this is true.

 - But here is the beauty of Prolog:  It just backs up and keeps trying
   other ways to parse "519" as beginning with (or completely made of)
   a <part>.  Eventually it will find the 3rd solution above, which *will*
   fit in with solving the initial goal.

 - Exercise: Run the goal
                real-number(ParseTree, [3, ., 5, 1, 9], Rem).
             with tracing turned on, to see the backtracking in action.

Using DCG notation for difference lists 
---------------------------------------
(recommended as the most elegant solution)

real_number --> part, [.], part.
part --> digit.
part --> digit, part.

digit --> [H],{member(H, [0,1,2,3,4,5,6,7,8,9])}.

% or equivalently
% digit([H|Rest], Rest):- member(H, [0,1,2,3,4,5,6,7,8,9]).

Note:  Between curly brackets you can put normal Prolog code.


Add parse trees
-----------------

The names of the functors in the internal nodes of the trees are the
nonterminals in the CFG.

Example:
   ?- real_number(Tree, [3,.,5,6],[]).
   Tree = real_number(part(digit(3)), '.',part(digit(5), part(digit(6)))) 
   Yes

          real_number
           /  |   \      
         part .   part
          |       /   \ 
        digit   digit part
          |       |    |
          3       5  digit
                       |
                       6

real_number(real_number(P1,.,P2)) -->
part(P1), [.], part(P2).
part(part(D)) --> digit(D).
part(part(D,P)) --> digit(D), part(P).

digit(digit(H)) --> [H],{member(H, [0,1,2,3,4,5,6,7,8,9])}.


Not using difference lists
--------------------------
 - It is also possible to write our predicates without the remainder
   argument.  But in order to do so, you have to find a way to give
   Prolog the ability to back up and try alternative ways of breaking down
   a "sentence" -- as we saw happening above.