******************************************************* 1. Natural Language Processing in Prolog (same material as in the lecture notes) - Modify the code below by adding more words and more syntactic constructions. - Write an interface that reads a sentence from the keyboard and puts it in a list as needed to call the predicate parse. Prolog is especially well-suited for developing natural language systems. A natural language parser for a simple subset of English will include sentences such as: * The dog ate the bone. * The big mouse chases a lazy cat. This grammar can be described with the following grammar rules. (The first rule says a sentence is made up of a noun phrase followed by a verb phrase. The last rule says an adjective is either 'big', or 'brown', or 'lazy'. ) ::= ::= | ::= ::= the | a ::= dog | bone | mouse | cat ::= ate | chases ::= big | brown | lazy To begin with, we will simply determine if a sentence is a legal sentence. In other words, we will write a predicate sentence, which will determine if its first argument is a sentence. The sentence will be represented as a list of words. Examples: [the,dog,ate,the,bone] [the,brown,mouse,chases,a,lazy,cat] There are two basic strategies for solving a parsing problem like this. The first is a generate-and-test strategy, where the list to be parsed is split in different ways, with the splittings tested to see if they are components of a legal sentence. We can use append to generate the splittings of a list. With this approach, the top-level rule would be: sentence(L) :- append(NP, VP, L), nounphrase(NP), verbphrase(VP). The append predicate will generate possible values for the variables NP and VP, by splitting the original list L. The next two goals test each of the portions of the list to see if they are grammatically correct. If not, backtracking into append causes another possible splitting to be generated. The clauses for nounphrase and verbphrase are similar to sentence, and call further predicates that deal with smaller units of a sentence, until the word definitions are met, such as verb([ate]). verb([chases]). noun([mouse]). noun([dog]). Using Difference Lists --------------------------- The above strategy, however, is extremely slow because of the constant generation and testing of trial solutions that do not work. Furthermore, the generating and testing is happening at multiple levels. The more efficient strategy is to skip the generation step and pass the entire list to the lower level predicates, which in turn will take the grammatical portion of the sentence they are looking for from the front of the list and return the remainder of the list. To do this, we use a structure called a difference list. It is two related lists, in which the first list is the full list and the second list is the remainder. The two lists are the last two arguments in a predicate. Here then is the first grammar rule using difference lists. A list S is a sentence if we can extract a nounphrase from the beginning of it, with a remainder list of S1, and if we can extract a verb phrase from S1 with the empty list as the remainder. sentence(S,[]):- nounphrase(S,S1), verbphrase(S1,[]). Before filling in nounphrase and verbphrase, we will jump to the lowest level predicates that define the actual words. They too must be difference lists. They are simple. If the head of the first list is the word, the remainder list is simply the tail. noun([dog|X],X). noun([cat|X],X). noun([mouse|X],X). verb([ate|X],X). verb([chases|X],X). adjective([big|X],X). adjective([brown|X],X). adjective([lazy|X],X). determiner([the|X],X). determiner([a|X],X). Testing shows how the difference lists work. ?- noun([dog,ate,the,bone],X). X = [ate,the,bone] ?- verb([dog,ate,the,bone],X). no Continuing with the new grammar rules we have: nounphrase(NP,X):- determiner(NP,S1), noun(S1,X). nounphrase(NP,X):- determiner(NP,S1), adjective(S1,S2), noun(S2,X). verbphrase(VP,X):- verb(VP,S1), nounphrase(S1,X). These rules can now be used to test sentences. ?- sentence([the,lazy,mouse,ate,a,dog],[]). yes ?- sentence([the,dog,ate],[]). no ?- sentence([a,big,cat,chases,a,lazy,dog],[]). yes ?- sentence([the,cat,jumps,the,mouse],[]). no Definite Clause Grammar ------------------------------ The use of difference lists for parsing is so common in Prolog, that most Prologs contain additional syntactic sugaring that simplifies the syntax by hiding the difference lists from view. This syntax is called Definite Clause Grammar (DCG), and looks like normal Prolog, only the symbol :- is replaced with an arrow -->. The DCG representation is parsed and translated to normal Prolog with difference lists. Using DCG, the 'sentence' predicate developed earlier would be phrased sentence --> nounphrase, verbphrase. This would be translated into normal Prolog, with difference lists. The above example would be translated into the following equivalent Prolog. sentence(S1, S2):- nounphrase(S1, S3), verbphrase(S3, S2). Thus, if we define 'sentence' using DCG we still must call it with two arguments, even though the arguments were not explicitly stated in the DCG representation. ?- sentence([the,dog,chases,the,cat], []). yes The DCG vocabulary is represented by simple lists. noun --> [dog]. verb --> [chases]. These are translated into Prolog as difference lists. noun([dog|X], X). verb([chases|X], X). % The parser: sentence --> nounphrase, verbphrase. nounphrase --> determiner, noun. nounphrase --> determiner, adjective, noun. verbphrase --> verb, nounphrase. determiner --> [the]. determiner -->[a] . noun --> [dog]. noun --> [bone]. noun --> [mouse]. noun --> [cat]. verb --> [ate]. verb --> [chases]. adjective --> [big]. adjective --> [brown]. adjective --> [lazy]. % Parse tree for correct sentences: % To test the grammar: % ?- sentence(Tree,[the,cat,ate,the,mouse],[]). % Tree = s(np(det(the),n(cat)),vp(v(ate),np(det(the),n(mouse)))) Draw the parse tree (the names of the functors are the names of the nonterminals in the BNF grammar or short notations for them): s / \ np vp / \ / \ det n v np | | | / \ the cat ate det n | | the mouse sentence(s(NP,VP)) --> nounphrase(NP), verbphrase(VP). nounphrase(np(D,N)) --> determiner(D), noun(N). nounphrase(np(D,A,N)) -->determiner(D),adjective(A), noun(N). verbphrase(vp(V,NP))--> verb(V), nounphrase(NP). determiner(det(the)) --> [the]. determiner(det(a)) -->[a]. noun(n(dog)) --> [dog]. noun(n(bone)) --> [bone]. noun(n(mouse)) --> [mouse]. noun(n(cat)) --> [cat]. verb(v(ate)) --> [ate]. verb(v(chase)) --> [chases]. adjective(adj(big)) --> [big]. adjective(adj(brown)) --> [brown]. adjective(adj(lazy)) --> [lazy]. Note: If you need to add normal Prolog code, you can do it between curly brackets. ----------- Another example, a grammar for numbers: ::= . ::= | ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 - 3.519 is an example "sentence" in the language. Draw its parse tree. - In our parser code below, we will assume that the sentence has been read and stored as a list of atoms, such as [3, ., 5, 1, 9]. You will make the same assumption in your project. Using difference lists ---------------------- - With this approach, each of the parsing predicates has 3 arguments. - For example, a parser for the little grammar above would include a predicate part(Tree, Input, Remainder) that succeeds iff Input begins with a , Remainder is what's left of Input after parsing the , and Tree is the piece of parse tree that represents the . Here's a parser for the whole grammar (not just for ). digit(_, [H|Rest], Rest):- member(H, [0,1,2,3,4,5,6,7,8,9]). part(_, Input, Remainder):- digit(_, Input, Remainder). part(_, Input, Remainder):- digit(_, Input, R1), part(_, R1, Remainder). real-number(_, Input, Remainder):- part(_, Input, [.|R1]), part(_, R1, Remainder). Notice that I have used "_" everywhere the pieces of parse tree would go. - Here is an example of what might happen if we queried the part predicate. Of course, the user would not directly ask such a thing; it would happen during the parsing of a full real-number. ?- part(Str, [5, 1, 9], Rem). Rem = [1,9] Str = _82; Rem = [9] Str = _82; Rem = [] Str = _82; no - Notice that the first solution above used up only the first element of the input, leaving a remainder of [1, 9]. If this goal were to be satisfied during parsing of the "519" component of the real-number "3.519", we would eventually hit a dead end -- the "19" at the end would have no way of being used up. - Exercise: By hand, trace the goal real-number(ParseTree, [3, ., 5, 1, 9], Rem). far enough to observe that this is true. - But here is the beauty of Prolog: It just backs up and keeps trying other ways to parse "519" as beginning with (or completely made of) a . Eventually it will find the 3rd solution above, which *will* fit in with solving the initial goal. - Exercise: Run the goal real-number(ParseTree, [3, ., 5, 1, 9], Rem). with tracing turned on, to see the backtracking in action. Using DCG notation for difference lists --------------------------------------- (recommended as the most elegant solution) real_number --> part, [.], part. part --> digit. part --> digit, part. digit --> [H],{member(H, [0,1,2,3,4,5,6,7,8,9])}. % or equivalently % digit([H|Rest], Rest):- member(H, [0,1,2,3,4,5,6,7,8,9]). Note: Between curly brackets you can put normal Prolog code. Add parse trees ----------------- The names of the functors in the internal nodes of the trees are the nonterminals in the CFG. Example: ?- real_number(Tree, [3,.,5,6],[]). Tree = real_number(part(digit(3)), '.',part(digit(5), part(digit(6)))) Yes real_number / | \ part . part | / \ digit digit part | | | 3 5 digit | 6 real_number(real_number(P1,.,P2)) --> part(P1), [.], part(P2). part(part(D)) --> digit(D). part(part(D,P)) --> digit(D), part(P). digit(digit(H)) --> [H],{member(H, [0,1,2,3,4,5,6,7,8,9])}. Not using difference lists -------------------------- - It is also possible to write our predicates without the remainder argument. But in order to do so, you have to find a way to give Prolog the ability to back up and try alternative ways of breaking down a "sentence" -- as we saw happening above.