Here is an example of a fully-annotated sentence:

WORDS---->  NE--->  POS   PARTIAL_SYNT   FULL_SYNT------>   VS   TARGETS        PROPS------->
                                                                        
The             *   DT    (NP*   (S*        (S(NP*          -    -        (A0*    (A0*      
$               *   $        *     *     (ADJP(QP*          -    -           *       *      
1.4             *   CD       *     *             *          -    -           *       *      
billion         *   CD       *     *             *))        -    -           *       *      
robot           *   NN       *     *             *          -    -           *       *      
spacecraft      *   NN       *)    *             *)         -    -           *)      *)   
faces           *   VBZ   (VP*)    *          (VP*          01   face      (V*)      *      
a               *   DT    (NP*     *          (NP*          -    -        (A1*       *      
six-year        *   JJ       *     *             *          -    -           *       *      
journey         *   NN       *)    *             *          -    -           *       *      
to              *   TO    (VP*   (S*        (S(VP*          -    -           *       *      
explore         *   VB       *)    *          (VP*          01   explore     *     (V*)     
Jupiter     (ORG*)  NNP   (NP*)    *       (NP(NP*)         -    -           *    (A1*      
and             *   CC       *     *             *          -    -           *       *      
its             *   PRP$  (NP*     *          (NP*          -    -           *       *      
16              *   CD       *     *             *          -    -           *       *      
known           *   JJ       *     *             *          -    -           *       *      
moons           *   NNS      *)    *)            *)))))))   -    -           *)      *)   
.               *   .        *     *)            *)         -    -           *       *   

There is one line for each token, and a blank line after the last token. The columns, separated by spaces, represent different annotations of the sentence with a tagging along words. For structured annotations (named entities, chunks, clauses, parse trees, arguments), we use the Start-End format.

The Start-End format represents phrases (chunks, arguments, and syntactic constituents) that constitute a well-formed bracketing in a sentence (that is, phrases do not overlap, though they admit embedding). Each tag is of the form STARTS*ENDS, and represents phrases that start and end at the corresponding word. A phrase of type k places a (k parenthesis at the STARTS part of the first word, and a ) parenthesis at the END part of the last word. Scripts will be provided to transform a column in Start-End format into other standard formats (IOB1, IOB2, WSJ trees). The Start-End format used last year (that considered the phrase type in the start and end parts) will be compatible with the current software and scripts.


The different annotations in a sentence are grouped in the following blocks:

 (S
    (NP (DT The)
      (ADJP
        (QP ($ $) (CD 1.4) (CD billion) ))
      (NN robot) (NN spacecraft) )
    (VP (VBZ faces)
      (NP (DT a) (JJ six-year) (NN journey)
        (S
          (VP (TO to)
            (VP (VB explore)
              (NP
                (NP (NNP Jupiter) )
                (CC and)
                (NP (PRP$ its) (CD 16) (JJ known) (NNS moons) )))))))
    (. .) )