ca.uottawa.balie
Class TokenList

java.lang.Object
  extended byca.uottawa.balie.TokenList
All Implemented Interfaces:
java.io.Serializable

public class TokenList
extends java.lang.Object
implements java.io.Serializable

List of Tokens to represent a text. Comes with a bunch of manipulation functions. Also an XML representation.

Author:
nadeaud
See Also:
Serialized Form

Constructor Summary
TokenList(boolean pi_DetectSentenceBoundaries)
          Construct an empty TokenList.
 
Method Summary
 boolean Add(Token pi_Token, SentenceBoundariesRecognition pi_SBR, WekaLearner pi_SBRModel)
          Add a token a the end of the TokenList.
 boolean equals(java.lang.Object pi_Obj)
           
 Token Get(int pi_Index)
          Gets the token at the given index.
 int getSentenceCount()
          Gets the number of sentences found.
 java.util.Hashtable HashAccess()
           
 int hashCode()
           
 TokenListIterator Iterator()
          Gets an iterator for the tokenList
 java.lang.String SentenceText(int pi_Index, boolean pi_Canonic, boolean pi_PrintNewLines)
          Gets the text version of the sentence at the given index.
 void SetEntityType(int pi_Index, int pi_Type)
           
 void SetPOS(int pi_Index, int pi_POS)
          Sets the Part-of-speech of the token at the given index.
 int Size()
          Gets the size (number of tokens) of the TokenList.
 java.util.Hashtable TermFrequencyTable()
          Gets the TF table.
 java.lang.String TokenRangeText(int pi_Start, int pi_Stop, boolean pi_Canonic, boolean pi_PrintNewLines, boolean pi_TagEntities)
           
 java.lang.StringBuffer ToXML()
          Gets the tokenlist in XML format
 java.util.ArrayList WordList()
           
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenList

public TokenList(boolean pi_DetectSentenceBoundaries)
Construct an empty TokenList. Ready for incremental constitution.

Parameters:
pi_DetectSentenceBoundaries - True if the sentences boundaries must be detected
Method Detail

Add

public boolean Add(Token pi_Token,
                   SentenceBoundariesRecognition pi_SBR,
                   WekaLearner pi_SBRModel)
Add a token a the end of the TokenList.

Parameters:
pi_Token - A new token
pi_SBR - The SBR object
pi_SBRModel - The learned SBR model
Returns:
True if the previous token (current-1) was a sentence break

Size

public int Size()
Gets the size (number of tokens) of the TokenList.

Returns:
Size

Get

public Token Get(int pi_Index)
Gets the token at the given index.

Parameters:
pi_Index - Index of the token to get.
Returns:
A token

equals

public boolean equals(java.lang.Object pi_Obj)

hashCode

public int hashCode()

SentenceText

public java.lang.String SentenceText(int pi_Index,
                                     boolean pi_Canonic,
                                     boolean pi_PrintNewLines)
Gets the text version of the sentence at the given index.

Parameters:
pi_Index - Index of the sentence to get (in number of sentences)
pi_Canonic - True if the text must be returned in its canonical version
Returns:
The text of a sentence (String)

TokenRangeText

public java.lang.String TokenRangeText(int pi_Start,
                                       int pi_Stop,
                                       boolean pi_Canonic,
                                       boolean pi_PrintNewLines,
                                       boolean pi_TagEntities)

getSentenceCount

public int getSentenceCount()
Gets the number of sentences found.

Returns:
Number of sentences.

TermFrequencyTable

public java.util.Hashtable TermFrequencyTable()
Gets the TF table. That is a lookup that maps words to their frequency in the text.

Returns:
Hashtable

HashAccess

public java.util.Hashtable HashAccess()

WordList

public java.util.ArrayList WordList()

SetPOS

public void SetPOS(int pi_Index,
                   int pi_POS)
Sets the Part-of-speech of the token at the given index.

Parameters:
pi_Index - Index of the token to update
pi_POS - Part-of-speech of this token (see TokenConsts for the enumeration)
See Also:
TokenConsts

SetEntityType

public void SetEntityType(int pi_Index,
                          int pi_Type)

ToXML

public java.lang.StringBuffer ToXML()
Gets the tokenlist in XML format

Returns:
an XML StringBuffer

Iterator

public TokenListIterator Iterator()
Gets an iterator for the tokenList

Returns:
the iterator (type TokenListIterator)
See Also:
TokenListIterator