ca.uottawa.balie
Class Token

java.lang.Object
  extended byca.uottawa.balie.Token
All Implemented Interfaces:
java.io.Serializable

public class Token
extends java.lang.Object
implements java.io.Serializable

Tokens are the unit element of Balie. A text is represneted as a list of consecutives tokens (called TokenList).

Author:
nadeaud
See Also:
Serialized Form

Constructor Summary
Token(java.lang.String pi_RawLiteral, java.lang.String pi_CanonLiteral, int pi_Type, PunctLookup pi_PunctLookup, int pi_Position, int pi_Sentence, int pi_NumWhiteBefore)
          Creates a new token with all the required information.
 
Method Summary
 java.lang.String Canon()
          Gets the canonical version of the token.
 int EntityType()
          Get the entity type of this token see TokenConsts for enumeration of types.
 void EntityType(int pi_Type)
          Set the entity type see TokenConsts for enumeration of types.
 boolean equals(java.lang.Object pi_Obj)
           
 int hashCode()
           
 void IncrementSentenceNumber()
          Increments the sentence number of a token.
 boolean IsAllCapSentence()
          Check if this token start a sentence that is all capitalized (e.g.: header, title)
 boolean IsCapitalized()
          Checks wether a token is capitalized.
 boolean IsSentenceStart()
          Check if the token starts a new sentence
 int Length()
          Gets the lenght of a token in number fo chars.
 int NumWhiteBefore()
          Get the number of white spaces that preceed this token in the text
 int PartOfSpeech()
          Gets the part-of-speech of the token.
 int Position()
          Gets the token position.
 java.lang.String Raw()
          Gets the raw version of the token.
 int SentenceNumber()
          Gets the sentence number.
 void setPosition(int numPosition)
          Sets the token position.
 void setSentenceNumber(int numSentence)
          Sets the sentence number.
 java.lang.String toString()
          A canonical string representation of this token.
 java.lang.StringBuffer ToXML()
          Gets the XML representation of the token.
 int Type()
          Gets the type of the token (word or punctuation).
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Token

public Token(java.lang.String pi_RawLiteral,
             java.lang.String pi_CanonLiteral,
             int pi_Type,
             PunctLookup pi_PunctLookup,
             int pi_Position,
             int pi_Sentence,
             int pi_NumWhiteBefore)
Creates a new token with all the required information.

Parameters:
pi_RawLiteral - The word as it appears in the text
pi_CanonLiteral - The canonical version of the word
pi_Type - The type (punctuation or word) see TokenConsts for details
pi_PunctLookup - The lookup table for punctuation types
pi_Position - The position of the token, in number of words from the text beginning
pi_Sentence - The sentence number
Method Detail

Raw

public java.lang.String Raw()
Gets the raw version of the token.

Returns:
String

Canon

public java.lang.String Canon()
Gets the canonical version of the token.

Returns:
String

Type

public int Type()
Gets the type of the token (word or punctuation). see TokenConsts for enumeration.

Returns:
The type of the token
See Also:
TokenConsts

PartOfSpeech

public int PartOfSpeech()
Gets the part-of-speech of the token. Words and punctuations have a POS. see TokenConsts for enumeration of both.

Returns:
the POS
See Also:
TokenConsts

equals

public boolean equals(java.lang.Object pi_Obj)

hashCode

public int hashCode()

IsCapitalized

public boolean IsCapitalized()
Checks wether a token is capitalized.

Returns:
True if the token is capitalized

NumWhiteBefore

public int NumWhiteBefore()
Get the number of white spaces that preceed this token in the text

Returns:
num white chars

EntityType

public int EntityType()
Get the entity type of this token see TokenConsts for enumeration of types.

Returns:
num entity type
See Also:
TokenConsts

EntityType

public void EntityType(int pi_Type)
Set the entity type see TokenConsts for enumeration of types.

Parameters:
pi_Type -
See Also:
TokenConsts

IsSentenceStart

public boolean IsSentenceStart()
Check if the token starts a new sentence

Returns:
true if the token is the first token of a sentence

IsAllCapSentence

public boolean IsAllCapSentence()
Check if this token start a sentence that is all capitalized (e.g.: header, title)

Returns:
true if this token starts an all-capitalized sentence

SentenceNumber

public int SentenceNumber()
Gets the sentence number.

Returns:
The sentence number

setSentenceNumber

public void setSentenceNumber(int numSentence)
Sets the sentence number.

Parameters:
numSentence - the new sentence number

IncrementSentenceNumber

public void IncrementSentenceNumber()
Increments the sentence number of a token. Useful if using the SBR module that can identify sentence break on the late.


Position

public int Position()
Gets the token position.

Returns:
The token position.

setPosition

public void setPosition(int numPosition)
Sets the token position.

Parameters:
numPosition - the new token position

Length

public int Length()
Gets the lenght of a token in number fo chars.

Returns:
Token lenght

ToXML

public java.lang.StringBuffer ToXML()
Gets the XML representation of the token.

Returns:
An XML representation in a StringBuffer.

toString

public java.lang.String toString()
A canonical string representation of this token.

See Also:
Object.toString()