|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectca.uottawa.balie.Token
Tokens are the unit element of Balie.
A text is represneted as a list of consecutives tokens (called TokenList
).
Constructor Summary | |
Token(java.lang.String pi_RawLiteral,
java.lang.String pi_CanonLiteral,
int pi_Type,
PunctLookup pi_PunctLookup,
int pi_Position,
int pi_Sentence,
int pi_NumWhiteBefore)
Creates a new token with all the required information. |
Method Summary | |
java.lang.String |
Canon()
Gets the canonical version of the token. |
int |
EntityType()
Get the entity type of this token see TokenConsts for enumeration of types. |
void |
EntityType(int pi_Type)
Set the entity type see TokenConsts for enumeration of types. |
boolean |
equals(java.lang.Object pi_Obj)
|
int |
hashCode()
|
void |
IncrementSentenceNumber()
Increments the sentence number of a token. |
boolean |
IsAllCapSentence()
Check if this token start a sentence that is all capitalized (e.g.: header, title) |
boolean |
IsCapitalized()
Checks wether a token is capitalized. |
boolean |
IsSentenceStart()
Check if the token starts a new sentence |
int |
Length()
Gets the lenght of a token in number fo chars. |
int |
NumWhiteBefore()
Get the number of white spaces that preceed this token in the text |
int |
PartOfSpeech()
Gets the part-of-speech of the token. |
int |
Position()
Gets the token position. |
java.lang.String |
Raw()
Gets the raw version of the token. |
int |
SentenceNumber()
Gets the sentence number. |
void |
setPosition(int numPosition)
Sets the token position. |
void |
setSentenceNumber(int numSentence)
Sets the sentence number. |
java.lang.String |
toString()
A canonical string representation of this token. |
java.lang.StringBuffer |
ToXML()
Gets the XML representation of the token. |
int |
Type()
Gets the type of the token (word or punctuation). |
Methods inherited from class java.lang.Object |
getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public Token(java.lang.String pi_RawLiteral, java.lang.String pi_CanonLiteral, int pi_Type, PunctLookup pi_PunctLookup, int pi_Position, int pi_Sentence, int pi_NumWhiteBefore)
pi_RawLiteral
- The word as it appears in the textpi_CanonLiteral
- The canonical version of the wordpi_Type
- The type (punctuation or word) see TokenConsts
for detailspi_PunctLookup
- The lookup table for punctuation typespi_Position
- The position of the token, in number of words from the text beginningpi_Sentence
- The sentence numberMethod Detail |
public java.lang.String Raw()
public java.lang.String Canon()
public int Type()
TokenConsts
for enumeration.
TokenConsts
public int PartOfSpeech()
TokenConsts
for enumeration of both.
TokenConsts
public boolean equals(java.lang.Object pi_Obj)
public int hashCode()
public boolean IsCapitalized()
public int NumWhiteBefore()
public int EntityType()
TokenConsts
for enumeration of types.
TokenConsts
public void EntityType(int pi_Type)
TokenConsts
for enumeration of types.
pi_Type
- TokenConsts
public boolean IsSentenceStart()
public boolean IsAllCapSentence()
public int SentenceNumber()
public void setSentenceNumber(int numSentence)
numSentence
- the new sentence numberpublic void IncrementSentenceNumber()
public int Position()
public void setPosition(int numPosition)
numPosition
- the new token positionpublic int Length()
public java.lang.StringBuffer ToXML()
public java.lang.String toString()
Object.toString()
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |