Looking at Text III: Tokenization --What is a Word (Cont’d)?
Word Segmentation in other languages: no whitespace ==> words segmentation is hard
whitespace not indicating a word break.
variant coding of information of a certain semantic type.
Speech corpora
Previous slide
Next slide
Back to first slide
View graphic version