These tools are provided with hard links to other subsidiary tools in
my directory. If you copy them, make sure that you change these
links.
The TDC (Text Documents Collection) tools are four shell scripts which
perform the following tasks (these descriptions are included in the
header of each script):
- Tool1: boundary_marker
# Adds paragraph and sentence tags to a TDC text.
# Paragraphs are delimited by <p> .... <\p>
# Sentences are delimited by <s> .... <\s>
# file.bm will contain the TDC text with boundary tags.
# Usage: boundary_marker TDC_file
- Tool2: tile
# Preprocesses and segments a TDC text using TextTiling.
# The output is printed at the standard output.
# Usage: tile TDC_file
- Tool3: segment
# Preprocesses and segments a TDC text using Segmenter.
# The output is printed at the standard output.
# Usage: segment TDC_file
- Tool4: segment_marker
# Preprocesses and put the segment tags in a TDC text.
# Current version includes segment tags of Segmenter and TextTiling.
# Segment tag is >Segment_number Segmenter_name<
# file.seg will contain the TDC text with segment tags.
# Usage: segment_marker TDC_file