CSI2131 File Management Winter 2003 Assignment #1 (worth 8%, 80 marks, due date: Tuesday February 11, 10:30 a.m.) PART I: PROGRAMMING PART (70 marks) ======================== Overview -------- In this assignment, you will apply your knowledge of fundamental file operations in C++ and data compression to write two simple programs for compressing and uncompressing images. The program for compressing images takes a text file containing an "image" and outputs a new file which is a compressed version of the input file. The program for uncompressing images takes the compressed file and reproduces the original image. Run-Length Encoding ------------------- Run-Length Encoding (RLE) is a simple technique for compressing files, which works as follows. First, we choose one special (unused) byte value as run-length code indicator. Then we search for repeated runs of a single byte value in an input file, and replace them by a single instance of the code indicator, the byte value, and a run count. You may consult Section 6.1.2 in our textbook to revise the algorithm. To uncompress an encoded file, it is enough to sequentially read the compressed file, searching for the run-length code indicators. Each time you encounter such an indicator, expand the repeated value by the number of time it has to be repeated. For this assignment, use character "*" as the run-length code indicator. Reading Material (for further reading but not essential to your task) ---------------- http://www.rasip.fer.hr/research/compress/algorithms/fund/rl/index.html Representing Images ------------------- Images files that you will use are just ASCII files. The following is an example of such an "image": ooooooxxoooooooooooooaaooooooooooooooooo ooooooxxxxooooooooooaaaaaooooooooooooooo ooooooxxxxxxxoooooooooaaoooooxxxoooooooo oooooooooxxxoooooooooooaoxxxxxxxxooooooo ooooooooooooooxxooooooooooxxxxxooooooooo oooooooooooxxxxxxoooooooooooxxoooooooooo ooooooooxxxxxxxxxoooooooooaaoooooooooooo ooooooooooooxxxxxoooooooooaaaooooooooooo ooooooaaaoooooxxxooooooooaaaaooooooooooo ooooaaaaaoooooooooooooooooaaoooooooooooo oooooooaaoooooooooooooooxxxxxxxxoooooooo ooooooooooooooxxooooooooooxxxxxooooooooo aoooooooooooxxxxxxoooooooooooxxooooooooo aaaaooooooooxxxxxxxxxooooooooooooooooooo oaaoooooooooooxxxxxooooooooooaaaaooooooo ooooooooooooooxxxoooooooooaaaaaaaooooooo This image contains 3 sorts of characters which are a's, o's, and x's. Imagine that the image represents a space image where the o's are some background colour, the a's are certain colour brighter than the o's, and the x's are the brightest colour available. In fact, one could use blanks instead of the o's to represent the background colour. At the end of each line there are characters representing a new line (in DOS, these are 2 characters). You can create such an image by using a text editor such as notepad. Try to create both dense and sparse images to test your algorithm and verify how it affects its compression ratio. Command Line Arguments ---------------------- The input file name for your programs will be given on the command line. This means that after your program is compiled and an executable is created, you are going to run it on the DOS prompt and specify some arguments which will represent options and file names to be used in your program. Details on how to access command line arguments inside your program will be explained in a separate page in the web. In short, the main mechanism to do that involves the use of arguments "argc" and "argv" in the main function: int main (int argc, char* argv[]) { ... } You may look for information on the use of these arguments on the web or on a C++ book. Programs to be created ---------------------- 1) compress Create a project "compress" with a single program ("compress.cpp"); the executable program will be called "compress.exe" by default, since "compress" is the project name. This program (compress.exe) will be executed at the DOS prompt where the user can specify command line arguments using the following format: compress.exe [-v] or alternatively (the user may omit the extension .exe): compress [-v] where is an existent input file and -v (for "verbose mode") is an optional flag. If verbose mode is selected, your program must print a message to the standard error telling the user what is the compression ratio achieved, which is the ratio: size of compressed file divided by the size of original file. The resulting program output (compressed file) must be written to a file with the same name as the input file, but extended with the suffix ".rle". Examples: If you run: compress.exe galaxyNGC891 the compressed output file should be "galaxyNGC891.rle" If you run: compress.exe galaxyNGC891.txt the compressed output file should be "galaxyNGC891.txt.rle" 2) uncompress Similarly to compress, you will create a project "uncompress" with a single program "uncompress.cpp" creating an executable "uncompress.exe". This program will receive a file created by the program "compress" and re-create the original, uncompressed file. The following format will be used for uncompress: uncompress where is your existent input file to be uncompressed. The program should check that has the extension ".rle" and if it doesn't it must append such extension to it. In this way, the user is given the option of providing the suffix or not. So a command such as "uncompress galaxyNGC891" should be understood by the program as "uncompress galaxyNGC891.rle". The resulting output of the program must be written to a file with the same name as the input file, but without the extension ".rle". So if your input file was "galaxyNGC891.rle", the outputfile must be named "galaxyNGC891". Handling Erroneous Inputs ------------------------- Your programs must handle the following erroneous cases: - incorrect use of command line arguments - compressing or uncompressing a non-existent file - creating a compressed or uncompressed file when another file with the same name already exists. Make things very simple and do not prompt the user for an alternate input. Documentation and Style ----------------------- Your program must be well-documented and written in good style. Part of the marks will be dedicated to documentation and style. To document your program, write comments explaining what is done in the following lines and add comments at the end of certain lines explaining what has been done on that line. For each function and procedure, add a full explanation before it, explaining its purpose and what its parameters represent. For each class created, add an explanation before it. Good style includes good documentation, choice of clear names (for variables, classes, functions, etc), good program organization and design (like creating functions that deal with each part of the program and/or creating classes where appropriate). NOTE: Although it is a preferable style to place different classes in different files, if you create any classes for this program, they all must be included in the same program file in order to comply with the standards described below. Testing ------- You must test your programs with many different command line arguments covering: * invalid inputs (testing every possible user error); * valid inputs (using various different image files). We shall provide one or more image files for you to test your program; they will be included in the webCT assignment page. You should create various image files for testing, one of which you will submit. We will test your program with the files provided as well as other files of our choice; we will also test a wide range of user errors. What to Submit -------------- Hand in the following files: compress.cpp (program for compression) uncompress.cpp (program for uncompression) image1.txt (file with a 20x20 ASCII image created by you) image1.txt.rle (the compressed file produced by running your program on image1.txt) written.txt (textfile containing your answers to written questions) EVERY FILE MUST CONTAIN A HEADER WITH Student Name, Student Number, Course and Section. Programs that do not comply with this standards or with the standard file names above will receive mark deductions. Create a directory/folder as explained in Assignment#0, copy the five files you have created into that directory/folder, and zip the directory. Similarly to what you were asked in Assignment#0, submit the zipped file as your assignment#1 submission. Your program must meet the specifications given in this handout and in any further clarification given later on the web page. ====================================================================== PART II: WRITTEN QUESTIONS ========================== Answer these questions in paper first and then type your answers in a text file called "written.txt" (use a text editor such as notepad). 1) Magnetic Disks (6 marks) Given a file with the following characteristics: length: 1,000,000 bytes organization: 2,000 records with 500 bytes per records and a disk drive with the following characteristics: Number of Bytes per Sectors = 600 Number of Sectors per Track = 100 Number of Tracks per Cylinder = 4 Average seek time = 2 msec Average rotational delay = .5 msec Transfer time = .01 msec/sector = 1 msec/track We further assume the following regarding our disk organization: 1-a) Configuration 1: .Records are allowed to span two sectors. .Consecutive file records are stored in consecutive disk sectors in a track. .The tracks are spread out randomly across the disk. 1-b) Configuration 2: .Only one record is stored in each sector. .Consecutive file records are stored in consecutive disk sectors and consecutive disk tracks within each cylinder. .No rotational delay is incurred in going from one track to the other within a cylinder during sequential access (i.e., the beginnings of the tracks are appropriately staggered within a cylinder). .The cylinders are spread out randomly. For each of the configurations above, calculate: A) The time it would take to read this file completely SEQUENTIALLY. B) The time it would take to read this file completely using RANDOM ACCESS (traversing each record once in random order). C) The amount of internal fragmentation present in the file due to unused space in a sector. For each calculation, show all the steps in your derivation. 2) Magnetic Tapes (4 marks) Consider the following tape description: density = 3,000 bpi interblock gap = 0.2 inch Speed = 200 ips and the following file characteristics: number of records = 1,000 record size = 30 Using blocking factors 50 and 100, calculate the following quantities: A) Amount of tape (in feet) required to store the file (note that 1 foot = 12 inches). B) Effective recording density. C) Effective transmission rate. For each calculation, show all the steps in your derivation. ======================================================================