Scanner for the EPSP Database Management Programming Project


Implementation of the project usually requires some code that can read and parse input, e.g. to determine what command is being issued and the command's arguments. This process, while not difficult, can sometimes be a time-consuming process. I have prepared some C code that you can use as a starting-point for writing the scanning functions that you might need. The implementation of the scanner is an example of a general class of parsing and scanning techniques known as "recursive-descent parsing". The organization and structure of the code is dictated by the formal grammar that describes the input that the code is to process.

Source code

Download the following zip file. It contains the following files:
epspscan.c
Source code of the scanner
epspscan.h
include file with scanner's declarations
simpdemo.cpp
Simple program demonstrating how to use the scanner.
hwdemo.c
A more complex example showing how to create the parser for one of the sections of the final project (reading orders). Read the comments at the top to see what kind of input it expects. Terminate it by typing Ctrl-Z.

Warning

If you use the EPSP Scanner, you should not use any other input functions. Get_Token will be enough for you. Otherwise you might experience unexpected behaviour from your application.

Definitions

token_type

enum token_type {ALPHA, NUMERIC, OTHER, END_OF_LINE, END_OF_FILE};

A token can be of 5 different types:

ALPHA
alphanumeric: it starts with an alphabetic character (as defined by the C++ function alpha --check its documentation--) followed by an alphabetic or numeric character.
NUMERIC
a sequence of numbers. It can be preceded by a + or - sign
END_OF_LINE
the scanner found an end of line
END_OF_FILE
it found the end of the input
OTHER
Anything else falls into this category.

Spaces and tabs are used as separators and are not included in any token.

Get_Token

token_type token_type Get_Token(char *token);

It reads the next token and returns its type. Its parameter is set to the value of the token (as a string). Get_Token assumes that there is space allocated for the token, it does not allocate memory for it

Get_Token returns the type of token read (see token_type above).

Reset_Get_Token

Sometimes, when an error is detected, it is desirable to skip until the beginning of the next line. Reset_Get_Token resets the scanner and the next token read is at the beginning of the next line.


Original implementation: Daniel M. German, Computer Systems Group, University of Waterloo
Date: May 26, 1999

Current maintenance: TRG