CyberArmy University | Open Source Institute | CyberArmy Intelligence & Security | CyberArmy Services & Projects

[Library Index]

[View category: Programming] [Discuss Article]

Design of Lexical Analyzers

Article Rating: Excellent (# of votes: 1)
Author:      Rae
Submitted:      28-Apr-2007 19:41:03
Imported From:      The CyberArmy University (original author: Rae)


Design of Lexical Analyzers
Generally the representation of the design model of any tool in computing is done with the help of a flowchart. The flowchart actually is nothing but a diagrammatic representation of an algorithm. To graphically represent lexical analyzers, we change our notation in flowcharts slightly, and the resulting chart is called a 'state transition diagram' or simply 'state diagram' or 'transition diagram'.

The design of the lexical analyzer is like this :-
  • Upon start, it moves to a state 0.
  • - If we encounter a letter, we move to state 1.
  • Now, if we encounter a letter or a digit, we remain in state 1.
  • If we encounter a delimiter, we move to state 2, the termination state.
The condition imposed in this design is that the first token must be a letter, and not a digit. Then only the analyzer moves to state 1, where the following characters can be letters or digits. The terminating character or delimiting character must not be a letter or a digit, but may be a special character like ;.

To make this lexical analyzer, we have to first define and code a primitive function GETCHAR, which returns the next character.

Program Pseudo Code for State 0 :-
C = GETCHAR()
IF C IS Letter
   THEN GOTO State 1
ELSE
   FAIL()
FAIL() is the error handling routine and 'C IS Letter' checks if variable C is a letter and if it is, returns true.

Program Pseudo Code for State 1 :-
C = GETCHAR()
IF C IS Letter OR Digit
   THEN GOTO State 1
ELSE 
   IF C IS Delimiter
      THEN GOTO State 2
ELSE
   FAIL()
State 2 indicates that an identifier has been found. As the delimiter is not a part of the identifier, we must retract the look ahead pointer one character. We can use '*' to indicate states on which input retraction must take place. If the identifier is not already in the symbol table, we install it there.

Program Pseudo Code for State 2 :-
RETRACT()
return(id,INSTALL())
The return function returns to the parser the pair of integer code for an identifier, and a pointer to the symbol table returned by the function INSTALL().

References :-
  • Elementry Discrete Mathematics by C L Liu
  • Theory of Computer Science by E V Krishnamurthy
  • Compilers : Construction and Design by Aho, Sethi and Ullman
  • System Programming by John Donovan
Written By Rae (3 February 2005)

Rae is a member of CAU Knowledge-Bank Tutorial Writers

This article was originally published by CyberArmy.net in the CyberArmy Library.

You must be logged in to vote on an article

About Us | Privacy Policy | Mission Statement | Help