30.11.12

Stream of night: Flex-Lexical Analyzer

A compiler is an object which translate a source code in an executable file. The process between this two point consists in two phases: a translation into an internal representation readable by computer, and then a translation which produces code in computer language.
The first phase is lexical analysis; it recognizes patterns in a stream of characters. A program that performs lexical analysis is called Scanner. Flex is a free scanner.
Flex takes a scanner description (.lex) and produces a scanner C source code (.lex.yy.c) which, given to  gcc, produces the executable file. This file can read an input stream and produces a tokenized stream.

File .lex

%{
//C code for external libraries
%}
//Here the definition of the patterns
LETTER [a-zA-Z]
DIGIT [0-9]
EXIT "exit"
%%
//From here the rules: regexp and C code for action taken when condition matches
EXIT         yyterminate();
%%
//Here the user code, copied as is in the file C. Usually it contains functions defined by user and routines called in the rules
main() {
     yylex();       //call this functions is mandatory!
}

How run it?
flex  file.lex
gcc file.lex.yy.c
Really simply.

This is a real example of a scanner which converts every lowercase character to an uppercase  character.

%{
#include <unistd.h>
%}
%option noyywrap      //When the Scanner read EOF terminates.
LETTER [a-zA-Z]
DIGIT [0-9]
SPACE[" "]
%%
{LETTER}+ {
    int i=0;
   for(; i<yyleng; i=i+1)
         if (islower( yytext[i] )) 
             putchar(toupper( yytext[i] ));
  }
{DIGIT} ECHO 

SPACE[" "] REJECT

%%
main() {
     yylex();
}

yytext is the token read. yyleng is the lentgh of yytext.
Rule matching is taken on the longest match rule. If no rule found, the scanner performs a standard rule (tupically return all on yyout/stdout).
Directive ECHO prints on yyout, directive REJECT jumps to the next rule.
That's all for now!

Nessun commento:

Posta un commento