A compiler is an object which translate a source code in an executable file. The process between this two point consists in two phases: a translation into an internal representation readable by computer, and then a translation which produces code in computer language.
The first phase is lexical analysis; it recognizes patterns in a stream of characters. A program that performs lexical analysis is called Scanner. Flex is a free scanner.
Flex takes a scanner description (.lex) and produces a scanner C source code (.lex.yy.c) which, given to gcc, produces the executable file. This file can read an input stream and produces a tokenized stream.
File .lex
%{
//C code for external libraries
%}
//Here the definition of the patterns
LETTER [a-zA-Z]
DIGIT [0-9]
EXIT "exit"
%%
//From here the rules: regexp and C code for action taken when condition matches
EXIT yyterminate();
%%
//Here the user code, copied as is in the file C. Usually it contains functions defined by user and routines called in the rules
main() {
yylex(); //call this functions is mandatory!
}
How run it?
flex file.lex
gcc file.lex.yy.c
Really simply.
This is a real example of a scanner which converts every lowercase character to an uppercase character.
%{
#include <unistd.h>
%}
%option noyywrap //When the Scanner read EOF terminates.
LETTER [a-zA-Z]
DIGIT [0-9]
SPACE[" "]
%%
{LETTER}+ {
int i=0;
for(; i<yyleng; i=i+1)
if (islower( yytext[i] ))
putchar(toupper( yytext[i] ));
}
{DIGIT} ECHO
SPACE[" "] REJECT
%%
main() {
yylex();
}
yytext is the token read. yyleng is the lentgh of yytext.
Rule matching is taken on the longest match rule. If no rule found, the scanner performs a standard rule (tupically return all on yyout/stdout).
Directive ECHO prints on yyout, directive REJECT jumps to the next rule.
That's all for now!
Nessun commento:
Posta un commento