Hacking it Up with Lex, Yacc and Apache
Posted by feydr | Posted in Uncategorized | Posted on 05-03-2010
View Comments
So…. I got drunk a week or so ago and decided I’d start my own programming language. this is not the first time I have done this — back 7-8 years ago when I was in school I did the exact same thing. I program in ruby and java most everyday and love the speed of java but hate it’s syntax. On the other hand I love ruby but hate its performance.

I’ve also been in the process of doing some serious benchmarking lately against webservers and other things while managing to piss people off when I start quoting scripture from the mountaintop regarding performance — which is the one thing I pay most attention to on whatever project I am working on. “Fast enough” is not faster or fastest.
This led me to examine what exactly I’m trying to do on various web projects and what I need to do to get it done. Primarily I want to serve up an application via the web to the world. To this end I want an environment that is either ridiculous fast at compiling/loading classes or I want an interpreter. Furthermore this environment should be able to serve up web pages. My first search started re-analyzing common types of web servers.
Types of WebServers
- CGI
- FastCGI
- Native Code
- Your Own WebServer
Let’s review them real quick:
CGI
CGI is probably the first thing people used to turn to when they want to serve up a script written in some language but do NOT want to write the webserver themselves since the grammar of HTTP is not the prettiest thing (from what I’ve been told).
Your typical cgi script is EXTREMELY easy to do.
Here’s an example:
First edit your /etc/apache/apache.conf like so:
ScriptAlias /cgi-bin/ /www/cgi-bin/
Now just put this simple two line ruby script into
that directory …
#!/usr/bin/ruby puts "Content-type: text/html\n\n" puts "test"
… and change the permissions.
sudo chmod -R 755 /www/cgi-bin/ sudo chown -R www-data:www-data /www/cgi-bin/ sudo /etc/init.d/apache/restart
Now you can navigate to your url at http://127.0.0.1/cgi-bin/test.rb and see a webpage!
Of course, it’s not very fast ….
ab -c 1 -n 100 http://127.0.0.1/cgi-bin/test.rb Requests per second: 150.45 [#/sec] (mean) ab -c 10 -n 100 http://127.0.0.1/cgi-bin/test.rb Requests per second: 435.64 [#/sec] (mean) ab -c 100 -n 100 http://127.0.0.1/cgi-bin/test.rb => Requests per second: 421.97 [#/sec] (mean)
For this reason a lot of people do not use straight up CGI in production environments anymore.
FastCGI
FastCGI is ‘faster’ than CGI since it loads the interpreter once and therefore does not have to keep loading it up everytime a page is requested. In reality it can be just as slow though if you have to load a ton of different components like notable ruby frameworks merb and rails. Other than this there really is not much difference between CGI and FastCGI — so it’s out of our game as well.
Native Code
This is what we are going to be looking at. Native code can take your requested webfile — parse it and return the contents immediately — this is the fastest way outside of creating your own webserver and having to consult Beej (cause I’m assuming you are writing your language in something fast like C right??? Right!?? Typically when people choose this option they are looking at options like phusion passenger where they make their own nginx or apache module. This is retarded fast.
Your Own WebServer
There are very good reasons for not rolling your own web server. There are quite a few edge cases in HTTP you need to support and various browsers have ‘weird’ ways of handling things. This is not to even mention that you are going to leave out quite a few people who would rather use apache or nginx. However, ruby’s webrick would be an example of ‘rolling your own’.
So let’s take a second look at writing an apache module (cause let’s face it — nginx is cool but apache is adopted everywhere else).
Compiling your Module
You’ll want to grab modules/experimental/mod_example.c from an apache 2.2 directory. You obviously don’t need the majority of the crap in it. I’ve included the most important part here:
/* This example just takes a pointer to the request record as its only * argument */ static int webit_handler(request_rec *r) { /* We decline to handle a request if hello-handler is not the value * of r->handler */ if (strcmp(r->handler, "webit-handler")) { return DECLINED; } /* The following line just prints a message to the errorlog */ ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0, r->server, "mod_webit: %s", "Loaded to kick ass!"); /* We set the content type before doing anything else */ ap_set_content_type(r, "text/html"); /* If the request is for a header only, and not a request for * the whole content, then return OK now. We don't have to do * anything else. */ if (r->header_only) { return OK; } /* Now we just print the contents of the document using the * ap_rputs and ap_rprintf functions. More information about * the use of these can be found in http_protocol.h */ ap_rputs("<HTML>\n", r); ap_rputs("Hello world\n", r); ap_rprintf(r, "%s\n", parseThatShit(r->filename)); ap_rputs("</HTML>\n" ,r); /* We can either return OK or DECLINED at this point. If we return * OK, then no other modules will attempt to process this request */ return OK; }
Your handler is what you want to focus in on. It receives a request and at that
point you can do whatever the hell you want with it. I have chosen here to take
whatever file the user gave to me and sent it off to parseThatShit which returns
a charstar.
You can compile it with the following:
(I have chosen c99 for better or for worse — I’ll leave your discriminatory comments up to
the internet to judge.)
sudo apxs2 -c -i -Wc,-std=c99 -a mod_webit.c sudo /etc/init.d/apache/restart
Let’s setup apache to accept this wonderfulness:
(make sure your module was installed in the same path as mine)
LoadModule webit_module /usr/lib/apache2/modules/mod_webit.so Listen 82 <VirtualHost 127.0.0.1:82> SetHandler webit-handler ServerName 127.0.0.1 DocumentRoot /home/feydr/random-hacking/webit/www/views </VirtualHost>
Don’t forget that the log files are you friend — if nothing happens check the logs! It’s probably because you didn’t compile with debugging support and you haven’t spent enough time in gdb land. Let’s go segfaults! Duhm, duh duh duhm!
tail -f /var/log/apache2/error.log *** stack smashing detected ***: /usr/sbin/apache2 terminated [Wed Mar 03 17:13:35 2010] [notice] child pid 318 exit signal Segmentation fault (11) [Wed Mar 03 17:13:42 2010] [notice] mod_webit: Loaded to kick ass! *** stack smashing detected ***: /usr/sbin/apache2 terminated [Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass! [Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass!
Lex and Yacc
Now to the meat of this article.

Lex allows you to define all your tokens. Yacc will allow you to put your tokens into meaningful functions.
Here’s a simple lex file:
%{ #include <stdio.h> #include "y.tab.h" %} %% var return TOKVAR; puts return TOKPUTS; \" return QUOTES; times return TIMES; do return DO; end return END; '.' return PERIOD; [-+()/*\n] return *yytext; [0-9]+ yylval=atoi(yytext); return NUMBER; [a-zA-Z]* yylval=strdup(yytext); return WORD; = return EQUALS; [ \t]+ /* ignore whitespace */; %%
Our yacc file is not so simple but it should be a bit more readable. Let’s take a look at some of it:
Our main:
main(int argc, char *argv[]) { if(argc > 1) { parseFile(argv[1]); } else { repl = 0; printf("Webit Version %s", VERSION); putPrompt(); /* yyparse is what actually will start parsing your file */ yyparse(); } } /* ........ */ /* here we see that we have various commands that we can do */ commands: /* empty */ | commands command ; command: var_assign | statement | string_assign | do_loop | puts_var ; /* ..... */ /* variable assignment */ var_assign: TOKVAR WORD EQUALS WORD '\n' { int x = firstEmpty(); /* debug mode only printf("assigned %s to %s\n", $4, $3); */ char *name = strdup($2); char *value = strdup($4); putPrompt(); } ;
our Makefile
all: lex webit.l yacc -d webit.y cc lex.yy.c y.tab.c -std=c99 -o blah clean: rm -rf y.tab.h blah lex.yy.c y.tab.c
Really making your own programming language the way you want to program and with performance is easy. Maybe you shouldn’t rush to use it on production servers but shit — it sure beats the hell out of putting up with crappy software in an age where programming is considered to be just including other people’s crappy software.

