The Mighty Awesome Power of Scala

Posted by feydr | Posted in Uncategorized | Posted on 21-03-2010

View Comments

Ruby FanBoy

Over at work we (I) have been getting pretty fed up with ruby lately. There is way too much fanboyism and not enough code. So I’ve been looking at doing things in different frameworks lately. This of course always leads to me having to try out new languages that I might have only looked at a couple of times before. Two things that I’d look in a web framework to make my life easier are:

1) class reloading
2) dynamic typing OR a sweet ass type inference engine

I could not stress these two points more. I do NOT want to have to recompile/reload my classes every time I make a simple edit to the code — having this with ruby/merb has just made tremendous strides in how fast we can flesh out features. I also HATE having to declare types for my methods, variables, etc. I think it’s a waste of time and good code should not have to do that (especially if you are using type inference like all good languages do).

So eventually I came across lift which I thought sucked right out since it likes maven/xml. Seriously, xml files were NOT meant to be configuration files — goto hell!

However, recently on another front my colleague and I have already been looking at languages that run on the JVM for another project for the same website. We’ve been looking into scalability concerns with big data and have both concluded that we want/need the power of the JVM but really do not like java particularly too much. This has led us to look at the two up and comers: Scala and Clojure. Recently I got ahold of the pragprog book and started flipping through it’s pages. One section in particular caught my eye. The author in a matter of 5-10 lines or so was able to open up a file, scrape some xml from the web, parse it and then save it. I was like — WHAT THE FUCK!? In java the exact same code would have had to have gone through the verbosity monster making machine and clock in at 20-30 lines easily.


So I kept reading the book! Another thing that immediately jumped out at me — where’s all your fucking semi-colons!? Oh those? We don’t need no fucking semi-colons in our language!

Motherfucking <3

This was enough for me to seriously start thinking about re-writing our main java project. This particular project has 10 (count them ten) antlr grammars and quite a few supporting classes along with CLI wrappers to said classes. We have quite a few test suites for each class thankfully otherwise this project would probably not be so possible without seriously killing it.

I first started converting one of our main classes over. Replace the for loops, drop the semicolons, take out the public identifiers, etc. Let’s try to compile. Oh noes! We don’t have access to these methods anymore. This is where javap comes in handy when you are converting from java –> scala. Javap will show you what the REAL class name or method name is for each language. Keep in mind my original intention was to just get a scala class to compile and use it FROM java. Now I’m on route to be using everything from scala but it really doesn’t matter — as long as you have class files and as long as those class files have access to the right methods/variables you are set.

Let’s say we have a seat class written in scala:

package com.bluffware;
 
class Seat() {
  var id:Int = 0
  var number:Int = 0
  var seat_id:Int = 0
  var sitout:String = ""
  var position:String = ""
  var button:String = ""
  var player:String = ""
  var amount:String = ""
 
  // might want to add utg, cutoff later on..
}

You might think that from java you can do a simple:

seat = Seat.new();
seat.number = 2;
myContainer.seats += seat;

Well you can’t cause javap shows that there is NO access to number.

If you are accessing this class from java you’ll need to do the following:

seat = Seat.new();
seat.number_\$eq(2);
myContainer.seats().\$plus\$eq(seat);

The escapes are for antlr — in normal java it’d just be _$eq(var)

Really though, the only time when you are going to have to do this is if you
have a java class that you need to be compiled through java — if you can convert
it to scala go ahead and do so.

Depending on the size and scope of your project it might make sense to convert
a bit to scala and do it piece-meal and then have other classes that access the scala
class to change the way they use it.

Really in our project now the only thing that uses this is our antlr grammars which
I’m going to be at sometime finally finishing the scala antlr target.

Anyways, if you have not tried out scala yet and you are a java dev — what are you
waiting for!? It’s easier than you think — make sure you have some tests for a small
class you’d like to convert then start doing it!

Hacking it Up with Lex, Yacc and Apache

Posted by feydr | Posted in Uncategorized | Posted on 05-03-2010

View Comments

So…. I got drunk a week or so ago and decided I’d start my own programming language. this is not the first time I have done this — back 7-8 years ago when I was in school I did the exact same thing. I program in ruby and java most everyday and love the speed of java but hate it’s syntax. On the other hand I love ruby but hate its performance.

damned if you do

I’ve also been in the process of doing some serious benchmarking lately against webservers and other things while managing to piss people off when I start quoting scripture from the mountaintop regarding performance — which is the one thing I pay most attention to on whatever project I am working on. “Fast enough” is not faster or fastest.

This led me to examine what exactly I’m trying to do on various web projects and what I need to do to get it done. Primarily I want to serve up an application via the web to the world. To this end I want an environment that is either ridiculous fast at compiling/loading classes or I want an interpreter. Furthermore this environment should be able to serve up web pages. My first search started re-analyzing common types of web servers.

Types of WebServers

  • CGI
  • FastCGI
  • Native Code
  • Your Own WebServer

Let’s review them real quick:

CGI
CGI is probably the first thing people used to turn to when they want to serve up a script written in some language but do NOT want to write the webserver themselves since the grammar of HTTP is not the prettiest thing (from what I’ve been told).

Your typical cgi script is EXTREMELY easy to do.
Here’s an example:

First edit your /etc/apache/apache.conf like so:

ScriptAlias /cgi-bin/ /www/cgi-bin/

Now just put this simple two line ruby script into
that directory …

#!/usr/bin/ruby
 
puts "Content-type: text/html\n\n"
puts "test"

… and change the permissions.

sudo chmod -R 755 /www/cgi-bin/
sudo chown -R www-data:www-data /www/cgi-bin/
sudo /etc/init.d/apache/restart

Now you can navigate to your url at http://127.0.0.1/cgi-bin/test.rb and see a webpage!

Of course, it’s not very fast ….

ab -c 1 -n 100 http://127.0.0.1/cgi-bin/test.rb
Requests per second:    150.45 [#/sec] (mean)
 
ab -c 10 -n 100 http://127.0.0.1/cgi-bin/test.rb
Requests per second:    435.64 [#/sec] (mean)
 
ab -c 100 -n 100 http://127.0.0.1/cgi-bin/test.rb
=> Requests per second:    421.97 [#/sec] (mean)

For this reason a lot of people do not use straight up CGI in production environments anymore.

FastCGI

FastCGI is ‘faster’ than CGI since it loads the interpreter once and therefore does not have to keep loading it up everytime a page is requested. In reality it can be just as slow though if you have to load a ton of different components like notable ruby frameworks merb and rails. Other than this there really is not much difference between CGI and FastCGI — so it’s out of our game as well.

Native Code
This is what we are going to be looking at. Native code can take your requested webfile — parse it and return the contents immediately — this is the fastest way outside of creating your own webserver and having to consult Beej (cause I’m assuming you are writing your language in something fast like C right??? Right!?? Typically when people choose this option they are looking at options like phusion passenger where they make their own nginx or apache module. This is retarded fast.

Your Own WebServer
There are very good reasons for not rolling your own web server. There are quite a few edge cases in HTTP you need to support and various browsers have ‘weird’ ways of handling things. This is not to even mention that you are going to leave out quite a few people who would rather use apache or nginx. However, ruby’s webrick would be an example of ‘rolling your own’.

So let’s take a second look at writing an apache module (cause let’s face it — nginx is cool but apache is adopted everywhere else).

Compiling your Module

You’ll want to grab modules/experimental/mod_example.c from an apache 2.2 directory. You obviously don’t need the majority of the crap in it. I’ve included the most important part here:

/* This example just takes a pointer to the request record as its only 
* argument */
static int webit_handler(request_rec *r)
{
 
        /* We decline to handle a request if hello-handler is not the value 
         * of r->handler */
        if (strcmp(r->handler, "webit-handler")) {
                return DECLINED;
        }
 
        /* The following line just prints a message to the errorlog */
        ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0, r->server,
        "mod_webit: %s", "Loaded to kick ass!");
 
        /* We set the content type before doing anything else */
        ap_set_content_type(r, "text/html");
 
        /* If the request is for a header only, and not a request for 
         * the whole content, then return OK now. We don't have to do 
         * anything else. */
        if (r->header_only) {
                return OK;
        }
 
        /* Now we just print the contents of the document using the 
         * ap_rputs and ap_rprintf functions. More information about 
         * the use of these can be found in http_protocol.h */
        ap_rputs("<HTML>\n", r);
        ap_rputs("Hello world\n", r);
        ap_rprintf(r, "%s\n", parseThatShit(r->filename));
        ap_rputs("</HTML>\n" ,r);
 
        /* We can either return OK or DECLINED at this point. If we return 
        * OK, then no other modules will attempt to process this request */
        return OK;
}

Your handler is what you want to focus in on. It receives a request and at that
point you can do whatever the hell you want with it. I have chosen here to take
whatever file the user gave to me and sent it off to parseThatShit which returns
a charstar.

You can compile it with the following:
(I have chosen c99 for better or for worse — I’ll leave your discriminatory comments up to
the internet to judge.)

sudo apxs2 -c -i -Wc,-std=c99 -a mod_webit.c 
sudo /etc/init.d/apache/restart

Let’s setup apache to accept this wonderfulness:
(make sure your module was installed in the same path as mine)

LoadModule webit_module /usr/lib/apache2/modules/mod_webit.so
 
Listen 82
 
<VirtualHost 127.0.0.1:82>
    SetHandler webit-handler
    ServerName 127.0.0.1
    DocumentRoot /home/feydr/random-hacking/webit/www/views
</VirtualHost>

Don’t forget that the log files are you friend — if nothing happens check the logs! It’s probably because you didn’t compile with debugging support and you haven’t spent enough time in gdb land. Let’s go segfaults! Duhm, duh duh duhm!

 tail -f /var/log/apache2/error.log
*** stack smashing detected ***: /usr/sbin/apache2 terminated
[Wed Mar 03 17:13:35 2010] [notice] child pid 318 exit signal Segmentation fault (11)
[Wed Mar 03 17:13:42 2010] [notice] mod_webit: Loaded to kick ass!
*** stack smashing detected ***: /usr/sbin/apache2 terminated
[Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass!
[Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass!

Lex and Yacc
Now to the meat of this article.

Lex allows you to define all your tokens. Yacc will allow you to put your tokens into meaningful functions.

Here’s a simple lex file:

%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
 
var                     return TOKVAR;
puts                    return TOKPUTS;
\"                      return QUOTES;
times                   return TIMES;
do                      return DO;
end                     return END;
'.'                     return PERIOD;
[-+()/*\n]              return *yytext;
[0-9]+                  yylval=atoi(yytext); return NUMBER;
[a-zA-Z]*               yylval=strdup(yytext); return WORD;
=                       return EQUALS;
[ \t]+                  /* ignore whitespace */;
%%

Our yacc file is not so simple but it should be a bit more readable. Let’s take a look at some of it:

Our main:

main(int argc, char *argv[])
{
  if(argc > 1) {
    parseFile(argv[1]);
  } else {
    repl = 0;
    printf("Webit Version %s", VERSION);
    putPrompt();
 
   /* yyparse is what actually will start parsing your file */ 
   yyparse();
  }
}
 
/* ........ */
 
/* here we see that we have various commands that we can do */
commands: /* empty */
        | commands command
        ;
 
command:
        var_assign
        |
        statement
        |
        string_assign
        |
        do_loop
        |
        puts_var
        ;
 
/* ..... */
 
/* variable assignment */
var_assign:
        TOKVAR WORD EQUALS WORD '\n'
        {
          int x = firstEmpty();
          /* debug mode only
            printf("assigned %s to %s\n", $4, $3);
          */
          char *name = strdup($2);
          char *value = strdup($4);
          putPrompt();
        }
        ;

our Makefile

all:
        lex webit.l
        yacc -d webit.y
        cc lex.yy.c y.tab.c -std=c99 -o blah
 
clean:
        rm -rf y.tab.h blah lex.yy.c y.tab.c

Really making your own programming language the way you want to program and with performance is easy. Maybe you shouldn’t rush to use it on production servers but shit — it sure beats the hell out of putting up with crappy software in an age where programming is considered to be just including other people’s crappy software.