The Mighty Awesome Power of Scala

Posted by feydr | Posted in Uncategorized | Posted on 21-03-2010

View Comments

Ruby FanBoy

Over at work we (I) have been getting pretty fed up with ruby lately. There is way too much fanboyism and not enough code. So I’ve been looking at doing things in different frameworks lately. This of course always leads to me having to try out new languages that I might have only looked at a couple of times before. Two things that I’d look in a web framework to make my life easier are:

1) class reloading
2) dynamic typing OR a sweet ass type inference engine

I could not stress these two points more. I do NOT want to have to recompile/reload my classes every time I make a simple edit to the code — having this with ruby/merb has just made tremendous strides in how fast we can flesh out features. I also HATE having to declare types for my methods, variables, etc. I think it’s a waste of time and good code should not have to do that (especially if you are using type inference like all good languages do).

So eventually I came across lift which I thought sucked right out since it likes maven/xml. Seriously, xml files were NOT meant to be configuration files — goto hell!

However, recently on another front my colleague and I have already been looking at languages that run on the JVM for another project for the same website. We’ve been looking into scalability concerns with big data and have both concluded that we want/need the power of the JVM but really do not like java particularly too much. This has led us to look at the two up and comers: Scala and Clojure. Recently I got ahold of the pragprog book and started flipping through it’s pages. One section in particular caught my eye. The author in a matter of 5-10 lines or so was able to open up a file, scrape some xml from the web, parse it and then save it. I was like — WHAT THE FUCK!? In java the exact same code would have had to have gone through the verbosity monster making machine and clock in at 20-30 lines easily.


So I kept reading the book! Another thing that immediately jumped out at me — where’s all your fucking semi-colons!? Oh those? We don’t need no fucking semi-colons in our language!

Motherfucking <3

This was enough for me to seriously start thinking about re-writing our main java project. This particular project has 10 (count them ten) antlr grammars and quite a few supporting classes along with CLI wrappers to said classes. We have quite a few test suites for each class thankfully otherwise this project would probably not be so possible without seriously killing it.

I first started converting one of our main classes over. Replace the for loops, drop the semicolons, take out the public identifiers, etc. Let’s try to compile. Oh noes! We don’t have access to these methods anymore. This is where javap comes in handy when you are converting from java –> scala. Javap will show you what the REAL class name or method name is for each language. Keep in mind my original intention was to just get a scala class to compile and use it FROM java. Now I’m on route to be using everything from scala but it really doesn’t matter — as long as you have class files and as long as those class files have access to the right methods/variables you are set.

Let’s say we have a seat class written in scala:

package com.bluffware;
 
class Seat() {
  var id:Int = 0
  var number:Int = 0
  var seat_id:Int = 0
  var sitout:String = ""
  var position:String = ""
  var button:String = ""
  var player:String = ""
  var amount:String = ""
 
  // might want to add utg, cutoff later on..
}

You might think that from java you can do a simple:

seat = Seat.new();
seat.number = 2;
myContainer.seats += seat;

Well you can’t cause javap shows that there is NO access to number.

If you are accessing this class from java you’ll need to do the following:

seat = Seat.new();
seat.number_\$eq(2);
myContainer.seats().\$plus\$eq(seat);

The escapes are for antlr — in normal java it’d just be _$eq(var)

Really though, the only time when you are going to have to do this is if you
have a java class that you need to be compiled through java — if you can convert
it to scala go ahead and do so.

Depending on the size and scope of your project it might make sense to convert
a bit to scala and do it piece-meal and then have other classes that access the scala
class to change the way they use it.

Really in our project now the only thing that uses this is our antlr grammars which
I’m going to be at sometime finally finishing the scala antlr target.

Anyways, if you have not tried out scala yet and you are a java dev — what are you
waiting for!? It’s easier than you think — make sure you have some tests for a small
class you’d like to convert then start doing it!

Hacking it Up with Lex, Yacc and Apache

Posted by feydr | Posted in Uncategorized | Posted on 05-03-2010

View Comments

So…. I got drunk a week or so ago and decided I’d start my own programming language. this is not the first time I have done this — back 7-8 years ago when I was in school I did the exact same thing. I program in ruby and java most everyday and love the speed of java but hate it’s syntax. On the other hand I love ruby but hate its performance.

damned if you do

I’ve also been in the process of doing some serious benchmarking lately against webservers and other things while managing to piss people off when I start quoting scripture from the mountaintop regarding performance — which is the one thing I pay most attention to on whatever project I am working on. “Fast enough” is not faster or fastest.

This led me to examine what exactly I’m trying to do on various web projects and what I need to do to get it done. Primarily I want to serve up an application via the web to the world. To this end I want an environment that is either ridiculous fast at compiling/loading classes or I want an interpreter. Furthermore this environment should be able to serve up web pages. My first search started re-analyzing common types of web servers.

Types of WebServers

  • CGI
  • FastCGI
  • Native Code
  • Your Own WebServer

Let’s review them real quick:

CGI
CGI is probably the first thing people used to turn to when they want to serve up a script written in some language but do NOT want to write the webserver themselves since the grammar of HTTP is not the prettiest thing (from what I’ve been told).

Your typical cgi script is EXTREMELY easy to do.
Here’s an example:

First edit your /etc/apache/apache.conf like so:

ScriptAlias /cgi-bin/ /www/cgi-bin/

Now just put this simple two line ruby script into
that directory …

#!/usr/bin/ruby
 
puts "Content-type: text/html\n\n"
puts "test"

… and change the permissions.

sudo chmod -R 755 /www/cgi-bin/
sudo chown -R www-data:www-data /www/cgi-bin/
sudo /etc/init.d/apache/restart

Now you can navigate to your url at http://127.0.0.1/cgi-bin/test.rb and see a webpage!

Of course, it’s not very fast ….

ab -c 1 -n 100 http://127.0.0.1/cgi-bin/test.rb
Requests per second:    150.45 [#/sec] (mean)
 
ab -c 10 -n 100 http://127.0.0.1/cgi-bin/test.rb
Requests per second:    435.64 [#/sec] (mean)
 
ab -c 100 -n 100 http://127.0.0.1/cgi-bin/test.rb
=> Requests per second:    421.97 [#/sec] (mean)

For this reason a lot of people do not use straight up CGI in production environments anymore.

FastCGI

FastCGI is ‘faster’ than CGI since it loads the interpreter once and therefore does not have to keep loading it up everytime a page is requested. In reality it can be just as slow though if you have to load a ton of different components like notable ruby frameworks merb and rails. Other than this there really is not much difference between CGI and FastCGI — so it’s out of our game as well.

Native Code
This is what we are going to be looking at. Native code can take your requested webfile — parse it and return the contents immediately — this is the fastest way outside of creating your own webserver and having to consult Beej (cause I’m assuming you are writing your language in something fast like C right??? Right!?? Typically when people choose this option they are looking at options like phusion passenger where they make their own nginx or apache module. This is retarded fast.

Your Own WebServer
There are very good reasons for not rolling your own web server. There are quite a few edge cases in HTTP you need to support and various browsers have ‘weird’ ways of handling things. This is not to even mention that you are going to leave out quite a few people who would rather use apache or nginx. However, ruby’s webrick would be an example of ‘rolling your own’.

So let’s take a second look at writing an apache module (cause let’s face it — nginx is cool but apache is adopted everywhere else).

Compiling your Module

You’ll want to grab modules/experimental/mod_example.c from an apache 2.2 directory. You obviously don’t need the majority of the crap in it. I’ve included the most important part here:

/* This example just takes a pointer to the request record as its only 
* argument */
static int webit_handler(request_rec *r)
{
 
        /* We decline to handle a request if hello-handler is not the value 
         * of r->handler */
        if (strcmp(r->handler, "webit-handler")) {
                return DECLINED;
        }
 
        /* The following line just prints a message to the errorlog */
        ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, 0, r->server,
        "mod_webit: %s", "Loaded to kick ass!");
 
        /* We set the content type before doing anything else */
        ap_set_content_type(r, "text/html");
 
        /* If the request is for a header only, and not a request for 
         * the whole content, then return OK now. We don't have to do 
         * anything else. */
        if (r->header_only) {
                return OK;
        }
 
        /* Now we just print the contents of the document using the 
         * ap_rputs and ap_rprintf functions. More information about 
         * the use of these can be found in http_protocol.h */
        ap_rputs("<HTML>\n", r);
        ap_rputs("Hello world\n", r);
        ap_rprintf(r, "%s\n", parseThatShit(r->filename));
        ap_rputs("</HTML>\n" ,r);
 
        /* We can either return OK or DECLINED at this point. If we return 
        * OK, then no other modules will attempt to process this request */
        return OK;
}

Your handler is what you want to focus in on. It receives a request and at that
point you can do whatever the hell you want with it. I have chosen here to take
whatever file the user gave to me and sent it off to parseThatShit which returns
a charstar.

You can compile it with the following:
(I have chosen c99 for better or for worse — I’ll leave your discriminatory comments up to
the internet to judge.)

sudo apxs2 -c -i -Wc,-std=c99 -a mod_webit.c 
sudo /etc/init.d/apache/restart

Let’s setup apache to accept this wonderfulness:
(make sure your module was installed in the same path as mine)

LoadModule webit_module /usr/lib/apache2/modules/mod_webit.so
 
Listen 82
 
<VirtualHost 127.0.0.1:82>
    SetHandler webit-handler
    ServerName 127.0.0.1
    DocumentRoot /home/feydr/random-hacking/webit/www/views
</VirtualHost>

Don’t forget that the log files are you friend — if nothing happens check the logs! It’s probably because you didn’t compile with debugging support and you haven’t spent enough time in gdb land. Let’s go segfaults! Duhm, duh duh duhm!

 tail -f /var/log/apache2/error.log
*** stack smashing detected ***: /usr/sbin/apache2 terminated
[Wed Mar 03 17:13:35 2010] [notice] child pid 318 exit signal Segmentation fault (11)
[Wed Mar 03 17:13:42 2010] [notice] mod_webit: Loaded to kick ass!
*** stack smashing detected ***: /usr/sbin/apache2 terminated
[Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass!
[Wed Mar 03 17:38:50 2010] [notice] mod_webit: Loaded to kick ass!

Lex and Yacc
Now to the meat of this article.

Lex allows you to define all your tokens. Yacc will allow you to put your tokens into meaningful functions.

Here’s a simple lex file:

%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
 
var                     return TOKVAR;
puts                    return TOKPUTS;
\"                      return QUOTES;
times                   return TIMES;
do                      return DO;
end                     return END;
'.'                     return PERIOD;
[-+()/*\n]              return *yytext;
[0-9]+                  yylval=atoi(yytext); return NUMBER;
[a-zA-Z]*               yylval=strdup(yytext); return WORD;
=                       return EQUALS;
[ \t]+                  /* ignore whitespace */;
%%

Our yacc file is not so simple but it should be a bit more readable. Let’s take a look at some of it:

Our main:

main(int argc, char *argv[])
{
  if(argc > 1) {
    parseFile(argv[1]);
  } else {
    repl = 0;
    printf("Webit Version %s", VERSION);
    putPrompt();
 
   /* yyparse is what actually will start parsing your file */ 
   yyparse();
  }
}
 
/* ........ */
 
/* here we see that we have various commands that we can do */
commands: /* empty */
        | commands command
        ;
 
command:
        var_assign
        |
        statement
        |
        string_assign
        |
        do_loop
        |
        puts_var
        ;
 
/* ..... */
 
/* variable assignment */
var_assign:
        TOKVAR WORD EQUALS WORD '\n'
        {
          int x = firstEmpty();
          /* debug mode only
            printf("assigned %s to %s\n", $4, $3);
          */
          char *name = strdup($2);
          char *value = strdup($4);
          putPrompt();
        }
        ;

our Makefile

all:
        lex webit.l
        yacc -d webit.y
        cc lex.yy.c y.tab.c -std=c99 -o blah
 
clean:
        rm -rf y.tab.h blah lex.yy.c y.tab.c

Really making your own programming language the way you want to program and with performance is easy. Maybe you shouldn’t rush to use it on production servers but shit — it sure beats the hell out of putting up with crappy software in an age where programming is considered to be just including other people’s crappy software.

The Quest for Greater Requests per Second

Posted by feydr | Posted in Uncategorized | Posted on 21-01-2010

View Comments

Was looking at an application the other day (that will rename nameless but any enterprising lad should be able to figure out what it is). It looked like we were clocking in at a whopping One point something requests/second. WHAT THE FUCK!?

I immediately focused my attention with mytop trying to find slow queries. Found quite a few and fixed them. I then started to cache out some of the problem areas and that is when I started to notice something very very unusual.

This particular application uses the merb framework which is sadly fast becoming ghostware with no promise from the powers that be to continue to integrate it into the upcoming rails 3 which was due out a year ago. Apologies, that I don’t have time myself to dedicate to this framework — so I can’t be too harsh, after all it is free software.

You see, we use datamapper as an ORM layer to our database and datamapper collections don’t fit well into memcached since they have procs that can’t be marshalled correctly — if I am wrong about this PLEASE LET ME KNOW.

Caching

So the typical way of caching your shit is to just call your finder from a helper in your view and cache the view. Simple enough I suppose — sounds like it should work. I then proceeeded to fragment cache everything I could.

memcached memory cache

When I was looking at items that would expire my cache (like a new blogpost upon a page that lists blog entries) I noticed that caching the entire page versus caching just a fragment or a partial of a page was noticeably faster. Now, we aren’t talking like “oh wow, that’s faster”… we are talking “HOLY FUCKING SHIT — that’s fucking faster than shit!”

Templating Performance

This drew me to the realization that makes the article’s purpose. Contrary to popular opinion — templating is a HUGE performance issue.

By caching our sql we gained 5-10X speedups (1 r/s to 5-10 r/s). By caching the entire damn page we went to 20 requests/second — a 20X speedup! That’s double what our best sql/fragment caching could do.

I needed to confirm this because everyone has told me that sql is by far a worse performance killer than templating engines. Well, my benchmarks say otherwise.

Let’s look at some of those benchmarks real quick. These were extremely simple to do and I invite you to replicate them — it’ll take 2 minutes of your time to see for yourself.

I benched all the requests using Apache Benchmark and they look like this:


Alchesay
Actually that is Alchesay, Apache was named cause back in the day when I had nothing to do but beat up on *nix boxen apache had more holes than the town whore.

feydr@mhu:~$ ab -c1 -n100 http://127.0.0.1:4567/hi

Everytime I used a haml template or put the equivalent HTML into the controller as a string it would look like this:

%h1
  Hello World
%ul
  %li
    Blah 1
  %li
    Blah 2
  %li
    Blah 3
  %li
    Blah 4

Note: I’m not trying to pick on haml — I really enjoy it — I just don’t want to be delluded that it’s all unicorns sprinting across rainbows shitting shooting stars when it comes to performance — because it is not — furthermore that kind of attitude in the community really aggravates me.


unicorn shitting under a rainbow

As a side note — anyone who uses the words like ‘tasty’ to describe code needs to go back to their frontpage bullshit.

Framework Conditions Requests/Second
————————————————————————————-
sinatra with ‘hello world!’ 838.19
sinatra with haml 610.57
merb with ‘hello world!’ 341.63
merb with haml 271.13
sinatra w/template in controller 1321.42
sinatra w/template and haml 320.77
merb w/template in controller 407.70
merb w/template and haml 227.39

So what’s the conlusion to this testing? Tell markup languages to take a hike and only use css/html? Nope, we still use haml as I hate looking at html now because of it.
What I ended up doing was caching everything I could up front and then combining multiple renders myself.

Solution

I basically decided to stub out portions of my views that I knew were not easily cacheable like the user layout found on hulu:


hulu login area

Some Code:

You can see I stub out “{begin-login}” with whatever I’m going to replace it with later. This really is not necessary
and I could prob. save a few cycles by not doing this but whatever.

 
    # user specific non-cacheable
    # needs 2 renders
    if session.authenticated? then
 
      @cache = MMCACHE.clone
      begin
        @welcomehome = @cache.get("/welcome/homein")
      rescue
        @posts = Blogpost.all(:order => [:created_at.desc], :limit => 4)
 
        # render our base layout without user-specific stuff
        @welcomehome = render :layout => 'cachein'
        @cache.set("/welcome/homein", @welcomehome, 0)
      end
      @cache.quit
 
      # render user-specific stuff if logged in and tack it on
      @loginshit = render :template => 'layout/_logout', :layout => false
      @welcomehome = @welcomehome.gsub("\"{begin-login}\"", @loginshit)
 
    # non-logged in user -- should only take 1 render
    # we have a special section for a logged in user that 
    # shows their inbox and other things
    else
 
      @cache = MMCACHE.clone
      begin
        @welcomehome = @cache.get("/welcome/homeout")
      rescue
        @posts = Blogpost.all(:order => [:created_at.desc], :limit => 4)
 
        @welcomehome = render :layout => 'cacheout'
        @cache.set("/welcome/homeout", @welcomehome, 0)
      end
      @cache.quit
 
    end
 
    #output final render
    @welcomehome

This could probably be cleaned up a bit more and be stuffed into a helper but you get the drift.

I did not include any fetch_partial (merb-cache helper methods) benchmarks here but to suffice to say I gained 20 requests/second by NOT using them — YMMV.

Today’s rant/diatribe brought to you by:

  • Roxy Music – Take a Chance With Me
  • Kid Cudi – Pursuit of Happiness
  • Blue Scholars – 50K Deep

So sorry, no crazy new utility to solve templating performance problems other than this hack but this does lay the foundation for more serious research in this category.

Let me know how you solve your templating performance problems in the comments below.

Tomcat and Merb with JRuby

Posted by feydr | Posted in Uncategorized | Posted on 18-12-2009

View Comments

In case you didn’t know we parse poker hand histories on our website, Bluff.com. This parsing takes place in java. In the past we used to call out to our parser from ruby using something like this:

  ENV['LC_CTYPE'] = 'en_US.UTF-8'
  IO.popen("java -cp \"#{classpath}\" com.bluffware.BluffParse #{xtra}"
                +handfile) do |f|
    @xml = f.read
  end

Of course once we started getting some users this quickly got retarded fast.
So I went ahead and wrote my first servlet in tomcat to grab the hand history. Only problem is that I had no clue how the hell I was supposed to know who was accessing our webservice since tomcat lived on a different server from merb and the cookies had a different domain. This turned out not to be a huge problem.

cookie monster


However, what was a huge problem was our cookie lived in a base64 encoded ruby object. How the hell was I supposed to get to it?

JRUBY to the Rescue!

JRuby has a sweet java<-->ruby bridge called RedBridge. This allows us to execute ruby code in java which allows us to get to our beloved cookies.
I wrote something like this to get our sessions going:

    // return merb session id from cookies
    public Integer getSession(Cookie[] cookies) {
        String userid = "0";
 
        if(cookies != null) {
          for(int i=0; i<cookies.length; i++) {
            Cookie cookie = cookies[i];
 
            // base64 decode, then un-marshall ruby style...
            // finally figure out what to do with our session secret key
            if(cookie.getName().equals("_session_id")) {
              if(!cookie.getValue().equals("")) {
                ScriptingContainer container = 
               new ScriptingContainer(LocalVariableBehavior.PERSISTENT);
                //container.setWriter(out);
                container.runScriptlet("require 'base64'; " +
                              "blah = Marshal.load(Base64.decode64(\""
                              + cookie.getValue() + "\"))[\"user\"];");
                userid = container.get("blah").toString();
              }
            }
 
          }
        }
 
        return Integer.parseInt(userid);
    }

All was good in the land of poker analysis until one of my users messaged me one day saying some of my hands had ended up in his account. WTF!??! Then after accepting that he was not lying I indeed experienced seeing one of my own hands pop up as someone else’s.

I started putting trace statements all across the code trying to figure out what was going on where.

System.out.println("is it here?");
//.....
System.out.println("it must be here...");
// etc...

then watching it with one of my more favorite tools, tail.

tail -f /usr/local/tomcat/logs/catalina.out

after scanning it for a bit I could not find anything out of the ordinary; then I deployed
some new servlet code without restarting tomcat

cp -R myservlet/ /usr/local/tomcat/webapps/.

and lo and behold, after posting a hand my cookie had changed! the key to figuring
this and most problems like this out is to trigger the error
and I had now found out
by deploying the app without restarting I could do so.

it turned out that my cookie was changing somewhere INSIDE the ruby code.
I popped onto the jruby website and noticed that my servlet was not
thread safe.
I checked in with the friendly folks at #jruby on freenode and they confirmed that this was more than likely the problem since it was a local var in a multi-threaded app.

So I changed my offending line to this:

ScriptingContainer container = 
new ScriptingContainer(LocalContextScope.THREADSAFE, LocalVariableBehavior.PERSISTENT);

I started triggering the same event like mad and it works now! Yeh, JRuby! Yeh, for docs!

Solr Search — Enterprise Software that Does Not Suck

Posted by feydr | Posted in Uncategorized | Posted on 12-11-2009

View Comments

The past couple of weeks I’ve been asked to go find some article that talks about so-and-so or to go look at a forum topic that mentions XYZ.
This is not the biggest pain in the ass as I can pop into sql and fairly quickly find what I’m looking for but DAMN I would like to just type something into a search box and hit enter instead.

  • Search for a title on one table — pretty easy — BAM — it’s done.
  • Search within the title and the body — ok… we can do that fairly easily.
  • Search for title/body on two tables — doable but now we are getting kinda convulted on our controller logic.
  • Fuck everything — search for any fucking text anywhere on the website — oh, and we want this query which is most assuredly going to be different accessible to all of our users — this is where the shit starts to hit the fan and you start spending more time extending and refactoring your search code rather than closing out this ridiculous feature and getting on with the more important shit.

I did NOT go down this route as I’ve already tried to make something that does this and I know it’s a bitch and a half. So I looked into the ‘Enterprise’ software available. I must admit — every time I hear the word enterprise I think of a army of monkeys with shit-stained fingers tapping away on a 486 making java classes that are composed of ten thousand 2 line functions and the only REAL function is that it counts from 1 to 10.

code monkey

Enter Solr

Solr is a super fast full-text ‘enterprise search solution’. What the fuck is ‘enterprise search’ you may ask? Simply put, it is a solution to search on whatever you might want to index on your site without having to do shit tons of crazy custom code knowing full well that the engineers that came before you made it the best fucking piece of shit around. Chances are if you are a website that does any sort of traffic at all you either have or want enterprise search. Now, lucene is the core engine that solr uses but if you want to talk to lucene directly you might as well take the time to write your own goddamn search app.

Allright, let’s stop fucking around and …

let’s get it installed

wget http://www.bizdirusa.com/mirrors/apache/tomcat/tomcat-6/v6.0.20/bin/apache-tomcat-6.0.20.tar.gz
wget http://people.apache.org/builds/lucene/solr/nightly/solr-2009-11-10.tgz
 
tar xzf apache-tomcat*
tar xzf solr*
 
sudo mv apache-tomcat-6.0.20/ /usr/local/tomcat6
sudo cp apache-solr-1.5-dev/dist/apache-solr-1.5-dev.war /usr/local/tomcat6/webapps/solr.war
sudo cp -r apache-solr-1.5-dev/example/solr/ /usr/local/tomcat6/solr/
 
sudo mkdir /usr/local/tomcat6/conf/Catalina/
sudo mkdir /usr/local/tomcat6/conf/Catalina/localhost/
 
sudo gem sources -a http://gemcutter.org
sudo gem install rsolr
 
sudo update-rc.d tomcat6 start 91 2 3 4 5 . stop 20 0 1 6 .

that’ll get us installed but let’s go ahead and throw up a init script for tomcat as manually restarting it is just dumb

put this in your /etc/init.d/tomcat6 and smoke it

# Tomcat auto-start
#
# description: Auto-starts tomcat
# processname: tomcat
# pidfile: /var/run/tomcat.pid
 
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/usr/local/tomcat6/solr"
 
case $1 in
start)
   sh /usr/local/tomcat6/bin/startup.sh
   ;;
stop)
   sh /usr/local/tomcat6/bin/shutdown.sh
   ;;
restart)
   sh /usr/local/tomcat6/bin/shutdown.sh
   sh /usr/local/tomcat6/bin/startup.sh
   ;;
esac
exit 0

There are two ruby libraries that look more or less the same to me:
rsolr and solr-ruby

erikhatcher who wrote solr-ruby told me to use the competitions stuff —

solr-ruby is my baby, but rsolr is inspired by it and took away some great lessons….rsolr has some ideas and refactorings i’d love to get into solr-ruby…..but i’d say rsolr is probably the most agile way to go right now

I am using rsolr but for no real reason.

I first decided to index the blogposts on my site to get a feel for how everything works. I put this in a rake task just to make it really easy to develop the functionality as I learned. As you can see I’m using merb but it would work fine for rails as well.

desc 'add solr indexes for blogposts'
task :populate_index => :merb_env do
  require 'rsolr'
  require 'lib/Colorify.rb'
  include Colorify
 
  solr = RSolr.connect :url => 'http://127.0.0.1:8080/solr'
 
  puts colorGreen("clearing index")
 
  # clear our index
  solr.delete_by_query '*:*'
 
  puts colorGreen("adding blogposts")
 
  if Merb.environment.eql? 'development' then
    host = "127.0.0.1"
    dbname = "my_dev"
  elsif Merb.environment.eql? 'staging' then
    dbname = "my_staging"
  else
    host = "my_production_host"
    dbname = "my_production"
  end
 
  DataMapper.logger.level = :error
  DataMapper::setup(:default, "mysql:/myuser:mypassword@#{host}/#{dbname}")
 
  Blogpost.all.each do |bp|
    begin
      solr.add :id => bp.id, :type => 'blogpost', :body => CGI.escapeHTML(bp.content), :title => bp.title,
                :anchor => bp.anchor, :description => bp.description, :slug => bp.slug
    rescue
      puts colorRed($!)
    end
  end
 
  puts colorGreen("adding forum posts")
 
  Post.all.each do |post|
    begin
      if(!post.parent.nil?) then
        solr.add :id => post.id, :type => 'forumpost', :body => CGI.escapeHTML(post.body), :title => post.parent.title,
                  :slug => post.parent.slug
      end
    rescue
      puts colorRed($!)
    end
  end
 
 
  solr.commit
end

my corresponding controller/action pair to look this up:

class Search < Application
  before :ensure_authenticated
  before :admin_login
 
  def index
    debugger
    require 'rsolr'
    solr = RSolr.connect :url => 'http://127.0.0.1:8080/solr'
 
    response = solr.select :q => "body: #{params[:query]} title:#{params[:query]}"
 
    @nresults = response["response"]["numFound"]
    @docs = response["response"]["docs"]
    render
  end
 
end

k… let’s tell ruby to fuck off…
as with every fucktard java project out there XML is the preferred method of setting up your configuration files.
to properly import all your data whenever you want (like with a cronjob) you’ll want to make a data-config.xml
that belongs in your /usr/local/tomcat6/solr/conf/ directory.

mine looks a little bit like this:
my data-config.xml:

<dataConfig>
  <dataSource type="JdbcDataSource" 
              driver="com.mysql.jdbc.Driver"
              url="jdbc:mysql://localhost/mycoolassfuckingdb_dev" 
              user="luser"
              password="assword"/>
  <document>
 
    <!-- blogposts -->
    <entity name="id" 
            query="select id, content, title, anchor, description, slug from blogposts"
            transformer="TemplateTransformer">
      <field column="content" name="body" />
      <field column="type" template="blogpost" />
    </entity>
 
    <!-- forum topics -->
    <entity name="id" 
            query="select a.id as id, a.body as body, b.slug as slug, b.title as title from posts as a, topics as b where b.id = a.parent_id" 
            transformer="TemplateTransformer">
          <field column="type" template="forumpost" />
    </entity>
 
  </document>
</dataConfig>

you’ll note that you there is a transformer that allows you to do all sorts of crazy shit on your data as you are importing it — like finding data that ‘sounds’ like what the user is trying to spell. You also note that in this example I use the TemplateTransformer to rename my column ‘content’ as ‘body’, even though I could have selected it in sql and naming it there.

Now, you’ll need to edit your schema.xml located in the same directory to add fields that match what you want to import as I’m sure the majority of you people don’t have SKUS on your website — but if you do — congratulations — you are already set!

How about importing this data?
Easy hit up this url: http://127.0.0.1:8080/solr/dataimport?command=full-import

having trouble with the logs? like where the fuck they are located???
try this:

tail -f /usr/local/tomcat6/logs/catalina.2009-11-12.log

Schema Shit

The key here is to make everything about as homogenous as possible — I know… wtf??!? No seriously, it’s cool — cause every field typically will have an index and the idiomatic way is to simply put a type bool on documents that share field names.

For more informatino on this please visit the pros:

Schema Design
Using Multiple Indexes

I’m kinda interested how sql-like injection works with solr and time permitting, I’ll have a new article on it in the future — needless to say a cursory scan of several popular hosting platforms revealed VERY OPEN solr installations.

Anyways, I’m drunk so this post will probably be revised in the future but I wanted to get it out there.

Go solr!

ᏰᎤᎷ  Ⴕჩვ  ⲂⲀⲊⲊ

Posted by feydr | Posted in Uncategorized | Posted on 28-08-2009

View Comments

So this Monday I found out that one of our datasets was not being parsed anymore — it was the most important one…FUCK!

After much cursing and much bullshitting I admitted that I was being sent UTF-16LE encoded text and my program was set to receive UTF8.

My first “fix” was to look for the BOM that usually comes with such encodings and convert to UTF8 based upon that. I’ve discussed my love of Unicode and the BOM before. The short story is that it is put at the top of text files to indicate to other programs what the text should look like since there’s only about 3,00-8,000 different languages in the world. The BOM is NOT always in use but in UTF-16 files it is. For example in UTF-8 files, which are strongly encouraged, BOMS can be present but are not mandatory. In UTF-16LE (think windows) the two bytes signature usually looks like this:

me@mhu: od -h testfile | more
## producing this output
0000000 feff 0046 0075 006c 006c 0020 0054 0069
## ...

Note, that the 0000000 is just the offset of the bytes — not the actual bytes — in our case we see 2 of them FE and FF.

This, however, was not going to work as the data that I received did not always include the BOM. Why? The data was being sent from multiple locations that were issuing HTTP requests to my server that collected and parsed it via an antlr based grammar.
antlr
On the machines that were issuing the HTTP requests there was a file watcher that sent chunks of data to the HTTP server — as the file grew more chunks were sent but only the first chunk ever had the BOM in it.

Knowing I could not immediately send the encoding along with the data or re-encode on the client machine I opted to do this on the server side. My friend over at SoftwareBloat.com suggested I just look for the null bytes. The result ended up looking at the first 10 bytes of a dataset — if it included the null byte at least 2 times this was a fairly good guess that we were dealing with UTF-16LE encoding. I did not bother checking for UTF-32 LE/BE or BE in general as I’d very much like to meet the person who is running a MIPS or RISC processor and using our service — although with the rise of netbooks utilizing ARM processors and such there may come a day when we have to support this.

My detection and conversion code looks like this:

      // guess encoding if utf-16 then
      // convert to UTF-8 first
      try {
        FileInputStream fis = new FileInputStream(args[args.length-1]);
        byte[] contents = new byte[fis.available()];
        fis.read(contents, 0, contents.length);
        byte[] real = null;
 
        int found = 0;
 
        // if found a BOM then skip out of here... we just need to convert it
        if ( (contents[0] == (byte)0xFF) && (contents[1] == (byte)0xFE) ) {
          found = 3;
          real = contents;
 
        // no BOM detected but still could be UTF-16
        } else {
 
          for(int cnt=0; cnt<10; cnt++) {
            if(contents[cnt] == (byte)0x00) { found++; };
 
            real = new byte[contents.length+2];
            real[0] = (byte)0xFF;
            real[1] = (byte)0xFE;
 
            // tack on BOM and copy over new array
            for(int ib=2; ib < real.length; ib++) {
              real[ib] = contents[ib-2];
            }
          }
 
        }
 
        if(found >= 2) {
          String asString = new String(real, "UTF-16");
          byte[] newBytes = asString.getBytes("UTF8");
          FileOutputStream fos = new FileOutputStream(args[args.length-1]);
          fos.write(newBytes);
          fos.close();
        }
 
        fis.close();
        } catch(Exception e) {
          e.printStackTrace();
      }

END NOTES:
* I found out when writing this article that my wordpress installation could not handle my characters well — the fix is to edit your wp-config.php and comment out the two lines describing db_charset and db_collate. It should look like this:

//define('DB_CHARSET', 'utf8');
//define('DB_COLLATE', '');

* My favorite website for finding unicode characters with associated information is: FileFormat.info. Just replace the ’2c8a’ hex with whatever code you want to know about. Also, if you are looking at a code in vim the command “:ascii” will give you the hex and octal of it.

* The official soundtrack for this particular problem comes from our friend, dumbfounded all the way out in Los Angeles, CA.

HBase to the rescue

Posted by feydr | Posted in Uncategorized | Posted on 10-08-2009

View Comments

Beginnings (circa 1 year ago)
I had finally grabbed a schema dump of how their ‘wonderful’ database worked. I was so elated that we would not have to re-invent the wheel. This dump was actually a fairly nice looking pdf describing in detail about their 60 column wide 60+ table design — to say the least my mouth dropped and I shit a brick cause this was for ONE USER to be used on a desktop — we needed this to scale to hundreds of thousands if not millions of users. They were using postgresql and could do 20 hands/second. We knew we had some problems ahead of us.

The first task that we noticed right away was our parsing was not up to speed — after about a year of dicking around in different languages (c++, java, ruby) we settled on java with antlr and now are pushing over 800 hands/second with no summarizing, 300 hands/second with summarizing and 80 hands/second stuffing rows in a mysql. This clued us into the fact that if we wanted to go faster without having to ‘scale up’ we’d need other alternatives.

Of course what use is the speed of our database if we can’t use it after 100 players start hitting our site everyday? One user alone could generate over 200,000 rows using the competition’s schema within one day — sometimes within a hour! This was the real main concern as we slowly realized that one database to rule them all would never cut it. It’s true we talked about sharding and the like for oh say a couple of minutes. We spent more time on the phone with TerraCotta and Vertica then we’d like to admit to.

Ever since early spring I have been on a key-value storage kick, yet those are only front line defenses — you need something in the background that can kick some ass — that little piece of enterprise software is HBase.

HBase?! What the fuck is that?
HBase is in short — awesome! Awesome like alligators artfully eating elephants awesome. HBase is the ‘database’ layer of a typical Hadoop stack. It sits on top of HDFS which is a distributed filesystem. It’s main competition is Hypertable written in C++ (which claims to be faster, yet my benches speak otherwise) and BigTable.

So what is it really? Well, to quote Michael Stack, it structures data as tables of ‘column-oriented rows’ which can scale to billions of rows and millions of columns with thousands of versions — can your mysql do that? There are no joins and no transactions. For you paranoid ACID heads out there the lack of transactions is not worthy of an argument — row updates are atomic.

There is some bullshit tossed around regarding HBase — the whole concept of living on ‘commodity hardware’ is a bit of a joke considering they suggest 4-8 cores with 6-8 gigs of ram to get started. To be fair though if you are involved in large projects a production server housing oracle or even mysql can easily start out with 32 gigs of ram — so it’s not that much of a joke.

Right now I have not done extensive benchmarking with ab yet but it appears that it will serve our needs directly from our ruby thrift code on a production site — when the time comes we plan on implementing front-side key-value caching.

Installing and using it is not a pain but it is not straightforward either.

Let’s get started
First we need to install hbase proper:

git clone http://git.apache.org/hbase.git/
# of course change this to wherever your jvm sits
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.14/jre/
cd hbase; ant

Thrift is a popular framework for cross-language services. This allows us to access hbase from languages like ruby, haskell and ocaml. It usually requires some libs that are not available on a virgin install — the most notable of which is libboost libs.

sudo apt-get install autoconf libtool libboost-dev g++ \
sun-java6-jdk ant flex bison pkg-config libevent-dev \
ruby-dev zlib1g-dev
  sudo ln -s /usr/include/boost/ /usr/local/include/boost-1_34_1

grab thrift:

wget -O thrift.tgz "http://gitweb.thrift-rpc.org/?p=thrift.git;a=snapshot;h=HEAD;sf=tgz"
tar -xzf thrift.tgz
cd thrift; ./bootstrap.sh; ./configure; make; sudo make install

since ruby is my language of choice for fast development let’s install the thrift gem:
(I really have no clue why you need mongrel for this — haven’t dug into deep.)

gem install mongrel echoe --no-ri --no-rdoc
cd ~/thrift/lib/rb
rake gem
sudo gem install pkg/thrift-0.1.0.gem --no-ri --no-rdoc

Let’s check out our shell:
HBase Shell

$bin/hbase shell
 
# create a table
>create 'treasureChest', 'col1', 'col2'
 
# stuff 2 columns into a row into the table
>put 'treasureChest', 'myveryfirstrow', 'col1:notmine', 'the queens underwear'
>put 'treasureChest', 'myveryfirstrow', 'col1:mine', 'elf spice'
 
# scan for anything on this row
>scan 'treasureChest'
 
# explicitly request for the notmine stuff
>get 'treasureChest, 'myveryfirstrow', {COLUMNS => 'col1:notmine'}
 
# disable/drop the table
>disable 'treasureChest'
>drop 'treasureChest'

My Patches:
As of 0.20 HBase was not able to scan across start/stop timestamps anymore using the thrift interface as the thrift interface doesn’t even compare to what the native java stuff can do — this as it turns out is incredibly useful to have. So I spent ~2 hours going through what was most assuredly
Proper Java Development using Eclipse for Enterprise Applications

Proper Java using Eclipse for Enterprise Software Development(tm)

I could straight shoot someone if I saw one more getter/setter pair. If you think I’m full of shit please read this. I mean reading through these 1000 line classes are really bad for your eyes and it is BAD PROGRAMMING.

The end result was a patched thrift server that supports scanning on start/stop timestamps. I provide the ruby thrift code for convenience — you still have to re-compile with ant to get your thrift server working.

git clone git://github.com/feydr/hrb.git
 
# to apply the patch cp the patches to: $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift
 
patch <ThriftServer.java.patch
patch <Hbase.thrift.patch
 
# then apply this last one in: $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift/generated
patch <Hbase.java.patch
 
# then just do a:
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.14/jre/
ant clean
ant
 
cd hrb/hrb-ng; gem build hrb-ng.gemspec
sudo gem install hrb-ng --no-ri --no-rdoc

Run it:

require 'rubygems'
require 'hrb-ng'
 
transport = Thrift::BufferedTransport.new(Thrift::Socket.new('127.0.0.1', 9090))
protocol = Thrift::BinaryProtocol.new(transport)
client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
transport.open()
 
scanner = client.scannerOpenWithStopStartTs("mytable", "myrow.0", "myrow.1", ["mycol:"], 1249677001723, 1249677009731)
blah = client.scannerGet(scanner)

Note: This patch is probably only necessary until the thrift interface is re-written. This code exists in the native java client but NOT in thrift as of yet.

If you look at the source of scannerOpenwithStopStartTs you’ll see all I did was copy it and modify the start/stop to be set to the passed args — everything else is thankfully already in place.

Generating new ruby Thrift Code:

mkdir ~/hrb
thrift --gen rb -o ~/hrb /opt/hbase2/srv/java/org/apache/hadoop/hbase/thrift/Hbase.thrift

And remember kids — “All your HBase are belong to us!”

Redis: Another Hash db FTW

Posted by feydr | Posted in Uncategorized | Posted on 24-05-2009

View Comments

So I have to admit — I kinda have a hardon for key-value stores, also known as hash dbs. Why? Cause they are hashes which is the data strucuture I’ll use right before deciding to make something a class and they are incredibly fast.

Relational databases are about fucking retarded when it comes to storing data. The whole idea here is to persist data structures in a manner that is efficient right? If you had the option of storing already ordered data in an un-ordered fashion or storing it in the order it is already in what choice would you choose? The more acute observers amongst you would probably say O(1) versus O(n) is the right choice and you’d be correct.

Sometimes mysql just does not cut it when it comes to storing data. Actually, let’s not kid ourselves, mysql is awfully slow. If it weren’t for the need of persistence mysql probably wouldn’t be one of the inherent staples of web applications. I don’t know about you but I can usually max out my mysql with around 10-11k tx/second IF I have upped the ram limit.

So how fast is redis? Well, for list operations I was doing 48k tx/second — very noice!
Want to install it? Lets!
(notice: the rake task auto dls and installs the redis c for you — why?? cause ruby hackers are hot like that)

git clone git://github.com/ezmobius/redis-rb.git
rake redis:install
# start it up
redis-server

now let’s check out a simple SET/GET

cyn0n@spicetrader:~/work/$ telnet 127.0.0.1 6379
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
SET foo 3
bar
+OK
GET foo
$3
bar
^]
telnet> quit

This is your pretty stereotypical hash storage. We tell the server that we are giving it a variable of 3 bytes. We terminate that line with ‘\r\n’ then we give it the data which is ‘bar’ and terminate it again with ‘\r\n’.
The server gives us a response saying we are good to go.
Now we tell the server we want our variable back. The server responds with a ‘$’ stating that data is to follow up to the next newline/carriage return. Then it proceeds to give us our requested data. Very straightforward.

Benchmarking
Ahh, this is what everyone comes for huh? So my benching wasn’t as awesome as tokyo cabinet was (however, keep in mind that redis comes with a built-in networking layer so it’s a lot faster than tokyo tyrant is)

Here they are using the included benchmarking boilerplate from ezra:

                            user     system      total        real
set                    33.300000   2.980000  36.280000 ( 37.641193)
set (pipelined)        14.390000   1.350000  15.740000 ( 26.550251)
push+trim              36.520000   2.520000  39.040000 ( 40.420952)
push+trim (pipelined)   0.410000   0.040000   0.450000 ( 10.630631)

which as you can see result in the following:

# sets
set                           606 tx/sec
set (pipelined)               1428 tx/sec
# lists
push+trim                     555 tx/sec
push+trim (pipelined)         48780 tx/sec

obviously sets being unordered do not have O(1) as our lists do, however it makes them useful for things such as storing state

Features of Redis

  • keys can store diff. data type rather than just strings
  • master-slave replication (can move databases with a simple cp)
  • persists data asynchronously versus tokyo cabinet that does this synchronously and memcache that does not at all
  • sets are tables of hash keys
  • can perform numerous server-side operations on sets such as set-intersections, unions, subtractions, and additions
  • push head/tail O(1), range of lists
  • list trims to implement circular buffers (ala rrdtool)

Another useful thing is that most (all?) primative operations are atomic such as the PUSH/POP/INRC/DECRs. What does this mean? This means that there is NO NEED for locking algorithims satisfying the A in ACID compliance and one main reason for the speed that is witnessed. There is more than one way to skin a cat and redis having dropped the requirement of having to lock is in my opinion an excellent choice!

Unlike tokyo cabinet, redis comes with a built-in networking layer which is much much faster than tokyo tyrant (the tokyo cabinet networking layer). I assume that the creators of tokyo cabinet were under the impression that you’d just embed the c lib into whatever program you wanted as the tyrant seems to have come as an afterthought.

Since data is written synchronously concerns of data loss pop up but since there is a chance you might lose a couple of keys here or there, however with the master/slave replication this defends against this pretty well.

If you look at redis you’ll see that another performance boost comes from the fact that certain objs are put into a free list and then recycled later on when needed negating the need to allocate/de-allocate system resources which ties up cpu — very smart indeed.

bullshit
Memory usage is another point of contention when it comes to concerns about hash databases. While this is a valid concern it really amounts to a bullshit argument by someone who is taking facts out of context. Yes, it is true that you are unable to store more data at any given time in the store as you have ram. This is shocking to some people surprisingly, however it is not uncommon for a bottom-shelf production server to have 16gigs of ram to start out with and that’s only one server — NOT partitioning. If it’s a simple app that you wrote for yourself on a development box I am going to guess you don’t need more than the 2 or 3 gig anyways — if you do I’d suggest looking into getting real hardware instead of your laptop before you start bitching about performance.

As an after-note to this if you are not going to heed this advice do yourself a favor and perform this:

echo 1 > /proc/sys/vm/overcommit_memory

What this does is tell linux to allow you to use more memory then you actually have. Whenever redis saves, it forks the process over using twice the amount of memory. Although, apparently this never really happens and the memory is shared. Anyways, if you find yourself doing this quite a lot this is a warning to upgrade your hardware — as if the sluggish connections wouldn’t have been a clue to begin with.

More Reading
If you wish to do more research on redis ezra gave a lightning talk at the MountainWest Ruby Conf
here and antirez has done a pretty damn good job of documenting his code. His comments are also very good considering if he had wanted to do he could’ve named all of his vars and written his comments in Italian. (on that note I know the Italian hacker community is known for it’s hackerspaces and badass hacker conferences)

Drop the relational and enjoy your scalable, speed-demon database!

Drop Boms Not Bombs

Posted by feydr | Posted in Uncategorized | Posted on 12-05-2009

View Comments

So I come in Monday morning one fine week after a long drunken hiatus to find a ‘badhand’ in my inbox. A badhand is a feature I implemented on one of the sites we are developing so that users can inform me that my parser did not correctly produce the expected xml. This saves me time and the user frustration.

Dragon Book Except, this badhand was a lot like some others I had received — it had a special 2-byte character that looked like ‘‘ in vim. I knew right away that something was wrong and somewhere somehow my unlucky user had formatted this text into a text-editor such as wordpad even though he defiantly shouted back that he had not — he uses a Mac after all.

So I started my quest and found out that eventually this special character was called a ‘BOM’ or a ‘byte order mark’ indicating to a text editor what encoding to use for unicode. Well, after further searching I found the ‘best answer’ on stackoverflow (as usual):

There’s no such thing as a UTF-8 BOM. A UTF-8 file is in a predefined byte order already, adding a BOM prefix is completely pointless. Applications that produce a UTF-8-encoded file with a U+FEFF character at the beginning are wrong.

Hrm… I thought for a second here cause I already take care of special utf8 characters in my parser — among them are variations of the em-dash, euro, and yen. Also, apparently the BOM is only used as the first 2-3 bytes at the top of a text file. I had text in my sample files that were on almost every other line.

Shit! I had been screwed by the powers that be! My lean mean parser was about to pick up an ugly method that would slow it down considerably.
It was time to ride the BOM.


300px-slim-pickens_riding-the-bomb_enh-lores

At first I thought — ok all I need to do was scan for the BOM (which would be simple — 0xfe 0xff) and remove. Then I remembered — most of my files that come through my webserver are being passed through ruby’s File.open method — NOT an unicode safe library call! Shit! For a language that was made in Japan you’d think ruby has decent unicode support — say what you will — ruby treats ALL strings as 1byte char star arrays — multi-byte unicode is not allowed. This threw me into a pickle. IRC was of no help — usually I’ve learnt that when you don’t receive an answer on IRC it’s because a) you are asking dumb questions or b) no one knows the answer to.

I decided I must pull out IRB and just start hacking until something comes.

This is what I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  # translate unicode to latin
  # and drop the BOM
  def unicodeTolatin(byteray)
 
    #flip to unicode
    unistring = byteray.each_byte.map do |p|
      [p].pack('U')
    end
 
    # drop any null bytes
    blah = []
    unistring.each do |o|
      if !o.eql? "\000" then
        blah << o
      end
    end
 
    #convert array byte array over to string
    newstr = blah.to_s
 
    # drop our bom
    newstr = newstr.gsub(/\303\277\303\276/, '')
  end

If any of you ever have to struggle with the fucking bom — remember DROP BOMS NOT BOMBS!

Reia Loses its Indentation Sensitivity

Posted by feydr | Posted in Uncategorized | Posted on 10-03-2009

View Comments

Yesterday on the reia mailing list Tony asked all of our opinions on indentation sensitivity. This is what we had to say:

This is what he had to say.

So basically python can stick it! Don’t worry, it’s not just because of the ponies

or because it’s a speed demon compared to ruby and we are jealous — it’s just a matter of principle (and developer time). Haml is next in line!

Anyways, what the fuck is Reia anyways? Taken from the wiki, “Reia (pronounced RAY-uh) is a Python/Ruby-like scripting language for the Erlang virtual machine (BEAM).”

What the fuck is erlang?

So why is it cool?? Cause, erlang is considered to be the ONLY language that not only meets concurrent concerns but EXCELS with flying colors on it — too bad the syntax can suck a nut or two…and coupled with the fact that since it’s a functional language with single assignment it doesn’t really fit into the object oriented paradigm that we’ve (20 something and under) have grown up with. Reia adds to the coolness by taking the 9 9′s of uptime, hot-code swapping, 80-core power producing, rampaging code crocodiling that is erlang and allowing us lowly programmers that are not voluntary-bald to kick ass with our time-saving dynamic language, duck typing, BDD, money making Machiavellism.

How do I get down with this madness?

Simple padawan. Let’s start by

installing erlang.

1
sudo apt-get install erlang

make sure we have 5.6.3 or greater..

1
erl

type q(). to quit

clone reia:

1
2
git clone git://github.com/tarcieri/reia.git
cd reia; rake install; sudo rake install

I spend most of my time in irb so let’s
run ire:

1
2
3
me@spicetrader:~$ ire
Reia Interactive Shell (prerelease)
Running on Erlang (BEAM) emulator version 5.6.3 [source] [smp:2] [async-threads:0] [kernel-poll:true]

simple.. enough..

let’s hello world this shit up:

1
2
3
>> puts("hello world")
hello world
=> nil

yes, those parens are necessary for now..

lambda?

1
2
3
4
5
6
7
8
>> pp = fun(str) { puts("#{str} does the limbo dance"); }
=> #
>> pp('ian')
ian does the limbo dance
=> nil
>> pp('kara')
kara does the limbo dance
=> nil

recursion?
for this example let’s use reia’s interpreter: reia..no kidding?!
open up your favorite text editor

vim blah.re

and paste this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
module Fib
  def calc(1)
    1
  end
 
  def calc(n)
    if (n == 0)
      0
    else
      calc(n - 1) + calc(n - 2)
    end
  end
 
end
 
puts(Fib.calc(3))
puts(Fib.calc(10))

now let's run it:

1
2
3
me@spicetrader:~$ reia blah.re 
2
55

this shows you a couple important things about erlang and reia.
1st: most ppl on the erlang list will shun if/else conditionals
saying there is no need for them when you have pattern
matching and recursion

last but not least we get back to the original point of this article:
reia lost it's indentation sensitivity!!

Thanks Tony!