Vincent and Vega Install on OS X (Mountain Lion)

This is a powerful combo for rapidly prototyping visualizations. I recently was introduced via a tweet; I read http://wrobstory.github.io/2013/10/mapping-data-python.html which while not having the glitziest maps did hook me with the promise of quickly being able to render some rather hefty visuals.

Let me briefly explain the purpose of these tools. Vincent allows building models in Python with objects. These can then be translated easily to Vega, a visualization grammar. Vega is JSON based so once you have the raw data you can build png, svg, or browser based implementations of the visualization! Now imagine being able to quickly tweak the python model and output a new visualization! Back this up with a application of some sort and you can automate building visualizations, or allow dynamic ones via a (web)app interface.

Installing these tools on your OS X machine (Mountain Lion in my case) is fairly complicated. You’ll need homebrew installed, and nodejs (I installed via hombrew), and finally python’s pip installer. I would actually recommend that you install virtualenv and create a separate environment for installing all these so that it doesn’t cruft up your system wide packages. If you install the npm modules in this folder then all the Javascript and Python packages will be very easy to remove later (remove folder and remove virtualenv).

Without further ado let’s install!

First is cairo, it’s a C based lib for making graphics used by a lot of projects. For example Graphite (a powerful monitoring app) uses Cairo for graph rendering. In our case Vega will be relying on it to build our awesome visualizations!

brew install cairo

This will install certain dependencies in /opt/X11/lib/pkgconfig/, you will need this findable via the env var below when installing vega, here is the link I used to figure this out https://github.com/mxcl/homebrew/issues/14123.

export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig/

npm install vega

We are about ready to install Vincent. As the site itself says you’ll want to try and install numpy and pandas before pip installing vincent, see https://github.com/wrobstory/vincent#installation

pip install numpy

pip install pandas

And now….

pip install vincent

Alright now you’d be ready to build some visualizations, say bar charts, line charts and the rest. In my case I wanted maps too, so here is the rest of the work I put in to be able to render maps using these tools.

First off you will need to install the map projection plugin for D3.

npm install d3 d3-geo-projection

Then I had to hack up the vg2png tool adding requires for the above. Right below the requires that are in the script add these two lines:

var d3 = require("d3");
require("d3-geo-projection")(d3);

And now when you run the tool it will be able to run the map projections required.

Footnotes:

Why I am interested – http://wrobstory.github.io/2013/10/mapping-data-python.html

Cairo – http://cairographics.org/examples/

D3 Map Projections Plugin – https://github.com/d3/d3-geo-projection/

Pandas – http://pandas.pydata.org/

Developing LogStash Grok Config

LogStash is a nice little app for being able to consolidate and search on logging files. Although it wraps up a lot of functionality it still requires some pretty in depth knowledge to properly use. This is most often the case when writing the Grok expressions which parse log files input , creating the parts which be used to index and later search on. This post is all about making writing those expressions easier.

First off, this is a great site which makes checking that your Grok expression properly parses your log input a piece of cake. When developing an expression I grab the input I would want to parse and place it in the top (input) field. Then I’ll plugin the value

%{GREEDYDATA:rest}

In the “Pattern” field. This will make sure that you grab everything. Now you are ready to start adding parts to the expression so that you can figure out the parsing you need. Let me run an example:

2013-08-26 20:34:46,369 8162832 DEBUG [http-bio-8080-exec-10] com.hhsos.hibernate.LoggingCriteria – Query time 3 ms

The above is a application log statement which I’ll be parsing. To start I add

%{TIMESTAMP_ISO8601:timestamp}

The the start of the “Pattern” field. I notice the output shows some new fields from the parsing of the timestamp. So now I move on and add:

%{INT:timesince}

Again noticing that a new field is added to the output from the online tool. It goes from there for the rest until everything is parsing just how I want.

Before I move on I’ll mention one gotcha. The square brackets in the log output above can cause some issues. To properly parse them I had to add a single backslash ‘\’ before each. When configuring in the rb file you’ll actually need to add double backslashes to make it work. Be wary of this there are a few characters where you’ll likely need to think about escapes.

So I said there are two tools. The second is using the built in LogStash testing to be able to try out weather parsing is working. You’ll want to look at the spec which is used as an example to figure this out. Basically the LogStash jar has built in support for running these types of tests which allow you to be 100% sure if your config works. Using this with the online Grok debugger should make it fairly easy to develop your expressions.

Amazon Linux + OpenJDK + Collectd Java Plugin

This took a bit of work to figure out. First off you need to install the jdk devel package, since that is where the header files will be. The header files are of course required to compile the java plugin for collected.

So use a command like:

sudo yum install java-1.6.0-openjdk-devel.x86_64

Which then installs the .h files. You should be able to find them in a folder like:

/usr/lib/jvm/java-1.6.0-openjdk.x86_64/include/

Note the above path varies on java version and such. Once you have that you can configure collectd to build with java as normal:

./configure –with-java=/usr/lib/jvm/java-1.6.0-openjdk.x86_64

Again I put my path in there but yours could be slightly different. After configure finishes check that the output shows a line like “java… yes”, which indicates java was configured. Now build and configure the plugin!

Intro to Metrics

I have been doing a lot of work lately to generate volumes of metrics. Why bother you might ask. I will tell you. If you do not measure what is going on with your resources (applications and machines in my case) you cannot put a number on things, which it turns out is quite useful. In putting numbers on things you’ll feel you’ve passed from a land of make believe and half-assed guesses into a world where you truly understand your resources and can make firmly backed assurances.

For example. It’s one thing to tell your boss “the server is really slow because like lot’s of web requests and stuff” (trust me business types love that sorta explanation), and another entirely to be able to say “we are experiencing 2x the normal request load and the server is doing it’s damnedest to use up all the available cpu and/or memory”. That brings me to important point, if you are not measuring anything how do you describe normal? Without metrics it’s probably in terms of “everything seems ok, I mean the pages load fast!” Even worse it could be “hey not a single crash today, the servers actually staid up!”

How do you really start to create a well informed solution to problems such as slowness and lack robustness (shit crashes)? Without metrics it’s likely “let’s throw a bunch of hardware at it and hope it’s enough”.

When you have the actual numbers you can make much better inferences, the conclusions of which will fuel sound strategies based on more than gut feelings. Later in this post I’ll point out how marriage of proper metrics and load testing can give you deep insight into your application.

OK, So Like What Are Metrics?

There are 3 that, in my experience, are pretty important when you are building web apps.

First up is your simple gauge metric. This little fella is simply a value at any given time. A good example in the web application world is count of database connections being pooled. Another very simple example is how much memory is available. A gauge provides a reading of some value at any given time. Over time it will likely go up and down, and sometimes stay pretty constant.

The next metric is the humble counter. A counter is pretty obvious right? You count things with it. A good example is how many times an endpoint has been hit. Note that a counter doesn’t have to increment by one each time, you could have a counter which counts the total number of bytes in and out of your server. In such a case you’d have pretty non-uniform increments to the counter each web request.

The last but not least of the metrics is the timer. A timer counts the number of milliseconds in a given period. Say your code has a section which makes a call to a third party web service. Wrap it in a timer and you can see when that service starts to get flaky on you. Timers also work great for wrapping algorithms or tricky sections of code. They’ll give you quite a bit of insight over time.

We Have Metrics Now What?

You will want a way to log each of these metrics at given intervals, and some way to visualize the data over time. Most likely you will be using standard line graphs to visualize this data, but it’s also possible you’ll use other forms of graphing and even custom visualizations. I’ll post later on tools which can be used to do the storage and graphing. These days there are quite a few packages that provide full and partial solutions to these problems out there including Ganglia, Graphite, and Cacti.

The other thing that you’ll probably want to do is create alerts based on certain metrics. You might be taking an average of requests per second and if it exceeds a custom threshold send an alert to one or more persons. Other useful alerts are disk space (way too common to run out of) and memory and cpu alerts on machines. There are some solutions to alerting, the one I’ve used most is Nagios. Depending on what solution you use for storing and retrieving metrics you might actually be able to just write your own little alerting app. In a language like Python or Ruby it could be rather trivial.

Once you have alerts you may also want to invest some work into integration with a tool like PagerDuty which makes it easy to alert the right person(s) at a given time, and escalate issues if nobody responds in a given time.

Help Dev Find Their Way

Wanna hear a really cool story about using a gauge to supercharge dev? I put gauges on each web app endpoint to report how many actual SQL statements are run during each web request. Using this information we where able to find really nasty goings on. We’ve found endpoints which caused the dreaded N+1 issue and fixed them. We found endpoints that just ran way too many SQL statements, and often found we could cache much of the work in Memcached.

We also used counters, counting the total number of statements overall and the total per endpoint. This allows calculating the % of total requests run by endpoint on average, taking in to account how many requests they get in total. This helped used understand focus on endpoints that get hit often and run lots of SQL vs ones that don’t get hit often and run lots of SQL.

Our app is quite bound by the DB end of things. We have some very big queries that can get generated. We wrapped up every endpoint so that it has a timer started when the request hits our app, reporting the value when the request terminates. With this we can see endpoints that take a long time, even though they may only be running a single query (or a few). Again another way to allow dev to dig into the performance side of things. Of course if endpoints where slow for reasons other than the DB we’d see that too (though it’s really not been the case).

The best part is when the devs get around to making improvements, either shaving down the number of queries run or the time they take to run, we can see the results quickly through the metrics. Honestly it makes everyone feel that much more awesome (well I mean as long as their fix works). It’s a good thing.

Load Testing

At this point you have a good set of metrics in place for your application, and when you run your app it generates metrics. You can gain even more insight by running load testing to generate, wait for it, loads of metrics (yuk yuk yuk). Metrics by themselves will give you insights into running applications, such which endpoints are receiving the most traffic and how fast they are. By pairing metrics with load testing you will gain insights into performance when things go past current production levels. Load testing is your crystal ball.

Wanna get a very good idea of how many requests per second your app can handle? Turn up the dial on your load tests slowly, ramp it up until the server just doesn’t want to respond (much/at all) anymore. Some may say “I already do this and I don’t have any stinking metrics. I know how many requests per second my app handles on given hardware yada yada yada”. So you know at what point it breaks, but can you tell my why it breaks? Tell me which trends indicate certain types of performance bottlenecks. Oh I see now we are back to “it’s slow because lots of web requests… and stuff”.

Even if all you wanted to do was throw money at it, would you know if it’s money well spent on memory, (faster) hard disks, more and/or faster cpus, more nodes etc. That is hard to tell just from a basic figure that says at X requests per second makes the app start flaking out.

Luckily you have metrics, right? Now go and look at the metrics generated by your load testing. You’ll probably see a ton of things that will blow your mind. Like what you may ask.

Well let’s start with timers. You’ll probably see the less optimized endpoints start taking a lot of time. You’ll be thinking “hmmmm these are weak points”. You’ll also maybe see your memory and/or cpu spike at a certain requests per second. Again you’ll be thinking something like “hmmmm something really eats up cpu and/or memory”. Counters like my SQL counter example above will indicate other issues, really the sky is the limit here you just have to start recording things!

What I’ve done here is explain how load testing will allow you to methodically find and deal with performance issues. You may be finding it interesting how interconnected load testing and gathering metrics are. It’s really should not be a surprise. Basically the two are part of one whole, each useful by their own but when put together you enable very broad insights neither can provide on their own.

Not Just for Developers

Metrics can provide useful information to people other than dev. How about customer service? Giving them a heads up when response times plummet (expect lots of phone calls) or when a certain user just can’t seem to log in (John J. just failed login 5 times!) will allow them to make better decisions.

Speaking of which product dev guys love it too. They can see which features are used and by who. They have numbers on the dev stuff which always seems to resonate quite well. How important is that new features? Well I don’t know this current features/endpoint is taking 3+ second on average to load maybe we should fix that first.

Sales? Who isn’t using the site, who is. How can we engage the customer and bring them back in when they may be ready to leave.

Marketing loves metrics, mostly for bragging rights. “We get X requests per month” for example.

More to Come

In my next post I will take some time to simply lay out a lot of the basic metrics most engineering teams will be interested in, and maybe a few others non-eng groups would be interested in. Later on I’ll cover actually putting some software together to deal with metrics, and to generate them too.

Locust IO 0.6.2 Gevent 1.0rc2

When using the latest release of locust.io for testing our site we hit a few gevent related bugs. The author of gevent says upgrade to the new version which uses libev over libevent. Since 0.6.2 locust has allowed use of the new 1.0rc2 version. This I found fixes a lot of issues, particularly around this bug.

Instructions on pip installing 1.0rc2 (or possibly a newer one by the time you read this) can be found here. Install it before you do your ‘pip install locustio’ and everything should be hunky dory.

Adding Apt GPG Key When Using Ansible

I’m using Ansible to provision, setup, and manage a server. If you want to install MariaDb, Postgresql, and many other packages you will want to use the best source. Often this is source is provided by projects themselves, in the form on an Ubuntu PPA. In the case of Postgresql and MariaDb you’ll use a PPA like I have. This normally means doing two things with Ansible.

First is to use the apt_key module to install the gpg key of the PPA repo. Second you use the apt_repository module to install the repository, after which apt-get will be able to install software using the repo.

The Ansible apt_key module is fairly simple, unlike the apt-add-key command you’d likely use on Ubuntu. You have to have a URL which directly references the gpg key. In the case of Postgresql this isn’t an issue, if you visit their download page the URL of the pgp key is given; not so lucky with MariaDb.

If you read MariaDb’s docs they’ll tell you to use the apt-key command with advanced options to install the gpg key. This doesn’t work in Ansible. It doesn’t support these options, so you need a real URL you can use. So how did I figure out what this URL is?

I started by going to the server listed on MariaDb’s site. That will give you a page where you can search for the gpg key. For the search string use the id, which in my case is ’0xcbcb082a1bb943db’. Make sure to also check the ‘Only return exact matches’ box so that you don’t get a bunch of similar but incorrect results. After you hit search you’ll should see exactly one match, which will include a bunch of information on the signing of the key. At the top is a line that looks like this:

pub  1024D/1BB943DB 2010-02-02

You will want to click that link. You end up with this page, which is the URL you want for the Ansible ‘add_key’ command. Plug that in as the ‘url’ key to the command and you’ll be all set.

Go VIM Features (on os x)

Go is a very good programming language which has Ken Thompson‘s fingerprints all over it. If you are writing any system level applications it’s worth taking a look at. If you really dig Python’s clean syntax, but want your app(s) to be able to better utilize machine resources this is the language for you. It is static, but in a good way (think Haskell), it allows management of memory (but also GC) and has a concurrency model which doesn’t punish the uninitiated.

Vim of course is the text editor of choice for system programmers, it runs on everything and can be extended in very powerful and useful ways. It’s an uber tool which stands the test of time.

The following is how I setup my Mac Book Pro (snow leopard) with both Go and VIM.

Download the latest Go package from here and install it on your mac you’ll find all the files in /usr/local/go. In that folder you also should have a collection of VIM features (highlighting for one) which you’ll want. Here is what I recommend.

If you aren’t already using Pathogen you first install it. It allows much cleaner management of VIM add ons than possible by default. You can go here and figure out how it all works, along with download and install it. Here is another article detailing Pathogen a bit more. Read both these resources, install Pathogen. Now you are ready for installing the add ons for Go.

Once Pathogen is installed simply do the following to install the add ons:

cd ~/.vim
mkdir -p bundle/golang
cp -R /usr/local/go/misc/vim/* bundle/golang

Now you should be able to open a .go file and see the syntax highlighter in effect.

Grails, Validation of Map on Domain Class

Grails has a bunch of can’d validators such as maxSize, blank, nullable and so forth. These are hard workers, they’ll check values, report errors which can easily be internationalized, and can even influence auto-schema generation (a feature I advise you ignore).

If you are reading this maybe you googled something like “grails map validation” which means that you want to know how you can properly check a map of something inside of a domain class. Well this post may or may not help you. What I was doing today was working on a way to allow arbitrary key values (only strings) to be set on my domain class. I figured I would allow 255 characters for key and value.

This ends up not being all to difficult but does require that you know how the underlying Spring Framework mechanisms work.

First off is the Errors interface which in Grails uses the concrete implementation BeanPropertyBindingResult.

When using a custom validator you can pass a code block, or closure, that take 3 parameters. In order they are the value being validated, the object that value is part of, and the Errors implementation being used for the object to handle validation and error messaging. When, in your validator code, you find an error you will want to tell the Errors object. There are a number of methods in the Errors interface which allow this. Look at the various flavors of reject and rejectValue which are found. It’s important to really understand how these work because you’ll rely on getting things right here to be able to use property files to externalize and i18n your error messages.

What I ended up doing in this closure was iterating the key/value pairs, and checking if the lengths of the keys and values. If any key or value was more than 255 characters I added an error for that field on that object to the Errors object by calling rejectValue. This worked out perfect and the validator works just like the ones baked into Grails. I can put a version of my message in any language and it will be properly built and reported at the proper layer of my application.

Here is the code for my validator:

attributes(validator: { attributes, obj, errs ->
	attributes.each { k, v ->
		if (k.size() > 255) {
			errs.rejectValue("attributes", "key.toobig", [k] as Object[], "Key size too big, max is 255" as String)
		}
		if (v.size() > 255) {
			errs.rejectValue("attributes", "value.toobig", [k] as Object[], "Value for key ${k} too big, max is 255" as String)
		}
	}
})

Notice that the ‘as Object[]‘ and ‘as String’ are there to make sure the types are cast to the proper ones which the methods of the Errors interface require. Remember Groovy gotcha #1739, if you use interpolation in a literal string you will get a GString (yes really that is what they called it) and not a String type back, and any method that requires String will be unhappy if you pass it a GString (yeah imagine that).

Install MongoDB on OS X

I’ve already had MongoDB installed for years on my Mac, but I found this article recently and followed the instructions to setup a better install. Using these steps you can use OS X’s launch control to stop/start/restart MongoDB and to have it run on system startup.

It’s worth doing IMO since I have had to manually run the server which can get tedious. Also I’m kinda a sucker for doing things the ‘right’ way in this case meaning using launch control which is the preferred Mac way to handle daemons.