Too Much Happening, Filtering the Overload 1
Tonight I was drifting through Upcoming to see what’s happening in my area. Turns out the first page of 50 some events only covered today…. and its a ‘boring’ Wednesday. As I flipped through page after page, eventually getting all the way up to this upcoming Sunday… it occurred to me that there is way way too much happening within a mere 30 miles of me.
Of course, being a geek I’ve been thinking about ways to try and narrow down this list so I can actually get a useful list for what I’d prolly be interested in for the next 45 days. There’s just no way I can look through 1200 events to see what’s going on for the next month, and I like to plan ahead.
A few things I think would help to begin with would be keyword filtering, of course the next step that requires would be some way to build a useful keylist of what should be flagged for my perusal. Trying to generate a full list of every keyword I might be interested in is going to be pretty tough, though I finally thought of the one application I have which could yield the best list of keywords to filter on… Delicious Library.
It already knows the vast majority of books I own, and all my DVD’s. That’s not a bad starting point, toss in all the music I’ve bought from scouring all the id3 tags on my digital music and that’ll cover my music. I really need a way to pull all my previously watched, and upcoming movies from Netflix, in addition to mining all the data of who was in each flick (IMDB should have that). After which I should have a decent set of keywords to ensure I don’t miss either some special premier with an actor I might like, or a comedian doing standup.
Is there anything like this already that I just haven’t found? Clearly, I’d want to keep this massive keyword list personal, so the software to pull from upcoming would narrow down the result set locally. I’m still a bit too privacy obsessed to consider putting such a massive amount of information into a single vendors hands. :)
SQLAlchemy, Declarative layers, and the ORM 'Problem' 16
There’s been a bit of talk on the Pylons devel list regarding the recommended way to use SQLAlchemy with Pylons mainly regarding how to use SA (SQLAlchemy) in a fashion that is well documented and easy to work with (and maintain!).
Prior to Pylons 0.9.6 and SQLAlchemy 0.4 it was a bit of a mess, with the framework needing to load the config (since thats where your db settings are), then setup globals for SA…. eek. Mike Orr had a good intermediary solution for SA 0.3 called SAContext that handled many of the tricky parts. Unfortunately, this actually caused even more confusion as more ways of doing the same thing came about. SAContext solved some of the global config grabbing issues, but the additional layer of indirection made trouble shooting even harder (despite how small of a library it was).
Less is More
So the fix? Less intermediary layers, less indirection… essentially, KISS. Despite how much Pylons was attempting to help a user to get the db going, the additional layers in the end actually caused more problems then they solved. Of course, I shouldn’t have been too surprised…. Mike Bayer did warn me about many of these things at the beginning. Being overly eager to make things “easier” for new users, I ignored him. :)
This is why Pylons does not recommend Elixir, and with SA 0.4 the recommended usage of SA is to use its plain mapper functionality should you need an ORM layer. Yes, that’s right, despite almost every web framework out there pushing its ORM on you (or someone else’s ORM), there are many times when an app doesn’t even need a full-blown ORM.
Declarative vs Basic SA
For a better look at why one might consider additional layers on SA a bad thing lets compare a fairly basic table setup consisting of users and groups. Each user can be in multiple groups, and lets use proper referential integrity to ensure that groups aren’t deleted when users are still in them.
Compare the following two ways of setting up a basic many to many relation and the tables:
SQLAlchemy 0.4
from sqlalchemy import Column, ForeignKey, MetaData, Table, types
from sqlalchemy.orm import mapper, relation
metadata = MetaData()
person_table = Table('person', metadata,
Column('id', types.Integer, primary_key=True),
Column('name', types.String, nullable=False),
Column('age', types.String)
)
group_table = Table('group', metadata,
Column('id', types.Integer, primary_key=True),
Column('name', types.String, nullable=False)
)
persongroups_table = Table('person_groups', metadata,
Column('person_id', types.Integer, ForeignKey('person.id', ondelete='CASCADE'), primary_key=True),
Column('group_id', types.Integer, ForeignKey('group.id', ondelete='RESTRICT'), primary_key=True),
)
class Person(object):
pass
class Group(object):
pass
mapper(Person, person_table, properties=dict(
groups=relation(Group, backref='people', lazy=False)
))
mapper(Group, group_table)
Elixir
from elixir import *
class Person(Entity):
has_field('name', String)
has_and_belongs_to_many('groups', of_kind='Group')
class Group(Entity):
has_field('name', String)
has_and_belongs_to_many('people', of_kind='Person')
On first glance, its pretty obvious that everyone should love Elixir vs the obviously more tedious SA approach of layout out your tables, then mapping them to the class objects. However, look at these two examples, and try to quickly answer the following questions:
- How do you add a column to the many to many table to store an additional bit of info for the join?
- Do they both enforce referential integrity?
- How do you control whether SA is eager loading the relation? Can you restrict it to just one column of the relation?
- What are the table names used?
- How many tables are in your database?
- Where do you change the id column name?
- Which one is closer to the Zen of Python?
I think the explicit setup makes many of these questions easier to answer just at a glance. Those with enough Elixir experience can fairly easily answer most of these questions, but consider what that implies. Not only do you need to know SQLAlchemy options and parameters, but you need to know Elixir options and how they map to the SQLAlchemy functions. The desire to reduce the up-front setup of the ORM actually increases the amount of knowledge a user has to have in order to use it, and the most worrisome aspect… how to troubleshoot it.
Setups that Grow with You
With Pylons, a goal has been to provide out of the box recommendations that grow with you. That is, using the set of recommended tools may not be as apparently “easy” as some other frameworks. However, the pay-off is that you don’t hit a wall in 2 months when your application inevitably gets a little more advanced and needs to do something the simple tools either can’t do at all, or it’s incredibly difficult to do even slightly complicated things (eager load 2 columns off a related table, but not all of them). This way, the toolset you learned, you can keep using as you get more advanced and you don’t “outgrow” your tools.
While Elixir definitely appears to be easier at first glance, when you need to get more complicated you can’t exactly turn to the SA docs since Elixir has put a layer between you and SA. This can be very crippling when you eventually hit a wall, and so much ‘magic’ is wrapped up in the declarative layer that you have to troubleshoot additional layers of code when something goes wrong. The result of this is that to effectively use Elixir in complicated setups with SQLAlchemy, you need to really really know both of them which actually requires more work for a user than plain SQLAlchemy.
The SA example clearly requires a little more up-front setup, however, are you really adding tables to your database every day? How often are you going to be actually touching the table and mapper code, vs just adding domain model methods to your Person/Group class? Did the layer make it easier or harder to use multiple databases and/or put more between you and advanced SA functionality you might need later?
Adam Gomaa pointed out some interesting issues with Django’s ORM and Elixir but unfortunately tries to do the same thing Elixir and TurboEntity do…. add more layers. More layers more indirection more to wade through when you need to do something that should be pretty basic with SQLAlchemy (and is probably nicely documented on the SA site, which won’t help with these layers until you dig through the layer to find the basic SA objects the SA site refers to…).
What really makes a lot of this even more trouble-some with SA, is that when setting up complex relationships, the order of declaring table objects becomes important, since relations need to refer to them and the ORM classes. This usually results in some interesting metaclass hackery when you have these Entity’s in multiple modules, importing each other, and doing other fairly common stuff.
SQLAlchemy 0.4
In the end, I’ve been using plain SQLAlchemy 0.4 (at beta5 now, but quite stable) a lot lately, and its really great. Yes, setting up the tables (generally a one-time thing) took me probably 15 mins longer than it would’ve with Elixir. But I’m fairly certain I’ve saved myself significantly more time in the long run since I won’t need to worry about diving into Elixir code to try and find SA objects when I need a complex query, or trying to figure out how to hack Elixir’s connection should I need multiple db connections at once, etc.
So please, new users to SQLAlchemy, use just SQLAlchemy. It definitely seems daunting at first, but the flexibility and comprehensive documentation give you a solution that scales to meet your needs, with no walls in sight.
On a side-note, its interesting to compare my position on this issue to the Django team lack of AJAX helpers. The Django team rightfully claims that Javascript isn’t that hard, so “get over it” and learn a nice Javascript library so you can do powerful things. Also note that by including AJAX helpers, Pylons is encouraging one part that doesn’t scale… as the AJAX helpers will have you hitting a wall sooner or later.
Routes planning and the road to Routes 2.0 13
Routes has been a wonderfully successful project of mine as its used not just in Pylons but quite a few other small WSGI apps. It’s even been integrated as a CherryPy URL dispatching option for those using TurboGears 1.x or plain CherryPy. It’s come a long way since 1.0 and has diverged on quite a few fronts from the Rails version that it was originally duplicating in functionality, which has me thinking that perhaps its time to consider 2.0 and what that will mean.
First, I think there’s quite a few behaviors that don’t make sense and probably never have. The Route minimizing functionality sounds neat and fulfills the Rails ideal of ‘keeping urls pretty’ yet suffers from a fundamental flaw… they result in multiple valid URL’s for the same page. James Tauber covers some of the implications of this in addition to the issue with relative URL’s not working right either. The example he cites is even worse for Routes since Routes can easily result in multiple URL’s mapping to the same controller action.
Routes 2.0
To solve the multiple URL issue, and also address some Named Routes confusion that recently hit the Pylons mailing list, Routes 2.0 will do a few things differently than Routes 1.X. Many things will be significantly more explicit and more predictable as a result. In addition, I want to add more functionality so that Routes can be the end-all of URL generation in addition to URL dispatching, even for use purely generating links conditionally that aren’t even used for URL dispatching.
Some of the key changes planned for Routes 2.0:
- Routes recognition and generation will always be explicit Currently, there’s an explicit option that removes the keyword controller/action/id defaults, 2.0 will not have these implicit defaults.
- Defaults don’t cause minimization Defaults just mean they’ll be used, they won’t result in Route minimization which increases the amount of URL’s that end up matching to a single controller action.
- Trailing slashes shouldn’t be an issue Without routes being minimized, a route such as ’/home/index’ will always be ’/home/index’ instead of being minimized to /home/. This will resolve the trailing slash issue.
- Named Routes will always use the route named A named route currently just means that the defaults for that route will be used during route generation. This currently can cause confusion because people believe that the named route actually forces that route to be used during generation.
- Generation-only routes A new option, which will result in routes that are used purely for generation. This option will likely be used primarily for static resources which may be on other servers, or may need the domain rotated so that the browser can do parallel resource loading. For example, one would be able to provide a list of domains to be used, and the generated links will rotate as desired on the page to split page resources over multiple domains.
- Redirect routes Sometimes, (especially when replacing legacy apps), its desirable to slowly migrate URL scheme’s from the old to the new rather. While URL’s should never change, sometimes the system being replaced has horrid URL’s that violate all URL recommendations. Being able to provide a smooth migration path from the old URL’s to the new ones is handy, and permanent redirects are respected by many search crawlers as well.
Migration and Compatibility
Routes 1.8 will include options to turn on behavior that will be the default in Routes 2.0, and if you like how Routes 1.X works there’s no need to worry, it will still be maintained for the foreseeable future. It’s currently extremely stable, and has a massive unit test suite to ensure it operates as designed.
Add Your Desired Feature Here
What other features are Routes users currently looking for?
Next-Gen DVD Wars Give me the Blues... 10
I’ve been seeing more and more movies I wouldn’t mind actually buying… except that I have a 56” HD TV set. It looks amazing, pretty good with DVD’s, but really amazing with HD so if I’m going to buy a movie I sure as heck want it in HD.
Unfortunately thanks to the tech companies refusing to come up with a single standard, I can either buy a $200 HD-DVD add-on for my Xbox 360, or a PlayStation 3 which I’ve heard is the cheapest Blu-Ray DVD player available. And of course, I really need to buy both if I want HD movies since some studios are only putting their movies out on one or the other format.
The end result? I’m not buying HD movies for HD-DVD or Blu-ray because its pathetic to have to have 2 players around (no, I don’t care about universal players). In addition, I’m not buying DVD’s anymore since I know that in a year or two I’ll want to re-buy it in whatever HD format wins (please let one of them win in a year or two!!!).
What are other people doing in this situation? Buying one or the other? Still buying DVD’s? Or not buying any movies at all until they get it straight?
Making logging friendly in webapps 2
Finding out whats going on in an application is always a key point when trying to figure out what’s happening when things don’t go the way you expect. In these kind of annoying situations, following the log of how your request mapped in and what was going on can save a significant amount of time. TurboGears has had great logging throughout it, and with 0.9.6, Pylons add’s the same thorough logging.
But once the entire request is mapped out, it can be a real chore to track through them, toggle modules you want and don’t want logging for, etc. Chainsaw has had excellent graphical support for navigating logging output, and with the help of Phil Jenvey’s XMLLayout package its easy to output logging statements in a format for use with it.
With the recently updated Chainsaw section in the Pylons logging docs, you too can be lumberjacking with Chainsaw.
And of course, a shot of Chainsaw in action:






