SQLAlchemy, Declarative layers, and the ORM 'Problem' 16

Posted by ben Tue, 04 Sep 2007 22:40:06 GMT

There’s been a bit of talk on the Pylons devel list regarding the recommended way to use SQLAlchemy with Pylons mainly regarding how to use SA (SQLAlchemy) in a fashion that is well documented and easy to work with (and maintain!).

Prior to Pylons 0.9.6 and SQLAlchemy 0.4 it was a bit of a mess, with the framework needing to load the config (since thats where your db settings are), then setup globals for SA…. eek. Mike Orr had a good intermediary solution for SA 0.3 called SAContext that handled many of the tricky parts. Unfortunately, this actually caused even more confusion as more ways of doing the same thing came about. SAContext solved some of the global config grabbing issues, but the additional layer of indirection made trouble shooting even harder (despite how small of a library it was).

Less is More

So the fix? Less intermediary layers, less indirection… essentially, KISS. Despite how much Pylons was attempting to help a user to get the db going, the additional layers in the end actually caused more problems then they solved. Of course, I shouldn’t have been too surprised…. Mike Bayer did warn me about many of these things at the beginning. Being overly eager to make things “easier” for new users, I ignored him. :)

This is why Pylons does not recommend Elixir, and with SA 0.4 the recommended usage of SA is to use its plain mapper functionality should you need an ORM layer. Yes, that’s right, despite almost every web framework out there pushing its ORM on you (or someone else’s ORM), there are many times when an app doesn’t even need a full-blown ORM.

Declarative vs Basic SA

For a better look at why one might consider additional layers on SA a bad thing lets compare a fairly basic table setup consisting of users and groups. Each user can be in multiple groups, and lets use proper referential integrity to ensure that groups aren’t deleted when users are still in them.

Compare the following two ways of setting up a basic many to many relation and the tables:

SQLAlchemy 0.4

from sqlalchemy import Column, ForeignKey, MetaData, Table, types
from sqlalchemy.orm import mapper, relation

metadata = MetaData()

person_table = Table('person', metadata,
    Column('id', types.Integer, primary_key=True),
    Column('name', types.String, nullable=False),
    Column('age', types.String)
)

group_table = Table('group', metadata,
    Column('id', types.Integer, primary_key=True),
    Column('name', types.String, nullable=False)
)

persongroups_table = Table('person_groups', metadata,
    Column('person_id', types.Integer, ForeignKey('person.id', ondelete='CASCADE'), primary_key=True),
    Column('group_id', types.Integer, ForeignKey('group.id', ondelete='RESTRICT'), primary_key=True),
)

class Person(object):
    pass

class Group(object):
    pass

mapper(Person, person_table, properties=dict(
       groups=relation(Group, backref='people', lazy=False)
))
mapper(Group, group_table)
Elixir

from elixir import *
class Person(Entity):
    has_field('name', String)
    has_and_belongs_to_many('groups', of_kind='Group')

class Group(Entity):
    has_field('name', String)
    has_and_belongs_to_many('people', of_kind='Person')

On first glance, its pretty obvious that everyone should love Elixir vs the obviously more tedious SA approach of layout out your tables, then mapping them to the class objects. However, look at these two examples, and try to quickly answer the following questions:

  • How do you add a column to the many to many table to store an additional bit of info for the join?
  • Do they both enforce referential integrity?
  • How do you control whether SA is eager loading the relation? Can you restrict it to just one column of the relation?
  • What are the table names used?
  • How many tables are in your database?
  • Where do you change the id column name?
  • Which one is closer to the Zen of Python?

I think the explicit setup makes many of these questions easier to answer just at a glance. Those with enough Elixir experience can fairly easily answer most of these questions, but consider what that implies. Not only do you need to know SQLAlchemy options and parameters, but you need to know Elixir options and how they map to the SQLAlchemy functions. The desire to reduce the up-front setup of the ORM actually increases the amount of knowledge a user has to have in order to use it, and the most worrisome aspect… how to troubleshoot it.

Setups that Grow with You

With Pylons, a goal has been to provide out of the box recommendations that grow with you. That is, using the set of recommended tools may not be as apparently “easy” as some other frameworks. However, the pay-off is that you don’t hit a wall in 2 months when your application inevitably gets a little more advanced and needs to do something the simple tools either can’t do at all, or it’s incredibly difficult to do even slightly complicated things (eager load 2 columns off a related table, but not all of them). This way, the toolset you learned, you can keep using as you get more advanced and you don’t “outgrow” your tools.

While Elixir definitely appears to be easier at first glance, when you need to get more complicated you can’t exactly turn to the SA docs since Elixir has put a layer between you and SA. This can be very crippling when you eventually hit a wall, and so much ‘magic’ is wrapped up in the declarative layer that you have to troubleshoot additional layers of code when something goes wrong. The result of this is that to effectively use Elixir in complicated setups with SQLAlchemy, you need to really really know both of them which actually requires more work for a user than plain SQLAlchemy.

The SA example clearly requires a little more up-front setup, however, are you really adding tables to your database every day? How often are you going to be actually touching the table and mapper code, vs just adding domain model methods to your Person/Group class? Did the layer make it easier or harder to use multiple databases and/or put more between you and advanced SA functionality you might need later?

Adam Gomaa pointed out some interesting issues with Django’s ORM and Elixir but unfortunately tries to do the same thing Elixir and TurboEntity do…. add more layers. More layers more indirection more to wade through when you need to do something that should be pretty basic with SQLAlchemy (and is probably nicely documented on the SA site, which won’t help with these layers until you dig through the layer to find the basic SA objects the SA site refers to…).

What really makes a lot of this even more trouble-some with SA, is that when setting up complex relationships, the order of declaring table objects becomes important, since relations need to refer to them and the ORM classes. This usually results in some interesting metaclass hackery when you have these Entity’s in multiple modules, importing each other, and doing other fairly common stuff.

SQLAlchemy 0.4

In the end, I’ve been using plain SQLAlchemy 0.4 (at beta5 now, but quite stable) a lot lately, and its really great. Yes, setting up the tables (generally a one-time thing) took me probably 15 mins longer than it would’ve with Elixir. But I’m fairly certain I’ve saved myself significantly more time in the long run since I won’t need to worry about diving into Elixir code to try and find SA objects when I need a complex query, or trying to figure out how to hack Elixir’s connection should I need multiple db connections at once, etc.

So please, new users to SQLAlchemy, use just SQLAlchemy. It definitely seems daunting at first, but the flexibility and comprehensive documentation give you a solution that scales to meet your needs, with no walls in sight.

On a side-note, its interesting to compare my position on this issue to the Django team lack of AJAX helpers. The Django team rightfully claims that Javascript isn’t that hard, so “get over it” and learn a nice Javascript library so you can do powerful things. Also note that by including AJAX helpers, Pylons is encouraging one part that doesn’t scale… as the AJAX helpers will have you hitting a wall sooner or later.

Comments

Leave a comment

  1. Avatar
    Nicholas about 10 hours later:

    Excellent post Ben!

    I could not agree more. One has to be careful with all these layers and ‘we make it easy for you’ approach. Explicit is better then implicit.

    Thats also the reason why I love it that Pylons is using Mako instead of Pylons and at the ‘Myghty time’ wondered why Pylons wasn’t using Cheetah. I had a hard time understanding that i had to ignore lots of functionality in Myghty.

    Concerning the webhelpers, i do think that in its essence Javascript or SQL isn’t that hard. However I do have a hard time remembering the correct syntax the now and then. The ORM i use helps me for that on the SQL side, and the webhelpers help me with the javascript. I think their great and their code is fairly easy to understand even though it might be a bit inconsistent. I would love a more OO approach on top of some sort of library where we use PyPy to convert the®python code to javascript.

  2. Avatar
    Adam Gomaa about 17 hours later:

    Good points. SQLAlchemy does model SQL very well, but I think the goal of a web framework ORM is to let the programmer think in terms of objects rather than tables. That’s my fundamental problem with the SQLAlchemy syntax – it’s explicit, and, while relatively verbose, pretty easy to abstract away – but the classes that you write have no relation to the DB until SQLAlchemy puts them there with mappers.

    But you’re right, I dunno if a declarative layer is the right solution. I think the OOP syntax that is being discussed for Elixir would help illustrate the advantages – writing natural classes is enormously better than writing SQL-in-Python.

  3. Avatar
    Paul about 18 hours later:

    I think part of the elixir user base comes from the fact that the Pylons screencast off of pylonshq uses elixir. So far, there aren’t any screencasts using SQLAlchemy.

  4. Avatar
    Ben Bangert about 19 hours later:

    @Adam: Yes, however the database is not objects. Which means if you make it too object-y with no way to easily see and use the real db structure, you impose a ton of hardships on yourself when you do need real db functionality (getting a list of unique tags in the tag table that were used for your pastebin posts, along with a count of how many pastebin posts go to each one).

    The database these ORM’s tie to is not an object database, its a RDBMS. As such, I believe the SQLAlchemy approach of having your tables, and mapping them to objects as desired is a great way to get the power of using objects when desired, but easily being able to do more ‘raw’ db queries when needed.

    Consider some of the other problems with SQLAlchemy declarative layers:

    • Upgrades: Does the declarative layer upgrade force your SQLAlchemy upgrade? Do they stay in sync? You now have 2 packages with dependent versions to keep track of.
    • Maintenance: Will you remember what the declarative layer does when you pick up your project again in 2 months? How much it swept under the rug and away from you can become a maintenance hindrance.

    Also related to maintenance is the effort it takes to bring someone new up to speed. With Elixir or some other layer, someone new coming into the project that may just need to add a more efficient db query, now has to dig through one more layer to find the objects necessary to use with the SQLAlchemy docs.

    I really don’t believe a db session query like:
    
    Session.query(User).filter_by(name="Fred")
    

    is ugly. It’s very clear and explicit, flexible, supports standard Python splicing to apply limits and offsets for you, etc. I find it a lot nicer than the extremely ‘un-pythonic’ something__contains() functions in Django’s ORM. It’s just bizarre to me, to see double underscores in the middle of a function call.

  5. Avatar
    John "Z-Bo" Zabroski 1 day later:

    In a nutshell, the basic priorities of any API: Correct, Explicit, Fast, and in that order.

    About AJAX Helpers… Providing JavaScript helpers is a complicated issue for most frameworks, primarily due to licensing and modularity. Suppose you wanted to use a freely available JavaScript library, but only wanted to use a particular feature of that library. You have a conflict that needs a resolution. Deciding on one library arbitrarily will force the web framework’s users into your JavaScript solution… something they might not be willing to accept. You could develop a modular platform that allows user’s to plug-in their own JavaScript solution, but most available JavaScript libraries are not modular and don’t lend themselves well to being plugged in.

  6. Avatar
    Robert Brewer 1 day later:

    Ben, you’ve restored my faith in Python frameworks. Model reality faithfully and leave the indirection to Java.

  7. Avatar
    Jonathan LaCour 1 day later:

    Nice post Ben! I wrote a bit of a rebuttal on my blog, although, as I state there, I think you have a lot of good points.

    Layers aren’t inherently bad, its a matter of knowing the right level of abstraction for your problem, and being able to escape the abstraction when needed. Elixir’s biggest problems are that we picked a weird syntax (this is being fixed) and that its poorly documented (we’re working on this as well). Its not the right solution for every problem, but neither is SQLAlchemy’s ORM, or Pylons, or anything else! Know your problem, pick the right tools to help you solve it quickly and effectively.

    Also, I am surprised that a guy like Robert Brewer posted a comment like that when his own Geniusql has the same amount of indirection as Elixir!

  8. Avatar
    Daniel Haus 1 day later:

    You’re right, Ben, indirection and layers over layers can cause dangerous and frustrating consequences. Basically purity and transparency are essential.

    On the other hand, shortcuts and macros can save huge amounts of time and work, and can make life much easier at some points, if done right. Somehow this reminds me on Python vs. Ruby, C vs. C++, LFS vs. Ubuntu, Eclipse vs. TextMate. It’s hard decisions and a solid foundation for countless flamewars.

    I think, when it comes down to it, the answer just depends on how you implement things, and on finding the right level of abstraction for the purpose. So far I’ve done a couple of applications based on Elixir and never experienced any of the problems you mention above. Some of your points actually are explained in the docs (at least indirectly), and unfortunately, some aren’t (yet).

    As Jonathan points out, hopefully future releases of Elixir will clear out most of these issues and the new website and wiki will help us to heavily improve the documentation.

    The combination of Pylons and Elixir is working awesome for me and I can’t imagine building webapps any other way. Let’s go and fix it!

  9. Avatar
    Gaëtan de Menten 1 day later:

    Disclaimer: I’m one of Elixir authors.

    I’m a bit saddened by your post, because it doesn’t paint Elixir in a very bright way while I thought you were a supporter of it.

    1. How do you add a column to the many to many table to store an additional bit of info for the join?
    2. Do they both enforce referential integrity?
    3. How do you control whether SA is eager loading the relation? Can you restrict it to just one column of the relation?
    4. What are the table names used?
    5. How many tables are in your database?
    6. Where do you change the id column name?

    You’re dawn right that the users of Elixir ought to be able to answer those questions. But, in my opinion, this is a documentation issue and I’ll fully agree with you that the current documentation of Elixir is nowhere. The spirit of Elixir has always be to try to do the repetitive tasks for you, not to hide them from you. But I realize that without proper documentation on what exactly happens behind the scene, hiding is exactly what happens.

    1. Which one is closer to the Zen of Python?

    Ok, now, just for the fun of it, let’s answer that question by reviewing each of the relevant items in the Zen of Python:

    Beautiful is better than ugly.

    -> Elixir wins (1-0)

    Explicit is better than implicit.

    -> SA obviously wins this one. We could even double the score of that one because I think it’s important. That brings us to (1-2).

    Simple is better than complex.

    -> Elixir arguably wins (2-2)

    Readability counts.

    -> Elixir wins (3-2)

    So, if you ask me, I’d say Elixir is closer to the Zen of Python for declaring entities than pure SQLAlchemy. Now, seriously, it’s clearly a matter of taste.

    Upgrades and Maintenance?

    Yep, that’s a potential problem. Though we try to keep in sync with SQLAlchemy.

    With Elixir or some other layer, someone new coming into the project that may just need to add a more efficient db query, now has to dig through one more layer to find the objects necessary to use with the SQLAlchemy docs.

    Here is again a documentation issue. We should really make it clearer on what you should look up in Elixir doc and what is SA land. I mean, that an user should know where to look, as much as he should know if he needs to look at the SQL layer documentation or the ORM documentation.

    I really don’t believe a db session query like: Session.query(User).filter_by(name=”Fred”) is ugly.

    Neither do I. But since Elixir’s query system is SQLAlchemy query system, I don’t see any problem here. And it’s a perfectly valid line for querying Elixir entities. You also have access to a shortcut syntax as so:

    User.query.filter_by(name=”Fred”)

    But I don’t see what’s ugly about that.

  10. Avatar
    Ben Bangert 1 day later:

    @Gaëtan

    Simple is better than complex.

    Consider that the SA way of declaring things is rather simple (granted its more verbose as well). While Elixir uses a rather complex metaclass to assemble the equivilant. When something breaks during such assembly, I feel sorry for anyone having to debug where in the metaclass, and when, the problem occured.

    Readability Counts

    I agree with you, and think SA’s rather readable as well. It’s incredibly obvious that you have db tables, that get mapped to a class with the SA approach. Let’s call it a tie. :)

    Yes, its definitely debatable on many aspects of the Zen of Python.

    I really do want to see Elixir become a good choice for SQLAlchemy users. My main point in this article was to try to emphasize that Elixir does not actually make a new users life as happy as one might expect because it actually does imply that a user needs to know more about whats going on, not less.

    When new users to Pylons see Elixir, due to its more concise syntax they usually assume they need to know less about SQLAlchemy, when in fact they need to know more. They need to know both SQLAlchemy, and they need to know the handy work Elixir is doing for them.

    It’s somewhat ironic that one of my biggest problems in my own usage of Elixir was my own extension to it, associable. It was built heavily on a sample written by Mike Bayer, that utilizes a rather complex model. I seriously doubt anyone (probably only Gaëtan, Mike, and myself) really understands what associable actually does.

    It’s extensions like these and those that build on them that I think may actually do more harm than good. Few users have much chance of trouble-shooting such complex relationships that were built behind their back.

    Overall I am not against Elixir for those users who are advanced enough to understand what it does, and how it does it. If you know SQLAlchemy well, are tired of writing the verbose table syntax, and understand what Elixir is doing for you—Elixir is a great option and definitely is getting better.

    Personally though, after using Elixir for awhile, I’ve found SQLAlchemy 0.4 to require very little additional work and at the moment - since the SA docs are so thorough - its been easier to tweak to my needs than with Elixir.

  11. Avatar
    Robert Brewer 1 day later:

    Also, I am surprised that a guy like Robert Brewer posted a comment like that when his own Geniusql has the same amount of indirection as Elixir!

    I think you mean “Dejavu”. Geniusql is much more like the lower levels of SA (without even SA’s own ORM).

    But my point was not about Elixir per se. It was more about the insane levels of abstraction I see being built up in every Python framework, many of which seem to have nobody trying to limit their breadth or depth. The combined cognitive load is untenable, and we all need to look for opportunities to reduce the number of layers, particularly nested ones. This post tells me Ben is thinking along those lines, and I’m happy to see that ray of hope. :)

  12. Avatar
    Jonathan LaCour 1 day later:

    @Robert

    Oops! Sorry about confusing the two :) Thanks for the clarification!

  13. Avatar
    Jonathan LaCour 1 day later:

    @Ben

    Great discussion! We have discussed this before, but I think that you are correct that Elixir will be much less confusing and much more effective and useful to users after they have an understanding of SQLAlchemy.

    Also, as for the extensions, I think that having powerful extensions that do a lot of boilerplate work for you is very important. I think the problem comes from having them not document what they are doing :) If they are well documented, and make it clear what tables are created, and what is happening underneath the covers, then you’ll always be able to break the abstractions, and drop down to the table objects when you need to.

  14. Avatar
    andybak 2 days later:

    An excellent argument against ORM’s and for learning SQL. Down with layers!

    ;-)

  15. Avatar
    Johan Samyn 24 days later:

    I understand Ben’s point, and also the ease of tools like Elixir. Apart from full docs, could it be of any help to have a tool that generates/shows the full boilerplate SA code that’s behind the Elixir way, so developers can read it and better understand what Elixir is doing for them ? As a way of checking if your Elixir code does what you expect it to. I’d consider such a tool as some kind of documentation too.

  16. Avatar
    tuan 7 months later:

    I think it’s silly to denigrate Elixir, it’s simply a greater level of abstraction for those who don’t want to get caught up in the details of DDL. Abstraction allows for faster dev time. Sure, it gets in the way if you want to know the details- that’s always been the tradeoff. . .One could make the same argument against Python vs. C. :)

Comments