Saturday, June 2, 2012

Raven and 'safe by default'

I'm still rather enamoured of RavenDB, in my opinion it has the best LINQ provider of any document database.

But it has some quirks, and I encountered one just the other day - the 'safe by default' behaviour. A standard installation limits the number of requests per raven session:

"By default, that limit is set to 30. On request #31, the session will throw an exception with detailed information about the quota violation."


There is also some rather weak justification for the existence of said limit (using the spectre of a DOS attack). But when is a request a request? In Raven terms, a request must result in a communication with the target server.

This is where a subtlety arises - loading a document n times might be one request or n. Loading a document that exists in the target database in a session results in one request regardless of how many times load is called; conversely, attempting to load a non existent document will generate a request for each load attempt (which is logical, as Raven should make no assumptions about the genesis of a document).

In my case, it was easy to work around by loading and retaining a reference to a document (even if it was null). And sure, Raven allows you to modify the number of requests allowed in a session, individually or globally.

But why even bother? Why have a limit that can only be changed programmatically and not by configuration? And 30 was arrived at by what analysis? The number of requests property is also an integer, instead of the more logical typing of unsigned integer.

I would prefer to have this 'feature' controlled by configuration, and have a 'switch' option - that is, be concerned about request numbers or not, rather than some arbitrary integer limit.

3 comments:

Sam Stephens said...

Personally I'm a fan of the safe by default behavior. He's not trying to justify it to prevent a DOS, it's to protect from SELECT N+1. He's just comparing SELECT N+1 to a denial of service.

The safe by default rule would have saved me from some embarrassing code early in my usage of Entity Framework, loading child entities lazily one by one in my naivety.

Considering that a session is supposed to have the lifetime of a request or similar, it sounds like the rule did its job - find that you were loading a document N times when you could have loaded it once.

Tony Beveridge said...

I know you're an Ayende fan so I'll measure my response :-)

An arbitrary limit that can only be modified via programmatic means and can surface under subtle conditions is poor. I don't need or want to be 'nannied' by RavenDB.

And I was only 'loading' a document n times when it does not actually exist. If Raven is going to implement such limits, it should be done with a lot more finesse than is currently apparent - why not allow for plugable policies for example?

Sam Stephens said...

Agree 100% with your arbitrary limit statement.

I like the idea of preventing bad programming practices such as SELECT N+1. I can't see a simple way to spot SELECT N+1 specifically. I'm not opposed to configuration that says more than 30 requests is a mistake. I know this is arbitrary, but you have to set some limit. I don't think there are many scenarios you should be making more than 30 requests.

I think there are two major issues here. One is that the behaviour appears to be active in production, which it shouldn't be. The other is that, as you say, the behaviour should be configurable.

Thinking about this, my take is
- The maximum should appear in configuration
- When you add RavenDB to a project using NuGet, it should add the default to your config file. This way the default is explicit.
- The limit should only be active when debug is on.

I don't imagine Ayende implementing pluggable policies here, I'm guessing he'd view it as unwarranted complexity :-)