Showing posts with label No-Sql. Show all posts
Showing posts with label No-Sql. Show all posts

Saturday, June 2, 2012

Raven and 'safe by default'

I'm still rather enamoured of RavenDB, in my opinion it has the best LINQ provider of any document database.

But it has some quirks, and I encountered one just the other day - the 'safe by default' behaviour. A standard installation limits the number of requests per raven session:

"By default, that limit is set to 30. On request #31, the session will throw an exception with detailed information about the quota violation."


There is also some rather weak justification for the existence of said limit (using the spectre of a DOS attack). But when is a request a request? In Raven terms, a request must result in a communication with the target server.

This is where a subtlety arises - loading a document n times might be one request or n. Loading a document that exists in the target database in a session results in one request regardless of how many times load is called; conversely, attempting to load a non existent document will generate a request for each load attempt (which is logical, as Raven should make no assumptions about the genesis of a document).

In my case, it was easy to work around by loading and retaining a reference to a document (even if it was null). And sure, Raven allows you to modify the number of requests allowed in a session, individually or globally.

But why even bother? Why have a limit that can only be changed programmatically and not by configuration? And 30 was arrived at by what analysis? The number of requests property is also an integer, instead of the more logical typing of unsigned integer.

I would prefer to have this 'feature' controlled by configuration, and have a 'switch' option - that is, be concerned about request numbers or not, rather than some arbitrary integer limit.

Saturday, May 28, 2011

RavenDB: Conflict resolution when loading documents

The fun with RavenDB continues - if that sounds sarcastic, it's not meant to be - I'm rather fond of Raven, it certainly performs well, and I still find myself waxing lyrical about the LINQ provider - it's really very nice indeed.

One thing I did have to do though was work out how to deal with conflicts, especially in the face of a replication setup. It's not that difficult to do, but I thought I'd share (I like my presumption that anyone is actually going to read this!).

I have a wrapper class around Raven (I get bored typing RavenDB, so Raven it is), that provides some useful behaviour around core Raven features. One such wrapper feature is the 'recoverable' load - simply put, when a document is loaded from Raven and a conflict is detected, an attempt will be made to resolve the conflict.

The base implementation is as below - we use a Func<> to load an object, and if a Raven conflict exception occurs, we attempt to resolve the conflict, and return an object.

1:  public T RecoverableLoad<T>(string id, Func<RavenSessionWrapper, string, T> loader) where T : RavenBase {  
2:    DBC.AssertNotNull(loader, "Cannot exeucte a recoverable load with a null loader");  
3:    T result;  
4:    try {  
5:      LogFacade.LogInfo(this, string.Format("Recoverable load attempt for {0}", id));  
6:      result = loader(this, id);  
7:    }  
8:    catch (ConflictException ex) {  
9:      result = ResolveConflict(ex, id, loader);  
10:      DBC.AssertNotNull(result, "Null document even after conflict resolution tried");  
11:    }  
12:    return result;  
13:  }  
14:  private T ResolveConflict<T>(ConflictException ex, string id, Func<RavenSessionWrapper, string, T> loader) {  
15:    return ResolveConflict(ex.ConflictedVersionIds, id, loader);  
16:  }  

Lines 14-16 are just a convenience method to extract conflict ids from the actual conflict exception.

The actual conflict resolution behaviour is shown next:

1:  private T ResolveConflict<T>(IEnumerable<string> conflictIDs, string id, Func<RavenSessionWrapper, string, T> loader) {  
2:    LogFacade.LogWarning(this,   
3:     string.Format("Forced to resolve conflict for {0}. Conflict IDs:{1}{2}", id, Environment.NewLine, string.Join(Environment.NewLine, conflictIDs.ToArray())));  
4:    List<JsonDocument> conflicts =   
5:     conflictIDs.Select(s => RavenStore.DatabaseCommands.Get(s)).ToList();  
6:    Func<JsonDocument, DateTime> f = d =>  
7:     d.DataAsJson[TimestampProperty] == null ?   
8:      DateTime.MinValue :   
9:      JsonConvert.DeserializeObject<DateTime>(d.DataAsJson[TimestampProperty].ToString());  
10:    JsonDocument choice =   
11:       conflicts.Aggregate((curMax, x) => curMax == null || f(x) > f(curMax) ? x : curMax);  
12:    RavenStore.DatabaseCommands.Put(id, null, choice.DataAsJson, choice.Metadata);  
13:    LogFacade.LogWarning(this, string.Format("Resolved conflict for {0}, Marker: {1}", id, f(choice)));  
14:    return loader(this, id);  
15:  }  

So, what do we actually do:

  • Get a list of JSON documents that correspond to the conflict ids, using the base Raven'Get'  database command  (lines 4-5)
  • Define a Func<> that gets a timestamp property from the raw JSON document or provides a substitute value if non existent (lines 6-9)
  • Find the JSON document with the latest timestamp, via a peculiar use of the Aggregate LINQ method (lines 10-11)
  • Write the document back to Raven as definitive (line 12)
  • Now, load the document (line 14)
Sure, it is not that pretty, and it could do with some optimisation, but it works. The only implementation issue is really that a specific property is expected to exist in the JSON document. However, in my scenario, I have complete control over the construction of Raven documents, so there was no real need to make it more general.

Saturday, April 2, 2011

Adventures with RavenDB

RavenDB (http://ravendb.net/) is an interesting beast - not quite at the vanguard of the 'no-SQL' movement, but an easy to deploy and use no-sql document database for Microsoft shops. It's also lighter on features when compared to its more mature counterparts, such as MongoDB, CouchDB et al, but its LINQ provider is a construct of relative beauty and the simple deployment in IIS is a boon.

I'm employing it in a very specific way for a project I'm working on. In essence, I have a WCF service that acts as the definitive configuration source for a range of distributed applications (web, windows services, fat clients), centralizing the management of configuration, some of which is shared. So, all in all, a reasonable concept (similar to Microsofts Configuration service, but far better). All of these applications depend entirely on the configuration service to serve their configuration on demand - failure to do so renders the application unusable - for example, enterprise library configuration is communicated by this mechanism.

The initial scheme has a set of XML documents, one per WCF server in the cluster, deployed with the services as a single unit (via web deploy). This is obviously quite beneficial in terms of redundancy - a WCF server can be removed from the cluster (for maintenance as a single example) and it makes no difference to the dependent configuration clients (the applications) - each WCF server having a private copy of the configuration documents and so independent.

Redundancy is all well and good, but what happens when the configuration needs updating in real time? Obviously each XML document(s) in the cluster needs to be modified - which is rather crude and inefficient when there are n machines in existence and also means there is a temporal window where the configuration service may respond with different results depending on the machine servicing the request. I looked at using DFS to support in place 'shared' editing, but it did not hold much appeal.

So, I could have reverted to the Microsoft configuration service approach, using a SQL server database, in which case I could take advantage of clustering also. But the applications want impossibly high availability - and in a fail over situation, there is a latency (certainly in my case) which could lead to unexpected application failures.

Thus enters RavenDB. It's lightweight, and I decided I could deploy a set of replicating instances in the WCF service cluster, in a multi master configuration. And it works well - at least in my current development tests; deployment to production is a little way off (and there is the small matter of RavenDB licenses, as my project is not open source).

So what do I get from this arrangement:
  • A simple persistence model (JSON based) for configuration documents
  • Per machine redundancy
  • Very high performance
  • Versioning and Replication 'bundles'
The idea appealed initially when I realised that I was of course dealing with XML documents to house my configuration, and at run time they were parsed and converted into host/application specific configuration sets. When you see them (as they are) in this light, the choice of a document database becomes relatively obvious. Sure, using SQL server was a viable option, but there were constraints for me that meant a document database was a superior choice.

I have a few issues with the current RavenDB conflict resolution approach, but nothing I haven't been able to work around.

As a final gesture to availability, the configuration service utilises a'provider' model. There are two providers therefore; one RavenDB flavoured, one the old XML document style. At run time a provider factory serves the duly configured provider, and has policies that can be associated with the factory to allow 'fall back' to a secondary option if the primary provider fails in execution. In my case, this means that the RavenDB provider is the primary, the XML document provider the secondary.