Recently, I spent nearly three full working days debugging the damndest thing: a simple NHibernate Linq query in an Azure test environment was timing out. Together with the SqlAzure client driver that does transient fault handling by retrying queries this resulted in a situation, where a specific page would never load, causing instead a huge spike in the database resource usage.
Of course, as it tends to be with these things, the same query against a local SQL Server database worked just fine.
Possibly the strangest part was that after obtaining the query through NHProf, I tried running the same query via SQL Server Management studio, and the mean execution time of the query was between 100ms and 200ms. Accordingly, I had a hell of a time believing that the issue was an inefficient query as such.
I even tried creating a raw ADO.NET query that had the same command text and parameter specifications… and it executed in under 200ms.
I was about to give up when I had the idea of running both the slow and the fast query against a local database with the SQL Profiler enabled, because while there was no discernible difference in execution time against the local database, perhaps I’d be able to make out some difference in the way the queries were executed.
At first, it looked like the queries were identical from the server’s perspective, too.
But then, I noticed the difference.
The slow query declared that the single parameter’s size was 4000 whereas the fast version said it was 1.
Realization began to dawn, and I re-ran my raw ADO.NET query against the Azure database, but with the parameter size set to 4000 — and wouldn’t you know it, the timeout manifested itself immediately.
My current hypothesis is that what’s going on underneath it all is that the database is waiting for a value for the parameter and it never shows up, which is what causes the timeout. Another strange factor is that the issue doesn’t reproduce with all Azure SQL servers.
All this is triggered by a change to NHibernate Core where the SqlClientDriver skips setting the parameter size to a meaningful value, and instead sets it to a default of 4000.
Fortunately, the workaround is simple: I extended the client driver code with a very specific special-case workaround that sets the parameter size:
It may eventually turn out that I’ll need to handle more than just the one-character parameter size, but for now, this seems to fix the issue.