ExecutionEngineException in ASP.NET MVC when mixing .NET Framework and .NET Standard assemblies

TL;DR: If you mix .NET Standard assemblies that depend on the System.Net.Http NuGet package into your .NET Framework ASP.NET MVC app, you are likely to encounter runtime crashes when your AppDomain unloads and reloads. The solution involves a binding redirect and a project build time setting.

This one was fun to figure out. I’m working on a project, where I really, really wanted to use Microsoft’s SpaServices package to enable server-side rendering for a Node frontend app. Fortunately, it only takes a little bit of glue to make them stick to an ASP.NET MVC 5 app, and you’re off to the races.

Unfortunately, it also comes with a nasty side effect of periodically crashing your app during development.

It took me a while to recognize the pattern, but after a dozen or so crashes, I finally figured out, that the exception always occurred after I had either edited the Web.config file or rebuilt the binaries — that is, whenever the AppDomain was unloaded and the app restarted.

The problem was initially triggered by ASP.NET Web Api’s EnsureInitialized call, but the culprit at the top of the stack was actually RuntimeAssembly.GetTypes() — so when I managed to trick Web Api into not triggering the problem, it was pushed forward to something Glimpse did. When I removed Glimpse, it moved to something that the ASP.NET MVC infrastructure did, and so on.

The actual problem is a combination of different issues: when running on the desktop framework, the System.Net.Http version you’re supposed to use is the one shipped in the framework. However, that version has some weird versioning quirks due to the fact that it isn’t just a .NET assembly, it’s also a Windows component. This leads to it having a version number that is smaller than the NuGet counterpart, which, in some edge cases, leads to the wrong assembly ending up in your process. I’m not sure where the memory corruption comes into play (and I’m not sure I want to know 😛), but fortunately the fix is simple.

You need to add a binding redirect for System.Net.Http and set ImplicitlyExpandNETStandardFacades to false in your project, as described in this GitHub comment.

Ah, System.Net.Http. Since it began shipping out-of-band, it’s just the gift that keeps on giving

I’m not bringing much new information to the table, but I hope this post at least helps someone else find the solution to the same issue without spending hours diagnosing it.

TypeScript Duplicate Identifiers when using npm link

Last night I ran into an issue with TypeScript compilation. I’m working on a frontend web project that uses a library we are developing in another repository. Right now, we’re simultaneously working on both repos, so writing a feature from end to end is a bit of a pain, if one has to first commit to the dependency repository, then pull from there to get the latest version for the app itself. Fortunately, there’s a solution for that: npm link / yarn link. This feature allows you to substitute a local version for a dependency, using symbolic links (or NTFS junction points, if you’re Windows-inclined, like myself).

The problem arises when both your main app and your linked dependency have one or more same dependencies. For example, in our case, the shared dependency was react-intl. If the dependency is not linked, everything works as well as things usually work in js-land (🙄), but once the linking happens, things begin to break.

The core of the problem is that at some point, the TypeScript compiler will encounter two different, possibly identical versions of the same dependency declaration, at which point it will give up and produce a whole mess of errors.

The fix is simple enough, and pretty well documented already: at the top level, where you run the compilation, you add a path mapping, like so:

    "baseUrl": "./src",
    "paths": {
      "*": [
        "../node_modules/@types/*",
        "*"
      ]
    }

What this means (as I understand it) is, for every non-cwd-relative module resolution the TypeScript compiler attempts, it will try the paths specified in the array. This means that when encountering the dependency in the dependent module, it will look for it in the parent module’s node_modules/@types directory, and since it finds it there, it will look no further. This coalesces the duplicate dependencies into one instance, and voila, problem solved. Path mappings can be more specific, too, if you need to target a single problematic package. In this case, I wanted this behavior for all deps, so I went with the easiest route.

Large number of connections when using MongoDB C# driver 2.x

Long time, no posts and whatnot. To break the ice, here’s something that’s fairly well documented, but not necessarily that obvious, that I learned today.

TL;DR: keep your MongoClients as singletons, like the documentation damn well tells you to do. A more detailed explanation follows.

Yesterday, I deployed an app to production. It was a fairly major deployment in that it involved upgrading the infrastructure: I had updated the app to use the MongoDB C# driver 2.x series, and also upgraded the actual production database from MongoDB 2.x to the latest 3.6. The app was fairly well tested, but of course, the one thing that very rarely gets tested for is production load.

I had performed the driver upgrade as a fairly mechanical search-and-replace type exercise, and while most operations were easy to replace, the one thing that was missing was the ability to disconnect from the server. I hit the documentation, and found out that it says the following:

It is recommended to store a MongoClient instance in a global place, either as a static variable or in an IoC container with a singleton lifetime.

However, multiple MongoClient instances created with the same settings will utilize the same connection pools underneath.

Reading a bit more on the topic confirmed that there was no need to disconnect the client, so all was well in the world. What I didn’t do, however, was to register the client as a Singleton — because I wanted to keep the changes to a minimum, and the docs stated that it wasn’t required, even if it was recommended.

In the next changeset, I added some telemetry: I wanted to log the duration of MongoDB operations, so I could use our monitoring to see if our Mongo operations got very slow. I found a post titled Monitoring MongoDB with Application Insights and followed its instructions. And here’s where things went wonky.

See, the article’s example uses a lambda function as the cluster configurator, which isn’t bad as such, but. The documentation I quoted above? In full context, it looks like this:

However, multiple MongoClient instances created with the same settings will utilize the same connection pools underneath. Unfortunately, certain types of settings are not able to be compared for equality. For instance, the ClusterConfigurator property is a delegate and only its address is known for comparison. If you wish to construct multiple MongoClients, ensure that your delegates are all using the same address if the intent is to share connection pools.

Combine that with the fact that my MongoClient registration was per-request, and ta-da, I’ve effectively disabled connection pooling with no possibility to dispose of the connections. So perhaps it wasn’t that surprising to see MongoDB log the following:

2018-01-18T08:17:35.340+0200 I NETWORK [listener] connection accepted from 127.0.0.1:52669 #4457 (4455 connections now open)

Yeah.

Fortunately, the fix was rather simple: move the client registration to be a singleton, and that’s it.

Mind you, there’s nothing wrong with the instructions in the post I linked to. Had I treated the client instance like the docs suggested, I wouldn’t have had any problems.

Moral of the story? If the documentation recommends something, it’s probably a good idea to do it, I guess.

ASP.NET Core and Assembly Binding Redirects

During the last year, I’ve been a part of launching two production sites that run on ASP.NET Core, and as a company, we’ve had enough dealings with the budding framework that we arranged a full day’s seminar on the topic.

Needless to say, using a framework in anger at this point of its development has led to all kinds of interesting discoveries, the kind that you typically only ever make on the bleeding edge.

Where have my assemblies gone?

One of the major changes in .NET Core compared to the full .NET Framework is that there is no more Global Assembly Cache. All assemblies – including most if not all of the runtime itself – will be shipped as NuGet packages, which means that the assembly loading story is a fairly major departure from the way things used to be. However, .NET Core is not always a viable platform: for instance, currently there is no production-ready server-side image processing capability since System.Drawing is not cross-platform*. Given that constraint, we’ve ended up deploying our production ASP.NET Core applications on the full .NET framework, and the full FX still has the GAC.

Currently, ASP.NET Core on the full FX loads assembly dependencies by hooking up AppDomain.AssemblyResolve to work its magic. When your code tries to interact with an assembly that is not yet loaded, the runtime looks for the assembly in your NuGet packages. However, there’s a key phrase in the documentation for the event: “Occurs when the resolution of an assembly fails.” This means that regular assembly binding rules are attempted first.

Typically, this would not be a problem. When you deploy your application, you deploy the NuGet dependencies, and the GAC only contains the framework’s assemblies. However, sometimes you will have a rogue application on your system that installs something to the GAC, and things may go a little pear-shaped.

DocumentDB deserialization woes

Consider this example: our app uses Azure DocumentDB as one of its data stores. The .NET DocumentDB client library uses JSON as its serialization format, and in particular, Newtonsoft.Json as its serialization library. One of the things you can do with that combination is specify that the serialized name of your property is different from the one declared in code, by annotating the property with JsonPropertyAttribute. Now, our app opted to use one of the latest builds of Newtonsoft.Json (7.x), and for the most part, everything worked beautifully. However, my development system had an installed app that, unbeknownst to me, registered an older version of Newtonsoft.Json into the GAC.

Unfortunately, the runtime assembly version of the GAC assembly matched the requirements of the DocumentDB client library exactly, so the runtime loaded that assembly for the DocumentDB client. The practical effect was that when the DocumentDB client (de)serialized objects, it never noticed the JsonPropertyAttribute that we were using. The net result? A single property in that class was never (de)serialized correctly.

It took me a while to figure out what was happening, but the key insight was to look at the loaded modules in the debugger and notice that indeed, we now had two copies of Newtonsoft.Json in memory: the version from the GAC and the version we were specifying as a dependency. Our own code was using the JsonPropertyAttribute from version 7.x whereas the older version of Newtonsoft.Json was looking for the JsonPropertyAttribute specified in that assembly. While the properties were identical in function, they were different in identity, so the attribute we were using was ignored entirely.

Wait, isn’t this a solved problem already?

If you’re a seasoned .NET developer, at this point you are probably thinking “binding redirects”. At least we were – but the question was, where to put them? Another major change in ASP.NET Core is that your application configuration is entirely decoupled from both the configuration of the runtime and the configuration of your web server. Which means that in a fresh out-of-the-box web application, you do have a web.config, but it is only used to configure the interaction between IIS and your application server, Kestrel.

Since Kestrel is running in a process outside IIS, it’s reasonable to expect that Web.config doesn’t affect the behavior of the runtime in that process. And indeed, it doesn’t. But the new configuration system doesn’t have a way to specify the configuration of the .NET runtime either. So what does that leave us?

After a little bit of to-and-fro with the ASP.NET Core team, the answer finally came up: the runtime configuration still exists, but its naming conventions are different from what we are used to. If you create a file called App.config (yes, even when it is a web application) and specify your binding redirects there, they will be picked up, and all is well in the world again.

The configuration file has the same schema as you would expect from a Web.config or a standalone executable’s App.config. The resulting file looks like this:

<?xml version=”1.0″ encoding=”utf-8″?>
<configuration>
<runtime>
<assemblyBinding xmlns=”urn:schemas-microsoft-com:asm.v1″>
<dependentAssembly>
<assemblyIdentity name=”Newtonsoft.Json” culture=”neutral” publicKeyToken=”30ad4fe6b2a6aeed” />
<bindingRedirect oldVersion=”0.0.0.0-4.5.0.0″ newVersion=”7.0.0.0″ />
</dependentAssembly>
</assemblyBinding>
</runtime>
</configuration>

Hope this helps anyone else encountering the same problem, however uncommon it may be!

(* My colleagues pointed out that I neglected to mention the fact that System.Drawing is not a production-ready server-side image processing solution either, given that it uses GDI+ which uses process-wide locks, and therefore essentially makes the image-processing parts of your app single-threaded.)

Dotless 1.4.4 is out

Idle hands do… open source?

During the last few weeks, I’ve done something I’ve wanted to do for a longer time, and stepped up my involvement in Free Software somewhat. It started out as kind of an accident: I was encountering an annoying bug that I attributed to our usage of dotless in one of our projects, and I went to the issue tracker looking for reports of a similar problem. Having looked at the tracker for a moment, I then checked the corresponding code and noted that, yes, dotless does in fact do the correct thing. Then I proceeded to look at my own code, and it took me all of five seconds to spot the obvious bug.

A bit embarrassing, sure, but not useless. Because while I was looking through the issue tracker, I noted that some of the issues were of a fairly simple nature — maybe even something I could fix? I remembered fixing a bug back in ’10, so I then went through the list of closed pull requests, and noted that I had contributed no less than five PRs.

During that weekend, I came down with the flu and skipped work. However, I used some of that downtime to work on dotless — given that I had no time constraints or expectations of efficiency, I could spend a moment here and another there to fix a bug or two. First, I ended up going for the low-hanging fruit. I ended up creating about a dozen pull requests — some with bug fixes, some with actual new features.

After giving things about a week to settle, I then asked the current maintainers if they might accept me as a core contributor, since they didn’t seem to have the time to process the pull requests. Not long after that Daniel granted me contributor access to the project, and off I went, merging the PRs in and cleaning up the issue tracker.

Sweet release

Of course, not everything went perfectly: I intended to release dotless 1.4.3 about a week after having merged the fixes in. And I did — except that I messed up the NuGet packaging so that the standalone dotless compiler was left out of the package. And instead of releasing 1.4.3.1 with the fixed package as I should have, I bumped up the version to 1.4.4. I expect that won’t be much of a problem for anyone, though, so I’m not feeling too bad. After all, I did fix a number of inconsistencies, crashers and things like Bootstrap not compiling when minified. So maybe I can forgive myself a bit of a blunder there. 🙂

What next?

The less.js guys are thinking about building a .NET wrapper around less.js. It’s an interesting idea, to be sure: that way, the .NET implementation would never need to play catch-up with the official version. However, I still believe there’s merit in having a “native” .NET implementation, so I’m going to keep at it for now.

For the next release, I’ve already got @import options, variable interpolation improvements, list arguments and improved mixin guards. Porting the less.js test cases to give me a rough idea of how far behind is a logical next step. I’d like to aim for feature parity for 1.5 — on the other hand, maybe more frequent releases with smaller, incremental improvements would better serve the project. At the very least, 1.5 should fully support Bootstrap and KendoUI.

A large slice of my professional history is in line-of-business software with user bases ranging in the dozens or hundreds. It’s exciting and a bit frightening to be taking responsibility of a project that has, over the course of years, been downloaded over 400 000 times from NuGet.org. Time to see if I’m up to the task!

NHibernate TimeoutException with Azure SQL

Recently, I spent nearly three full working days debugging the damndest thing: a simple NHibernate Linq query in an Azure test environment was timing out. Together with the SqlAzure client driver that does transient fault handling by retrying queries this resulted in a situation, where a specific page would never load, causing instead a huge spike in the database resource usage.

Of course, as it tends to be with these things, the same query against a local SQL Server database worked just fine.

Possibly the strangest part was that after obtaining the query through NHProf, I tried running the same query via SQL Server Management studio, and the mean execution time of the query was between 100ms and 200ms. Accordingly, I had a hell of a time believing that the issue was an inefficient query as such.

I even tried creating a raw ADO.NET query that had the same command text and parameter specifications… and it executed in under 200ms.

I was about to give up when I had the idea of running both the slow and the fast query against a local database with the SQL Profiler enabled, because while there was no discernible difference in execution time against the local database, perhaps I’d be able to make out some difference in the way the queries were executed.

At first, it looked like the queries were identical from the server’s perspective, too.

But then, I noticed the difference.

The slow query declared that the single parameter’s size was 4000 whereas the fast version said it was 1.

Realization began to dawn, and I re-ran my raw ADO.NET query against the Azure database, but with the parameter size set to 4000 — and wouldn’t you know it, the timeout manifested itself immediately.

My current hypothesis is that what’s going on underneath it all is that the database is waiting for a value for the parameter and it never shows up, which is what causes the timeout. Another strange factor is that the issue doesn’t reproduce with all Azure SQL servers.

All this is triggered by a change to NHibernate Core where the SqlClientDriver skips setting the parameter size to a meaningful value, and instead sets it to a default of 4000.

Fortunately, the workaround is simple: I extended the client driver code with a very specific special-case workaround that sets the parameter size:

It may eventually turn out that I’ll need to handle more than just the one-character parameter size, but for now, this seems to fix the issue.

Blog move

Just a quick note: I’ve grown tired of Blogger regularly doing a hatchet job with my post markup and the limited options they give me for customizing the look and feel of the site, so I’ve moved to a new host.

I tried my best to not mess with the existing post URLs, but I’m afraid that at least the RSS feed urls will have changed.

Windows Azure Active Directory: Using the Graph API with an Office 365 Tenant

In a previous post I wrote about querying the Graph API for the group memberships of a WAAD user. In yet another post, I detailed setting up a WAAD tenant, but I also noted that you don’t have to do that if you’ve already got an Office 365 subscription, and would prefer to use that instead.

While that statement is technically true, the story is far from complete. As commenter Nick Locke points out, my instructions for using the Graph API don’t work for Office 365 tenants: the code throws an exception while trying to get the authorization token for the request.

After a bit of trial and error, I managed to figure out a fix.

Once again, open up the Microsoft Online Services Module for Windows PowerShell. Run Import-Module MSOnlineExtended -Force to load the module, use Connect-MsolService with admin credentials to establish the connection, and type in:

$adPrincipal = Get-MsolServicePrincipal -AppPrincipalId 00000002-0000-0000-c000-000000000000
$adPrincipal.ServicePrincipalNames

If your Office 365 setup is like ours, the result will look a lot like this:

00000002-0000-0000-c000-000000000000
AzureActiveDirectory/graph.windows.net
AzureActiveDirectory/directory_s5.windows.net
AzureActiveDirectory/directory_s4.windows.net
AzureActiveDirectory/directory_s3.windows.net
AzureActiveDirectory/directory_s2.windows.net
AzureActiveDirectory/directory_s1.windows.net
AzureActiveDirectory/directory.windows.net
AzureActiveDirectory/Directoryapi.microsoftonline.com

Looking at this list, it occurred to me that the problem might be related to the fact that we construct Realm Names for our application using the GUID 00000002-0000-0000-c000-000000000000 and the domain name graph.windows.net — the result looking like: 00000002-0000-0000-c000-000000000000/graph.windows.net@your-app-principal-id. So I figured I’d change my app to use the string AzureActiveDirectory instead of the GUID, and wouldn’t you know it, I got past the authorization token error. Only now I ended up with another error: validating the token didn’t work.

After a bit of head-scratching and web-searching, I thought I had it backwards: it wasn’t my app that was misconfigured, it was the Office 365 tenant. Suppose that the Graph API service actually identifies itself as 00000002-0000-0000-c000-000000000000/graph.windows.net and not AzureActiveDirectory/graph.windows.net, and that’s why things are failing.

I repeated the commands above in a MSOL PowerShell session that was connected to the WAAD domain I created before, and noted that indeed, the Service Principal Names for Microsoft.Azure.ActiveDirectory were otherwise identical, but AzureActiveDirectory was replaced by 00000002-0000-0000-c000-000000000000. So I cooked up this PowerShell bit:

I then ran it against the Office 365 MSOL PowerShell session and tried again. And wouldn’t you know, it worked!

Apparently, we’re not the only ones with this problem, so let’s hope this helps save someone else the trouble!

TechDays 2012

I was up on the stage today at TechDays 2012 Finland, giving my first big presentation!

The title of my talk was roughly “practical difficulties of unit testing”. I started out with a brief intro to the tools – test frameworks, test runners and mock libraries. After that, I discussed unit test quality issues by taking an example from a recent project I worked on, and dissecting the issues I had with it. Finally, I talked at some length about breaking dependencies, which in my experience is usually the first obstacle that trips beginners.

The talk was recorded, and the video will be up for grabs at some point. Both the talk and the accompanying slides were in Finnish, but I guess so are most of my readers. 🙂

Renaming an LDAP entry

I spent a better part of today struggling with the ModifyDNRequest class from the System.DirectoryServices namespace. I was trying to rename a bunch of entries that had invalid escaping in their DNs (thanks to yours truly botching the escaping, sigh), and I kept getting “the distinguished name contains invalid syntax” errors.

Here’s what MSDN says about the NewName property on the request class:

The NewName property contains the new object name.

So I sort of assumed it’d be the new CN for the object I was renaming – namely, the user’s username. But that didn’t work out. After digging around for a while, I noticed that here’s what MSDN says about the constructor parameters:

Parameters

distinguishedName
Type: System.String The current distinguished name of the object.
newParentDistinguishedName
Type: System.String The distinguished name of the new parent of the object.
newName
Type: System.String The new distinguished name of the object.

Alas, the last bit is rather misleading. It looks like it’s supposed to be the new DN for the object. That’s not the case either. What you’re actually supposed to use is the new relative distinguished name, which is the most significant part of the DN: the CN of the object… along with the cn= prefix.

PS: Considering that the object essentially takes three different distinguished names as arguments, it’d be real nice if it was good enough to tell which one of them was the one with the invalid syntax.

Cheers.

1 2 3 11