ConcurrentDictionary in Caching Implementation

April 16, 2013

The .net class ConcurrentDictionary<T1, T2> does provide thread-safe methods to store and read objects with a unique key. This ability makes it a good candidate for implementing memory based “level 1 caching” for fast access to cached information from long running database request.

When implementing such a caching using this class, you should be aware that methods like “GetOrAdd” really are thread safe, but this does not mean that they will behave as you might expect.

For demonstration purpose I will replace the long running database request simply with a “Thread.Sleep(500);” and returning a unique value.

I’ve this little unit test to demonstrate the first issue with the ConcurrentDictionary:

    [TestMethod]

    public void OverlappingCallsToGetOrAddDontBlock()
    {
        var target = new ConcurrentDictionary<string, string>();
        var code1 = false;
        var code2 = false;
        var code2ExecutesAfterCode1 = false;
 
        var thread = new Thread(
            () => target.GetOrAdd(
                “Hello”,
                s =>
                    {
                        code1 = true;
                        Thread.Sleep(500);
                        return s + “1″;
                    }));
 
        thread.Start();
        Thread.Sleep(100);
        var start = DateTime.Now;
        var result = target.GetOrAdd(
            “Hello”,
            s =>
                {
                    code2ExecutesAfterCode1 = code1;
                    code2 = true;
                    Thread.Sleep(500);
                    return s + “2″;
                });
 
        var end = DateTime.Now;
 
        // the result should be “Hello”
        Assert.AreEqual(“Hello1″, result);
 
        // the first code fragment has been executed
        Assert.IsTrue(code1);
 
        // the second code fragment has been executed (this
        // would be “false” if the second invoke of GetOrAdd
        // would wait for the first to complete)
        Assert.IsTrue(code2);
 
        Assert.IsTrue(code2ExecutesAfterCode1);
        Assert.IsTrue((end – start).TotalMilliseconds > 450);
    }

As you already can see by reading the comments and asserts, the call to GetOrAdd does execute the generation of the value for the key “Hello” two times. This is because the first result has not been added to the internal value collection when the 2nd call starts. Even more disturbing for me is: the 2nd call provides a delegate that returns “Hello2”, that is being executed but the return value is “Hello1” (even if “Hello2” is the “newer” value).

It depends on the timing:

0 ms We’ve started a thread that takes 500ms to generate a value for “Hello” – so that value is added to the internal collection of the ConcurrentDictionary.
100 ms While this value-generation we requested the value for the key “Hello” again – this will lookup the internal collection for the value of key “Hello”, will not find anything, so it will start the value creation again.
500 ms The value from the first call inside the additionally started thread will be added to the internal collection.
600 ms The second call did finish generating its value and tries to insert it into the internal collection – but there is already the value from the first call. In this case it does NOT update the value inside the ConcurrentDictionary, but uses the value from the internal collection instead (“Hello1”).

In this case the generation of the second value was a totally waste of time. In case of the second call ending before the first one, both calls will return the second value … which value will be “the one” depends on when the call starts (before adding a value for the key) and when it ends.

If you use the ConcurrentDictionary as a simple storage for a cache on a server side, you might get into trouble when you start up your application and get multiple requests that need potentially cached information. They will all start nearly the same time, so requesting a page 10 times a second that might need 2 seconds to get some “normally cached” data will case 20 concurrent queries for that data – what might not be what you really want to do. You should carefully design and test your caching – there might be an open source caching library that already implements what you need, so you don’t risk wasting that amount of CPU-time and IO-load.

I’ve talked about the “first issue” with using this class in a self implemented cache – so what’s the “second issue”? The second issue is security related and I will post that in a week or two (depending on my other workload).


PDF as the new main document format for Microsoft Word?

April 8, 2013

Recently a friend of mine told me that Microsoft Word is now capable to use PDF as its persistence format – you can now save documents as PDF and you can open PDF documents for editing. And because of this he assumes PDF to not be a secure file format to transfer documents that must not be altered until you protect it with a password.

This got my attention because of some issues I have with this “information” – I’ll just post the two most important:

  1. PDF is not a format that has been designed for saving text documents that will be edited again. PDF has been enriched with a lot of things that are contrary to its initial target (describing the output of a formatted document that is device independent and durable). I’ve seen PDF documents that actively call web services like applications do. IMHO it’s NOT a good idea to use things for a certain work just because there is a way to use them – I try to always use the tools for things they are designed for (e.g. I don’t try to force a nail into a wall using my cordless screwdriver – even if it would work). Opening a PDF to edit it is like opening a EXE with a decompiler to add a new feature.
  2. Simply saving a document as PDF never was a way to secure the content from being altered. Adding a password does not really make this better (http://www.pdfunlock.com/). To protect something against manipulation the correct tool is a time stamped signature – not a password or a specific file format.

So what I want you to take from this post:

  • Use tools the way the tool-designer had in mind!
  • If you are not 100% sure, what tool to use: RTFM – don’t ask your naighbor … he might didn’t read the FM ;-)

Clean Architecture Design – the weakness of “Defer Decision”

Februar 27, 2013

In this video (http://www.youtube.com/watch?v=asLUTiJJqdE) “Uncle Bob” (Robert C. Martin) talks about clean architecture. While I agree with many of his statements completely (mainly because from my experience I can tell you: many things simply are true – no matter whether you believe in them or not), there are some things I really do not agree with. One of them is “Defer Decision of Database and Data Model”. But wait a moment before you tell scream “But deferring decisions is what’s that whole talk is about!” – Because I think: it’s not.

The key point is not to decide late on one technology or another – you can build a very clean architecture even if you already have some technology decisions done. You can build a clean architecture in a project environment where MS-SQL is mandatory for business data storage, where you only have .NET developers with web experience and IE is the only browser allowed on the client PCs. The key is not to not already have that decisions – the key is to not let them dominate your mind while architecting the system.

Another issue I see with his approach (building the Business User Cases upfront) is that you need to decide e.g. the maximum length of a name, street etc. in order to be able to store it somewhere in a structured way. If you start designing use cases, you might analyze the sample data of that use cases and design your system using that data. For simplicity I will assume something really stupid outside the software development world: I will analyze the specification of a “house”. The specification says: “we need some doors”. Well, that’s ok, but what size of doors? Well, we will “unit-test” the door by letting all the project members go through that door and we see that 90 % of them will match with a door height of 1.80 m. Well, that’s nice, isn’t it? We can buy 20 of those 1.80 m doors and put them into the building, since we know the people that will use this house and we even did a better job than the 80/20-rule that states the last 20% of your goal might cost you 80% of the price, so 80% match might be “good enough” (test went green).

Obviously we did the wrong decision – and it will be very costly to fix that, because we might also did the wrong decision about the room height. So what went wrong in this stupid example? Simply that: we did concentrate of a specific small group of test cases that are part of our “current use cases” and we didn’t expect the test data to be incomplete, because it did match the use cases. How could we have prevented that issue? Multiple ways:

  1. Ask someone with experience – well in software design, this might be an issue, because if there is already a system that does the same thing as your planned software (only in that case there is someone with the experience), why are you designing a new system? If that existing software does not fit, the experience of the person who built it might not be such a good guidance.
  2. Wait for the decision what door to order as long as you can … well sometimes that’s not an option: to size the height of the rooms correctly, you will need to make assumptions early. You might defer the decision about the door, but then this decision will be done implicitly – without careful considerations. You might hope that this implicit decision (door height < room height) is a good one … but: “Hope is not a management tool!”
  3. Look for better test data – that’s the way I would prefer, but the problem in this case is “What is better test data? And how do we ensure that this new test data would not be restricted to the current use cases as the data before?”

To create really good data models they don’t need to include each and every aspect you might need in the future – there is a very simple rule to follow that will allow you to design systems that last a huge amount of time with very little redesign: “Do not design things that are not true!”. If you have such a decision about the height of a door, ask everyone in the team: what would make our assumption “people need a max. height of a door of 1.80 m” wrong? The will tell you that there might be even taller people in other projects – then you should consider sticking to a standard (and for nearly everything there is a standard) or start research about people height.

But back to the database decision – how would the early decision of a database and its structure help you? Wouldn’t it hurt? Well it will not hurt if you already have such a decision in place (most enterprises do have policies about that) – and the chance is really low (if not equal to zero) that you will have to change that database system if you are writing in-house software. Will it give you an advantage? Yes: it will allow you to start modelling and store information in a persistent way, that will last for more than one iteration and it will force you (and your customer) to start re-thinking your model. And that’s what it’s really all about: think carefully about your model decisions – do not model anything that’s not 100% real. If thinking about the model takes more time, delaying decisions makes sense – but deferring decisions by its own does not add any value to the architecture … you can make bad decisions any time if they do not match reality.


Adventures in Windows 8 Sideloading

November 29, 2012

Windows Store Apps (formerly known as „Metro Apps“) are meant to be distributed over the Windows App Store, but in contrast to Apple, Microsoft does provide a supported way to install such apps without consulting the App Store: sideloading. This feature has been invented for enterprises that do have their own software development team and need to have 100% control over the intellectual property inside the software.

The first drawback of the sideloading feature is that it’s available out of the box only on domain joined computers – this means: you cannot use sideloading on windows versions that cannot join a domain (e.g.: Windows Home). This includes Windows RT, too. There is a product called Sideloading Activation Key available in bundles of 100 keys for about 3000 $US offered to enterprises that want to implement BYOD with WinRT devices like the Microsoft Surface.

The next issue was some more cumbersome: you need to digitally sign the installation package (the APPX-package) with a trusted certificate. This seems an easy one, if and when you can use a company CA. But if you have two different customers that want to use your app and you cannot use their CA to sign your package, you probably don’t want to have them to install your CA certificate as a trusted root CA (this would be a really bad idea from the security perspective).

So you need another commonly trusted CA to sign your Code Signing Certificate – this is where commercial CAs comes into the game.

Commercial CAs do provide a process in which a Code Signing Certificate will be issued after validating your identity. In case of Windows Store Apps distributed via the Microsoft App Store this CA is Microsoft itself – so by uploading your App to the store, it will be digitally signed by Microsoft and everything is fine. If you want to sign for sideloading you need to buy a Code Signing Certificate from a commercial CA – but when you search for Code Signing Certificates, you will only find such for Microsoft Authenticode or Windows Phone … not for Windows 8 Apps.

So because no one seems to guarantee that the certificate will work with Windows Store Apps, I decided to buy one from Thawte – I had already positive experience with them in the past, so I thought that their support personnel will be helpful, when something does not work. And of course: the certificate did not work.

We had some interesting experience with Visual Studio (the error message was that the certificate does not meet “the requirements” – no word about what specific requirement) Microsoft support (don’t want to go into details about that here, just to say that after some tries with the US support, the German support engineer was really helpful), but in the end we spotted the issue: there was an additional “Extended Key Usage (EKU) OID 1.3.6.1.4.1.311.2.1.22” in the Thawte certificate. The support engineer at Thawte simply issued me a Verisign coded signing certificate (Verisign and Thawte are both property of Symantec, so this is not a real issue for them), which does not contain that EKU.

At the moment of this post Thawte is not able to provide a certificate without that EKU, but the Verisign certificate does simply work. I will post an update, when I get the info from Thawte that they will support such kind of certificates. Until then you should stick with Verisign certificates – even when they are much more expensive.


Adventures in Windows 8 Sideloading

November 29, 2012

Windows Store Apps (formerly known as „Metro Apps“) are meant to be distributed over the Windows App Store, but in contrast to Apple, Microsoft does provide a supported way to install such apps without consulting the App Store: sideloading. This feature has been invented for enterprises that do have their own software development team and need to have 100% control over the intellectual property inside the software.

The first drawback of the sideloading feature is that it’s available out of the box only on domain joined computers – this means: you cannot use sideloading on windows versions that cannot join a domain (e.g.: Windows Home). This includes Windows RT, too. There is a product called Sideloading Activation Key available in bundles of 100 keys for about 3000 $US offered to enterprises that want to implement BYOD with WinRT devices like the Microsoft Surface.

The next issue was some more cumbersome: you need to digitally sign the installation package (the APPX-package) with a trusted certificate. This seems an easy one, if and when you can use a company CA. But if you have two different customers that want to use your app and you cannot use their CA to sign your package, you probably don’t want to have them to install your CA certificate as a trusted root CA (this would be a really bad idea from the security perspective).

So you need another commonly trusted CA to sign your Code Signing Certificate – this is where commercial CAs comes into the game.

Commercial CAs do provide a process in which a Code Signing Certificate will be issued after validating your identity. In case of Windows Store Apps distributed via the Microsoft App Store this CA is Microsoft itself – so by uploading your App to the store, it will be digitally signed by Microsoft and everything is fine. If you want to sign for sideloading you need to buy a Code Signing Certificate from a commercial CA – but when you search for Code Signing Certificates, you will only find such for Microsoft Authenticode or Windows Phone … not for Windows 8 Apps.

So because no one seems to guarantee that the certificate will work with Windows Store Apps, I decided to buy one from Thawte – I had already positive experience with them in the past, so I thought that their support personnel will be helpful, when something does not work. And of course: the certificate did not work.

We had some interesting experience with Visual Studio (the error message was that the certificate does not meet “the requirements” – no word about what specific requirement) Microsoft support (don’t want to go into details about that here, just to say that after some tries with the US support, the German support engineer was really helpful), but in the end we spotted the issue: there was an additional “Extended Key Usage (EKU) OID 1.3.6.1.4.1.311.2.1.22” in the Thawte certificate. The support engineer at Thawte simply issued me a Verisign coded signing certificate (Verisign and Thawte are both property of Symantec, so this is not a real issue for them), which does not contain that EKU.

At the moment of this post Thawte is not able to provide a certificate without that EKU, but the Verisign certificate does simply work. I will post an update, when I get the info from Thawte that they will support such kind of certificates. Until then you should stick with Verisign certificates – even when they are much more expensive.


OpenSource – take, then give…

Oktober 5, 2012

At work (@ SDX) we are currently working on a Windows 8 App using Callisto as a component (actually we are using NuGet to include the library to be able to update it easily and use it with a CI build process via TFS 2012). Today just before lunch I updated the package and got an “OverflowException” from a “ColorBrightnessConverter”. We don’t use that class in our markup, so I tracked the usage down to the SettingsFlyout class from Callisto that we do use in an auto-generated settings page (more on that in a later post). Unfortunately the markup is also directly from Callisto, so we cannot fix it directly. The issue was a conversion from string to int using the code line

var factor = System.Convert.ToDouble(parameter);

You might realize the issue that we ran into when I show you the fixed line:

var factor = System.Convert.ToDouble(parameter, CultureInfo.InvariantCulture);

The only little problem was a missing hint for the conversion method what culture to use. Because the original code does not provide a culture, Convert.ToDouble() assumes we want to use the current app’s language. And because I was running that code on a German language Windows 8 installation with a manifest allowing German and English language, it was simply assuming that I want to use the German number format, which interprets the “.” character as a thousands separator (to allow additional languages you might simply open the manifest using the “XML (Text) Editor” of visual studio and add more entries to the “<Resources>” tag – I added “<Resource Language=”en” />” and “<Resource Language=”de” />”).

To fix it, I had to download the code, fix the code and compile it – but I also did drop a bug report with the line of code that did cause the issue and the line of code that solved the issue. So I first took the Open Source library to have an advantage, but then – when I was able to – I gave something back: a bug fix … and that’s something that I think you should do, too. If you find some useful code that’s Open Source, you should use it. And when you find a way to contribute to that project, simply try to be constructive in the things you give to the project. There’s no honor in shouting at the project owner that she/he did miss one thing in a project – offer them a way to improve the project with minimal impact; just like telling them how changing a single line of code did help you with fixing an issue.

By the way: Tim Heuer did respond really fast to that issue report and included the fix in less that 6h. Thinking about the project to be provided for free and maintained in free time, this is amazingly fast – thanks a lot Tim!


Simpler configuration sections

September 17, 2010

Following the documentation of the ConfigurationSection class is a good idea as long as you want a sophisticated full blown and end user friendly configuration section. But sometimes you just want a way to persist some information inside a config file – then simple deserialization may be a better alternative.
I want the following configuration section to switch a feature on or off:

<MyConfig>
  <SuppressAll>false</SuppressAll>
</MyConfig>

To represent this in code i write this simple class:

public class MyConfig
{
    public bool SuppressAll { get; set; }
}

Now I need to implement a reader-class for the config section:

public class ConfigReader : ConfigurationSection
{
    private static MyConfig current;
    private static object sync = new object();

    protected override void DeserializeSection
                         (System.Xml.XmlReader reader)
    {
        var serializer = new XmlSerializer(typeof(MyConfig));
        current = (MyConfig)serializer.Deserialize(reader);
    }

    public static MyConfig Current
    {
        get
        {
            if (current == null)
            {
                ConfigurationManager.GetSection("MyConfig");
            }

            return current ?? new MyConfig();
        }
    }
}

What’s that? Well, it’s a “fake” config section. Inside the config I’ll define the section like this:

<configSections>
  <section name="MyConfig"
            type="Sem.GenericHelpers.Contracts
                  .Configuration.ConfigReader,
                  Sem.GenericHelpers.Contracts"/>
</configSections>

As shown here, I’ll configure the ConfigReader class as the “handler” for the config section. This will cause the configuration manager to instanciate a ConfigReader and call “DeserializeSection” as soon as I call ConfigurationManager.GetSection(“MyConfig”) … which is done in the “Current” property. Instead of reading each node I simply deserialize into my target class and set a static instance of that class, wich is returned inside the “Current” property.
I would call the reader a “fake” config section, because it is initialized like a config section, but the application does not get the config section when calling ConfigurationManager.GetSection(“MyConfig”), nor does this class have properties that do correspond to the config section xml. It’s simply a generic reader for the config section that fools the ConfigurationManager in order to get the abstract DeserializeSection method called with the xml from the config file.

You might also go one step further:

public class ConfigReader<TResult> : ConfigurationSection
    where TResult : class, new()
{
    private static TResult current;
    private static object sync = new object();

    protected override void DeserializeSection(
                           System.Xml.XmlReader reader)
    {
        var serializer = new XmlSerializer(typeof(TResult));
        current = (TResult)serializer.Deserialize(reader);
    }

    public static TResult Current
    {
        get
        {
            if (current == null)
            {
                lock (sync)
                {
                    if (current == null)
                    {
                        ConfigurationManager
                          .GetSection(typeof(TResult).Name);
                    }
                }
            }

            return current ?? new TResult();
        }
    }
}

This config reader is really generic. You would call it like this:

var suppressAll = ConfigReader<MyConfig>.Current.SuppressAll;

The name of the section is determined by the class TResult. You can simply define classes in your code (XML serializable classes) that derive from whatever you want and load the content of that classes from the config.

The drawback of that technique is that you do not have additional configuration features like validation – attaching an attribute “[LongValidator(MinValue = 1, MaxValue = 1000000, ExcludeRange = false)]” will simply do nothing. But the benefit is: MUCH less code ;-)


Search Msdn

Just another WordPress.com weblog

ScottGu's Blog

Just another WordPress.com weblog

AJ's blog

Thoughts and informations I think worthwhile to share...

Outlawtrail - .NET Development

Architecture & Design

SDX eXperts Flurfunk

Just another WordPress.com weblog

Follow

Bekomme jeden neuen Artikel in deinen Posteingang.