Externalizing parts of the logic

In SemSync there are many parts inside the logic parsing the web sites in order to extract information. This logic is changes in a relative high frequency – each time the target web sites did change. The parsing has already been implemented as regular expression to be flexible and fast in changing the extraction rules, but I still needed to publish a new setup to compensate web page changes.

Now I decided to go one step further and download the regular expressions from my web page as XML and deserialize it into an WebSideParameters object. This way I can simply update the xml on my web page instead of publishing a new setup:

protected virtual WebSideParameters WebSideParameters
        if (parameters == null)
            var parameterFileName = this.FriendlyClientName + ".xml";
            var parameterFile = this.httpRequester.GetContent("http://www.svenerikmatzen.info/WebScrapingParameters/" + parameterFileName);
            parameters = Tools.LoadFromString<WebSideParameters>(parameterFile);

        return parameters;

WebSideParameters is a protected virtual property in the webscraping base class, so an implementation can decide to host the parameters (including the regex) itself (so you don’t need to mail me your xml to host it on my web site ;-)). “Tools.LoadFromString” does deserialize a string into an object.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Sylvio's Infobox

Aktuelle Themen rund um SQL Server, BI, Windows, ...

Meredith Lewis

Professional Digital Portfolio

Vittorio Bertocci

Just another WordPress.com weblog

ScottGu's Blog

Just another WordPress.com weblog

AJ's blog

Thoughts and informations I think worthwhile to share...

Outlawtrail - .NET Development

Architecture & Design

SDX eXperts Flurfunk

Just another WordPress.com weblog

%d bloggers like this: