First dive into Backbone.js

December 14, 2012

At my company I took it upon myself to start out doing some prototype projects, and the first was a new type of location-based browsing application. I decided to try doing it using Backbone.js, since it is very javascript heavy. Now I have very little frontend experience, so it wasn’t the smoothest experience, but I found it a lot more reasonable reasonable than I expected. Some things I noticed as I worked on it.

First Foray with ElasticSearch

December 13, 2012

Recently at work I’ve been tasked with migrating from our hosted Endeca solution to ElasticSearch. We chose ElasticSearch because it’s free, is easy to setup, has easy replication, has facet support out of the box, and has good .NET libraries available. The migration is now done for the most part and I’m happy with the results, but it wasn’t all smooth sailing.

Serializing MongoDB in Node

December 15, 2011

Perhaps because I’m relatively new to the Javascript world, dealing with serialization has been tough for me. I needed to serialize objects from Mongoose into Redis for caching, and it was proving difficult for a couple reasons.

Dates aren’t serialized properly to JSON because the specification doesn’t account for dates.
Mongoose documents are usually wrapped around a Model, and you don’t want to serialize the whole model.

So this code would be a problem

JSON.parse(JSON.stringify(model)).datetime.getMonth();

because datetime would come back as a string. Mongoose models already handle JSON serialization and only serialize the document, so at least that portion is seamless. Unfortunately, deserializing back requires a bit of legwork.

Mongoose Model inherits from Document, which is the raw data, and you can reconstruct the model given the document using the model constructor. It isn’t exactly what I hoped for, but it at least gets the job done. I’m doing this instead now

return new Post(JSON.parse(json)); //Post is a mongoose Model

which will rehydrate the model from the json as I would expect, dates and all. I don’t know what kind of performance implications this will have when load tested, but I don’t have high hopes for it.

Apparently the inheritance setup is very important for handling serialization, something I’ve taken for granted in OO languages, which are usually very good at serializing only what you need back and forth efficiently.

Storing lists of key/value pairs in mongo

November 30, 2011

Don’t keep them as arrays if you can help it. Imagine if you have a list of key/value pairs, perhaps to store extra attributes about an entity. You might decide on something like this:

{
    id: 1,
    attributes: [
        { name: 'key', value: 'value' },
        { name: 'key2', value: 'value2' }
    ]
}

This will be fine if you never ever plan on querying against the data inside of the attributes field. If you do need to query it, maybe in map/reduce, you’ll end up spending a lot of CPU time iterating through the collection. You might want to consider saving yourself the trouble and storing it as an embedded object.

{
    id: 1,
    attributes: {
      key: 'value',
      key2: 'value2'
    }
}

Then when you iterate over the collection you can do something like:

if (this.attributes.key) doSomething();

PygramETL and MS SQL Server

November 15, 2011

I recently started playing around with pygrametl to see what it’s like to deal with datawarehouses, and found some rough spots in getting it going with MSSQL. Pygrametl supports any database driver as long as it’s PEP 249 compliant, or a JDBC driver if you are using Jython. First I decided to try the Microsoft JDBC driver under Jython, but that didn’t work because getParameterMetaData() generated invalid SQL for some odd reason. Next I moved onto adodbapi, which is a part of pywin32, and fortunately that worked well.

The unfortunate problem with this combination was that I was using CPython, and it was pretty slow. Since ADO doesn’t support threading (I feel like most DB drivers don’t), I couldn’t spin off threads to speed things up, so I settled on trying to get it to run under IronPython, which supports adodbapi, for the most part. One thing that got me was that adodbapi wouldn’t handle “None” properly when creating empty parameters under IronPython, which is what PygramETL sets everything to by default. The error I’d get is similar to this:

[Microsoft][ODBC SQL Server Driver]Invalid use of default parameter

which was pretty baffling. Through a lot of debugging and trial and error I worked around that by setting the defaults to DBNull.Value, which seemed to work reasonably well, but I did have to add platform checks to the PygramETL source, which will be a slightly annoying thing to maintain going forward. At least it’s about 30-40% faster than CPython so it’ll be worth it.

Overall I’m pretty impressed with the tool. It’s pretty easy to maintain and is a lot easier to source control than SSIS packages. The developers are still maintaining it, and are planning on adding threading, so the performance will be hugely improved once they can get that out the door.

Upgraded to Node 0.6.0

November 9, 2011

Lameblog is now compatible with Node 0.6.0. Mostly a smooth experience, except the time module didn’t work in 0.6.0, so I had to switch to use zoneinfo. Unfortunately, the error messaging didn’t help much, it just says:

FATAL ERROR: v8::HandleScope::Close() Local scope has already been closed

No stack trace, nothing. Maybe there is some debug mode or some such that would help out, but I haven’t explored that. Had to just comment out stuff to narrow down what was causing the failure. I can imagine in bigger projects that this would suck pretty hard.

LameBlog Launched

October 26, 2011

I’ve decided that I’ll start blogging into LameBlog now (link to code at the bottom). Hopefully I don’t end up regretting it :). It was a fun NodeJS project, and I’ll probably continue making small refinements and additions. Here’s what I’ve learned so far:

Javascript is a powerful language for sure, but if it wasn’t used all over the web I probably wouldn’t bother with it in the backend. It has too many design decisions that make it tougher than it needs to be when working on large projects (i.e. no static typing, prototypical inheritance, no proper namespaces/packaging).
As an addendum to #1, because of those design choices, you get no code intellisense, which makes it a bitch to work with when starting out.
Node is a really slim framework, and while it’s great to work in a platform that doesn’t have so many abstractions, it’s nice to have something that has all those robust features that you come to appreciate, like security, validation, etc. This will probably change once node matures.
Because node is relatively new, there is a lot of development into what will be the most popular design patterns. If you don’t already have a good grasp of web frameworks today, you could easily end up making really terrible code.
Unit testing is currently quite tedious to do, but at least there are some good frameworks in progress, such as Vows and Jasmine.

On the flip side, some of the things I loved about node:

It’s really slim in terms of memory usage, and surprisingly fast for javascript.
Since it’s javascript, it can leverage a whole bunch of existing code.
The community around it is developing a lot of cool design patterns.
It interfaces really well with other javascript interfaces, such as Mongo/Couch, REST JSON APIs, etc.

I’m going to definitely be keeping node in my toolbelt, especially since it can bridge the gap in making a universal web language (although we’ll see what Google’s Dart does in the next few years), and it has its uses for sure.

DSL using Jint and Coffeescript

October 25, 2011

Currently I’m involved in a project to create a DSL (domain specific language) for one of our internal processes that could be updated in realtime. Our original idea was to use Boo for that, as it is a well-supported .NET language, but because it has to be compiled into an assembly that made updating code in realtime difficult. This is because creating assemblies will leak memory unless you create them in a separate appdomain, but hosting them in another appdomain can lead to a whole other set of complications regarding integration.

Fortunately, I came across a pretty useful tool called Jint, which is a javascript interpreter for .NET. What’s great about it is because it’s an interpreter it doesn’t have the memory-leak issue because there are no assemblies generated. This combined with Coffeescript allowed us to create a DSL that was manageable for our business users. Overall, it was surprisingly simple to get going once I got past a small learning curve that I’ll explain.

The biggest downside for us is that since we’re using javascript we don’t get the benefits of a compiler, such as validating variable references and such, so all our scripts need to be tested extensively. It is much easier to integrate, however, as it easily allows you to expose functions and variables without any fuss. Unfortunately, there are many rough edges in the library including, but not limited to all the following:

functions with params at the end are tough to integrate. You have to create a clr Array in the javascript code, which can be cumbersome.
Referencing enums using the full classname is unreliable, I instead just expose each enum value as a parameter to the engine using a loop.
You can’t create an Array easily, because Jint doesn’t interpret the square brackets in the “new” call.
For some reason, calling .Run(script) vs CallFunction(script) results in different error handling. When using .Run, you can get a somewhat useful stack trace, while CallFunction results in nothing at all when your code fails. Haven’t had a chance to see why, but you can easily call a function using .Run by just putting a function call as your entire script.

It’s also not particularly well-documented, but the authors do seem to respond to comments in a timely manner. The performance is surprisingly good for an interpreter, in my opinion, so I’ll be using it for sure. If you have a DSL to implement that doesn’t require updating the code without restarting then I’d probably stick to something like Boo, since compiled .NET code is much nicer to work with.