Monthly Archives: June 2013

Node.js Part 2 – Asynchronous Calls in Node

“There’s no such thing as a free lunch” – El Paso Herald-Post, June 27th, 1938

The etymology of this quote, like many others, is not exactly clear-cut.  However it’s meaning and truth are indisputable.  Ok, you can dispute it, but even doubters would be forced to admit that it is very rare to get something good in return for giving nothing.  The same is true of obtaining the real benefits of Node.js.  You cannot reap the harvest without first doing some sowing of your own.  You need to commit to one thing if you are going to use Node in the spirit of it’s design – you have to be willing and able to make your blocking calls (mostly IO-related) in an asynchronous manner.  Furthermore, there is a wrong way to do this and multiple right ways to do it.  This article explains the difference and explains how asynchronous calls can be made correctly.

To write a non-trivial Node program one must understand something of how Node works, but not every detail.  A typical user application is more than just a single program, it is a composite of one or more discreet interlocking bundles of code.  A Node application is no different in this regard.  To deeply comprehend Node without abstracting our understanding we’d all have to go learn the internals of Node’s component pieces – Google’s V8 JavaScript engine, the libuv framework and a bunch of Node binding libraries written in C and JavaScript.  Fortunately (although study of these things would be interesting indeed) we don’t have to do this.

To understand Node well enough to really use it, we can rely on an abstraction of how it works.  We could think in terms of Event loops, Event Queues, Event Emitters and function call backs.  In fact, a lot of articles attempting to describe the way Node works simply throw up a few of these terms and continue with something to the effect of, “You are undoubtedly familiar with events and call backs since you encounter these ideas when writing JavaScript for a browser”.  It’s a bit of a cop out, but understandable – they are trying to abstract for you how Node works without going into gory details you don’t have to know.  If you are like me, explanations like this can leave you feeling vaguely uneasy.  They either too far or else don’t go far enough to provide a solid abstraction that one can use to understand the interface between a Node program and Node itself.

For the purposes of this article, we’ll use a simpler abstraction – one that focuses solely on the interaction between Node and your program code.  Node can be thought of as an Event Creator and Manager.  Every time your code makes an asynchronous call, Node creates an associated event that your code will take advantage of.   Naturally, a two-way communication must be maintained between your code and this Event Creator/Manager that is Node.  There are two ways in which you as a coder can communicate with Node:

  • Call back functions to allow Node to “talk back” to your code when an Asynchronous function completes.
  • Library calls that you invoke as input to Node, many of which can be blocking calls.

It is these blocking calls that are of interest in this article.  Typically these are IO sorts of calls but there are others.  Some examples include calls related to files, streams, sockets, pipes and timers.  Node has one single process thread of execution.  Its job is to quickly move through the ordinary parts of program code (and the code of other program instances that are using it) and queue up asynchronous events for blocking code.  This event queuing is part of a process that Node uses to off-load blocking work onto other external process threads.  As a coder, when you make a blocking call in your code, you have choices about the manner in which you invoke it:

  1. Usage of asynchronous vs. usage of synchronous function calls (asynchronous is thematic).
  2. Assuming asynchronous calling is chosen, how you control the order of execution for calls that are dependent upon each other.

The use of Asynchronous function calls is more than just the subject of this article – it is the heart and soul of Node itself.  The Asynchronous execution of long blocking functions – on external process threads that your own single program thread never has to worry about – is Node’s whole reason for existence.  This is what makes Node fast, efficient in memory usage and extremely scalable.  However again, there is no such thing as a free lunch – there is a cost for this.  That cost is the necessity of working with an event-driven callback model and taking extra care when calling the “I0″ types of functions mentioned above.

This article will focus primarily on explaining how to correctly use the asynchronous versions of function calls supplied by Node’s function libraries.  To do so, we will be looking at three different program examples.  Each example will be a variation on the same simple program.  It will first create a file and write some information to it, then it will re-open it and append some more information, then it will read that information and present it in the browser.  The three variations of this program will consist of a program using only synchronous calls, a program using only asynchronous calls in a naive manner, and the program using asynchronous calls in a proper manner. We’ll be looking at the actual runnable code for these a little later on in the article.  I’d like to first present a little analogy that helps explain and justify the need for taking the trouble to set up proper asynchronous processing.  If that sounds like a waste of time to you, then just skip on to the code examination portions of this article.

Here is an analogy that illustrates the differences between a synchronous approach and a proper asynchronous approach.  Let us equate a Node application to a small restaurant, one with only three employees – a Chef, a Waitress and a Manager. Fleshing out the analogy a little further, it is made up of:

  • A Chef – Node itself will play this part.
  • A Waitress – This is represented by the controlling thread of your program.
  • Food Orders – These are the Blocking/IO calls, presented to the Chef by the Waiter.  They are blocking because the Waitress must wait for the food to be prepared before she can serve it.
  • Food – This is data.
  • Customer(s) – This is the remaining code in your program that consumes the “food”.
  • The Manager – This is the coder.  He tells the Waitress and the Chef how they will do their jobs.

We will look at a slow synchronous restaurant, a naively run asynchronous restaurant, and a well run asynchronous restaurant.  For each scenario we will have just one diner ordering the same meal:  An appetizer of onion rings, followed by a main course consisting of a salad, a steak and a baked potato, with a dish of ice cream for desert.

Let’s look at a synchronous restaurant first.  In this restaurant, the manager has decreed that the Chef will only work on one menu item at a time.  He figures this will reduce complexity and possibility for error. To control this, the Waitress must submit Food Orders consisting of just one item and she must do so in the order needed to properly serve the diner.  So, the Chef makes just one thing at a time instead of being able to prepare multiple things at once.  There is a controlled ordering of the food items to prepare but each one must be completely finished before the Chef begins working on the next.  In the end, the customer would not be bowled over with the service but at least the meal would be correct, i.e. it would be delivered in good order and nothing would get cold or melt.  Consider however, how well this restaurant would do if it got busy.  One customer’s meal would be prepared at a time, and even that in discreet sub steps.  Service would be unacceptably slow.

Now let’s look at a naively run asynchronous restaurant.  The Manager, having been fired from his previous retaurant due to slow service, realizes that this Chef is going to have to be allowed to cook more than one thing at time.  He tells this Waitress that the chef must not be kept waiting, and to submit all Food Orders as soon as a customer has finished ordering.   However, the Manager still wants to avoid complex communication between herself and the chef.  Therefore, each Food Order is only allowed to have one item on it.  He figures that she can arrange the food items into meals when delivering them to each diner, since she knows what they ordered.  So the Chef is unfettered in this restaurant.  He can prepare as many items at once as seems reasonable to him, and furthermore he can choose what order to prepare them in. The results are not good.  He has no idea which items are destined for which customer, or even how many diners there are at the time.  He decides that ice cream is very fast to prepare, so he’ll dish it out first.  After that, he decides to start preparing everything else all at once – there isn’t much to cook.  It turns out that the potato is fastest – they are pre-cooked, so that gets handed off to the Waitress not long after the ice cream.  The salad follows shortly thereafter.  A bit later, the steak and the onion rings are finished about the same time, so he sets both at once on the pickup area used by the Waitress.  The end result here is that all of the food making up the customer’s order was delivered pretty quickly, but there were definite problems with the order of delivery.  The customer had a choice of eating desert first or letting it melt. He was then presented with a baked potato instead of the introductory salad, which he got next.  After a while, the waitress served up the rest of the main meal and the appetizer at the same time.  In the end, the customer would be upset and probably never return – that is if he even stayed for the entire meal.  This naïve asynchronous approach fails at even serving one customer, unless the customer orders something simple like just ice cream, or unless the order of food preparation completion just happens to match the expected order.  In a busy restaurant, the insanity would be even worse – and what’s more, service for some customers would probably not only be badly ordered, but slow.

Now we’ll look at a win – the well run asynchronous restaurant.  In this scenario our intrepid Manager has decided he doesn’t completey know what he is doing.  So he takes some classes and reads a book or two on managing the restaurant business.  He lands another new job, based upon the strength of his new training.  This time things are different.  He realizes that the Chef needs to work on more than one thing at a time and he also realizes that the Chef has to be given more guidelines as to how he prepares the Food Orders.  He knows he has to allow for a little more complexity in communication between Waitress and Chef.  He tells the Waitress to immediately submit Food Orders containing the entire meal for a diner and that the order of items within the Food Order must match the expected order of delivery for the meal.  So the chef is free to prepare more than one thing at a time and the order in which he may prepare them for a given Food Order is specified.  Furthermore, since food orders contain entire meals and the preparations he begins will only roughly have to coincide with the order of Food Orders as they are submitted, he is free to work at maximum efficiency while still more or less completing entire orders as they arrive. It won’t necessarily be first in first out – part or all of one Food Order might be completed before a prior Food Order.  However on the whole, Food Orders will more or less come out in the order in which they were submitted – and more importantly, the order of completion of items within an individual Food Order will always be correct.  In the end, orders come out in correct order and are delivered pretty quickly and even approximate a first in first out scenario.  Customer are satisfied, Manager and restaurant flourish.

It’s time to look at some code - we will look at three programs that attempt to do the same thing, though not at all in the same way. Each program is written with the idea that it will perform the following three basic steps in the order listed below:

  1. Create a file and add some data.
  2. Append some more data to the file.
  3. Read the full content of the file, then send that content back to the browser, along with some tracking of step order and elapsed time that were created as the program did it’s work.

Also, each program makes use of the same two helper functions.  I’ll just define them here once and not show them in each of the program examples, though you will see them being called.  There is nothing of any interest to discuss in these two functions, so I’ll not comment on them.

 

Our first example program is one that does it’s work by making synchronous File IO calls.  It runs correctly but would not scale well, nor would it play nice with other programs running on the same Node instance -  Every single File IO call is blocking – the main thread of Node must wait, babysitting this one instance of a program.

 

Our second example program is one that does it’s work by making asynchronous File IO calls, but it does so in a naïve manner.  It simply replaces the synchronous calls with their asynchronous versions.  This is not enough!  Since the events will execute asynchronously, there is absolutely no guarantee that they will return in the same order in which they were called.  Nothing has been done here to constrain that – the order of return is undetermined.  Step 2, “Append data”, might occur before Step 1, “Create original file data”.  Alternatively, Step 3, “Read and show existing file content” might occur before Steps 1 or Step 2 are complete, or on a busy system, perhaps before any part of their data were in the file at all.

 

Our third and final example program is one that does it’s work by making asynchronous File IO calls, and it does so correctly.  In the absence of special tools for this purpose (more on that later), the correct way to force one asynchronous call to follow the next is to chain them.  Let us say for example, as in the example below, that there are three successive asynchronous calls to be made and each one should only be invoked after the successful completion of the previous call.  What must be done is to put the invocation of the second call in the branch of code that is the successful completion of the first asynchronous call.  Likewise, the third asynchronous call should only be invoked in the branch of code that represents successful completion of the second asynchronous call.

 

Note the important difference between the properly coded asynchronous methodology and the use of simple synchronous calls.  In the Synchronous method, the entire execution thread of Node is forced to move through the blocking calls one at time, sitting idle will waiting a relatively long time for the results of each blocking call.  With the asynchronous methodology, this program still forces the sequential calling of these programs, waiting for the prior call to finish before beginning the next.  But what a difference.  In the asynchronous case, the single Node process thread is free to set up an event for each of these calls in turn, and it does not have to wait for them to return results.  It is free to continue it’s business of queuing events and making call backs for other programs – all that IO blocking is done on some other external process thread that your program knows nothing about, and that Node does not have to sit idly waiting for.

So there are indeed extra precautions to take when making depended asynchronous calls and they produce a kind of hierarchical hell (with all the indentations) in the source files that make them harder to maintain.  There are multiple Modules available from Github that attempt to provide solutions for this problem.  They provide an API that allows for a more organized approach to writing this sort of chained code i.e. avoiding the pyramidal, hierarchical hell.  I will mention one such project, perhaps the most popular, called Async (not Asynch.js which is something else).  The usage of such modules and of Async in particular, is beyond the scope of this article.

You may have noticed that my error handling in the three examples was either threadbare or non-existent. Example 1 actually had none – this is a combination of laziness on my part and a desire to keep the examples as simple and pure as possible.  In examples 2 and 3, there is some error handling due to the fact that is required by design, but it is pretty threadbare.  What, no try catch blocks?  Well, it turns out that try/catch/finally doesn’t really serve well in the asynchronous coding style that one must adopt for high volume Node coding.  As of Node 0.8, there is something new provided called Domains, that is meant to address this.  The usage of Domains is also beyond the scope of this article, but here is a link to a pretty good Domain slide presentation by Felix Geisendörfer at Speaker Deck.

I hope this article has been of some use to someone out there.  Remember that if you are writing synchronous, blocking function calls into your Node code, you are doing the wrong thing unless you own the Node instance and you know that it will never have to handle heavy loads.  Asynchronous, event-driven coding for server-side solutions is what Node was invented for in the first place.

Node.js Part 1 – Introduction

“Node news is good news.”

As you might gather from the rather cheesy paraphrasing above, I am very impressed with Node.js.  More than that, I am astonished by the open source community and code projects which have which have sprung up around Node.  The popularity of Node is exploding, and in this article I will offer you some basic knowledge about Node.js that may pique your own interest in this “hot” server side development framework.

So then, what is Node.js?  To properly answer that question will take this entire article and perhaps more – but we need to start somewhere, so I am going to list a few truths about node.js that will give you some context upon which to hang the rest of the article:

  • It is JavaScript for the server side – all things related to browser and DOM removed.
  • It is based upon Google’s V8 JavaScript engine, a platform abstraction layer called libuv, and a core library originally written in C but now mostly rewritten in JavaScript.
  • It was created in early 2009 by Ryan Dahl of Joyent (this company still maintains and develops Node.js).  Here is a link to Ryan’s original presentation of Node.js, courtesy of Youtube: http://www.youtube.com/watch?v=ztspvPYybIY
  • It is intended to be used primarily as a high performance back end coding platform that is utilized by front ends like browsers over HTTP or other front end applications using TCP sockets to transmit data to and from the server.
  • It follows an event driven, non-blocking I/O model.  What this means is that the event loop driving Node does not have to wait for IO (typically, IO is glacial in performance compared to other code execution).  This is an asynchronous rather than a synchronous approach, allowing Node to be highly scalable.  (I plan to delve more deeply into asynchronicity with Node.js in a future article.)
  • It’s a hugely popular open source framework upon which hundreds of other liberally licensed, freely available open source frameworks and add-ons.  (See my previous article about Github: Stepping into the New Age of JavaScript using Github).

Let’s try and defragment the above list a bit by focusing on the potential benefits of using Node.js. There are at least three compelling reasons for using Node.js:

  1. The single most compelling reason for using Node.js as the basis for the server half of a client-server solution is that it is highly scalable and highly performant under heavy request loads.  This holds true more so when the bulk of these individual requests tend to be small, rather than large,  Fortunately, that is the usage pattern of many, or perhaps the majority, of web-based applications.
  2. Another often cited reason for using Node.js is that JavaScript is a well known programming language, and usage of Node on the back-end allows all code, both client and server, to be written in JavaScript.  There are even efforts afoot to try and systemize the reuse of portions of code on both client and server (think data model definition and validation).
  3. If the aforementioned benefits of using Node are not enough for you, there is a biggie remaining - the plethora of useful and professionally maintained open source frameworks and libraries available for Node.js.  If there is a particular type of resource you need for your back-end code or some problems you need to solve by writing code, there is a very good chance that it has already been done for you.  I’ll have more to say on the subject of add-ons further on in the article.

Ok, that was a lot of wordage to wade through – let’s look at some code.  We will use some code examples as a springboard from which to begin understanding how Node.js works.  The smallest, simplest node.js program might look something like the following code snippet and exist in a text file named helloworld.js:

Of course this is a useless program, but it will serve to get us started.  Perhaps the first thing worth noting is that it uses standard JavaScript syntax (single quotes instead of double quotes would have worked just as well).  To execute this program on your system, Node.js would first have to be installed (we’ll see how to do this later).  To run it, you would use the command line or open a terminal or command window, whatever your OS provides.  After ensuring that that both Node and helloworld.js are in the executable path, enter the following: node helloworld.js.  Upon execution of the program, you would see the words Hello World in your console, i.e. the same UI you used to execute the program.

Let’s look at a program that does a little bit more – another program that could also be written in client-side JavaScript.  Again, this program must not reference any browser elements - these don’t exist in Node.js JavaScript.  The code snippet for this example is nearly as simple and pointless as the code in the first example, but it goes a little further in reinforcing that fact that server-side Node.js JavaScript looks the same as browser-based JavaScript.  You could put it in a file called helloworlddated.js and run it in a manner similar the previous example, i.e. by entering: node helloworlddated.js.  Here is the source for helloworlddated.js:

The resulting output in your console would be something like this: Hello World.  Today (day/month/year) is: 15/06/2013.  Running this small program gives a more convincing demonstration that Node.js is JavaScript for the server side… yet, it is not a thematic demonstration.  Sure, you might put some similar code into a Node.js program, or you might even create a server side program that runs now and again to perform some useful function.  However, the main idea behind the creation of Node.js was to provide solutions for high volume client-server types of apps.

What might a more thematic Node.js program look like?  Here is an extremely stripped down version of the beginnings of one such program:

This example program was taken from: http://nodejs.org/ , the home of Node.js at Joyent.  What is it doing?  It looks almost as if it is acting like some sort of miniature web server… which is exactly what it is, a little five line web server that serves up some dedicated content instead of HTML files.  This simple web server written in Node responds with “Hello World” for every request.  A very common pattern for Node.js programs is that they will not only provide web content or other content, but they also provide the mechanism to receive client requests and return server responses.  The good news is, that because of some built-in Node.js library code, creation of such a web server is a trivial matter.  Actually, even creating a full blown, actual web server is a fairly trivial matter, if certain add-on frameworks are also put into use.  (I will provide some more detail on this subject further below and will probably eventually write more articles about various modules that extend the basic functionality of Node).  I am getting ahead of myself though.  Let’s dissect the above code snippet, piece by piece.

Line 1:  var http = require(‘http’);

It looks like standard JavaScript syntax, but what does “require” do and what does that argument of ‘http’ mean?  The first thing to understand is that much of the functionality of Node is compartmentalized in to something called a module.  Node comes installed with several built in modules and you can download hundreds of others from Github.  In Line 1, the argument ‘http’ is the name of a module that pulls in functionality surrounding the use of http, in this case the ability to create a simple web server.  the function named “require” is a globally available Node.js function that allows you to pull the content of modules into your code.  As one might guess, it returns an object that can then be used to invoke the functionality exported by the module.

Line 2:  http.createServer(function (req, res) {

In Line 2, we use the http object returned by the require(‘http’) call in Line 1, to create an http server.  We can see the beginning of an anonymous function definition that takes two arguments – a request argument and a response argument. (Note: For those unfamiliar with anonymous JavaScript functions, you may simply think them as an unnamed functions that are dynamically declared at runtime.  This link to Helen Emerson’s blog at Helephant.com offers a very clear explanation.).  The anonymous function that is being passed here (definition completed on subsequent lines), will be invoked when Node receives an HTTP request on the port whose number is passed in Line 5.  This anonymous function will provide the response to that request.

Line 3:  res.writeHead(200, {‘Content-Type’: ‘text/plain’});

Line 3 is the first line of the body of the anonymous function began in Line 2.  When the anonymous function is executed, this line of that function writes an HTTP response header into the response.  You may have seen similar code in other types of back end solutions or if you have ever examined the workings of the HTTP protocol.

Line 4:  res.end(‘Hello World\n’);

Line 4 contains the end of the anonymous function body and it adds the string ‘Hello World\n’ to the body of the response.  It also caps off the response because the end method does that, in addition to accepting the Hello World string content.

Line 5:  }).listen(1337, ’127.0.0.1′);

Line 5 is the end of the createServer call and it chains a call to the listen method of said server after the createServer call is completed.  The first argument to the listen method is the port number that the server should use to listen for requests (a small leet-speak joke if you didn’t notice).  The second argument is the url upon which the server should be listening.

Line 6:  console.log(‘Server running at http://127.0.0.1:1337/’);

Line 6 provides output for the console about what has been done.  After the program has been started (as before this involves saving the code into a JavaScript file and running it from the console by typing; node <the javascript file name goes here>), copying that link from the console and running it from a web browser will cause the words “Hello World” to be displayed in the browser.

That wasn’t much work to code, even for a typical “Hello World” sort of program – but the reality is that program is ever so much more than a typical “Hello World” program.  It is the basis for a server-side program that might perform any of various kinds of behaviors or actions for a client side piece.  That particular example is an HTTP kind of animal, but an analogous TCPIP based program is just as easily created.  Here is an example of a simple TCP server which listens on port 1337 and echoes whatever you send it:

It is beyond the scope of an introductory article to try and bridge the gap from the previous thematic example to showing explicit examples of code that might back real world projects - but here is a link to an eye-opening page at Github: Projects, Applications, and Companies Using Node.  Scan the notes in the two large tables on the page that this link leads to and you will see a list of uses and functionality that empirically demonstrate what Node is capable of.  The list of usages is broader and deeper than one might expect - I think it’s pretty impressive.  Some readers might think this paragraph would be a natural place to end this first introductory article, but wait, there’s more!

There is a wealth of freely available external modules that can be used within a Node program.  I would be remorse in my introductory duties if I did not provide at least a glimpse of what kinds of functionality might be offered by those modules, how one might find out more about them and how they can be installed.  The original creator of Node.js and those that provided additional essential work on Node believed that Node should provide a basic, highly scalable, extensible engine for server-side work.  These external modules offer solutions (some of them competing with one another) that fill in the gaps of what is not built into Node itself.  There is a way that you can search for, find out about, and download nearly all of the external modules that have been created for use with Node.js – follow this link the Github Modules Search page.  However, In addition to downloading these modules directly Github, there is another, better way to get  them onto a system on which Node has already been installed - Node Package Management.  Node has a built-in command line program called npm that can be used to install modules (note that they must still be required into your code).  The syntax to install a module looks like this: npm install <some module>.  To learn more about how to use npm one can either search the web or else on the command line, type: nmp help install.

The functionality provided by this huge number of external modules could be divided into a number of different categories – but I decided upon just four.  My four basic categories are: Middleware, MVC Frameworks, Database Access, and my cop-out category, “Others”.  I will provide no examples for “Others” but a host of very useful library, utility and debugging-oriented modules would fall into this category.  In any event, these four divisions are highly arbitrary and probably woefully inadequate, but categorizing things in this manner offers a way to approach a summarization of what is really a rather extensive topic.

The term “middleware” is unfortunately, a sorely overloaded one.  It has had multiple meanings in the history of our industry and yet another is introduced in the context of Node.js.  In Node, the term is a loose descriptor for functionality that exists between the client portion of an app and the logical portion of the server-side app that makes up the application code.  I am going to give a couple of examples of what I consider to be middleware in Node and you can agree with me or disagree if you wish.  Fortunately, the “rose by any other name” principle applies to the two examples I am going to give you – they are both arguably best of breed in their class – regardless of what name you give that class.  The two examples I will put forth are Connect.js and Socket.io.  Connect is actually a framework of sorts itself, one that offers a surprising amount of functionality.  The bulk of this functionality centers upon fleshing out the things a web server can do that the http module does not do, or does not handle as well (i.e. powerful but elegant: logging, cookie handling, favicon handling, static file serving, error control, etc.).  Socket.io is one of several modules that are available for node that provide Web Sockets functionality.  Socket.io is one of the more popular if not the most popular and it is very good at gracefully degrading into alternative communication methodologies when older web servers and/or browsers are in play.

MVC stands for Model-View-Controller of course, and there are a number of Node.js external modules that offer MVC or MVC-like functionality.  (Note: The scope and definition of the MVC style of architecture is beyond the scope of this article, but here is a link at Wikipedia that offers a starting point of you are unfamiliar with it: Model-View-Controller.)  I am going to mention a few MVC-ish modules here but not go into great detail about any of them.  I will briefly discuss Geddy, Express.js and Sails.js.  Geddy is billed as the “original MVC” solution for Node and was quite popular at one point though is perhaps growing somewhat long in the tooth now.  It is more of a standard, Rails-like MVC solution than say, Express is.   Express.js is almost itself a framework for creating MVC frameworks – in fact some new MVC frameworks are based upon Express.  However, you can do MVC in Express.js, and out of all the MVC-like solutions I read about, it seems to be the most popular.  As of this writing, I have never yet used Geddy or Express (though I do plan to experiment with Express), so bear in mind that my opinions on Geddy and Express are based upon reading from multiple sources, not personal experience.  Sails.js is an up and coming solution that is generating a lot of interest.  Its claim to fame is the ability to easily generate a RESTful API, the delivery content of which is JSON.  This is a natural fit for creating hybrid MVC solutions in conjunction with a front end framework like the popular framework Backbone.js, a partial MVC solution that will natively digest JSON via internal JQuery AJAX calls.

The available modules for database access are in a way, astonishingly rich but in another way, surprisingly poor.  They are rich in that among them there is surprisingly broad support for what these days are being called “NoSQL” databases.  The examples I will list are MongoDB,  CouchDB and Redis.   MongoDB and CouchDB are both document-centric databases that accept, store and re-issue data in JSON format. (Note: If you are not familiar with JSON you should fix that ASAP.  Here is a link at Wikipedia that offers a starting point for beginning to understand JSON.)  Redis is a key-value in memory database that may be optionally persisted to file (the type of persistence in use by Redis is referred to as “durability”).  Between MongoDB and CouchDB, MongoDB seems to be more popular, though CouchDB has strong adherents.  I have played with MongoDB some and I like it.  It is worth noting that accessing MongoDB from Node is much easier if done via a simplifying API such as Mongoose.  Redis is purportedly the most popular key-value store that is out there at this time.  Ok, now we’ve come to the “surprisingly poor” part.  As far as I know, as of this writing, the only available module level support for relational database access for Node is for MySQL.  Doubtless this will be addressed soon, to at least provide support for Oracle and SQL Server.  That is not to say that your application code could not access such resources itself, for example, via web service.  Nonetheless, I consider this to be a fly in the ointment for Node.js.

We are nearing the end of the article and I am finally going to tell you how you can install Node.js - assuming I have interested you enough for you to give it a try.  The easiest way to get Node onto your system is to go to the main Node.js site sponsored by Joyent and either hit the big Install button on that page or else click the smaller Downloads button and choose an explicit installation on the subsequent page.  There are also other ways to get Node onto your system and install it.  You can Fork the project on Github and then Clone it down to the target system.  Alternatively, you could download the source code from Github and build it yourself.  The details of how to apply the latter two of the listed methodologies are beyond the scope of this article and are platform dependent as well.  Undoubtedly though, the best way to get started with Node.js is to install it from the Joyent sponsored site as described above.

In closing, I’d like to re-iterate Node’s strengths and also highlight the weaknesses I have mentioned. When high performance scalability is important, Node.js and judiciously chosen additional modules may offer a great replacement for more traditional solutions involving a combination of traditional web servers and architectures such as Ruby on Rails, Java Servlets or EJB-driven solutions, and ASP.NET style of pages.  Additionally It’s all JavaScript all of the time and offers a number of NoSQL solutions.  Every bit of this code is open source and free under licenses such as the MIT variant and other permissive GPL licenses.  A lot of this code is offered with dual license options as well.  On the other hand, if you are looking for built-in, strong relational database support or if you require a solution involving frequent heavy streaming between client and server, then Node is probably not the best choice, at least not yet.

Node is just warming up – this too may be a weakness – some of the modules you will be interested in will likely be pretty fresh and thus reliability and sustainability may be in question. However, I think most outsiders and insiders would say that it has already achieved critical mass, and this is just the beginning of a bright and beautiful future – Node is just warming up!

 

 

 

Stepping into the New Age of JavaScript using Github

“A journey of a thousand miles must begin with a single step.” - Lao Tzu

At least initially, this blog will focus primarily on new developments in JavaScript… but where to start?  “Begin at the beginning”, says the King in Carroll’s Alice in Wonderland – but that would be going much too far back for our purposes.  The reader is probably at least passingly familiar with JavaScript and it’s roots, so we don’t need to tread that path.  It makes more sense to begin with recent developments, such as the advent of server side JavaScript with the inception of node.js and the rise in popularity of HTML5/JavaScript as a very exciting and very real game platform.  There is a fantastic amount of open source code available for both of these topics and many others.  The best place to access that treasure trove of free frameworks, libraries and code is something called Github.

Maybe you already know all about Github – in that case, you should probably stop reading now – this article attempts to do nothing more than define what Github is and provide a basic introduction to it’s features, installation and use.  If you are like I was however, you probably have heard of it and have some vague idea of what it is but are not comfortable with it.  Perhaps you’d rather not deal with it. Maybe you think it sounds too complicated or might expose you as some sort of coding Neanderthal to the elite coding world, should you attempt to get involved.   To be honest, I actually thought those very things myself.

However, after any degree of real exposure to the things that are happening these days in the JavaScript world and in the coding world in general, you are going to discover that you cannot evade Github.  More to the point, after you become more familiar with it you won’t want to avoid it.  The resources there are just too broad, too deep, too well coded, too useful and well… too free to ignore.

Ok, so what is Github?  The least complicated answer that is even close to adequate is that Github is a web site providing a nexus for a distributed version control system called Git.  However, it is more than that and I’ll touch on what that means a bit later on in the article.  Before I do that, I would like to try and make the “least complicated answer” I gave above, be a little less complicated.

FIrst let’s go over a little background on Git itself.  Git is a source code version control system that was created in its initial form in 2005 by a group of Linux devotees headed by Linus Torvalds.  Unlike other version control systems, for example SVN, Git does not store a base file and then subsequent diffed changes on each check in.  Instead it was designed to store “snapshots” of a project over time (although it can save space by referencing unchanged files in previous snapshots).

Coders editing files belonging to a particular project in the centralized git store have a version of that project (and access to all its history) in working directories on their local machines.  The files that they edit must go through a staging phase before being committed within the local project in which they are being modified.  This local commit is not the same thing as committing changes to the actual central store from which the local version of project was obtained.  An additional step must be taken to sync and commit changed versions of the files to the actual project repository in which the master source controlled copies of the files reside.

The differences between the way Git works vs. other source control systems allows Git to be fast and makes it convenient to use even if there is no current network connection.  I found an excellent resource on the web (there are many others as well) that explains the basic ideas behind working with Git: http://git-scm.com/book/en/Getting-Started-Git-Basics.

Once you really wrap your head around what Git is, understanding Github is not as fearsome a task at it may have originally seemed.  In passing, it should be noted that there are other web sites that host Git repositories and I believe you can even install it to be a stand-alone system on a single machine.  However, Github is far and away the most popular and nearly all of the up and coming stuff is stored there.  One of the nicest things about Github is, you don’t really have to get very involved in order to utilize the resources found there.  You can go to the site without creating an account and grab loads of open source software frameworks, libraries and code without even creating a Github account.  You don’t have to jump right in, you can just stick a toe in the water.  Nobody will complain.  You will be just one more anonymous downloader to them.

So, how does one get software from Github in “ninja” mode?

  1. Go to the home page at: https://github.com/.
  2. Type something in the search box at the top of the page, for example: node.js.
  3. Hit enter and watch the results come pouring forth.
  4. Select a project repository from this result list.
  5. Click the button that allows you to download a recent stable version of the project as a compressed file (e.g. zip, tar).

I got 16,249 repository results from a search on “node.js” at the time I wrote this article.  That’s 16,249 projects related to Node.js alone!  Many of these are world class frameworks and code libraries that are in actual use at major corporations and other legitimate companies – and you can use them too.  The majority of these desirable code bases have permissive licenses such as the MIT license or something similar – in essence, free and with almost no restrictions as to their use.  (Note: I plan to do a blog post soon on open source licensing, as I think it is a somewhat confusing subject).

So that is one way to use Github – as huge bundle of open source code repositories that you can access even without a Github account.  However, there is much more to Github than that:

  • It can be a personal version control and backup system for your projects.
  • It can provide source control for an entire team of people working on the same project(s).
  • You can participate on other people’s open source projects, adding to them and squashing bugs.  Some of these projects are world famous.  If you are competent, courageous and creative, you might just make a name for yourself.
  • You can monitor activity on projects and communicate with other denizens of Github.

Participating in the ways listed above require that you create and use a Github account.  Free accounts are available and there are also various levels of paid accounts, as of this writing ranging from $7 to $200 per month.  As they rise in cost the number of repositories you may own increases.  However, any paid account gets you one thing that you do not get with a free account – free accounts may not be private, but must be open source.  Whatever you store in a free, public account may be seen by anyone and people may use and access your code.   They may even submit suggested changes and corrections.  This is not a problem in the majority of cases and in fact this exemplifies the basic idea of open source.  Free accounts do have an unlimited number of repositories and users, so that too, is very nice.  The steps to create a free account are easy to follow and the process can initiated from the home page of Github.

Once you have created an account, the next step in being able to use Github for source control purposes is to get some manner of a Git client onto your local machine – you need Git software to use Git, which as has been previously stated, is the source control system that Github uses.  Upon creation of an account, the Github site will provide a UI that acts as a sort of “Boot Camp” for beginning to use Github.  Part of this Boot Camp is a UI piece that allows you to download a Git client.  You will have a few choices.  There is a command line Git, which is the original way to use the program, but there are alternatives.  There is a GUI client available for both Windows and Mac OS that is arguably easier to use than the command line program.  There is also an Eclipse plug-in available.  The web site itself also provides GUI UI for the portions of the process that involve fetching code from the Github site down to your local machine.

Disclaimer: In the discussions that follow regarding procedure for using Git and Github, I refer to the actions and concepts behind using this system for source control, but I do not provide the actual syntax of commands if using the command line interface, nor discuss the menu choices or clicks in a GUI.  There is help within these programs and other sites on the web for specific command line syntax and GUI options necessary to conduct an action or procedure.  Any approach other than this would result in a very long article and risk muddying the explanation.  However, I may come back to this topic at some point and provide a few examples.

After you have a Github client on our machine you are able to embark on two-way transmission of code with Github.  The process of doing so involves the use of repositories.  We know that there are a large number of repositories already in existence that are owned by other people.  As you might suspect, the process of working with this type of repository is a little different and a little more complicated than working with repositories that you own or are owned by your team.  We’ll first take a look at how to use owned repositories.

Working with owned repositories:

Let’s start with the easiest scenario to understand, the case in which there is already a repository created on Github and you have ownership access to it.  To be able to work with the code in a Github repository one must first Clone it from the Github website.  Doing so will create a local repository on the user’s local machine.  All of the source code and history of the project will be pulled into this local repository.  It is then possible to edit source files and mark them as modified which puts them in a local staging area.  Any files in staging are able to be committed - still to a local directory.  The set of committed files are then ready to be pushed to the Github server repository from which the source was originally obtained.

If no Github repository yet exists for a project, the creation of one begins with creation of a repository on a local machine.  Files are then created, moved to staging and committed within the local repository (similarly to the process described above for modifying cloned project files pulled down to a local repository).  After that is done, the project may be pushed to a Github repository which is created as part of this process.

Working with Repositories belonging to others:

This process has similarities to the work-flow described above in which a repository is cloned and files are edited, staged, committed and pushed to a Github repository.  However, there are two key differences:

  • A copy of the original repository on Github must first be created on Github.  This is done by Forking a project.  Once a project has been forked, it may then be cloned down to a local machine.
  • When files are eventually pushed back up to Github, they get pushed to the forked repository, not the repository from which the forked project was copied.  The pusher of said files must submit a “pull request” for the files to be “pulled” from the Github fork to the real source repository.  The owner of the actual source repository will then review the changes and merge the files into his repository or not, as he sees fit.

In summary, Github functions as both a repository for private projects and world class open source software.  However, there is much more to Github. It provides certain statistics, the ability to get notifications of project changes and the ability to communicate with repository owners and pull request submitters.  There is even more than this to Github, but the discovery of that, as they say, will be left as an exercise for the reader.