Tuesday, October 25, 2016

The Top 10 Most Common Mistakes That Node.js Developers Make

Since the moment Node.js was unveiled to the world, it has seen a fair share of both praise and criticism. The debate still continues, and may not end anytime soon. What we often overlook in these debates is that every programming language and platform is criticized based on certain issues, which are created by how we use the platform. Regardless of how difficult Node.js makes writing safe code, and how easy it makes writing highly concurrent code, the platform has been around for quite a while and has been used to build a huge number of robust and sophisticated web services. These web services scale well, and have proven their stability through their endurance of time on the Internet.
However, like any other platform, Node.js is vulnerable to developer problems and issues. Some of these mistakes degrade performance, while others make Node.js appear straight out unusable for whatever you are trying to achieve. In this article, we will take a look at ten common mistakes that developers new to Node.js often make, and how they can be avoided to become a Node.js pro.
node.js developer mistakes

Mistake #1: Blocking the event loop

JavaScript in Node.js (just like in the browser) provides a single threaded environment. This means that no two parts of your application run in parallel; instead, concurrency is achieved through the handling of I/O bound operations asynchronously. For example, a request from Node.js to the database engine to fetch some document is what allows Node.js to focus on some other part of the application:
// Trying to fetch an user object from the database. Node.js is free to run other parts of the code from the moment this function is invoked..
db.User.get(userId, function(err, user) {
 // .. until the moment the user object has been retrieved here
})
node.js single threaded environment
However, a piece of CPU-bound code in a Node.js instance with thousands of clients connected is all it takes to block the event loop, making all the clients wait. CPU-bound codes include attempting to sort a large array, running an extremely long loop, and so on. For example:
function sortUsersByAge(users) {
 users.sort(function(a, b) {
  return a.age < b.age ? -1 : 1
 })
}
Invoking this “sortUsersByAge” function may be fine if run on a small “users” array, but with a large array, it will have a horrible impact on the overall performance. If this is something that absolutely must be done, and you are certain that there will be nothing else waiting on the event loop (for example, if this was part of a command-line tool that you are building with Node.js, and it wouldn’t matter if the entire thing ran synchronously), then this may not be an issue. However, in a Node.js server instance trying to serve thousands of users at a time, such a pattern can prove fatal.
If this array of users was being retrieved from the database, the ideal solution would be to fetch it already sorted directly from the database. If the event loop was being blocked by a loop written to compute the sum of a long history of financial transaction data, it could be deferred to some external worker/queue setup to avoid hogging the event loop.
As you can see, there is no silver-bullet solution to this kind of Node.js problem, rather each case needs to be addressed individually. The fundamental idea is to not do CPU intensive work within the front facing Node.js instances - the ones clients connect to concurrently.

Mistake #2: Invoking a Callback More Than Once

JavaScript has relied on callbacks since forever. In web browsers, events are handled by passing references to (often anonymous) functions that act like callbacks. In Node.js, callbacks used to be the only way asynchronous elements of your code communicated with each other - up until promises were introduced. Callbacks are still in use, and package developers still design their APIs around callbacks. One common Node.js issue related to using callbacks is calling them more than once. Typically, a function provided by a package to do something asynchronously is designed to expect a function as its last argument, which is called when the asynchronous task has been completed:
module.exports.verifyPassword = function(user, password, done) {
 if(typeof password !== ‘string’) {
  done(new Error(‘password should be a string’))
  return
 }

 computeHash(password, user.passwordHashOpts, function(err, hash) {
  if(err) {
   done(err)
   return
  }
  
  done(null, hash === user.passwordHash)
 })
}
Notice how there is a return statement every time “done” is called, up until the very last time. This is because calling the callback doesn’t automatically end the execution of the current function. If the first “return” was commented out, passing a non-string password to this function will still result in “computeHash” being called. Depending on how “computeHash” deals with such a scenario, “done” may be called multiple times. Anyone using this function from elsewhere may be caught completely off guard when the callback they pass is invoked multiple times.
Being careful is all it takes to avoid this Node.js error. Some Node.js developers adopt a habit of adding a return keyword before every callback invocation:
if(err) {
 return done(err)
}
In many asynchronous functions, the return value has almost no significance, so this approach often makes it easy to avoid such a problem.

Mistake #3: Deeply Nesting Callbacks

Deeply-nesting callbacks, often referred to as “callback hell”, is not a Node.js issue in itself. However, this can cause problems making code quickly spin out of control:
function handleLogin(..., done) {
 db.User.get(..., function(..., user) {
  if(!user) {
   return done(null, ‘failed to log in’)
  }
  utils.verifyPassword(..., function(..., okay) {
   if(okay) {
    return done(null, ‘failed to log in’)
   }
   session.login(..., function() {
    done(null, ‘logged in’)
   })
  })
 })
}
The more complex the task, the worse this can get. By nesting callbacks in such a way, we easily end up with error-prone, hard to read, and hard to maintain code. One workaround is to declare these tasks as small functions, and then link them up. Although, one of the (arguably) cleanest solutions to this is to use a utility Node.js package that deals with asynchronous JavaScript patterns, such as Async.js:
function handleLogin(done) {
 async.waterfall([
  function(done) {
   db.User.get(..., done)
  },
  function(user, done) {
   if(!user) {
   return done(null, ‘failed to log in’)
   }
   utils.verifyPassword(..., function(..., okay) {
    done(null, user, okay)
   })
  },
  function(user, okay, done) {
   if(okay) {
    return done(null, ‘failed to log in’)
   }
   session.login(..., function() {
    done(null, ‘logged in’)
   })
  }
 ], function() {
  // ...
 })
}
Similar to “async.waterfall”, there are a number of other functions that Async.js provides to deal with different asynchronous patterns. For brevity, we used simpler examples here, but reality is often worse.

Mistake #4: Expecting Callbacks to Run Synchronously

Asynchronous programming with callbacks may not be something unique to JavaScript and Node.js, but they are responsible for its popularity. With other programming languages, we are accustomed to the predictable order of execution where two statements will execute one after another, unless there is a specific instruction to jump between statements. Even then, these are often limited to conditional statements, loop statements, and function invocations.
However, in JavaScript, with callbacks a particular function may not run well until the task it is waiting on is finished. The execution of the current function will run until the end without any stop:
function testTimeout() {
 console.log(“Begin”)
 setTimeout(function() {
  console.log(“Done!”)
 }, duration * 1000)
 console.log(“Waiting..”)
}
As you will notice, calling the “testTimeout” function will first print “Begin”, then print “Waiting..” followed by the the message “Done!” after about a second.
Anything that needs to happen after a callback has fired needs to be invoked from within it.

Mistake #5: Assigning to “exports”, Instead of “module.exports”

Node.js treats each file as a small isolated module. If your package has two files, perhaps “a.js” and “b.js”, then for “b.js” to access “a.js”’s functionality, “a.js” must export it by adding properties to the exports object:
// a.js
exports.verifyPassword = function(user, password, done) { ... }
When this is done, anyone requiring “a.js” will be given an object with the property function “verifyPassword”:
// b.js
require(‘a.js’) // { verifyPassword: function(user, password, done) { ... } } 
However, what if we want to export this function directly, and not as the property of some object? We can overwrite exports to do this, but we must not treat it as a global variable then:
// a.js
module.exports = function(user, password, done) { ... }
Notice how we are treating “exports” as a property of the module object. The distinction here between “module.exports” and “exports” is very important, and is often a cause of frustration among new Node.js developers.

Mistake #6: Throwing Errors from Inside Callbacks

JavaScript has the notion of exceptions. Mimicking the syntax of almost all traditional languages with exception handling support, such as Java and C++, JavaScript can “throw” and catch exceptions in try-catch blocks:
function slugifyUsername(username) {
 if(typeof username === ‘string’) {
  throw new TypeError(‘expected a string username, got '+(typeof username))
 }
 // ...
}

try {
 var usernameSlug = slugifyUsername(username)
} catch(e) {
 console.log(‘Oh no!’)
}
However, try-catch will not behave as you might expect it to in asynchronous situations. For example, if you wanted to protect a large chunk of code with lots of asynchronous activity with one big try-catch block, it wouldn’t necessarily work:
try {
 db.User.get(userId, function(err, user) {
  if(err) {
   throw err
  }
  // ...
  usernameSlug = slugifyUsername(user.username)
  // ...
 })
} catch(e) {
 console.log(‘Oh no!’)
}
If the callback to “db.User.get” fired asynchronously, the scope containing the try-catch block would have long gone out of context for it to still be able to catch those errors thrown from inside the callback.
This is how errors are handled in a different way in Node.js, and that makes it essential to follow the (err, …) pattern on all callback function arguments - the first argument of all callbacks is expected to be an error if one happens.
Mistake #7: Assuming Number to Be an Integer Datatype
Numbers in JavaScript are floating points - there is no integer data type. You wouldn’t expect this to be a problem, as numbers large enough to stress the limits of float are not encountered often. That is exactly when mistakes related to this happen. Since floating point numbers can only hold integer representations up to a certain value, exceeding that value in any calculation will immediately start messing it up. As strange as it may seem, the following evaluates to true in Node.js:
Math.pow(2, 53)+1 === Math.pow(2, 53)
Unfortunately, the quirks with numbers in JavaScript doesn’t end here. Even though Numbers are floating points, operators that work on integer data types work here as well:
5 % 2 === 1 // true
5 >> 1 === 2 // true
However, unlike arithmetic operators, bitwise operators and shift operators work only on the trailing 32 bits of such large “integer” numbers. For example, trying to shift “Math.pow(2, 53)” by 1 will always evaluate to 0. Trying to do a bitwise-or of 1 with that same large number will evaluate to 1.
Math.pow(2, 53) / 2 === Math.pow(2, 52) // true
Math.pow(2, 53) >> 1 === 0 // true
Math.pow(2, 53) | 1 === 1 // true
You may rarely need to deal with large numbers, but if you do, there are plenty of big integer libraries that implement the important mathematical operations on large precision numbers, such as node-bigint.

Mistake #8: Ignoring the Advantages of Streaming APIs

Let’s say we want to build a small proxy-like web server that serves responses to requests by fetching the content from another web server. As an example, we shall build a small web server that serves Gravatar images:
var http = require('http')
var crypto = require('crypto')

http.createServer()
.on('request', function(req, res) {
 var email = req.url.substr(req.url.lastIndexOf('/')+1)
 if(!email) {
  res.writeHead(404)
  return res.end()
 }

 var buf = new Buffer(1024*1024)
 http.get('http://www.gravatar.com/avatar/'+crypto.createHash('md5').update(email).digest('hex'), function(resp) {
  var size = 0
  resp.on('data', function(chunk) {
   chunk.copy(buf, size)
   size += chunk.length
  })
  .on('end', function() {
   res.write(buf.slice(0, size))
   res.end()
  })
 })
})
.listen(8080)
In this particular example of a Node.js problem, we are fetching the image from Gravatar, reading it into a Buffer, and then responding to the request. This isn’t such a bad thing to do, given that Gravatar images are not too large. However, imagine if the size of the contents we are proxying were thousands of megabytes in size. A much better approach would have been this:
http.createServer()
.on('request', function(req, res) {
 var email = req.url.substr(req.url.lastIndexOf('/')+1)
 if(!email) {
  res.writeHead(404)
  return res.end()
 }

 http.get('http://www.gravatar.com/avatar/'+crypto.createHash('md5').update(email).digest('hex'), function(resp) {
  resp.pipe(res)
 })
})
.listen(8080)
Here, we fetch the image and simply pipe the response to the client. At no point do we need to read the entire content into a buffer before serving it.

Mistake #9: Using Console.log for Debugging Purposes

In Node.js, “console.log” allows you to print almost anything to the console. Pass an object to it and it will print it as a JavaScript object literal. It accepts any arbitrary number of arguments and prints them all neatly space-separated. There are a number of reasons why a developer may feel tempted to use this to debug his code; however, it is strongly recommended that you avoid “console.log” in real code. You should avoid writing “console.log” all over the code to debug it and then commenting them out when they are no longer needed. Instead, use one of the amazing libraries that are built just for this, such as debug.
Packages like these provide convenient ways of enabling and disabling certain debug lines when you start the application. For example, with debug it is possible to prevent any debug lines from being printed to the terminal by not setting the DEBUG environment variable. Using it is simple:
// app.js
var debug = require(‘debug’)(‘app’)
debug(’Hello, %s!’, ‘world’)
To enable debug lines, simply run this code with the environment variable DEBUG set to “app” or “*”:
DEBUG=app node app.js

Mistake #10: Not Using Supervisor Programs

Regardless of whether your Node.js code is running in production or in your local development environment, a supervisor program monitor that can orchestrate your program is an extremely useful thing to have. One practice often recommended by developers designing and implementing modern applications recommends that your code should fail fast. If an unexpected error occurs, do not try to handle it, rather let your program crash and have a supervisor restart it in a few seconds. The benefits of supervisor programs are not just limited to restarting crashed programs. These tools allow you to restart the program on crash, as well as restart them when some files change. This makes developing Node.js programs a much more pleasant experience.
There is a plethora of supervisor programs available for Node.js. For example:
All these tools come with their pros and cons. Some of them are good for handling multiple applications on the same machine, while others are better at log management. However, if you want to get started with such a program, all of these are fair choices.

Conclusion

As you can tell, some of these Node.js problems can have devastating effects on your program. Some may be the cause of frustration while you’re trying to implement the simplest of things in Node.js. Although Node.js has made it extremely easy for newcomers to get started, it still has areas where it is just as easy to mess up. Developers from other programming languages may be able to relate to some of these issues, but these mistakes are quite common among new Node.js developers. Fortunately, they are easy to avoid. I hope this short guide will help beginners to write better code in Node.js, and to develop stable and efficient software for us all.

No comments: