Campaigns:JS Developers Task Force

From LibrePlanet
Revision as of 11:27, 3 March 2014 by Wtheaker (talk | contribs)
Jump to: navigation, search

This page is a place for resources related to the JavaScript Developers Task Force, a community group that empowers the Free Software Foundation's Free JavaScript campaign.


Resources

Discovering JavaScript Files

Because JavaScript code can include additional files at runtime, it is necessary to execute the JavaScript to discover all files; addons like NoScript and LibreJS should be disabled during analysis.

For analysis of websites that have fairly predictable behavior, or do not vary much between requests, a simple manual process may suffice:

  • FireFox users may use built-in developer tools in newer versions, or Firebug; either will have a "Network" tab that is able to monitor all requests. Further, the debugger may be used to list all scripts loaded into memory.
  • Chromium users may use the "Network" tab in the same manner as FireFox users. The "Sources" tab also lists all scripts loaded into memory.

To determine what script caused another to be included, the Referer header of the request (e.g. in the "Network" tab of a web browser) can be examined.

For larger, more complex websites, a [partially] automated solution may be desired:

  • All requests can be served through a client-side proxy to analyze and map out the site as you navigate, providing an overview of all included scripts. Alteratively, some tools can perform automated scans of websites. Tools that perform these tasks are generally used for security auditing and are outside the scope of this article.
  • To automate script discovery, any headless JavaScript engine (e.g. Node.js) may be used; in this case, a virtual DOM will likely have to be used; all XHRs (XMLHttpRequests, commonly referred to as AJAX) will have to be properly processed and logged; and all inserted <script> tags will too have to be processed and logged.

Combined File Analysis

JavaScript sources are most often combined and minified for production environments to improve page loading speeds. This section describes methods by which these files can be correlated with their source files. It is assumed that the person performing the analysis has access to the source code of the website.

Concepts and Terminology

Minification is the process whereby a JavaScript file is rewritten to reduce its file size. This most often includes (varying by level of compilation)

  • Stripping unnecessary comments and whitespace;
  • Aggressively shortening variable, function, and parameter names (single-character if possible); this is sometimes called obfuscation, aluding to the fact that the document is now difficult for a human to reason about;
  • Automated refactoring of duplicate code;
  • Dead code removal (removing unreachable code); and
  • Function inlining, where doing so will reduce byte count.

Some of the most popular tools used for minifcation—called minifiers—include:

A combined file is the concatenation of any number of (possibly minified) source files. Since source files often need their own scope, they are often organized into modules using self-executing functions, like so:

( function() { /* ... */ } )();

Since functions introduce scope in JavaScript, this ensures that variables local to that particular source file remain encapsulated and do not interfere with the global scope (and thus other modules). Encapsulation is not a requirement for concatenation and may not always be necessary.

The source code of a script is the JavaScript text as it exists before minification. The source file is the file that contains a particular section of source code prior to concatenation into a combined file.

Source Code Correlation

The minification process has the potential to drastically alter the source code. Here is an example, taken directly from Closure Compiler's documentation:

function unusedFunction(note) {
  alert(note['text']);
}

function displayNoteTitle(note) {
  alert(note['title']);
}

var flowerNote = {};
flowerNote['title'] = "Flowers";
displayNoteTitle(flowerNote);

At its most aggressive compilation level, Closure Compiler will optimize the above block into the following:

var a={};a.title="Flowers";alert(a.title);

Notably,

  • The unused function unusedFunction was entirely removed;
  • The function displayNoteTitle was entirely removed after having been inlined at the point of invocation (the last line of the source code);
  • flowerNote is a long variable name; it is replaced with a;
  • A more concise means of accessing flowerNote['title'] is flowerNote.title, which results in a.title;
  • and the inlined call to displayNoteTitle, which uses the renamed flowerNote variable, is modified to use a.title in place of flowerNote.title, which was passed to displayNoteTitle's note parameter.

Note that the variable a is retained in this example, but in context, Closure Compiler may have noticed that it too is unused, in which case the above example could have been rewritten simply to read:

alert("Flowers");

It is therefore important to understand a couple important situations under which minification cannot take place:

  • Strings used by reachable code, and
  • Function and field names of public APIs.

The process by which the minifier determines or is explicitly notified of a public API are not important; in the case of correlating minified code to its source, we need only recognize when minification has not taken place. In the above example, a.title is clearly only partially minified—the minifier would not have chosen to minify some field into title. Simiarily, the string "Flowers" is also retained. If we had both the minified and source code as originally presented in the example above, we could say that an assignment of "Flowers to some object's title field, and subsequent alerting of that string, gives the minifed and source code a pretty strong correlation to one-another.

When performing an audit, have the website's entire source code on hand; it is then simple enough to use grep or the repository's commands to search the entire code base for strings that meet the aforementioned criteria.

# filesystem search
grep -r 'methodName\|text of some string\|function name' path/to/src/js

# git history search of the current branch
git log -pG 'methodName\|text of some string\|function name'

Keep in mind that some code may be generated by other languages (e.g. code generators, languages like CoffeeScript, etc), so you may need to search for portions of strings, method names, etc.

Source File Correlation

Consider that you have discovered a combined, minified file containing the following:

(function(a){function f(){c();d();}(function(){a.moo="cow";})();}(foo);var Bar={};Bar.baz=function(){if(!a){b();}};(function(b){...})(bar);

In order to determine which source file(s) this code was compiled from, the following manual method is likely to yield satisfactory results. Please read the previous section on #Source Code Correlation before continuing. Load the combined, minified file into your favorite editor that supports parentheses/brace matching, such as Vim or Emacs, and proceed to the first non-whitespace and non-comment character.

  1. If the first character is an opening parenthesis, match it with its closing. In Vim, this can be done by placing the cursor atop of the opening parenthesis and hitting '%'. In Emacs, try C-M-f. Let the matched block of code be the hypothetical minified source file.
  2. If the first character is not an opening parenthsis, then consider the remainder of the file to [temporarily] be the hypothetical minified source file.
  3. Within the hypotheical source file, locate an unminified string, method, or field name.
  4. Using one of the methods mentioned at the end of #Source Code Correlation, locate the source files with a high correlation with the hypothetical minified source. Using other unminified components and code structure surrounding them, you should be able to determine whether the source file exists or not.
  5. Once the actual (unminified) source file is found, determine whether it encompases the whole of the hypothetical minified source file. Use the actual start and end to adjust your hypothesis, which is especially important if the hypothesis was not a self-executing function.
  6. Repeat.

Using the above snippet as an example, let's see how we may apply these steps. The first character is an opening parenthesis, so our first hypothetical minified source file is:

(function(a){function f(){c();d();}(function(){a.moo="cow";})();}(foo);

From this, we have the field moo and the string "cow" that can be used to search for the corresponding source code as discussed in #Source Code Correlation.

Following that hypothetical minified source file, the first character is a 'v'. We will therefore consider the remainder of the minified file to be our hypothesis:

var Bar={};Bar.baz=function(){if(!a)b();};(function(b){...})(bar);

We see that Bar and baz are clearly not minified. Suppose we searched the filesystem and came across this file:

/**
 * Some license
 */
var Bar = {};

Bar.baz = function()
{
    if ( !logged_in ) {
        // do stuff
        do_auth();
    }
};

From this, we can clearly see that our actual hypothetical minified source file should be adjusted to this:

var Bar={};Bar.baz=function(){if(!a)b();};

And our last remaining hypotheis is therefore:

(function(b){...})(bar);

If a source file cannot be found, be sure to check the referrer in the network request and the domain on which the script is hosted; it could be part of a library loaded from a CDN, for example.

This page was a featured resource in March 2014.