Group talk: FSF/Tasks/ModifyNoScript
Contents
Initial Thoughts by vonkow
I have a number of ideas of how best to accomplish this task. NoScript operates by blocking the execution of any script not served from a user-defined white-list of sites. I am still not entirely sure how deeply it inspects a script before allowing its execution, but I do not believe that it currently has the ability to determine whether a script is trivial or non-trivial.
What is needed, I believe, is a method of identifying JavaScript that is Free Software and allowing its execution. To that end I am currently working on a Firefox Extension that can detect and display uncompressed source code and license information for embedded JavaScript. In addition, the extension can be set to block the execution of non-free JavaScript (though it is yet unable to detect trivial vs non-trivial scripts). I'm still ironing out the details, but I am quite close to creating a standards-compliant method of easily embedding license and source code information for browser JavaScript.
NoScript does not yet have in-built support for subscribing to externally updated lists of permitted sites, however it is relatively easy to emulate this functionality using additional Firefox Extensions. The lead developer of NoScript has stated that he is hesitant to add this functionality to NoScript itself, as a master list of blocked domains would require political decisions as to which sites should be allowed. For the Free Software community, this is not an issue as the litmus test for a site is simply whether it serves Free JavaScript or not.
With a community maintained NoScript white-list of websites that use properly licensed and sourced Free JavaScript and an extension that can display the license and source information (such as the one I am working on), I believe that we can create an effective way of protecting the user's software freedom within the web browser.
I will have a fully working prototype of this idea relatively soon and will post further details at that time.
Thoughts, questions, comments?
-Caz
More thoughts by vonkow
Based on the criteria for trivial vs non-trivial JavaScript as described in 'The JavaScript Trap' I have a few observations on determining the triviality of a given chunk of JavaScript.
1. A script 'Loads an external script or is loaded as one' :
A script element that loads an external script (has a src attribute) is very easy to detect. A script that loads an external script is not too tricky to detect, UNLESS it's using a library function to do so. If the library passes the Free Software test, it can still be used by another script to load an external script. In this instance, it may be a little harder to determine whether the external script is Free Software. Mayhap a listener that detects the creation of a new script element and then checks it's freeness could be used.
2. A script 'Makes an AJAX request':
Once again, this is not too difficult to detect, UNLESS a library is being used. To truly determine whether a script makes an AJAX request, it need to be executed (in a sandbox). NoScript already blocks any AJAX requests to domains not on the white-list, so in some ways we are already covered. Likewise, it also prevents the execution of external scripts loaded from non-white-list domains. In some ways, having a white-list of domains that exclusively use Free JavaScript seems to be the best course of action. Coupling this with a method of determining the license and freeness of a script would provide adequate user protection from non-free JavaScript.
3. A script 'Defines methods':
JavaScript doesn't quite have traditional methods. Any property of an object that happens to be a function is considered, by and large, a method, but their assignment can be pretty subtle compared to say, Python. For example:
var ob={
method:function(){alert('foobar')}
};
Defines a method in the traditional sense, as does:
var constructor=function() {
this.method=function(){alert('foobar')}
};
var ob=new constructor();
But:
var func = function(){alert('foobar')};
var ob={};
ob.method=func;
Also defines a method. In the first two examples, method detection (via parsing of the source) wouldn't be too difficult, but the third example illustrates how any defined (or anonymous) function can be turned into a method.
This makes method detection a little bit trickier. Either the code must be executed (in a sandbox) and then have the objects it creates tested for methods (anonymized functions with methods will be even trickier to detect), or any function declaration must be considered a potential method.
This leads us to the common JavaScript practice of using anonymous functions to avoid polluting the global namespace, such as:
(function(){alert('foobar')})();
This snippet doesn't define any methods and would be considered trivial, but it does contain the function keyword. Assuming that all functions are potential methods, this code would be falsely considered non-trivial.
Conclusion:
NoScript already does an admirable job of preventing scripts and AJAX requests from non-white-list domains and the determination of trivial vs non-trivial JavaScript is relatively difficult (and potentially very slow) with the widespread usage of libraries and the peculiarities of the language.
Creating a standard for including license information and source code of JavaScript elements would help protect Free Software users from non-free code execution. However, it would be relatively easy to disguise non-free JavaScript using this standard. Hence:
A community maintained NoScript white-list of domains that serve only Free JavaScript and a standard of distributing license information and source code would probably be much easier to implement and would offer the same protections to Software Freedom. To this end, NoScript could be modified to work with external white-lists (it can already be done using other Firefox Extensions in conjunction with NoScript). In addition, an extension that checks/displays JavaScript licenses and blocks (using NoScript?) licenseless scripts (or scripts with non-free licenses) is needed, along with a convention for including license and source info.
-Caz
vonkow's proposed JavaScript Licensing Convention
Due to the lack of a standard for properly including license and source information in browser JavaScript, Free JavaScript projects have adopted a hodge-podge collection of methods for doing so. Some projects, like StatusNet, use a jDoc-like convention that relies on @tags within comments. This convention is very similar to the one described in 'The JavaScript Trap'. It has the benefits of being included within the source file and being easily extracted by tools such as jsDoc-toolkit. This convention works admirably for large, external script files, but adds unnecessary amounts of comments to smaller, non-external scripts. This could perhaps be mitigated by using a simple @license tag that contains the url of a file with the full license information for the script in question and would allow browser extensions to easily find and display license information. For large, external scripts, a full @license, @author, @source, etc, comment at the head of the file is a great idea and should be adopted.
Another method, would be to place all license and source information in an external file and create an easy method of referencing it. HTML5 supports adding 'data-' prefixed attributes to any element and almost all browsers currently support custom element attributes (using getAttribute(), setAttribute(), etc). By adding a 'data-license' attribute to script elements (containing the url of the license data), the license information can be quickly referenced by browser extensions. In addition, a 'data-license-full' attribute can be added to the first script element on a page, signifying that all scripts within the page are licensed by a single license file. Scripts that are loaded from a different domain could require a 'data-license-external' attribute, to further insure that all scripts within the page are Free Software. Large, external scripts could still use the @license tag to point to the appropriate license file.
Using this method, we can create a standard for license files, making it easier for browser extensions to quickly display the license information (as opposed to crawling through the text of a script file to find @tags). I would suggest using the JSON format for license files, as it does not allow the storage of functions (that could be malicious), and is easily parsed by browsers. I'm still working on the details, but here's an example license file:
{ "license" : {
"package" : "Some JavaScript",
"author" : "Foo Bar <foo@bar.org>",
"copyright" : 2010,
"source" : "http://bar.org/path/to/uncompressed-source.js",
"sourceOf" : "http://bar.org/path/to/compressed-source.js",
"link" : "http://bar.org/path/to/page.html",
"libraries" : [ "http://foo.org/path/to/external/library.js" ],
"licenses" : {
"gpl-3" : ["http://bar.org/path/to/local/copy.html", "http://www.fsf.org/licenses/gpl-3.0.html"],
"x11" : "http://bar.org/path/to/license.html"
},
"notes" : "This will be easy for a browser extension to parse and display."
}}
To link to the license, a script element would be written thusly:
<script src="compressed-source.js" data-license:"path/to/license.json"></script>
Using a convention like this will make it easy for browser extensions to determine what licenses the code is released under. A quick check for license.licenses.gpl-3, for example, would be all that is required to determine if the code can be executed for a user that has set their preferences to run only gpl-3 code. In addition, the browser extension would be able to display this information in a popup or separate window, making source access only a click away.
Presumably, "license.licenses" would be the only required property of the license file (maybe "author" as well), with the option of adding other attributes as needed. "author" could be a string, or an object (containing "name", "email", etc) or an array of objects or strings for multiple authors. "source" could be a string or an array of strings for multiple sources, as could "sourceOf" and "libraries". Licenses, such as the GPL, that are maintained by an organization (as opposed to MIT/x11 style licenses) could require links to both a local copy and the master copy (eg. http://www.fsf.org/licenses/gpl-3.0.html), to insure that the license is being properly used. In addition, Free Software licenses could require that the "source" attribute be present in the license file before allowing execution of the script.
Currently, I'm working on a Firefox extension that can detect and display both external JSON licenses and commented @tag license information. It also has primitive script blocking abilities, but I think it would be better served by harnessing NoScript to do the blocking. Once it's stable, I'll upload the source (hopefully within the next few days).
-Caz
dmonhntr's Start of an Extension
I like most of your suggestions Caz, and I have attempted to create (from NoScript) the base of an extension that would work for this purpose.
I ripped apart NoScript until I had almost nothing left and then added the ability to block external scripts. I used a licensing convention that is slightly different from the one above. I made the license.licenses part into arrays, here is what it looks like:
{ "license" : {
"package" : "Some JavaScript",
"author" : "Foo Bar <foo@bar.org>",
"copyright" : 2010,
"source" : "http://bar.org/path/to/uncompressed-source.js",
"sourceOf" : "http://bar.org/path/to/compressed-source.js",
"link" : "http://bar.org/path/to/page.html",
"libraries" : [ "http://foo.org/path/to/external/library.js" ],
"licenses" : [
["gpl-3", ["http://bar.org/path/to/local/copy.html", "http://www.fsf.org/licenses/gpl-3.0.html"],],
["x11", "http://bar.org/path/to/license.html"]
],
"notes" : "This will be easy for a browser extension to parse and display."
}}
I did this because then all the license names can be easily extracted by using a for loop to go through the arrays. My modified version of NoScript uses a regular expression to determine if any of the licenses are acceptable.
What the extension does: installs over NoScript (this will need to be changed, I just didn't feel like messing with it yet.) blocks any external script that attempts to load unless it is allowed by the regex. (Policy.js)
What the extension doesn't (yet) do: block nonfree scripts that are inline allow scripts that are trivial
I am thinking about how to go about blocking nonfree inline scripts. It could be done by disabling javascript on the page before the page loads, then checking the page, and then reloading if javascript should be enabled. The site could then be added to a whitelist automatically on the local computer so it doesn't have to check again. I'm sure if this is a good way to go about this or not.
Kyle
dmonhntr's Thoughts
If as said before, the whitelist is made and stored locally, there could be a feature to get that list so that it could be combined with others into a much bigger whitelist. The nice thing about this is the whitelist for those sites would be made automatically.
I think a similar approach could possibly be used for trivial scripts.
Thoughts on trivial scripts: What exactly doesn't fit in the definition of nontrivial? All that I can see are things that run before onload such as something like this:
<script type="text/javascript">document.write("Some html here")</script>
Is it possible that it is easier to define trivial than it is to define nontrivial? And therefore easier to check for?
I don't see the libraries being a problem. If the library is not free, it will not load. Therefore if a script uses a library to load another script or make an ajax call, it would have to use a free library because any nonfree library has already been blocked.
One problem I see though, is how exactly to go about implementing the features. I don't mean how to do the detecting, but what to do AFTER the detecting. Do you have javascript disabled until you check the page, then reload it? Or is it possible that in defining what a trivial script is, we find that they all run before any command (and therefore method). Is it then possible to simply disable all methods from inline scripts on the page until it is known if it is free or not?
I don't know if some of these ideas are feasible or even possible, just my thoughts.
Kyle