Group talk: FSF/Tasks/ModifyNoScript

From LibrePlanet
Jump to: navigation, search

Initial Thoughts by vonkow

I have a number of ideas of how best to accomplish this task. NoScript operates by blocking the execution of any script not served from a user-defined white-list of sites. I am still not entirely sure how deeply it inspects a script before allowing its execution, but I do not believe that it currently has the ability to determine whether a script is trivial or non-trivial.

What is needed, I believe, is a method of identifying JavaScript that is Free Software and allowing its execution. To that end I am currently working on a Firefox Extension that can detect and display uncompressed source code and license information for embedded JavaScript. In addition, the extension can be set to block the execution of non-free JavaScript (though it is yet unable to detect trivial vs non-trivial scripts). I'm still ironing out the details, but I am quite close to creating a standards-compliant method of easily embedding license and source code information for browser JavaScript.

NoScript does not yet have in-built support for subscribing to externally updated lists of permitted sites, however it is relatively easy to emulate this functionality using additional Firefox Extensions. The lead developer of NoScript has stated that he is hesitant to add this functionality to NoScript itself, as a master list of blocked domains would require political decisions as to which sites should be allowed. For the Free Software community, this is not an issue as the litmus test for a site is simply whether it serves Free JavaScript or not.

With a community maintained NoScript white-list of websites that use properly licensed and sourced Free JavaScript and an extension that can display the license and source information (such as the one I am working on), I believe that we can create an effective way of protecting the user's software freedom within the web browser.

I will have a fully working prototype of this idea relatively soon and will post further details at that time.

Thoughts, questions, comments?

-Caz

More thoughts by vonkow

Based on the criteria for trivial vs non-trivial JavaScript as described in 'The JavaScript Trap' I have a few observations on determining the triviality of a given chunk of JavaScript.

1. A script 'Loads an external script or is loaded as one' :

A script element that loads an external script (has a src attribute) is very easy to detect. A script that loads an external script is not too tricky to detect, UNLESS it's using a library function to do so. If the library passes the Free Software test, it can still be used by another script to load an external script. In this instance, it may be a little harder to determine whether the external script is Free Software. Mayhap a listener that detects the creation of a new script element and then checks it's freeness could be used.

2. A script 'Makes an AJAX request':

Once again, this is not too difficult to detect, UNLESS a library is being used. To truly determine whether a script makes an AJAX request, it need to be executed (in a sandbox). NoScript already blocks any AJAX requests to domains not on the white-list, so in some ways we are already covered. Likewise, it also prevents the execution of external scripts loaded from non-white-list domains. In some ways, having a white-list of domains that exclusively use Free JavaScript seems to be the best course of action. Coupling this with a method of determining the license and freeness of a script would provide adequate user protection from non-free JavaScript.

3. A script 'Defines methods':

JavaScript doesn't quite have traditional methods. Any property of an object that happens to be a function is considered, by and large, a method, but their assignment can be pretty subtle compared to say, Python. For example:

  var ob={
     method:function(){alert('foobar')}
  };

Defines a method in the traditional sense, as does:

  var constructor=function() {
     this.method=function(){alert('foobar')}
  };
  var ob=new constructor(); 

But:

  var func = function(){alert('foobar')}; 
  var ob={}; 
  ob.method=func;

Also defines a method. In the first two examples, method detection (via parsing of the source) wouldn't be too difficult, but the third example illustrates how any defined (or anonymous) function can be turned into a method.

This makes method detection a little bit trickier. Either the code must be executed (in a sandbox) and then have the objects it creates tested for methods (anonymized functions with methods will be even trickier to detect), or any function declaration must be considered a potential method.

This leads us to the common JavaScript practice of using anonymous functions to avoid polluting the global namespace, such as:

  (function(){alert('foobar')})();

This snippet doesn't define any methods and would be considered trivial, but it does contain the function keyword. Assuming that all functions are potential methods, this code would be falsely considered non-trivial.

Conclusion:

NoScript already does an admirable job of preventing scripts and AJAX requests from non-white-list domains and the determination of trivial vs non-trivial JavaScript is relatively difficult (and potentially very slow) with the widespread usage of libraries and the peculiarities of the language.

Creating a standard for including license information and source code of JavaScript elements would help protect Free Software users from non-free code execution. However, it would be relatively easy to disguise non-free JavaScript using this standard. Hence:

A community maintained NoScript white-list of domains that serve only Free JavaScript and a standard of distributing license information and source code would probably be much easier to implement and would offer the same protections to Software Freedom. To this end, NoScript could be modified to work with external white-lists (it can already be done using other Firefox Extensions in conjunction with NoScript). In addition, an extension that checks/displays JavaScript licenses and blocks (using NoScript?) licenseless scripts (or scripts with non-free licenses) is needed, along with a convention for including license and source info.

-Caz

vonkow's proposed JavaScript Licensing Convention

Due to the lack of a standard for properly including license and source information in browser JavaScript, Free JavaScript projects have adopted a hodge-podge collection of methods for doing so. Some projects, like StatusNet, use a jDoc-like convention that relies on @tags within comments. This convention is very similar to the one described in 'The JavaScript Trap'. It has the benefits of being included within the source file and being easily extracted by tools such as jsDoc-toolkit. This convention works admirably for large, external script files, but adds unnecessary amounts of comments to smaller, non-external scripts. This could perhaps be mitigated by using a simple @license tag that contains the url of a file with the full license information for the script in question and would allow browser extensions to easily find and display license information. For large, external scripts, a full @license, @author, @source, etc, comment at the head of the file is a great idea and should be adopted.

Another method, would be to place all license and source information in an external file and create an easy method of referencing it. HTML5 supports adding 'data-' prefixed attributes to any element and almost all browsers currently support custom element attributes (using getAttribute(), setAttribute(), etc). By adding a 'data-license' attribute to script elements (containing the url of the license data), the license information can be quickly referenced by browser extensions. In addition, a 'data-license-full' attribute can be added to the first script element on a page, signifying that all scripts within the page are licensed by a single license file. Scripts that are loaded from a different domain could require a 'data-license-external' attribute, to further insure that all scripts within the page are Free Software. Large, external scripts could still use the @license tag to point to the appropriate license file.

Using this method, we can create a standard for license files, making it easier for browser extensions to quickly display the license information (as opposed to crawling through the text of a script file to find @tags). I would suggest using the JSON format for license files, as it does not allow the storage of functions (that could be malicious), and is easily parsed by browsers. I'm still working on the details, but here's an example license file:

  { "license" : {
     "package" : "Some JavaScript",
     "author" : "Foo Bar <foo@bar.org>",
     "copyright" : 2010,
     "source" : "http://bar.org/path/to/uncompressed-source.js",
     "sourceOf" : "http://bar.org/path/to/compressed-source.js",
     "link" : "http://bar.org/path/to/page.html",
     "libraries" : [ "http://foo.org/path/to/external/library.js" ],
     "licenses" : {
        "gpl-3" : ["http://bar.org/path/to/local/copy.html", "http://www.fsf.org/licenses/gpl-3.0.html"],
        "x11" : "http://bar.org/path/to/license.html"
     },
     "notes" : "This will be easy for a browser extension to parse and display."
  }}

To link to the license, a script element would be written thusly:

  <script src="compressed-source.js" data-license:"path/to/license.json"></script>

Using a convention like this will make it easy for browser extensions to determine what licenses the code is released under. A quick check for license.licenses.gpl-3, for example, would be all that is required to determine if the code can be executed for a user that has set their preferences to run only gpl-3 code. In addition, the browser extension would be able to display this information in a popup or separate window, making source access only a click away.

Presumably, "license.licenses" would be the only required property of the license file (maybe "author" as well), with the option of adding other attributes as needed. "author" could be a string, or an object (containing "name", "email", etc) or an array of objects or strings for multiple authors. "source" could be a string or an array of strings for multiple sources, as could "sourceOf" and "libraries". Licenses, such as the GPL, that are maintained by an organization (as opposed to MIT/x11 style licenses) could require links to both a local copy and the master copy (eg. http://www.fsf.org/licenses/gpl-3.0.html), to insure that the license is being properly used. In addition, Free Software licenses could require that the "source" attribute be present in the license file before allowing execution of the script.

Currently, I'm working on a Firefox extension that can detect and display both external JSON licenses and commented @tag license information. It also has primitive script blocking abilities, but I think it would be better served by harnessing NoScript to do the blocking. Once it's stable, I'll upload the source (hopefully within the next few days).

-Caz

dmonhntr's Start of an Extension

I like most of your suggestions Caz, and I have attempted to create (from NoScript) the base of an extension that would work for this purpose.

I ripped apart NoScript until I had almost nothing left and then added the ability to block external scripts. I used a licensing convention that is slightly different from the one above. I made the license.licenses part into arrays, here is what it looks like:

 { "license" : {
    "package" : "Some JavaScript",
    "author" : "Foo Bar <foo@bar.org>",
    "copyright" : 2010,
    "source" : "http://bar.org/path/to/uncompressed-source.js",
    "sourceOf" : "http://bar.org/path/to/compressed-source.js",
    "link" : "http://bar.org/path/to/page.html",
    "libraries" : [ "http://foo.org/path/to/external/library.js" ],
    "licenses" : [
       ["gpl-3", ["http://bar.org/path/to/local/copy.html", "http://www.fsf.org/licenses/gpl-3.0.html"],],
       ["x11", "http://bar.org/path/to/license.html"]
    ],
    "notes" : "This will be easy for a browser extension to parse and display."
 }}

I did this because then all the license names can be easily extracted by using a for loop to go through the arrays. My modified version of NoScript uses a regular expression to determine if any of the licenses are acceptable.

What the extension does:

  • installs over NoScript (UPDATE: should now work along side NoScript)
  • blocks any external script that attempts to load unless it is allowed by the regex. (Policy.js)

What the extension doesn't (yet) do:

  • block nonfree scripts that are inline
  • allow scripts that are trivial

I am thinking about how to go about blocking nonfree inline scripts. It could be done by disabling javascript on the page before the page loads, then checking the page, and then reloading if javascript should be enabled. The site could then be added to a whitelist automatically on the local computer so it doesn't have to check again. I'm not sure if this is a good way to go about this or not.

Kyle

dmonhntr's Thoughts

If as said before, the whitelist is made and stored locally, there could be a feature to get that list so that it could be combined with others into a much bigger whitelist. The nice thing about this is the whitelist for those sites would be made automatically.

I think a similar approach could possibly be used for trivial scripts.

Thoughts on trivial scripts:

What exactly doesn't fit in the definition of nontrivial? All that I can see are things that run before onload such as something like this:

   <script type="text/javascript">document.write("Some html here")</script>

Is it possible that it is easier to define trivial than it is to define nontrivial? And therefore easier to check for?


I don't see the libraries being a problem. If the library is not free, it will not load. Therefore if a script uses a library to load another script or make an ajax call, it would have to use a free library because any nonfree library has already been blocked.

One problem I see though, is how exactly to go about implementing the features. I don't mean how to do the detecting, but what to do AFTER the detecting. Do you have javascript disabled until you check the page, then reload it? Or is it possible that in defining what a trivial script is, we find a better solution? Or does someone else have a better idea right now?

I don't know if some of these ideas are feasible or even possible, just my thoughts.

Kyle

Can't install it on my FF 3.6.8 on Ubuntu 10.04 x86_64

Installing freescript from that xpi results in an error for me:

Firefox не смог установить файл с (Firefox couldn't install a file from)

«file:///tmp/freescript%20v0.0.1.xpi»

по следующей причине: Не найден сценарий установки (because: Install script not found) -204

Works now

Sorry, I packaged it wrong (really dumb mistake). I uploaded it again, it should work now.

Kyle

License filters?

It's just an idea for a release (I hope freescript will make it to a release).

Well, I suppose (if freescript will get to the point it will be able to block non-free scripts), there should be a filter based on scripts' licenses too. NoScript can block scripts based on which website they are located on, so freescript should add license filtering, right?

Because different people have different opinions on what is 'free software'. For example, RMS says 'free' == 'gives you the 4 freedoms (use/share/modify/share modifications)', and Debian claims 'free software' is 'compliant with DFSG'. Some BSDers may say "GPL isn't truly a free license, 'cause I will have to share my modifications under GPL too".

Well, so anyone can change their white/blacklists of licenses.

BTW, now it installed fine, and looks like it really blocked the "nonfree" test script.

D1337r 07:13, 24 August 2010 (UTC)

Trivial vs. Non-trivial Confusion

I am kind of confused on the methods (I'm not formally trained in programming). Would this be considered trivial or not? And why or why not so I might understand this a little better.

 function add(a, b){
   var answer = a+b;
   return answer;
 }
 window.onload = function(){alert(add(5, 3))}

What I am really wondering is: can a trivial script contain functions that may be quite complex?

For example is something like this trivial?: http://javascript.internet.com/games/blackjack.html

Also: Is something like onclick and onload considered a method and therefore would make a script non-trivial?

According to http://javascript.about.com/od/learnmodernjavascript/a/tutorial08.htm:

“all functions are actuually methods of the window object”

So, are all functions considered methods or not?

I have started a script that tests for trivialness, but am getting very confused on the methods part.

Kyle - 15:43, 25 August 2010 (UTC)

Will this help in trivial vs non-trivial?

Martin Bähr emailed me suggesting that Chrome Sniffer (a Chrome extension) could help us with detection of non-trivial scripts.

More info about it can be found here: http://www.nqbao.com/chrome-sniffer

Kyle 18:54, 4 September 2010 (UTC)

Updated Extension

Now includes an options window. Licenses can be enabled/disabled. New licenses can be added.

D1337r, I like your idea of license filters, and I think this will do what you were saying.

The extension uses regular expressions to detect whether the script is under a free license. It would be helpful if each license had a standard name that would be used (gpl-3 vs gplv3). Of course the regular expression could be used to make both work (/gpl[-v]3/), but it would just make things easier.

Currently I think that there are two major things missing: detecting trivialness and blocking inline scripts.

I have added a link to a script that I have started that detects trivialness, but it needs help (especially with the methods part). If someone could even just write a script that would do method detection, that would be great and would help a lot.

Kyle 18:54, 4 September 2010 (UTC)

Reporting a bug in 0.1.0

Yeah, this is what I was talking about.

But looks like there is a bug that disables NoScript's context menu (left or right click on NoScript's icon) D1337r 05:47, 11 September 2010 (UTC)

Fixed bug

I fixed the bug and posted the new version. The only difference is the bug fix, there are no new features.

Kyle 22:19, 25 September 2010 (UTC)

Version 0.2.0

The major change in the new version is that it now can check for trivialness.

New options: Allow all scripts, Allow trivial non-free scripts, chose method for trivialness check

Chose between two methods of checking for trivialness. The one that include checking for methods isn't very good. It checks for the word “function”. If the script contains that word, it blocks it. This method is good though if you want almost everything that does not have a free license to be blocked.

Does not appear to work in Firefox 4 beta 8 (at least didn't let me install when I set maxVersion to 4.0b9pre). I'm not sure what has to be done to get it to work.

Whitelists/blacklists may not be necessary, at least not like I had thought before. When I tested this version (very lightly) it didn't appear to slow anything down so it looks like only whole domain listing would really be useful. And maybe any files that are huge and slow to check for trivialness.

This version contains no support for white or blacklists, just some thoughts about the future.

I also created a new test page, it is a lot more informative than the old one.

Kyle 16:24, 7 January 2011 (EST)

Thoughts/questions by Stefan Monnier

First: is there a mailing-list or some more convenient place where we can track the progress of this project? (I'd expect a savannah.gnu.org/projects/freescript or something like that)

As a user who wants to defend himself from non-Free code, I don't like the idea of whitelists/blacklists of sites, since I don't want to have to trust/distrust sites, only code.

I do not like the idea of the license information being completely separate from the code itself, so I'd much rather it be in a @tag inside the javascript code than in a data-foo attribute on the HTML code, otherwise it's too easy for someone to cook up an HTML page that uses non-Free code but tags it as if it were Free. If we don't want to scan the whole code for @tags, then we can simply require that they be "at the beginning".

I'd rather avoid making more http requests, so I think that the license info should not be placed in some external page. The way I imagine it, it would be more like: the javascript code holds in @tags two URLs: one for the license document (maybe it should be a URI instead), the other for the source code. By license document I mean something like a standard license, so the user can specify a set of accepted licenses (or rather their URIs) and the browser will mostly never fetch the license document itself. If we want extra licensing info such as copyright holders and such, this can be included in the source file.

Stefan Monnier, May 18 2010

Reply to Stefan Monnier

First of all, there is no mailing list or anything else (that I know of anyway). This project is so slow moving that I am not sure that it is necessary.

I don't quite understand what you mean by not liking the data- tags. I understand the the extra http requests part, but I don't understand the part where someone could just tag a script as free. It seems to me that they could do the same thing with your @tags idea. And if they were to do either one of them, they would be then making the code free, wouldn't they? It is quite possible that I am misunderstanding something, though.

Your idea with the two URLs is very interesting and confusing. The thing I don't get is how does the javascript get used if there is only a link to it in the comments? I am very confused, so some more explanation would be appreciated.

As for the more http-requests. I don't think that it has appeared to make any difference as far as speed, but I suppose we could do something different. We could use the URIs of licenses (like you suggested) instead, and have that be what is in the data-licenses attribute. Then those would just have to be searched to see if there is a match.

Kyle 16:35, 4 June 2011 (EDT)

What's the roadmap for this project?

I am working on this project, but I haven't been using the version 0.2.0 developed earlier derived from NoScript. What is the status of this project and particular implementation? I've sent an email to each of the "interested parties" on the main project page and waiting for an answer. Also, I wouldn't consider functions to be nontrivial by default. In many cases in the past, the functions I've created are performing small changes/improvements to the DOM. Maybe the scope of the function is more important, since functions in the global scope might just be used to organize very trivial code in a more readable way. (That's the use i've made of it many times, although it might not be good practice.)


Loic Duros, 09/09/2011

lduros[at]member[dot]fsf[dot]org

Works with Firefox 6!

I found out that I messed up the version number when I tested it with Firefox 4, so it actually still works with Firefox 6. I put the max version at 10.* just so that it doesn't go out of date soon when Firefox 7 is released, but I also don't foresee the extension losing its compatibility anytime soon.

Kyle 15:06, 9 September 2011 (EDT)

ff 6

On my end it can't load in Firefox. I am taking a look inside the extension, but It might be a good idea, as I've started, to work on a solution from the ground up rather than tweak noscript.

Loic Duros 09/09/2011

Fixed

That's interesting that it worked if you installed it just by just copying it in the folder, but if you opened it with Firefox, it had troubles. Anyway, I figured it out, I had to removed the META-INF folder. It should install fine now.

Kyle 19:05, 9 September 2011 (EDT)

could install

Thanks, Kyle. It can now be added as xpi. It seems a little clearer to me now what I can work on. My list at this point:

1. adding some regex for the javascript parsing part (trying to make it more contextual).

2. working on some logic to tie scripts with each other (like a library with actual user code, etc, ... so that we can determine the free aspect of an app as a whole)

3. Recognizing all free licenses.

4. Notifications within the user interface.

At this point if anyone is planning to add anything to the script, please email: lduros [at] member [dot] fsf [dot] org

Waiting to get a slot on Savannah.

Loic Duros 09/10/2011

Why NoScript?

Why use NoScript as the starting point? There are other plugins, such as Javascript-Options, which might be useful starting points, no? Ciaran 08:57, 23 September 2011 (EDT)

Ciaran:
Yep, you are correct. I've been working on a brand new extension for the past 3 weeks now (mostly working on the parsing/semantic analysis part, but the extension is underway also). The main point to remember from NoScript is that it intercepts script loads. That's the only requirement that we need on our end. First working version scheduled for around mid to late october.
LD
Excellent. I'll check back in a few weeks to see if I can help with testing. Ciaran 08:33, 30 September 2011 (EDT)

Check out LibreJS

A few weeks ago, an add-on has been released to address The JavaScript Trap. If you are interested in this project, please check it out and join the mailing list: http://lduros.net/cgi-bin/mailman/listinfo/librejs