Group: Copilot Watch Group/Research
- Note: This is a work-in-progress. Please add your research and findings below. We are just trying to put things in a public-facing location, so that others can participate. Thank you!*
To find out what is bad about copilot, and to what extent. Please add any research you have done below, as we are trying to understand better what Copilot is doing and in what ways it is bad for free software and its users.
Five papers on the implications of copilot were published by the Free Software Foundation and are available at: https://www.fsf.org/news/publication-of-the-fsf-funded-white-papers-on-questions-around-copilot
- This video shows a person typing comments: <https://yewtu.be/watch?v=DeO7xLXORpY>
- "Do not use" (discusses copyright): <https://yewtu.be/watch?v=b9u3ZAGQmT0>
- Recent and discusses copyright and licensing: <https://yewtu.be/watch?v=CHvIOgSFp9I>
- Year old, but shows large portions of verbatim copying, including the license: <https://yewtu.be/watch?v=xxX7dpYSClQ&t>
- Another video of large portions of code being copied, including the license: <https://nitter.it/moyix/status/1432085687365513225>
Mentions in Publications or Talks
Richard Stallman discusses the implications of copilot briefly in his talk "The state of the free software movement." See from timestamp [0:38:24] on LibrePlanet:Conference/2022/Transcripts/RMS-state-of-free-software
- Do GitHub's updated terms of service conflict with copyleft?
- Ethics in, ethics out -- promote user-respecting software development platforms
- What Is GitHub Copilot And Why Are Developers Hating On It?
- GitHub Copilot AI Improved, Offered as API: 'A Taste of the Future'
- Evaluating Large Language Models Trained on Code
- Minecraft’s Code-Writing AI Points to the Future of Computers
- GitHub Copilot and open source laundering
Notable from video
- Users may just type comments of functions they want, and the code appears for them below
- (May be out of date but...) license text can be "summoned" verbatim (but line by line). This could be problematic even if it wasn't verbatim because what if a user just copied most, but "tweaked" a few things in, for example, the GPL.
Not yet known
- How many lines of code can be copied verbatim? (Certainly, the license copy example is too much) And, how many lines can be copied?
- To what extent are the keystrokes etc. of copilot users being tracked and analyzed by Microsoft?
(These need to be verified and backed up with research. Please provide links, if you have them.)
- People say, (and even GitHub's own Web site says,) "0.1% of the time, the code is verbatim"
- Certainly, copilot never tells you what code any of this is being copied from (nor its license) no matter how many lines of code.
- Copilot is a SaaSS