AI, Microsoft Copilot, and Copyright

Firstly, a Happy New Year to everyone – I’m sure that you have amazing plans for 2024, and wish you the best of luck with them!

Over the last few months, I’ve been keeping a close eye on Copilot (since it was announced as being in GA), and various happenings around the wider AI scene. It seems almost impossible to find someone who is NOT aware of the OpenAI happenings towards the end of 2023, and so many more conversations now include mention of ChatGPT, Azure OpenAI, etc.

But there’s one item I’d like to pick up on & discuss, which given the news events of last week, is extremely pertinent. This is the topic of copyright, which is a very important topic to understand. However in order to understand it properly, we need to understand how AI offerings are generated in the first instance.

All of the various AI offerings rely on LLM’s. This stands for ‘Large Language Model’, and these are deep learning models that are pre-trained on vast amounts of data, and by vast, the number of data points are truly staggering:

Incidentally, users should consider the use case that they’re using AI for, and look to use the best optimum model for it. As an example – if wanting to find out the best routine for making a cup of coffee, it is unlikely that a GPT-4 model would be neededa suitable model could be one with less parameters in it. This is in part due to the amount of resources needing to be used to run queries on more advanced data models.

When using an AI offering, these datasets are used to present answers back to the users. Some AI models are not limited to just their LLM datasets, and are able to actively trawl & access websites as well for more up to date information.

But there are two inherent problems with the way that this can work:

Data

When users interact with chatbots or other AI capabilities, they’re inputting data into them. This data could be used by the AI capability to further train models forward, ingesting the data provided to them. Given that data could be sensitive or proprietary, this can be problematic. Not all AI organisations use data just for processing, as has been discovered by various users.

Copyright

Given that responses back to users (whether in text format, image format, or other formats) are based on the datasets from the underlying LLM’s, users could potentially be provided with information that is actually copyright, and which they’re not able to use. This is very problematic, as it can result in users passing off material as their own, whilst it actually belongs to someone else.

Microsoft’s approach

Microsoft’s approach to AI capabilities has been made extremely clear. You may be familiar with seeing slides similar to the following slide:

What this means is the following:

  • Microsoft will not use any customer data to train AI models & capabilities for any standard AI offering
  • Any custom Copilots created by Microsoft customers will remain their own – Microsoft will not use data or capabilities from customer created collateral within the Microsoft AI offerings. This means that a bespoke Copilot will only offer its functionality to the customer that created it – other organisations, even within the same sector & creating Copilots themselves, will not benefit from this

Microsoft has also confirmed publicly that any information generated through the usage of Copilot or Azure OpenAI can be used without concern about copyright claims (Microsoft announces new Copilot Copyright Commitment for customers – Microsoft On the Issues). In fact, Microsoft has even gone so far as to say that Microsoft will assume responsibility for any potential legal risks involved.

If a third party sues a commercial customer for copyright infringement for using Microsoft’s Copilots or the output they generate, we will defend the customer and pay the amount of any adverse judgments or settlements that result from the lawsuit, as long as the customer used the guardrails and content filters,

Brad Smith, Microsoft Chief Legal Officer

This is quite important – customers are able to use Copilot & Azure OpenAI capabilities, and be assured that they will not have to be concerned about copyright issues or challenges. There are of course some conditions around this, in the way that prompts & interactions need to be handled (see Customer Copyright Commitment Required Mitigations | Microsoft Learn for further information on this).

Microsoft has this called out specifically within their Universal License Terms, available to view in full at Microsoft Product Terms.

Recent news events

With the announcement last week that the New York Times has filed suit against the OpenAI Corporation & Microsoft, this is very timely to look at (New York Times Sues OpenAI and Microsoft Over Use of Copyrighted Work – The New York Times (nytimes.com)).

The implications of such a lawsuit will affect how AI capabilities will be able to be created & used on an on-going basis. Copyright is of course very important to respect, and it will be quite interesting to see how this plays out. Having taken a look at some of the material included in the lawsuit, there are most definitely similarities between the New York Times information, and the AI generated output.

So, in my opinion, this is going to be a very interesting space moving forward, and I look forward to seeing how it goes, and any effects that it has on the usage of AI within organisations.