Apparently, OpenAI and Google have used the transcripts of YouTube videos for AI

Google and OpenAI have used YouTube video transcripts to train some of their AI models, according to a new report.

It appears that OpenAI and Google have used YouTube video transcripts to train their AI models; An action that can lead to violation of the copyright laws of content creators.


The New York Times report points to extensive efforts by OpenAI, Google and Meta to maximize the amount of data that can be used to train AI models. This report was published just a few days after the interview of Neil Mohan, the CEO of YouTube with Bloomberg; An interview in which Mohan said that OpenAI's possible use of YouTube videos to train Sora's model is against the rules of this platform.


Using its voice recognition tool called Whisper, OpenAI is said to have converted more than a million hours of YouTube videos into text and used this data to train the powerful GPT-4 AI model.


The Information previously reported that OpenAI used YouTube videos and podcasts to train its two AI systems.


In an interview with the New York Times, Matt Bryant, Google's spokesperson, while referring to Google's rules that prohibit the unauthorized extraction or downloading of YouTube content, stated that she was unaware of OpenAI's use of this data.


Apparently, some people at Google knew that some companies were using YouTube data to train their AI models, but because Google was doing the same thing, they didn't take any action against OpenAI. Google says that in this context, it has only used the videos of channels that were satisfied to do so.


In June 2023, Google has requested one of its teams to update its privacy policy in order to use public content more widely, including Google Drive and Google Sheets documents, to train models and artificial intelligence products.