Microsoft Corporation (MSFT.US) and News Corporation Class B (NWS.US) have teamed up with HarperCollins, a subsidiary of News Corporation, to train AI models using a vast amount of book data.
20/11/2024
GMT Eight
According to sources, Microsoft Corporation (MSFT.US) has reached an agreement with HarperCollins Publishers, a subsidiary of News Corporation Class B (NWS.US), to utilize the latter's extensive non-fiction book resources to train its artificial intelligence models in order to enhance the quality and performance of the models. This collaboration is limited to using selected old books for model training and does not involve creating new books, with authors having the right to choose whether to participate.
Specifically, Microsoft Corporation hopes to incorporate HarperCollins books into its yet-to-be-announced artificial intelligence model to expand high-quality text sources, improve the accuracy of the model, and enhance its capability to provide professional knowledge. Although Microsoft Corporation declined to comment, HarperCollins has confirmed the agreement, stating that it will "allow limited use of selected non-fiction old books for training artificial intelligence models."
Furthermore, HarperCollins emphasized that the scope of this agreement is limited, with clear restrictions in place to ensure exemplary output that respects authors' rights, and authors can choose whether to participate.
"Our mission is to create opportunities for authors to think deeply while ensuring that the core value of their works and the shared income and royalties are protected," HarperCollins stated. "This agreement sets clear limits on outstanding works that respect authors' rights, successfully achieving this goal."
It is understood that tech companies have been seeking more high-quality text sources to train artificial intelligence models, and companies like Microsoft Corporation are no exception. By obtaining licenses to use a range of data from social media websites to news articles, they aim to make their programs more accurate and better at answering questions or providing specialized knowledge on specific topics.
It is worth mentioning that News Corporation had previously signed an agreement with OpenAI allowing the use of content from its various publications. Microsoft Corporation has also collaborated with several publishers on artificial intelligence projects.
Additionally, earlier this year, Alphabet Inc. Class C reached a $60 million agreement with Reddit, allowing the search giant to utilize a large number of subreddits to train its AI models.
However, some publishers have expressed dissatisfaction with the unauthorized use of their content by artificial intelligence companies and have filed lawsuits. For instance, the New York Times Company Class A sued OpenAI and Microsoft Corporation for copyright infringement.
In conclusion, the agreement between Microsoft Corporation and HarperCollins marks another significant advancement for tech companies seeking high-quality text sources to train artificial intelligence models. However, how to respect authors' rights while utilizing these resources remains a challenge that publishers and tech companies need to address together.