With large language models needing quality data, some publishers are offering theirs at a price while others are blocking access
“It would be impossible to train today’s leading AI models without using copyrighted materials,” the company said this year in a submission to the UK’s House of Lords, adding that limiting its options to books and drawings in the public domain would create underwhelming products. AI labs construct large language models – the technology that underpins tools such as OpenAI’s leading chatbot – by using trillions of words taken from the internet, a vital resource for providing material that allows LLMs to understand text-based prompts and predict the right response to them.