What is Happening with Books and AI?

I can no longer open my various inboxes without seeing something about AI: newsletters remarking on the success or failure of various AI tech and startups, urgent calls to explore AI tools designed for social media and content managers, headlines about how AI is being used in exciting and terrible ways across industries…So, while I hesitated before writing this piece and possibly contributing to your own AI-related spam, I can’t help but recognize that stories about AI’s effects on the book world are intensifying and worth paying attention to. Not only is it part of my job to remain curious and vigilant when it comes to these updates, it’s also interesting to anyone who pays close-ish attention to books and publishing.

As developments continue to crowd this intersection, I think what strikes me most is that they deal with so many different corners of the book world. AI is a technology pervasive enough to highlight some of the more subtle layers and facets that make up book production and book culture. It’s made me consider and reconsider the ethics of using AI as a professional publishing tool, the question of what defines creative work, and how AI might impact the average reader, to name a few philosophical exercises.

Even as I type, a new drama related to AI algorithms, and involving author reactions that mirror sentiments around the use of copyrighted works to train AI and Large Language Models (LLMs — we’ll get into this momentarily) in particular, is unfolding. To give you a sense of how quickly these controversies escalate, see this Gizmodo article recounting how a group of writers, including Jeff Vandermeer and Indra Das, pushed back against fiction analytics site Prosecraft after author Zach Rosenberg called it out for using copyrighted works to develop a data library. Rosenberg posted about it on X on August 7, and it quickly garnered attention and calls from authors to remove their books from the library. Later that very same day, after making attempts at damage control, Prosecraft developer Benji Smith voluntarily took the site down.

Though, as the Gizmodo article points out, Prosecraft isn’t exactly an LLM — the model at the heart of many headlines about authors versus AI — it’s not difficult to make connections between the escalation against the site and broader pushback and advocacy concerning the unauthorized and uncompensated use of human-created work used to train AI.

Like many others, my initial hands-on exploration of AI began with ChatGPT. Curious about how it worked, I read up on LLMs — an acronym theretofore unfamiliar to me. The thing to know about LLMs as we get into these stories is that they take in existing datasets (think books, articles, and other digitized resources) to output predictive text. They work with what they’ve got and what they’ve got is sometimes inaccurate, biased, or copyrighted.

This input method of scraping for data fueled a fire under the almost 8,000 writers who signed a letter to some of the biggest AI companies calling for them to stop using their works to train LLMs. While the letter, crafted by professional advocacy organization The Author’s Guild, collected signatures from authors with sizable platforms — authors like Alexander Chee and Nora Roberts — the most it could do was ask these companies to please compensate the people who authored these works. Those willing to spend the time and money on more aggressive measures, however, have filed lawsuits, taking companies to court over this issue.

Click here to continue reading this free article via our subscription publication, The Deep Dive! Weekly staff-written articles are available free of charge, or you can sign up for a paid subscription to get additional content and access to community features.