This week, fierce backlash from authors on Twitter led to the shutdown of a platform called Prosecraft — a website that its creator Benji Smith wrote was “dedicated to the linguistic analysis of literature.” Prosecraft housed a database of around 25,000 books, and various versions of the site have been around for years, but it seems that many authors have only recently learned how much of their work and data had been fed into it without their consent.
In a 2018 blog post describing Prosecraft and Shaxpir, another writing program he was working on, Smith said that both used “machine-learning [AI] algorithms” mined from the books he was uploading to “recognize which kinds of words can be used in which kinds of contexts.”
“We used AI algorithms to analyze a huge corpus of literature, building a unique new thesaurus, just for fiction authors,” he wrote in 2018. Smith does not explain where or how he obtained the content for Prosecraft’s “giant library of literature.”
It’s worth noting that Prosecraft does not seem to have been generative AI, comparable to ChatGPT. Smith describes using AI to scrape books and produce specific analytics — word count, percentage of passive voice, “vividness” and adverb usage — which Smith would then publish because he thought it would help aspiring writers better understand how to write successful books. Smith also fed the data AI pulled via Prosecraft to train his other project, Shaxpir, to build out the thesaurus and word choice suggestion functions.
Writers were furious. Gretchen Felker-Martin, author of Manhunt, called Prosecraft a “content mining site” posting other writers’ works “so that it can be indifferently plagiarized by anyone.”
I've just discovered that MANHUNT has been uploaded to a content mining site so that it can be indifferently plagiarized by anyone who wants to feed it into their so-called "AI".@benji_smith, I demand you remove my work from your site immediately.
— Gretchen Felker-Martin (@scumbelievable) August 7, 2023
“I love it when people pirate MANHUNT, I really do. No irony,” she continued in another tweet. “The more people who read it, the happier I am. But using a speak n’ spell to strip it for parts so morons can ‘write’ their own ‘books’ on the back of my labor? F*** that and f*** you.”
How DARE you, @benji_smith
— Zach Rosenberg’s Debut Is Out Now! (@ZachRoseWriter) August 7, 2023
I demand you take my book off your site immediately. I do not consent to this, and never did. And I know my publisher never would pic.twitter.com/QvPkRme5pr
This company Prosecraft appears to have stolen a lot of books, trained an AI, and are now offering a service based on that data https://t.co/76jxgaA9TP
— Hari Kunzru (@harikunzru) August 7, 2023
This is where AI is a danger to authors. This website has uploaded thousands of novels to mine for linguistics without permission and in breach of copyright. If your novel is on it, email the developer and tell them to take it down: support@shaxpir.comhttps://t.co/xf1nSGtdzl
— Michelle Davies (@M_Davieswrites) August 7, 2023
Two of my novels on this site, with analysis of passive voice vs. active voice, adverbs etc. Horrified. https://t.co/W4dgcmdo2X
— C.J. Cooke (@CJessCooke) August 7, 2023
Celeste Ng, the author of Little Fires Everywhere, tweeted she’d combed through the platform and counted at least 20 Stephen King novels and 20 Jodi Picoult books uploaded to Prosecraft.
“I’m not worried about AI writing being *better* than works by actual humans,” she elaborated in a follow-up. “But I *am* worried about companies (wrongly) thinking they can replace human writers with AI.”
Ooof, Prosecraft nabbing TWENTY @StephenKing books for their for-profit AI-scraping “writing” site is likely not the visionary business move they might have imagined. pic.twitter.com/athfmMW2kK
— Keith Rosson (@keith_rosson) August 7, 2023
Smith attempted to reply to some of the tweets, issuing an apology and agreeing to take down the works of any authors who didn’t want them listed on the site.
Indrapramit Das, another author, whose book, The Devourers, was included on the site, replied, “I think you can safely assume that the default for any artist or writer is ‘doesn’t want them to be there’ (there being any AI training project) unless you have their written and confirmed consent.”
In a blog post published Aug. 8, in which Smith announced he would be shutting down Prosecraft, he said he’d been working on the project since the summer of 2017, when he began writing a memoir.
“It was my first book, and I didn’t know how many words I should write,” he explained. “I had heard that ‘real books’ should be about 100,000 words. I searched the internet for more specific guidance but I didn’t find much.”
He claimed he researched copyright laws in an effort to be “mindful of not wanting to hurt or offend the community of authors that I care so much about” and alleged he was “honoring the spirit of the Fair Use doctrine” by not reaching out to the authors or publishers.
Smith also alleged Prosecraft “has never generated any income,” although a short story writer, Lincoln Michel, pointed out in one of Smith’s own posts that he’d described the project as “a demo of a feature in the premium tier” of his writing software, Shaxpir.
“The purpose was to make money,” Michel argued.
According to Smith’s 2018 blog, features of Shaxpir were built using Prosecraft’s database, and those features were only available to paying subscribers.
I was already talking to my lawyers, because my work was on there too https://t.co/yu0BB05xek
— Bolu Babalola is technically on leave 🍯&🌶 (@BeeBabs) August 7, 2023
Smith denied that Prosecraft is a “shadow library” and claimed that he “had never heard” of the concept. Shadow libraries, a relatively new legal term in copyright law, are online databases that provide users with access to millions of books and articles, usually ones that are hard to find or paywalled. Quartz has commented that shadow libraries “often infringe on copyrighted work and cut into the publishing industry’s profits,” which is why governments around the world have started to crack down on them.
While Smith insists Prosecraft is not a shadow library, Dilan Thampapillai, the incoming dean of law at the University of Wollongong in Australia, explained in a post that copyright infringement depends on “human actions” only.
“If a person undertakes the act of copying a book to place it in a shadow library, this amounts to an act of copyright infringement,” Thampapillai wrote.
“In the future, I would love to rebuild this library with the consent of authors and publishers,” Smith concluded. “I truly believe these tools are useful for creative people. But now is not the right time.”
In July, thousands of writers, including Margaret Atwood and James Patterson, added their names to a letter organized by the Authors Guild addressed to the CEOs of several AI companies. The Authors Guild is the largest professional writers’ organization in the U.S.
The letter called attention to the “inherent injustice in exploiting our works as part of your AI systems” and described how their copyrighted books were being used as “food” for AI systems.
“The introduction of AI threatens to tip the scale to make it even more difficult, if not impossible, for writers — especially young writers and voices from underrepresented communities — to earn a living from their profession,” the letter says.
In The Know by Yahoo is now available on Apple News — follow us here!
The post What was Prosecraft? Platform creator gets pressure from Book Twitter to shut it down appeared first on In The Know.
What was Prosecraft? Platform creator gets pressure from Book Twitter to shut it down
Pinoy Variant