The Atlantic this week published a searchable dataset of almost 200,000 books used to train generative AI by major companies, without the permission of the authors (find it sans paywall here). This follows on a similar tool published recently by the Washington Post, covering the C4 linguistic corpus, which I wrote about at the time. The Atlantic article has prompted much justifiable outrage from authors over the ethics of scraping, but it also highlights some of the same problems found in the The Books3 database is less problematic than the C4 corpus, since the publication process does impose some friction on the worst extremist content, but the list includes Mein Kampf, most of the books by Italian fascist Julius Evola, and other red flags like the anti-Muslim dystopia Submission by Michel Houellebecq.
There’s also a host of material that, while properly part of the English literature canon, is considered problematic in various ways due to dated cultural mores about race, gender and more. The important thing to understand about the generative AI/LLM process is that it strips all this language of its cultural context and replaces it with a “neighboring words” context, which can be especially pernicious in the context of fiction. A work like To Kill A Mockingbird, whose racial content has been much discussed of late, can arguably be read and contextualized for what it has to say about its time and its characters, but any such attempt at nuance is stripped away by the machine. Even more complicated issues arise from books about anti-heros or outright villains—A Clockwork Orange is included in Books3, for instance, although its eccentric vocabulary probably helps mitigate its utility for chatbots.
Consider Harriet Beecher Stowe’s Uncle Tom’s Cabin, which is in the Books3 dataset. It’s a book about the evils of slavery, which it illustrates by depicting very racist characters, which is often deeply misunderstood even by human readers. An AI chatbot may assimilate the generalized criticism of slavery poorly, but it will absorb the racist language of the characters readily. If you teach an AI to speak using The Great Gatsby, you’re going to get a simulacrum of careless people who smash things up and let other people clean up the mess they had made.
There was a bumper crop of research this week, so I will close with a look at several new publications, and more to come in next week’s edition.
The Islamic State’s Shadow Governance in Eastern Syria Since the Fall of Baghuz
The power-byline of Aaron Zelin and Devorah Margolin with a look at how ISIS has attempted to maintain some semblance of governance in Eastern Syria. Given how critically important governance — and entitativity, in the form of the appearance of state-like status — were to Islamic State’s appeal, this is definitely a subject to watch.
Distinguishing Children From ISIS-Affiliated Families in Iraq and Their Unique Barriers for Rehabilitation and Reintegration
An exploration of a deeply complex and troubling issue—how to understand the situation of children whose parents joined ISIS. Islamic State’s horrifying exploitation of children in propaganda is only the tip of the iceberg. The children of ISIS adherents are victims who face profound challenges in almost any circumstance, but especially in captivity. Joana Cook tackles this important issue.
Selective and deceptive citation in the construction of dueling consensuses
If you have been following me in recent years, you’ve probably seen me talking about the social construction of reality, and the role of in-group consensus in fostering extremism. So you will not be surprised that I perked up when I saw this study, which looks at how social media users for and against COVID masking frame perceptions of consensus in order to persuade. Anti-maskers used misleading and fabricated citations to create a false impression of scientific consensus.
Risks and Challenges in Online Communities for 3D-Printed Firearms Among Extremists and Terrorists
Writing for GIFCT, Kyle Dent, Yannick Veilleux-Lepage and Maria Zuppello provide a useful rundown of 3D-printed firearms, the technology, its capabilities and history, its use by extremist groups, and the challenges of moderating the content online.
Performance Information and the Satisfaction of Different Social Groups: Citizen Evaluation by Racial Groups
This is slightly adjacent to extremism, but it’s got some interesting implications for in-group consensus construction. A study of Chicago public schools found that different racial in-groups had different perceptions of school performance, which is not surprising, but also that they assessed performance using different criteria and prioritizing different information. By Minjung Kim.
Why we fight: investigating the moral appeals in terrorist propaganda, their predictors, and their association with attack severity
The abstract on this paper —“How do terrorists persuade otherwise decent citizens to join their violent causes?”—had me SCREAMING. First, the paper doesn’t answer that question IN ANY WAY, and second, "otherwise decent people” is a huge and unjustifiable presumption when talking about the audiences for terrorist propaganda. THAT BEING SAID… the actual findings were quite interesting once you got past the wildly irresponsible lead. In-group loyalty themes were found across the sample studied, and along with in-group purity, which correlated to violence by the authoring movement. I have a lot of incipient thoughts about in-group purity, but until they are more, uh, cipient, I’ll leave you with the paper to chew on.
Whoa. This one left me mighty sad -- and I haven't read any of the source material yet!
My first reaction (a bit odd) was to think (for the first time in decades) about the ways we were rewarded in that first newswriting class for coming up with "grabber" ledes. It was fun, but as you point out, it can also be misleading and even disturbing...not to mention discouraging further engagement with the material!