Tag Archives: Bias

Public red-teaming and trust

DEF CON is one of the most important hacker conferences worldwide, held yearly in Las Vegas. This coming August, it will host a large simulation, in which thousands of security experts from the private sector and academia will be invited to compete against each other to uncover flaws and bias in the generative large language models (LLMs) produced by leading firms such as OpenAI, Google, Anthropic, Hugging Face, and Stability. While in traditional red-team events the targets are bugs in the code, hardware, or human infrastructure, participants at DEF CON have additionally been instructed to seek exploits through adversarial prompt engineering, so as to induce the LLMs to return troubling, dangerous, or unlawful content.

This initiative definitely goes in the right direction in terms of building trust through verification, and bespeaks significant confidence on the part of the companies, as it can safely be expected that the media outlets in attendance will be primed to amplify any failure or embarassing shortcoming in the models’ output. There are limits, however, to how beneficial such an exercise can be. For one thing, the target constituency is limited to the extremely digitally literate (and by extension to the government agencies and private businesses the firms aspire to add to their customer list): the simulation’s outcome cannot be expected to move the needle on the broad, non-specialist perception of AI models and their risks in the public at large. Also, the stress test will be performed on customized versions of the LLMs, made available by the companies specifically for this event. The Volkswagen emissions scandal is only the most visible instance of how one may exploit such a benchmarking system. What is properly needed is the possibility of an unannounced audit of LLMs on the ground in their actual real-world applications, on the model of the Michelin Guide’s evaluation process for chefs and restaurants.

In spite of these limitations, the organization of the DEF CON simulation if nothing else proves that the leading AI developers have understood that wide-scale adoption of their technology will require a protracted engagement with public opinion in order to address doubts and respond to deeply entrenched misgivings.

Fear > Loathing: a broader net for dangerous speech online

The Psychology of Technology Institute reports on research conducted by Professor Kiran Garimella of Rutgers on content moderation and the design of online platforms.

Specifically, Garimella studied ‘fear speech’, a type of online activity that has many of the negative connotations of ‘hate speech’ but, by eschewing the latter’s inflammatory tone and ad hominem attacks, is much harder to counter, both in terms of devising automated systems to identify and remove it, and in discursive terms to expose and debunk it. The spread of false, misinformed, or decontextualized statements (generally, but not exclusively, pertaining to some stigmatized group) does not automatically result in the naming and shaming of responsible parties or in calls to action (or violence), but it sets the stage, forms or reinforces a general climate of opinion, comforts an audience in its prejudices and normalizes extreme assessments of states of fact on which further extremist appeals may be built.

One of the reasons fear speech bypasses many of the roadblocks we have erected to the spread of hate speech is that its rhetorical form is one we are extremely familiar and comfortable with in our heavily technological and risk-laden societies. What is unique to fear speech is its ability to mix the epistemic thrill of conspiracy theory with the practical, down-to-earth, and seemingly neutral tone of the PSA. After all, this is not much of a novelty: prejudice has often been couched in prudential terms— “not all members of a stigmatized group may be nefarious and vicious, but when in doubt…”.

The implication of Garimella’s research is that, if we are serious about removing dangerous speech from our online environments, it is necessary to cast a wider net, focusing not only on the narrow band of clearly and openly aggressive hate speech, but addressing its precursors as well, the false, baseless and irrational fears out of which hate grows, and its mongers. This position, in turn, dovetails with the Psychology of Technology Institute’s own contention that design should be the focus of information governance, rather than downstream content moderation.

This position is closely argued, prima facie reasonable, and clearly germane to the struggles many organizations, big and small, private and public, have with speech content. I fail, nonetheless, to be completely convinced. For one thing, freedom of speech concerns become salient as soon as obvious threats and incitement are taken out of the equation. In order to label fears as false or groundless, it would become necessary to lean even more heavily into fact-checking, a process not without its own political pitfalls. Moreover, the unequal distribution of risk propensities on different issues within a polity surely must be a legitimate basis for political sorting, organization, and choice. However we may feel normatively about it, fear is a longstanding mechanism for political mobilization, and furthermore certain scholars (such as George Lakoff, for example) have claimed that its use is not symmetrical along the political spectrum, which would lend these proposals a (presumably unwelcome) partisan slant.

I believe that in considering fear speech and its possible limitations, it is helpful to begin with the motivations of the three parties: the speaker, the audience, and the information overseer. Specifically, what is the goal pursued in attempts to curtail fear speech? Is it to silence bad actors / stigmatize bad behaviors? Is it to prevent mutual discovery and community-building between people already convinced of beliefs we find objectionable? Or is it to shield the malleable, the unwary, the unattentive from misleading information that may lead them eventually to embrace objectionable beliefs? Empirically, information overseers (a national government, say, or a social media platform) may well hold all three, perhaps subsumed by the overriding imperative to preserve the reputation of the forum. But analytically it is important to distinguish the motivations, so as to understand what a proposed remedy portends for each, how it affects the incentives of each type of speaker and each segment of audience. And the key consideration in this respect is how it impacts their likelihood of voice and/or exit, visible protest vs. the redirection of communication in another forum. Only on this basis is it possible to evaluate the feasibility or desirability of curbs to online fear speech.

Workshopping trust and speech at EDMO

It was a great pleasure to convene a workshop at the European Digital Media Observatory today featuring Claire Wardle (Brown), Craig Matasick (OECD), Daniela Stockmann (Hertie), Kalypso Nicolaidis (Oxford), Lisa Ginsborg (EUI), Emma Briant (Bard) and (briefly) Alicia Wanless (Carnegie Endowment for International Peace). The title was “Information flows and institutional reputation: leveraging social trust in times of crisis” and the far-ranging discussion touched on disinformation, trust vs. trustworthiness, different models of content moderation, institutional design, preemptive red-teaming of policies, algorithmic amplification, and the successes and limits of multi-stakeholder frameworks. A very productive meeting, with more to come in the near future on this front.

Geopolitical splintering, decentralization, impartiality

Meta and Twitter have discovered and dismantled a network of coordinated inauthentic behavior spreading pro-US (and anti-China/Russia/Iran) narratives in Central Asia and the Middle East (Al Jazeera, Axios stories). Undoubtedly, this kind of intervention bolsters the platforms’ image as neutral purveyors of information and entertainment, determined to enforce the rules of the game no matter what the ideological flavor of the transgression may be. In a way, paradoxically, such impartiality may even play well in Washington, where the companies would certainly welcome the support, given the current unfavorable political climate.

The type of universalism on display in this instance harkens back to an earlier era of the internet, the techno-libertarian heyday of the 1990s. Arguably, however, that early globalist vision of the world-wide web has already been eviscerated at the infrastructural level, with the growth of distinctive national versions of online life, in a long-term process that has only been made more visible by the conflict in Ukraine. Hence, the impartiality and universality of Meta and Twitter can be seen ultimately as an internal claim by and for the West, since users in countries like Russia, China, or Iran are unable to access these platforms in the first place. Of course, geopolitical splintering was one of the ills the web3 movement set out to counter. How much decentralization can resist the prevailing ideological headwinds, however, is increasingly unclear. Imperfect universalisms will have to suffice for the foreseeable future.