Sunday, December 31, 2023

I'd forgotten to do this. [see Jan.]

Mirowski, "The Evolution of Platform Science"
Draft 2.0, April 2023

There is a massive literature explaining how large language models comprised of neural networks actually function; but most of the crucial points can be made without venturing into theweeds of computer science. These chatbots are based on large scale statistical exercises, trained upon truly massive datasets. They are language models because they extract and report specific words (sentences, paragraphs, etc.) that would most likely follow on from the inputs supplied by the interlocutor. In the first instance, there is nothing at all present that could be graced with the term ‘intelligence’; rather, it is an overgrown word autocompletion algorithm, not so very different from one that you might find on your phone or word processor. But that is only in the first instance; chatbots are not entirely automated in their so-called ‘deep learning’ process, but are also subject to the intervention of actual human beings at various junctures in their training regimens. This is revealed in Figure I, taken directly from the website of Open AI

image source: 

Far from ‘improving’ the chatbot outputs, what these interventions do is feed further noise and indeterminacy to the system. These workers (not just ‘labelers’, pace the Figure) take the statistical results and censor or skew them according to principles which are nowhere present in the software or the underlying data sets. For instance, what the low-paid censor considers lifelikeor transgressionary may vary widely according to local standards and different objectives than any straightforward criteria of ‘intelligence’. In other words, neither the language model  protocols nor the guidelines imposed by the human censors are structured to achieve any  particular epistemic ends or values; and indeed, the criteria of the humans dragooned into the  process may diverge in profound ways from those built into the statistical algorithms. Further, if any output is deemed embarrassing or problematic for any of a smorgasbord of random external reasons dictated by the executive suites, then the censors are mobilized once again to prevent the algorithms from delivering those results. At least of this writing in April 2023, the output is a hodgepodge.

It gets worse

No comments:

Post a Comment

Comment moderation is enabled.