I’ve been working on creating an interactive version of my book using the GPT. This has entailed splitting the book into one text file per chapter, then trying out different versions of the GPT to produce summaries. This has been far more interesting than I expected, and it has some implications on Foundational models.
The versions of the GPT I’ve been using are Davinci-003, GPT-3.5-turbo, and GPT-4. And they each have distinct “personalities.” Since I’m having them summarize my book, I know the subject matter quite well, so I’m able to get a sense of how well these models summarize something like 400 words down to 100. Overall, I like the Davinci-003 model the best for capturing the feeling of my writing, and the GPT-4 for getting more details. The GPT-3.5 falls in the middle, so I’m using it.
They all get some details wrong, but in aggregate, they are largely better than any single summary. That is some nice support for the idea that multiple foundational models are more resilient than any single model. It also suggests a path to making resilient Foundational systems. Keep some of the old models around to use an ensemble when the risks are greater.
Multiple responses also help with hallucinations. One of the examples I like to use to show this is to use the prompt “23, 24, 25” to see what the model generates. Most often, the response continues the series for a while, but then it will usually start to generate code – e.g. “23, 24, 25, 26, 27, 28];” – where it places the square bracket and semicolon to say that this is an array in a line of software. It has started to hallucinate that it is writing code.
The thing is, the only elements that all the models will agree on in response to the same prompt repeated multiple times are the elements most likely to be trustworthy. For a model, the “truth” is the common denominator, while hallucinations are unique.
This approach makes systems more resilient for the cost of keeping the old systems on line. It doesn’t address how a deliberate attack on a Foundational model could be handled. After all, an adversary would still have exploits for the earlier models and could apply them as well.
If all models lined up and started to do very similar things, that could be a sign that there was something fishy going on, and a cue for the human operators of these systems to start looking for the nefarious activity.