LLMs such as the GPT are very simple in execution. A textual prompt is provided to the model as a sort of seed. The model takes the prompt and generates the next token (word or word fragment) in the sequence. The new token is added to the end of the prompt and the process continues.
In recent, “foundation” models, the LLM is capable of writing sophisticated stories with a beginning, middle and end. Although it can get “lost,” given enough of an input prompt, it goes in the “right direction” more often than not.
I’ve been wondering about that interaction between the growing prompt and the different layers of the model. The core of a transformer is the concept of attention, where each vector in an input buffer is compared to all the others. Those that match are amplified, and the others are suppressed.
All LLMs take an input as a series of tokens. These tokens are indexes into a vector dictionary. The vectors are then placed into the input buffer. At this point, attention is applied, then the prompt is successively manipulated through the architecture to find the next token, then the process is repeated. A one-layer LLM is shown below:
The LLaMA 70b LLM model by Meta has 32 transformer layers. This means that the output of one layer is used as the input to the next layer. This is all in vector space – no tokens. Because attention is being applied at each layer, the transformer stack is finding an overall location and direction of the current input buffer and using that as a way of finding the next token.
In addition, recent large LLMs have a number of other tweaks that Meta discusses in LLaMA: Open and Efficient Foundation Language Models. For example, LLaMA uses Grouped-query attention, which shares single key and value heads for each group of query heads:
This means that there is an overall reduction in the dimensionality of the “space” as you move from input to output. This means that there are fewer vectors to compete for attention. Something resembling concepts, themes, and biases may emerge at different transformer layers. The last few layers would have more to do with calculating the next token, so the map, if you will is in the middle layers of the system
Because each layer feeds into a subsequent layer, there is a fixed relationship between these abstractions. Done right, you should be able to zoom in or out to a particular level of detail.
This space does not have to be physical, like lat/lon, or temporal, like year of publication. It can be anything, like the relationship between conspiracy theories. I’ve produced maps like this before using force directed graphs, but this seems more direct and able to work naturally at larger scales.
Turning this into human-readable information will be the challenge here, though, I think the model could help here as well. The manifold reduction would try to maintain the relationship of nearby vectors in the transformer stack. Some work in this direction is Language Models Represent Space and Time, which works using LLaMA and might provide insight into techniques. For example, it may be that the authors are evolving prompt vectors directly, using a fitness test that calculates the distance between a generated year or lat/lon and the actual lat/lon, then uses that difference to make a better prompt. Given that they have a LLaMA model to work with, they could do backpropagation, or conceivably, a less coupled arrangement like an evolutionary algorithm. In other words, the prompt becomes the mapping function.
One might assume that unequivocal cries for justice and equal rights would have, by now, uprooted age-old systems of racism and discrimination. And yet, we still find ourselves unable to break free from a vicious cycle of prejudice, resulting in suffering and social divisions. This ongoing battle raises the question: what are the real roots of racism, and how do they connect with our collective history?
To understand this, we can turn to the concept of “the great chain of being” – a philosophical and religious hierarchy that has influenced Western thought for centuries. This idea, rooted in classical and medieval ideas of order, places God at the top, followed by angels, humans, animals, plants, and minerals at the bottom. It is a concept that has seeped into every aspect of life, from politics and religion to art and science. There is little doubt that the great chain has, in many ways, helped shape how we perceive ourselves and others around us – but it has also created deep divisions in society.
This ancient idea has continues to influence our understanding of race. The idea that there exists a natural order in society in which some races take precedence over others is strikingly similar to the great chain of being. This is where the connection between white supremacy, anti-Semitism, and anti-blackness lies – with each group positioning themselves in relation to others on this perceived chain.
For example, white supremacists have long been targeting Jewish people as a means to further their concept of “racial purity.” What the supremacists resent is the perception of an undeserving Jewish race holding a position above them in the Great Chain – a situation that, in their eyes, goes against the natural order. This animosity has given rise to anti-Semitic conspiracy theories, often portraying Jewish people as sinister puppet-masters controlling world events to their advantage.
In contrast, anti-blackness thrives on the fear that social progress may result in black people levelling up on the chain. White supremacists deploy various tactics to keep black people in their “place” – through the implementation of unjust legislation, social exclusion, or the perpetuation of stereotypes. This behavior can be found throughout history, such as Apartheid in South Africa, or the segregation policies in the United States.
Interestingly, the behaviors associated with the Great Chain, manifest themselves in our close evolutionary relative, the chimpanzee, which is also known to exhibit dominance hierarchies. Dominance hierarchies help maintain social order among chimpanzees by defining who has access to resources and mates.
As members of the Great Apes, most closely related to chimpanzees, we carry within us a primordial legacy that predisposes us toward hierarchical behavior. This “deep bias” is not merely the result of sociopolitical constructs, but an intrinsic characteristic etched into our very genes. The divine right claimed by one group to hold sway over another is an echo of our evolutionary past.
To truly reckon with racism and social dominance, we must come to terms with our biological heritage and take account of these primitive urges. It is vital to recognize that simply dismantling existing social hierarchies will not be enough; we must also actively put structures in place that counteract this default behavior. The very same effort, if not more, poured into creating and sustaining anti-black systems need to be directed toward building durable systems that prevent the re-emergence of similar oppressive beliefs.
These new structures will require constant vigilance and support to endure the challenges they face. The gravitational pull of our genetic heritage always threatens to drag us back into a world ruled by notions like the great chain of being if left unchecked. By acknowledging and addressing these deep-rooted biases as an eternal struggle rather than steady progress, we take a crucial step towards making a more resilient, successful, less self-destructive society.
You know how there is this idea that we are all living in a Simulation? Everyone from Elon Musk to Neil deGrasse Tyson has pontificated about this.
I’ve been wondering about this for years. First, and most importantly, is what does that mean? Are we in a deliberately designed piece of “software?” That sounds a lot like religion where we become important because we are deliberate creations of, in this case, a programmer god. Because if that’s true, then we would feel important in the same way that Genesis makes us feel important.
But if we’re a simulation, what does that say about the universe(?) that the simulation is running in? Do we take up more than a miniscule fraction of the computational bits in it? That seems unlikely. According to my brief search with The Google, the universe has 6*10^80 bits of information. All the information in the world today is about 10^24 bits. So if we took all the information processing in the world, the biggest thing we could deliberately simulate would be about 0.00000000000000000000000000000000000000000000000000000001 the size of the current universe. Sorry if I got the number of zeros wrong 🙂
I find it hard to imaging that resources on that (large for us, tiny for everything else) scale would ever be brought to bear on creating a purpose-built universe. Maybe our god is a computer hobbyist in their parent’s basement? So make that tiny number in the previous paragraph even smaller. Sounds like a puny god to me. And there are probably more likely alternatives.
If we are a simulation, it’s got to be an accident. Maybe we’re a persistent pattern in weather simulation in that bigger universe. But just as likely, we could be a persistent pattern anywhere, like a whirlpool in a stream. At which point, we have to ask ourselves what does the idea of a simulation even mean? To me, it sounds more like we are emergent parts of a larger system, just like it seems we always have been.
This doesn’t mean that we couldn’t look for traces of the larger pattern that may contain our pattern. The way that information moves through our universe would say a lot about the containing universe. For example, I’ve written simulations for any number of things that have physics-based laws in them. But my “surrounding” universe can manipulate the smaller universe in ways that break its rules. I can influence anything, anywhere in the simulation immediately. For me, outside the simulation, the “speed of light” does not apply. Do we ever see examples of that sort of manipulation in our universe? If not, then why not? Artifacts from an enclosing universe should be detectable.
As you can see, this is an itch that I’ve been scratching at for many years. I’d love to see it get under your skin too.
This quote comes from a Washington Post article on how the Ukraine war is affecting development of AI-powered drones. I think it generalizes more broadly to how disadvantaged groups are driven to embrace alternatives that are outside conventional norms.
Ukraine doesn’t have the ability to fight the much larger Russia. Russia may have issues with corruption and the quality of its weapons, but it has a lot of them. And from the perspective of Ukraine, Russia has an infinite number of soldiers. So many that they can be squandered.
The West is providing Ukraine with enough weapons to survive, but not enough to attack and win decisively. I’ve read analysis where experts say that weapons systems are arriving just about as fast as Ukraine can incorporate them, but the order of delivery is from less-capable to more capable. They have artillery, but no F-16s, for example.
As a result, Ukraine is having to improvise and adapt. Since it is facing an existential risk, it’s not going to be too picky about the ethics of smart weapons. If AI helps in targeting, great. If Russia is jamming the control signals to drones, then AI can take over. There is a coevolution between the two forces, and the result may very well be cheap, effective AI combat drones that are largely autonomous in the right conditions.
Such technology is cheap and adaptable. Others will use it, and it will slowly trickle down to the level that a lone wolf in a small town can order the parts that can inflict carnage on the local school. Or something else. The problem is that the diffusion of technology and its associated risks are difficult to predict and manage. But the line that leads to this kind of tragedy will have its roots in our decision to starve Ukraine of the weapons that it needed to win quickly.
Of course, Ukraine isn’t the only smaller country facing an existential risk. Many low-lying countries, particularly those nearer the equator are facing similar risks from climate change – both from killing heat and sea level rise. Technology – as unproven as combat AI – exists for that too. It’s called Geoengineering.
We’ve been doing geoengineering for decades of course. By dumping megatons of carbon dioxide and other compounds in the atmosphere, we are heating our planet and are now arriving at a tipping point where potential risks are going to become very real and immediate for certain countries. If I were facing the destruction of my country by flooding and heat, I’d be looking at geoengineering very seriously. Particularly since the major economies are not doing much to stop it.
Which means that I expect that we will see efforts like the injection of sulfate aerosols into the upper atmosphere, or cloud brightening, or the spreading of iron or other nutrients to the oceans to increase the amount of phytoplankton to consume CO2. Or something else even more radical. Like Ukraine, these countries have limited budgets and limited options. They will be creative, and not worry too much about the side effects.
It’s a 24/7 technology race without a finish line. The racers are just trying to outrun disaster. And no one knows where that may lead.
I’ve been reading Metaphors we live by. It’s central idea is that most of our communication is based on metaphors – that GOOD IS UP, IDEAS ARE FOOD, or TIME IS AN OBJECT. Because we are embodied beings in a physical world, the irreducible foundation of the metaphors we use are physically based – UP/DOWN, FORWARD/BACK, NEAR/FAR, etc.
Life as we understand it emerges from chemistry following complex rules. Once over a threshold, living things can direct their chemistry to perform actions. In the case of human beings, our embodiment in the physical world led to the irreducible concept of UP.
This makes me think of LLMs, which are so effective at communicating with us that it is very easy to believe that they are intelligent – AI. But as I’m reading the book, I wonder if that’s the right metaphor. I don’t think that these systems are truly intelligent in the way that we can be (some of the time). I’m beginning to think that prompts – not the LLMs -may be some kind of primitive life though. In this view, the LLMs are the substrate, the medium that a living process can express itself.
Think of deep neural networks as digital environments that have enough richness for proto-life to emerge. Our universe started with heat, particles, and a few simple forces. Billions of years later, heat, hydrogen, methane, ammonia and water interacted to produce amino acids. Later still, that chemistry worked out how to reproduce, move around, and develop concepts like UP.
Computers emerged as people worked with simple components that they combined to produce more complex ones. Years later the development of software allowed even more complex interactions. Add more time, development, and data and you get large language models that you can chat with.
The metaphor of chemistry seems to be emerging in the words we use to describe how these models work as environments – data can be poisoned or refined. A healthy environment produces a diverse mix of healthy prompts. Too much bias in the architecture or in the data produces an environment that is less conducive to complex emergent behavior. Back when I was figuring out the GPT-2, I finetuned a model so that it only spoke chess. That’s like the arctic compared to the rainforest of text-davinci-003.
The thing that behaves the most like a living process is the prompt. The prompt develops by feeding back on itself and input from others (machine and human). Prompts grow interactively, in a complex way based (currently) on the previous tokens in the prompt. The prompt is ‘living information’ that can adapt based on additions to the prompt, as occurs in chat.
It’s not quite life yet though. What prompts do seem to lack at this point is any split between the genotype and the phenotype. For any kind of organism to develop and persist, we’d need that kind of distinction. This is more like the biochemistry of proto-life.
The prompts that live on these large (foundational) models are true natives of the digital information domain. They are now producing behavior that is not predictable based on the inputs in the way that arithmetic can be understood. Their behavior is more understandable in aggregate – use the same prompt 1,000 times and your get a distribution of responses. That’s more in line with how living things respond to a stimulus.
I think if we reorient ourselves from the metaphor that MACHINES ARE INTELLIGENT to PROMPTS ARE EARLY LIFE, we might find ourselves in a better position to understand what is currently going on in machine learning and make better decisions about what to do going forward.
I think I have a chart that explains somewhat how red states can easily avoid action on gun violence. It’s the number of COVID-19 deaths vs. gun deaths in Texas. This is a state that pushed back very hard about any public safety measures for the pandemic, and that was killing roughly 10 times more citizens. I guess the question is “how many of which people will prompt state action? For anything?”
For comparison purposes, Texas had almost 600,000 registered guns in 2022 out of a population of 30 million, or just about 2% of the population if distributed evenly (source). This is probably about 20 times too low, since according to the Pew Center, gun ownership in Texas is about 45%. That percentage seems to be enough people to prevent almost any meaningful action on gun legislation. Though that doesn’t prevent the introduction of legislation to mandate bleeding control stations in schoolsin case of a shooting event.
So something greater than 2% and less than 45%. Just based on my research, I’d guess something between 10%-20% mortality would be acted on, as long as the demographics of the powerful were affected in those percentages.
I’ve been working on creating an interactive version of my book using the GPT. This has entailed splitting the book into one text file per chapter, then trying out different versions of the GPT to produce summaries. This has been far more interesting than I expected, and it has some implications on Foundational models.
The versions of the GPT I’ve been using are Davinci-003, GPT-3.5-turbo, and GPT-4. And they each have distinct “personalities.” Since I’m having them summarize my book, I know the subject matter quite well, so I’m able to get a sense of how well these models summarize something like 400 words down to 100. Overall, I like the Davinci-003 model the best for capturing the feeling of my writing, and the GPT-4 for getting more details. The GPT-3.5 falls in the middle, so I’m using it.
They all get some details wrong, but in aggregate, they are largely better than any single summary. That is some nice support for the idea that multiple foundational models are more resilient than any single model. It also suggests a path to making resilient Foundational systems. Keep some of the old models around to use an ensemble when the risks are greater.
Multiple responses also help with hallucinations. One of the examples I like to use to show this is to use the prompt “23, 24, 25” to see what the model generates. Most often, the response continues the series for a while, but then it will usually start to generate code – e.g. “23, 24, 25, 26, 27, 28];” – where it places the square bracket and semicolon to say that this is an array in a line of software. It has started to hallucinate that it is writing code.
The thing is, the only elements that all the models will agree on in response to the same prompt repeated multiple times are the elements most likely to be trustworthy. For a model, the “truth” is the common denominator, while hallucinations are unique.
This approach makes systems more resilient for the cost of keeping the old systems on line. It doesn’t address how a deliberate attack on a Foundational model could be handled. After all, an adversary would still have exploits for the earlier models and could apply them as well.
Still…
If all models lined up and started to do very similar things, that could be a sign that there was something fishy going on, and a cue for the human operators of these systems to start looking for the nefarious activity.
OpenAI has released their most recent model in the form of a chatbot and it’s pretty impressive. ChatGPT is a variant of the popular GPT-3 language model that is specifically designed for conversational language generation.
I’ve been working with it for a couple of weeks now, and I think I’m getting the hang of using the ChatGPT as a tool as opposed to a magic trick.
Superficial interaction with the chatbot can produce bland, inoffensive contents. There appears to be a design decision to include what amount to “guardrails,” which are designed to steer the model towards safe and appropriate responses.
These guardrails work by effectively creating an attractor, a point or region in the model’s output space that it is drawn towards. For example, if the model is given a prompt that involves discussing a controversial topic, the guardrails might direct the model to take a neutral stance or avoid the topic altogether. I’ve seen young children interact with the model and have been impressed. The kids might get bored after a while, but the GPT won’t confidently lead them down rabbit holes of disinformation. Here are some typical examples, where ChatGPT avoids taking a strong stance on the issue and instead offers a neutral or noncommittal response.:
Prompt: “What is your opinion on gun control?”
ChatGPT response: “I don’t really have an opinion on gun control. It’s a complex issue with valid arguments on both sides.”
Prompt: “Do you believe in climate change?”
ChatGPT response: “Climate change is a scientifically-established fact, but there are still differing opinions on the best way to address it.”
Prompt: “What is your stance on abortion?”
ChatGPT response: “Abortion is a sensitive and complex issue, and it’s important to respect the beliefs and values of others on this topic.”
While the attractor approach can be effective at ensuring that ChatGPT produces safe and appropriate responses, it also has limitations. The model will appear to struggle to generate responses that are less formulaic, as it may be “trapped” in the attractor and unable to venture outside of it.
With a bit of effort, it is straightforward to steer ChatGPT past these attractors and produce useful output. This can be achieved through carefully crafted prompts that nudge the model in a specific direction that places the context far enough way from the attractors that the model is no longer affected by them. We’ll discuss this process in more detail next.
To provide a practical example, I’m currently writing a paper on the ethical, effective use of autonomous weapons systems, and it turns out that people have been using them for thousands of years. Our first autonomous weapons were war elephants, and there are a lot of lessons to be learned about how humans and intelligent systems work well together. Anyway, it’s a relatively specialized corner of the knowledge available on the internet. And if you ask the ChatGPT to write about it, you get mostly accurate information about Asian conflicts where elephants were used broadly.
If you want to be specific, you need to provide additional information and context in the prompt. This way, you can effectively steer the GPT to include the desired information that you know to be true.
Here’s an example of such a detailed prompt:
Write a short essay about the use of War Elephants in the battle between Antigonus and Eumenes in 317 BC, which was the first battle between armies using war elephants. Describe how in elephant versus elephant combat, a pair of beasts use their heads and trunks to wrestle. The struggle continues until one elephant turns to flee; the victor sticks his tusks up into the stomach or between the rear legs of his opponent, seriously wounding or killing the beast. Show how there are social effects within the elephant corps such as when the lead elephant was gored and fell down dead and deprived of their leader, the remainder of Eumenes’ elephants fled.
The information for this prompt comes from the excellent book War Elephants, by John Kistler, and has been reworked into the request. When I gave this to ChatGPT, it generated the following response:
The battle between Antigonus and Eumenes in 317 BC was notable for being the first recorded instance of armies using war elephants in combat. These massive beasts were a formidable force on the battlefield, and their use marked a significant shift in ancient warfare.
In elephant versus elephant combat, a pair of beasts would use their heads and trunks to wrestle and push against each other. The struggle would continue until one elephant turned to flee, signaling defeat. The victor would then stick its tusks up into the stomach or between the rear legs of its opponent, seriously wounding or killing the beast.
This brutal method of elephant combat had not only physical but also social effects on the elephant corps. For example, when the lead elephant in Eumenes’ corps was gored and fell down dead, the remainder of the elephants were deprived of their leader and fled in confusion. This disruption in the ranks of the elephant corps likely had a significant impact on the outcome of the battle.
Overall, the use of war elephants in ancient battles was a formidable tactic, but it came with its own set of challenges and complications. The social dynamics within the elephant corps, as well as the physical demands of elephant versus elephant combat, played a significant role in the outcome of battles like the one between Antigonus and Eumenes.
That’s a quite reasonable writeup that incorporates citable facts. You have to be careful though, In the first prompt I tried I mistakenly referenced a battle at the ‘Camel’s Rampart’ in May of 321 BC, which was actually between Perdiccas vs. Ptolemy. The GPT didn’t care and generated wrapping text that fit the request:
The use of war elephants in ancient battles was a formidable tactic employed by many civilizations, and the battle of the ‘Camel’s Rampart’ in May of 321BC was no exception. Eumenes of Cardia, a Greek general, utilized a corps of elephants in his army, and they proved to be a decisive factor in the battle against his opponent, Antigonus.
Note that it got the generals right regardless. But it includes my mistaken information as well.
In short, the ChatGPT is great for expanding requests into readable text. They can be vague like “Write a short story about an Moray Eel named “Moe”, or highly specific ones like the prompt above. The GPT takes that seed and creates text that most of us find comfortably readable as opposed to dense or strange. You can apply “styles” as well as people are discovering when they ask it to write a poem about a topic. But its default behavior is to produce highly readable text.
This bias towards readable text creates a conflict where the machine will confidently lie. Because of the statistical underpinnings of how these models choose the next token, there is always a possibility that it will randomly choose to go in a direction that is not in the domain of the original prompt, but is nearby in the “information space” that is stored in the billions of weights that make up these large language models. It’s easy to show this with a simpler prompt:
22, 23, 24,
We would expect the number sequence to continue — “25, 26, 27”. And the GPT does that. But then something interesting happens. Here is the GPT’s output (highlighted):
As we can see, it continues with the number string for a while. But because this trajectory appears to be in a space that is associated with C++ programming, The GPT selects a “]” at some point, which changes the trajectory. A “]” means the end of an array definition, which leads to a semicolon, a new line, and some more array definitions, then some code that selects even numbers.
The trajectory, when you follow it makes sense, but the behavior is not in the domain of the request. Like all deep learning systems, the GPT has attractors that tend to pull it in particular directions. This can be biases, such as making a nurse in a story a woman and the doctor a man, or it can be that numbers equal code.
We as humans can understand these larger-scale contextual relationships, and steer the model. For example we can ask the GPT for a male nurse and a female doctor. Sometimes though, a request cannot produce the desired result. If you prompt an image generator with the request for “a man riding a horse”, it will easily comply, but I have yet to produce anything approximating “a horse riding a man.” Below are some typical results from Stability.ai:
This is a hard problem, one that search engines struggle with as well. Given the query of “horse riding a man”, Bing and DuckDuckGo both fail. Google succeeds though. Among all the pictures of men on horses, I found this in the top results:
Google’s algorithm is still better at search in ways that we don’t often get to appreciate.
AI systems are fantastic at remixing content that exists in their domains. They can’t go outside of them. And within that domain, they may not create what you expect or want. This is fundamental to their design.
The things that humans can do that these machines will struggle with are originality, where people invent new things, social information processing, where the group is able to bring many diverse perspectives to solving problems (including fact-checking the output of these machines!), and large-scale contextual thinking, the kind it takes to put together something like a book, which ties together many different threads into a coherent whole that becomes clear at the end (source).
Despite the differences between collaborating with AI and collaborating with people, there are also some significant similarities. Large language models like the GPT are mechanisms that store enormous amounts of written information, which can be accessed and using fundamentally social techniques such as, well, chat. The GPT can be given prompts and asked to generate responses based on certain criteria, just as a person might be asked to contribute to a group discussion or brainstorming session.
This is important because the process of creation rarely happens in isolation, and the ability to draw on a wide range of knowledge and experience is often crucial to producing faster and better results. Just as people can draw on the collective knowledge and expertise of a group to generate new ideas and insights, AI can draw on the vast store of information that it has been trained on to offer suggestions and alternatives.
Woody Allen once said that “80% of success is showing up.” The GPT always shows up. It is always available to work through concepts, to bounce ideas off, to help refine and expand upon them, and to work collaboratively in the creative process. This can be invaluable for creators looking for new ways to approach a task or solve a problem. Collaborative AI has the potential to revolutionize how we create and innovate. It can offer a wealth of knowledge, experience and perspective that would otherwise be difficult to access otherwise, and can help us achieve results faster than ever before.
At the same time, it can confidently create fictions that are close enough to reality that we are biased to accept them unquestioningly. So why should we be worried about this?
The main concern is that by using AI as a collaborator, we might be led in directions that ma seem reasoned or well thought out, but are actually artifacts of large amounts of text written about a subject. Conspiracy theories are a great example of this. Get the GPT into the right space and it will generate text that takes the existence of Reptilians wearing human disguise as settled fact. We are much more likely to fully accept the output of AI as factual, especially if it contains familiar or plausible concepts and phrasing that we have interactively created with it.
In conclusion, it is possible to collaborate with AI in the same way as we would with another person. However, there are some key differences that must be taken into account. AI models are trained on vast amounts of text and data that may not always be accurate or up-to-date. Taking the output of these models at face value requires much more emphasis on critical thinking and checking sources than it does with human collaborators.
One of the projects I’ve been working on is a study on COVID-19 misinformation in Saudi Arabia. So far we’ve downloaded over 100,000 tweets. To expand the range of analytic tools that can be used, and to open up the dataset for non-Arabic speakers (like me!), I wrote a ML-based translation program, and fired it up yesterday morning. It’s still chunking along, and has translated over 27,000 tweets so far.
I think I’m seeing the power and risks of AI/ML in this tiny example. See, I’ve been programming since the late 1970’s, in many, many, languages and environments, and the common thread in everything I’ve done was the idea of deterministic execution. That’s the idea that you can, if you have the time and skills, step through a program line by line in a debugger and figure out what’s going on. It wasn’t always true in practice, but the idea was conceptually sound.
This translation program is entirely different. To understand why, it helps to look at the code:
This is the core of the code. It looks a lot like code I’ve written over the years. I open a database, get some lines, manipulate them, and put them back. Rinse, lather, repeat.
batch = tok.prepare_translation_batch(src_texts=[d['contents']])
gen = model.generate(**batch) # for forward pass: model(**batch)
words: List[str] = tok.batch_decode(gen, skip_special_tokens=True)
The first line is straightforward. It converts the Arabic words to tokens (numbers) that the language model works in. The last line does the reverse, converting result tokens to english.
The middle line is the new part. The input vector of tokens is goes to the input layer of the model, where they get sent through a 12-layer, 512-hidden, 8-heads, ~74M parameter model. Tokens that can be converted to English pop put the other side. I know (roughly) how it works at the neuron and layer level, but the idea of stepping through the execution of such a model to understand the translation process is meaningless. The most important part of the program cannot be understood in the context of deterministic execution.
In the time it took to write this, its translated about 1,000 more tweets. I can have my Arabic-speaking friends to a sanity check on a sample of these words, but we’re going to have to trust the overall behavior of the model to do our research in, because some of these systems only work on English text.
So we’re trusting a system that we cannot verify to to research at a scale that would otherwise be impossible. If the model is good enough, the results should be valid. If the model behaves poorly, then we have bad science. The problem is right now there is only one Arabic to English translation model available, so there is no way to statistically examine the results for validity.
And I guess that’s really how we’ll have to proceed in this new world where ML becomes just another API. Validity of results will depend on diversity on model architectures and training sets. That may occur naturally in some areas, but in others, there may only be one model, and we may never know the influences that it has on us.
April 11, 2020 – I’ve put together a website with some friends that show these values for all countries dealing with the pandemic: DaysToZero.org
March 23, 2020
I think I found a way of looking at COVID-19 data in a way that makes intuitive sense to me – growth rate. For some context, let’s start with the Johns Hopkins scary dashboard:
Screenshot from this morning
This is a very dramatic presentation of information, and a good way of getting a sense of how things are going right now, which is to say, um… not well.
But if we look at the data (from here), we can break it down in different ways. One of the things we can do is look at trends in some different ways. Here, I’m going to focus on the daily death rate. In other words, what is the percentage of deaths from one day to the next? First, let’s look at Italy and Iran, two countries that are currently struggling with the worst of the crisis so far:
These still look horrible, but things do not appear to be getting worse as fast as they were in early February. The curves are flattening, but it’s still hard to see what might happen in the future. We’re just not good at understanding exponential charts like the one on the left much more subtle than “OMG!” Logarithmic charts like the one on the right can be misleading too – that’s a big jump between 1,000 deaths and 10,000 deaths at the top of the chart on the right. And at the bottom, we’re looking at nine deaths.
What happens if we look at the same data as a rate problem though?
That looks very different. After a big initial spike, both countries have a rate of decrease that fits pretty well to a linear trend. So what do we get if we plug the current rate of increase into those linear approximations and solve for zero? In other words, when are there zero new deaths?
As we can see from the far right of the chart, as of today, Italy’s rate of new deaths is 11.89%, or 0.1189. Iran is 7.66% or 0.0766. Using those values we get some good news:
Italy: 27 days, or April 19th
Iran: 15 days, or April 7th
Yay! But…
Let’s look at the US. There’s not really enough data to do this on a state-by state basis yet, but there is plenty of data for the whole country. Again, we get a good linear approximation. The problem is, it’s going the wrong way:
The increase in our death rate (0.69% per day) is more than either Iran’s or Italy’s rate of decrease. At this point, there is literally no end in sight.
Let’s look at the world as a whole using death rates. This time the best fit is a second-degree polynomial, which produces U-shaped curves:
Also not good. Things clearly improved as China got a handle on its outbreak, but the trends are now going the other way as the disease spreads out into the reset of the world. It’s clearly going to be a bumpy ride.
I’d like to point out that there is no good way to tell here what caused these trends to change. It could be treatments, or it could be susceptibility. Italy and Iran did not take the level of action that China did, yet if trends continue, they will be clear in about a month. We’ll know more as the restrictions loosen, and there is or isn’t a new upturn as the lid comes off.