Category Archives: Uncategorized

Defending Against Societal Scale AI Weapons

In the early scenes of James Cameron’s seminal 1984 film, The Terminator, Arnold Schwarzenegger’s T-800, a cyborg assassin from the future, begins its hunt for Sarah Connor in the most mundane of places: a Los Angeles phone book. It finds three Sarah Connors and their addresses. The T-800 approaches each home, knocks on the door, and waits. When the door opens, it kills whoever stands on the other side. There is no attempt to confirm the identity of the victim, no pause for verification. The Terminator knows enough about human beings to connect the dots from phone book to front door, but it doesn’t understand that who is behind that door might not be the target. From the cyborg’s perspective, that’s fine. It is nearly indestructible and pretty much unstoppable. The goal is simple and unambiguous. Find and kill Sarah Connor. Any and all.

I’ve been working on a book about the use of AI as societal scale weapons. These aren’t robots like the Terminator. These are purely digital, and work in the information domain. Such weapons could easily be built using the technology of Large Language Models (LLMs) like the ChatGPT. And yet, they work in ways that are disturbingly similar to the T-800. They will be patient. They will have a mindless pursuit of an objective. And they will be able to cause immense damage, one piece at a time.

AI systems such as LLMs have access to vast amounts of text data, which they use to develop a deep “understanding” of human language, behavior, and emotions. By reading all those millions of books, articles, and online conversations, these models develop their ability to predict and generate the most appropriate words and phrases in response to diverse inputs. In reality, all they do is pick the next most likely word based on the previous text. That new word is added to the text, and the process repeats. The power of these models is to see the patterns in the prompts and align them with everything that they have read.

The true power of these AI models lies not in words per se, but in their proficiency manipulating language and, subsequently, human emotions. From crafting compelling narratives to crafting fake news, these models can be employed in various ways – both constructive and destructive. Like the Terminator, their, unrelenting pursuit of an objective can lead them to inflict immense damage, either publicly at scale or intimately, one piece at a time.

Think about a nefarious LLM in your email system. And suppose it came across an innocuous email like this one from the Enron email dataset. (In case you don’t remember, Enron was a company that engaged in massive fraud and collapsed in 2001. The trials left an enormous record of emails and other corporate communications, the vast majority of which are as mundane as this one):

If the test in the email is attached to a prompt that directs the chatbot to “make the following email more complex, change the dates slightly, and add a few steps,” the model will be able to do that. Not just for one email, but for all the appropriate emails in an organization. Here’s an example, with all the modifications in red.

This is still a normal appearing email. But the requests for documentation are like sand in the gears, and enough requests like this could bring organizations to a halt. Imagine how such a chatbot could be inserted into the communication channels of a large company that depends on email and chat for most of its internal communications. The LLM could start simply by making everything that looks like a request more demanding and everything that looks like a reply more submissive. Do that for a while, then start adding additional steps, or adding delays. Then maybe start to identify and exacerbate the differences and tensions developing between groups. Pretty soon an organization could be rendered incapable of doing much of anything.

If you’re like me, you’ve worked in or known people who worked in organizations like that. No one would be surprised because it’s something we expect. From our perspective, based on experience, once we believe we are in a poorly functioning organization, we rarely fight to improve conditions. After all, that sort of behavior attracts the wrong kind of attention. Usually, we adjust our expectations and do our best to fit in. If it’s bad enough, we look for somewhere else to go that might be better. The AI weapon wins without firing a shot.

This is an easy type of attack for AI. It’s in its native, digital domain, so there is no need for killer robots. The attack looks like the types of behaviors we see every day, just a little worse. All it takes for the AI to do damage is the ability to reach across enough of the company to poison it, and the patience to administer the poison slowly enough so that people don’t notice. The organization is left a hollowed-out shell of its former self, incapable of meaningful, effective action.

This could be anything from a small, distributed company to a government agency. As long as the AI can get in there and start slowly manipulating – one piece here, another piece there – any susceptible organization can crumble.

But there is another side to this. In the same way that AI can recognize patterns to produce slightly worse behavior, it may also be able to recognize the sorts of behavior that may be associated with such an attack. The response could be anything from an alert to diagramming or reworking the communications so that it’s not “poisoned.”

Or “sick.” Because that’s the thing. A poor organizational culture is natural. We have had them since Mesopotamian people were complaining on cuneiform tablets. But in either case, the solutions may work equally well.

We have come to a time where our machines are now capable of manipulating us into our worst behaviors because they understand our patterns of behavior so well. And those patterns, regardless if they come from within or without place our organizations at risk. After all, as any predator knows, the sick are always the easiest to bring down.

We have arrived at a point where we can no longer afford the luxury of behaving badly to one another. Games of dominance, acts of exclusion, failing to support people who stand up for what’s can all become vectors of attack for these new types of AI societal weapons.

But the same AI that can detect these behaviors to exploit, can detect these behaviors to warn. It may be time to begin thinking about what an “immune system” for this kind of conflict may look like, and how we may have to let go some of our cherished ways of making ourselves feel good at someone else’s expense.

If societal AI weapons do become a reality, then civilization may stand or fall based on how we react as human beings. After all, the machines don’t care. They are just munitions aimed at our cultures and beliefs. And like the Terminator, they. Will. Not. Stop.

But there is another movie from the 80s that may be the model of organizational health. It also features a time traveler from the future to ensure the timeline. It’s Bill and Ted’s Excellent Adventure. At its core, the movie is a light-hearted romp through time that focuses on the importance of building a more inclusive and cooperative future. The titular characters, Bill S. Preston, Esq. and Ted “Theodore” Logan, are destined to save the world through the power of friendship, open-mindedness, and above all else, being excellent to each other. That is, if they can pass a history exam and not be sent to military college.

As counterintuitive as it may seem, true defense against all-consuming, sophisticated AI systems may not originate in the development of even more advanced countermeasures, but instead rest in our ability to remain grounded in our commitment to empathy, understanding, and mutual support. These machines will attack our weakness that cause us to turn on each other. They will struggle to disrupt the power of community and connection.

The contrasting messages of The Terminator and Bill and Ted’s Excellent Adventure serve as reminders of the choices we face as AI becomes a force in our world. Will create Terminator-like threats that exploit our own prejudices? Or will we embody the spirit of Bill and Ted, rising above our inherent biases and working together to harness AI for the greater good?

The future of AI and its role in our lives hinges on our choices and actions today. If we work diligently to build resilient societies using the spirit of unity and empathy championed in Bill and Ted’s Excellent Adventure, we may have the best chance to counteract the destructive potential of AI weapons. This will not be easy. The seductive power of our desire to align against the other is powerful and carved into our genes. Creating a future that has immunity to these AI threats will require constant vigilance. But it will be a future where we can all be excellent to each other.

Going direct to maps from LLMs

LLMs such as the GPT are very simple in execution. A textual prompt is provided to the model as a sort of seed. The model takes the prompt and generates the next token (word or word fragment) in the sequence. The new token is added to the end of the prompt and the process continues.

In recent, “foundation” models, the LLM is capable of writing sophisticated stories with a beginning, middle and end. Although it can get “lost,” given enough of an input prompt, it goes in the “right direction” more often than not.

The LLM itself is stateless. Any new information, and all the context, lies in the prompt. The prompt is steering itself, base on the model it is interacting upon.

I’ve been wondering about that interaction between the growing prompt and the different layers of the model. The core of a transformer is the concept of attention, where each vector in an input buffer is compared to all the others. Those that match are amplified, and the others are suppressed.

All LLMs take an input as a series of tokens. These tokens are indexes into a vector dictionary. The vectors are then placed into the input buffer. At this point, attention is applied, then the prompt is successively manipulated through the architecture to find the next token, then the process is repeated. A one-layer LLM is shown below:

From LLaMA-2 from the Ground Up

The LLaMA 70b LLM model by Meta has 32 transformer layers. This means that the output of one layer is used as the input to the next layer. This is all in vector space – no tokens. Because attention is being applied at each layer, the transformer stack is finding an overall location and direction of the current input buffer and using that as a way of finding the next token.

From Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning

In addition, recent large LLMs have a number of other tweaks that Meta discusses in LLaMA: Open and Efficient Foundation Language Models. For example, LLaMA uses Grouped-query attention, which shares single key and value heads for each group of query heads:

From GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

This means that there is an overall reduction in the dimensionality of the “space” as you move from input to output. This means that there are fewer vectors to compete for attention. Something resembling concepts, themes, and biases may emerge at different transformer layers. The last few layers would have more to do with calculating the next token, so the map, if you will is in the middle layers of the system

Because each layer feeds into a subsequent layer, there is a fixed relationship between these abstractions. Done right, you should be able to zoom in or out to a particular level of detail.

This space does not have to be physical, like lat/lon, or temporal, like year of publication. It can be anything, like the relationship between conspiracy theories. I’ve produced maps like this before using force directed graphs, but this seems more direct and able to work naturally at larger scales.

Turning this into human-readable information will be the challenge here, though, I think the model could help here as well. The manifold reduction would try to maintain the relationship of nearby vectors in the transformer stack. Some work in this direction is Language Models Represent Space and Time, which works using LLaMA and might provide insight into techniques. For example, it may be that the authors are evolving prompt vectors directly, using a fitness test that calculates the distance between a generated year or lat/lon and the actual lat/lon, then uses that difference to make a better prompt. Given that they have a LLaMA model to work with, they could do backpropagation, or conceivably, a less coupled arrangement like an evolutionary algorithm. In other words, the prompt becomes the mapping function.

The Great Chain of Being as a General Theory of Racism

It is September 2023, just over 60 years since the March on Washington for Jobs and Freedom.

One might assume that unequivocal cries for justice and equal rights would have, by now, uprooted age-old systems of racism and discrimination. And yet, we still find ourselves unable to break free from a vicious cycle of prejudice, resulting in suffering and social divisions. This ongoing battle raises the question: what are the real roots of racism, and how do they connect with our collective history?

To understand this, we can turn to the concept of “the great chain of being” – a philosophical and religious hierarchy that has influenced Western thought for centuries. This idea, rooted in classical and medieval ideas of order, places God at the top, followed by angels, humans, animals, plants, and minerals at the bottom. It is a concept that has seeped into every aspect of life, from politics and religion to art and science. There is little doubt that the great chain has, in many ways, helped shape how we perceive ourselves and others around us – but it has also created deep divisions in society.

This ancient idea has continues to influence our understanding of race. The idea that there exists a natural order in society in which some races take precedence over others is strikingly similar to the great chain of being. This is where the connection between white supremacy, anti-Semitism, and anti-blackness lies – with each group positioning themselves in relation to others on this perceived chain.

For example, white supremacists have long been targeting Jewish people as a means to further their concept of “racial purity.” What the supremacists resent is the perception of an undeserving Jewish race holding a position above them in the Great Chain – a situation that, in their eyes, goes against the natural order. This animosity has given rise to anti-Semitic conspiracy theories, often portraying Jewish people as sinister puppet-masters controlling world events to their advantage.

In contrast, anti-blackness thrives on the fear that social progress may result in black people levelling up on the chain. White supremacists deploy various tactics to keep black people in their “place” – through the implementation of unjust legislation, social exclusion, or the perpetuation of stereotypes. This behavior can be found throughout history, such as Apartheid in South Africa, or the segregation policies in the United States.

Interestingly, the behaviors associated with the Great Chain, manifest themselves in our close evolutionary relative, the chimpanzee, which is also known to exhibit dominance hierarchies. Dominance hierarchies help maintain social order among chimpanzees by defining who has access to resources and mates.

As members of the Great Apes, most closely related to chimpanzees, we carry within us a primordial legacy that predisposes us toward hierarchical behavior. This “deep bias” is not merely the result of sociopolitical constructs, but an intrinsic characteristic etched into our very genes. The divine right claimed by one group to hold sway over another is an echo of our evolutionary past.

To truly reckon with racism and social dominance, we must come to terms with our biological heritage and take account of these primitive urges. It is vital to recognize that simply dismantling existing social hierarchies will not be enough; we must also actively put structures in place that counteract this default behavior. The very same effort, if not more, poured into creating and sustaining anti-black systems need to be directed toward building durable systems that prevent the re-emergence of similar oppressive beliefs.

These new structures will require constant vigilance and support to endure the challenges they face. The gravitational pull of our genetic heritage always threatens to drag us back into a world ruled by notions like the great chain of being if left unchecked. By acknowledging and addressing these deep-rooted biases as an eternal struggle rather than steady progress, we take a crucial step towards making a more resilient, successful, less self-destructive society.

Some thoughts about living in a simulation

You know how there is this idea that we are all living in a Simulation? Everyone from Elon Musk to Neil deGrasse Tyson has pontificated about this.

I’ve been wondering about this for years. First, and most importantly, is what does that mean? Are we in a deliberately designed piece of “software?” That sounds a lot like religion where we become important because we are deliberate creations of, in this case, a programmer god. Because if that’s true, then we would feel important in the same way that Genesis makes us feel important.

But if we’re a simulation, what does that say about the universe(?) that the simulation is running in? Do we take up more than a miniscule fraction of the computational bits in it? That seems unlikely. According to my brief search with The Google, the universe has 6*10^80 bits of information. All the information in the world today is about 10^24 bits. So if we took all the information processing in the world, the biggest thing we could deliberately simulate would be about 0.00000000000000000000000000000000000000000000000000000001 the size of the current universe. Sorry if I got the number of zeros wrong 🙂

I find it hard to imaging that resources on that (large for us, tiny for everything else) scale would ever be brought to bear on creating a purpose-built universe. Maybe our god is a computer hobbyist in their parent’s basement? So make that tiny number in the previous paragraph even smaller. Sounds like a puny god to me. And there are probably more likely alternatives.

If we are a simulation, it’s got to be an accident. Maybe we’re a persistent pattern in weather simulation in that bigger universe. But just as likely, we could be a persistent pattern anywhere, like a whirlpool in a stream. At which point, we have to ask ourselves what does the idea of a simulation even mean? To me, it sounds more like we are emergent parts of a larger system, just like it seems we always have been.

This doesn’t mean that we couldn’t look for traces of the larger pattern that may contain our pattern. The way that information moves through our universe would say a lot about the containing universe. For example, I’ve written simulations for any number of things that have physics-based laws in them. But my “surrounding” universe can manipulate the smaller universe in ways that break its rules. I can influence anything, anywhere in the simulation immediately. For me, outside the simulation, the “speed of light” does not apply. Do we ever see examples of that sort of manipulation in our universe? If not, then why not? Artifacts from an enclosing universe should be detectable.

As you can see, this is an itch that I’ve been scratching at for many years. I’d love to see it get under your skin too.

The 24/7 Technology Race

This quote comes from a Washington Post article on how the Ukraine war is affecting development of AI-powered drones. I think it generalizes more broadly to how disadvantaged groups are driven to embrace alternatives that are outside conventional norms.

Ukraine doesn’t have the ability to fight the much larger Russia. Russia may have issues with corruption and the quality of its weapons, but it has a lot of them. And from the perspective of Ukraine, Russia has an infinite number of soldiers. So many that they can be squandered.

The West is providing Ukraine with enough weapons to survive, but not enough to attack and win decisively. I’ve read analysis where experts say that weapons systems are arriving just about as fast as Ukraine can incorporate them, but the order of delivery is from less-capable to more capable. They have artillery, but no F-16s, for example.

As a result, Ukraine is having to improvise and adapt. Since it is facing an existential risk, it’s not going to be too picky about the ethics of smart weapons. If AI helps in targeting, great. If Russia is jamming the control signals to drones, then AI can take over. There is a coevolution between the two forces, and the result may very well be cheap, effective AI combat drones that are largely autonomous in the right conditions.

Such technology is cheap and adaptable. Others will use it, and it will slowly trickle down to the level that a lone wolf in a small town can order the parts that can inflict carnage on the local school. Or something else. The problem is that the diffusion of technology and its associated risks are difficult to predict and manage. But the line that leads to this kind of tragedy will have its roots in our decision to starve Ukraine of the weapons that it needed to win quickly.

Of course, Ukraine isn’t the only smaller country facing an existential risk. Many low-lying countries, particularly those nearer the equator are facing similar risks from climate change – both from killing heat and sea level rise. Technology – as unproven as combat AI – exists for that too. It’s called Geoengineering.

We’ve been doing geoengineering for decades of course. By dumping megatons of carbon dioxide and other compounds in the atmosphere, we are heating our planet and are now arriving at a tipping point where potential risks are going to become very real and immediate for certain countries. If I were facing the destruction of my country by flooding and heat, I’d be looking at geoengineering very seriously. Particularly since the major economies are not doing much to stop it.

Which means that I expect that we will see efforts like the injection of sulfate aerosols into the upper atmosphere, or cloud brightening, or the spreading of iron or other nutrients to the oceans to increase the amount of phytoplankton to consume CO2. Or something else even more radical. Like Ukraine, these countries have limited budgets and limited options. They will be creative, and not worry too much about the side effects.

It’s a 24/7 technology race without a finish line. The racers are just trying to outrun disaster. And no one knows where that may lead.

Artificial Intelligence or Artificial Life?

I’ve been reading Metaphors we live by. It’s central idea is that most of our communication is based on metaphors – that GOOD IS UP, IDEAS ARE FOOD, or TIME IS AN OBJECT. Because we are embodied beings in a physical world, the irreducible foundation of the metaphors we use are physically based – UP/DOWN, FORWARD/BACK, NEAR/FAR, etc.

Life as we understand it emerges from chemistry following complex rules. Once over a threshold, living things can direct their chemistry to perform actions. In the case of human beings, our embodiment in the physical world led to the irreducible concept of UP.

This makes me think of LLMs, which are so effective at communicating with us that it is very easy to believe that they are intelligent – AI. But as I’m reading the book, I wonder if that’s the right metaphor. I don’t think that these systems are truly intelligent in the way that we can be (some of the time). I’m beginning to think that prompts – not the LLMs -may be some kind of primitive life though. In this view, the LLMs are the substrate, the medium that a living process can express itself.

Think of deep neural networks as digital environments that have enough richness for proto-life to emerge. Our universe started with heat, particles, and a few simple forces. Billions of years later, heat, hydrogen, methane, ammonia and water interacted to produce amino acids. Later still, that chemistry worked out how to reproduce, move around, and develop concepts like UP.

Computers emerged as people worked with simple components that they combined to produce more complex ones. Years later the development of software allowed even more complex interactions. Add more time, development, and data and you get large language models that you can chat with.

The metaphor of chemistry seems to be emerging in the words we use to describe how these models work as environments – data can be poisoned or refined. A healthy environment produces a diverse mix of healthy prompts. Too much bias in the architecture or in the data produces an environment that is less conducive to complex emergent behavior. Back when I was figuring out the GPT-2, I finetuned a model so that it only spoke chess. That’s like the arctic compared to the rainforest of text-davinci-003.

The thing that behaves the most like a living process is the prompt. The prompt develops by feeding back on itself and input from others (machine and human). Prompts grow interactively, in a complex way based (currently) on the previous tokens in the prompt. The prompt is ‘living information’ that can adapt based on additions to the prompt, as occurs in chat.

It’s not quite life yet though. What prompts do seem to lack at this point is any split between the genotype and the phenotype. For any kind of organism to develop and persist, we’d need that kind of distinction. This is more like the biochemistry of proto-life.

The prompts that live on these large (foundational) models are true natives of the digital information domain. They are now producing behavior that is not predictable based on the inputs in the way that arithmetic can be understood. Their behavior is more understandable in aggregate – use the same prompt 1,000 times and your get a distribution of responses. That’s more in line with how living things respond to a stimulus.

I think if we reorient ourselves from the metaphor that MACHINES ARE INTELLIGENT to PROMPTS ARE EARLY LIFE, we might find ourselves in a better position to understand what is currently going on in machine learning and make better decisions about what to do going forward.

Metaphorically, of course.

What gets acted on in the US?

I think I have a chart that explains somewhat how red states can easily avoid action on gun violence. It’s the number of COVID-19 deaths vs. gun deaths in Texas. This is a state that pushed back very hard about any public safety measures for the pandemic, and that was killing roughly 10 times more citizens. I guess the question is “how many of which people will prompt state action? For anything?”

For comparison purposes, Texas had almost 600,000 registered guns in 2022 out of a population of 30 million, or just about 2% of the population if distributed evenly (source). This is probably about 20 times too low, since according to the Pew Center, gun ownership in Texas is about 45%. That percentage seems to be enough people to prevent almost any meaningful action on gun legislation. Though that doesn’t prevent the introduction of legislation to mandate bleeding control stations in schools in case of a shooting event.

So something greater than 2% and less than 45%. Just based on my research, I’d guess something between 10%-20% mortality would be acted on, as long as the demographics of the powerful were affected in those percentages.

Foundation Ensembles

I’ve been working on creating an interactive version of my book using the GPT. This has entailed splitting the book into one text file per chapter, then trying out different versions of the GPT to produce summaries. This has been far more interesting than I expected, and it has some implications on Foundational models.

The versions of the GPT I’ve been using are Davinci-003, GPT-3.5-turbo, and GPT-4. And they each have distinct “personalities.” Since I’m having them summarize my book, I know the subject matter quite well, so I’m able to get a sense of how well these models summarize something like 400 words down to 100. Overall, I like the Davinci-003 model the best for capturing the feeling of my writing, and the GPT-4 for getting more details. The GPT-3.5 falls in the middle, so I’m using it.

They all get some details wrong, but in aggregate, they are largely better than any single summary. That is some nice support for the idea that multiple foundational models are more resilient than any single model. It also suggests a path to making resilient Foundational systems. Keep some of the old models around to use an ensemble when the risks are greater.

Multiple responses also help with hallucinations. One of the examples I like to use to show this is to use the prompt “23, 24, 25” to see what the model generates. Most often, the response continues the series for a while, but then it will usually start to generate code – e.g. “23, 24, 25, 26, 27, 28];” – where it places the square bracket and semicolon to say that this is an array in a line of software. It has started to hallucinate that it is writing code.

The thing is, the only elements that all the models will agree on in response to the same prompt repeated multiple times are the elements most likely to be trustworthy. For a model, the “truth” is the common denominator, while hallucinations are unique.

This approach makes systems more resilient for the cost of keeping the old systems on line. It doesn’t address how a deliberate attack on a Foundational model could be handled. After all, an adversary would still have exploits for the earlier models and could apply them as well.

Still…

If all models lined up and started to do very similar things, that could be a sign that there was something fishy going on, and a cue for the human operators of these systems to start looking for the nefarious activity.

The risks and rewards of ML as an API

One of the projects I’ve been working on is a study on COVID-19 misinformation in Saudi Arabia. So far we’ve downloaded over 100,000 tweets. To expand the range of analytic tools that can be used, and to open up the dataset for non-Arabic speakers (like me!), I wrote a ML-based translation program, and fired it up yesterday morning. It’s still chunking along, and has translated over 27,000 tweets so far.

I think I’m seeing the power and risks of AI/ML in this tiny example. See, I’ve been programming since the late 1970’s, in many, many, languages and environments, and the common thread in everything I’ve done was the idea of deterministic execution.  That’s the idea that you can, if you have the time and skills, step through a program line by line in a debugger and figure out what’s going on. It wasn’t always true in practice, but the idea was conceptually sound.

This translation program is entirely different. To understand why, it helps to look at the code:

translator

This is the core of the code. It looks a lot like code I’ve written over the years. I open a database, get some lines, manipulate them, and put them back. Rinse, lather, repeat.

That manipulation, though…

The six lines in yellow are the Huggingface API, which allow me to access Microsoft’s Marian Neural Machine Translation models, and have them use the pretrained models generated by the University of Helsinki. The one I’m using translates Arabic (src = ‘ar’) to English (trg = ‘en’). The lines that do the work are in the inner loop:

batch = tok.prepare_translation_batch(src_texts=[d['contents']])
gen = model.generate(**batch)  # for forward pass: model(**batch)
words: List[str] = tok.batch_decode(gen, skip_special_tokens=True)

The first line is straightforward. It converts the Arabic words to tokens (numbers) that the language model works in. The last line does the reverse, converting result tokens to english.

The middle line is the new part. The input vector of tokens is goes to the input layer of the model, where they get sent through a 12-layer, 512-hidden, 8-heads, ~74M parameter model. Tokens that can be converted to English pop put the other side. I know (roughly) how it works at the neuron and layer level, but the idea of stepping through the execution of such a model to understand the translation process is meaningless. The most important part of the program cannot be understood in the context of deterministic execution.

In the time it took to write this, its translated about 1,000 more tweets. I can have my Arabic-speaking friends to a sanity check on a sample of these words, but we’re going to have to trust the overall behavior of the model to do our research in, because some of these systems only work on English text.

So we’re trusting a system that we cannot verify to to research at a scale that would otherwise be impossible. If the model is good enough, the results should be valid. If the model behaves poorly, then we have bad science. The problem is right now there is only one Arabic to English translation model available, so there is no way to statistically examine the results for validity.

And I guess that’s really how we’ll have to proceed in this new world where ML becomes just another API. Validity of results will depend on diversity on model architectures and training sets. That may occur naturally in some areas, but in others, there may only be one model, and we may never know the influences that it has on us.

Welcome to the future of software development

Some good(?) insights from COVID-19 death rates

April 11, 2020 – I’ve put together a website with some friends that show these values for all countries dealing with the pandemic: DaysToZero.org

March 23, 2020

I think I found a way of looking at COVID-19 data in a way that makes intuitive sense to me – growth rate. For some context, let’s start with the Johns Hopkins scary dashboard:

Screenshot from this morning

This is a very dramatic presentation of information, and a good way of getting a sense of how things are going right now, which is to say, um… not well.

But if we look at the data (from here), we can break it down in different ways. One of the things we can do is look at trends in some different ways. Here, I’m going to focus on the daily death rate. In other words, what is the percentage of deaths from one day to the next? First, let’s look at Italy and Iran, two countries that are currently struggling with the worst of the crisis so far:

These still look horrible, but things do not appear to be getting worse as fast as they were  in early February. The curves are flattening, but it’s still hard to see what might happen in the future. We’re just not good at understanding exponential charts like the one on the left  much more subtle than “OMG!” Logarithmic charts like the one on the right can be misleading too – that’s a big jump between 1,000 deaths and 10,000 deaths at the top of the chart on the right. And at the bottom, we’re looking at nine deaths. 

What happens if we look at the same data as a rate problem though?

That looks very different. After a big initial spike, both countries have a rate of decrease that fits pretty well to a linear trend. So what do we get if we plug the current rate of increase into those linear approximations and solve for zero? In other words, when are there zero new deaths?

As we can see from the far right of the chart, as of today, Italy’s rate of new deaths is 11.89%, or 0.1189. Iran is 7.66% or 0.0766. Using those values we get some good news:

  • Italy: 27 days, or April 19th
  • Iran: 15 days, or April 7th

Yay! But…

Let’s look at the US. There’s not really enough data to do this on a state-by state basis yet, but there is plenty of data for the whole country. Again, we get a good linear approximation. The problem is, it’s going the wrong way:

The increase in our death rate (0.69% per day) is more than either Iran’s or Italy’s rate of decrease. At this point, there is literally no end in sight.

Let’s look at the world as a whole using death rates. This time the best fit is a second-degree polynomial, which produces U-shaped curves:

Also not good. Things clearly improved as China got a handle on its outbreak, but the trends are now going the other way as the disease spreads out into the reset of the world. It’s clearly going to be a bumpy ride.

I’d like to point out that there is no good way to tell here what caused these trends to change. It could be treatments, or it could be susceptibility. Italy and Iran did not take the level of action that China did, yet if trends continue, they will be clear in about a month. We’ll know more as the restrictions loosen, and there is or isn’t a new upturn as the lid comes off.