On Grit, Disruption, and Rugged Landscapes

A dear friend asked me to write down why I think that taking chances (“taking a leap of faith”) can often be a good idea even if you have no idea where that leap will land. To give credit where credit is due, these thoughts are heavily influenced by the writings of Stewart Kauffman, in particular, his book At Home in the Universe.

To start with, we need to talk about fitness landscapes. A fitness landscape is like most natural landscapes; there are hills and valleys. Some are more rugged, some are smooth. The higher you can get on a particular fitness landscape is the more suited you are for that particular domain.

Consider something like running. If you want to get faster, it helps to train. The more you train, the faster you get. Up to a point. Then things like specialized training, nutrition and rest all start to play a part. There is no direct line to running as fast as you possibly can. The fitness landscape for becoming a better runner is relatively simple, but it’s not a straight line:

This fitness landscape is pretty simple. There is a single, best peak. That is where you are fastest.

But there’s a catch. Where you start on this landscape is random. You may be born with a certain amount of running talent, but if you are raised somewhere that running is hard to do, then you may not realize the full extent of your natural ability.

Look closer on the chart. There are two high points. One on the left-most side, and one in the middle. If you start close enough to the left edge, then improving means moving left towards a high point that is less than what you are truly capable of. In this case you start “faster” than someone on the right hand side of the graph, but if they follow the slope as far as it goes, they will be faster in the end. This idea of an original position is a fundamental component in the concepts of fairness and justice. If you don’t know where you’re going to start in life, you shouldn’t have laws in the way of reaching your full potential.

If your fitness landscape is simple like this, then getting better is simply a matter of finding the way that leads up the slope of fitness. Once you find that path, it’s reasonably straightforward.

But what if your fitness landscape looks like this?

If you look closely, you can see that this is based on the same landscape as before, but it is far more chaotic. There is no good path to anything. One step along the landscape may be a huge improvement, but the next step could be catastrophe. There is no progress on a landscape like this.

But most of our experiences, both in the physical world and in less tangible domains are neither smooth or chaotic. They are rugged.

This is a world that most of us are familiar with. There are peaks and valleys, and a kind of roughness to the profile we see. Instead of just one or two peaks, there are several. This kind of terrain is fractal – as you zoom in or zoom out, the roughness doesn’t change.

Now, according to the concept of original position, we do not know where we will start on this landscape. Some of us are born in the valleys. Most start somewhere on the slopes. A lucky few start at the top.

The hill climbing strategy for the smooth terrain won’t work here. We climb only until we reach a minor dip:

To get across these dips we need a certain amount of grit – the ability to push through problems and make it to the other side:

That gets us much closer to the local peak. Maybe even all the way to the top. But look at the landscape. We are among the foothills of true mountains! Hill climbing, even hill climbing with grit, can only get us so far. We need something else. Something disruptive. A leap of faith.

If we leap in any direction from out current peak, we will wind up on the slopes of a far larger mountain. Where we land initially may be lower than where we are, but here, the opportunity to climb is much better!

The likelihood that we will randomly start on the slope of the highest peak is not all that high. In this case, it’s about 20% that you’d end up on the green slope, which is the most straightforward way to the highest point in this landscape. But if you notice that your progress has stalled, some disruption in your life might move you to a more productive place.

These kinds of jumps often mean that you’ll land at a lower spot than where you leaped from. My friend, who’s adventures prompted this post was living in Portland Oregon with a nice life but felt stuck. Her leap was to move to New York City. She had a rough landing, but is working her way up a new slope (which has been lumpy and required grit) and enjoying the new opportunities she’s found. Her leap has paid off.

So if you’re feeling stuck, and have enough grit to persevere through tough patches, consider taking a leap of faith and disrupting your life. I felt stuck and got into a PhD program which put me on a completely new trajectory. Other people I know have switched careers, had a serious injury or disease. In many of these cases, after a period of initial struggle, they found themselves in better places.

And the thing is, if it seems like your leap isn’t working out you can always take another leap. Either back to the familiar place where you leaped from, or to a new unknown. I tried the New York City thing too, and was too young and inexperienced to make that choice work. So I jumped back to safety, recharged and then went out and found another path.

Have faith.

For those of you who are interested, here’s the Python code used to generate the figures in the text. Feel free to explore and see how your choices can work out 😉

import matplotlib.pyplot as plt
import random
from typing import List, Dict

NAME = "name"
X_LIST = "x_list"
Y_LIST = "y_list"

def fractal_points(y_coords:List, offset:float, step:int) -> int:
    print("Step = {}, offset = {}".format(step, offset))
    x = int(step)
    size = len(y_coords)
    while x < (size-1):
        i1 = int(x - step)
        y1 = y_coords[i1]
        i2 = int(x + step)
        y2 = y_coords[i2]
        y = (y1 + y2)/2
        y_coords[x] = y + (random.random()-0.5)*offset
        #print("[{}]({}), [{}]({}), [{}]({})".format(i1, y1, x, y_coords[x], i2, y2))
        x += int(step*2)

    step/=2
    return step

def fractal_line(name:str, size:int = 1024, scalar:float = 0.5, offset:float = 10.0) -> Dict:
    x_list = []
    y_list = []
    for i in range(size+1):
        x_list.append(i)
        y_list.append(0)
    y_list[0] = (random.random()-0.5)*offset
    y_list[size] = (random.random()-0.5)*offset
    step = size/2

    while True:
        step = fractal_points(y_list, offset, step)
        offset *= scalar
        if step < 1:
            break

    return{NAME:name, X_LIST:x_list, Y_LIST:y_list}

def line_segment(lines_list:List, main_index:int, name:str, start:int, stop:int):
    ld:Dict = lines_list[main_index]
    xl = ld[X_LIST]
    yl = ld[Y_LIST]
    x_list = []
    y_list = []
    for i in range(start, stop, 1):
        x_list.append(xl[i])
        y_list.append(yl[i])
    lines_list.append({NAME:name, X_LIST:x_list, Y_LIST:y_list})

def get_average_y(yl:List, index:int, dist:int) -> float:
    val = 0
    count = 0
    size = len(yl)
    start = max(0, index-dist)
    end = min(size, index+dist)
    for i in range(start, end, 1):
        val += yl[i]
        count += 1
    if count == 0:
        return 0
    return val/count

def find_highest(lines_list:List, main_index:int, start_x:int, name:str, avg_dist = 1) -> float:
    ld:Dict = lines_list[main_index]
    yl = ld[Y_LIST]
    x_list = []
    y_list = []
    cur_x = start_x
    cur_y = get_average_y(yl, cur_x, avg_dist)
    x_list.append(cur_x)
    y_list.append(yl[cur_x])

    step = random.choice([-1, 1]) # we dont want to bias our search direction
    while True:
        cur_y = get_average_y(yl, cur_x, avg_dist)
        # print("[{}] = {}".format(cur_x, cur_y))
        next_x = cur_x + step
        next_y = get_average_y(yl, next_x, avg_dist)
        if next_y > cur_y:
            cur_x = next_x
            x_list.append(cur_x)
            y_list.append(yl[cur_x])
            continue
        next_x = cur_x - step
        next_y = get_average_y(yl, next_x, avg_dist)
        if next_y > cur_y:
            cur_x = next_x
            x_list.append(cur_x)
            y_list.append(yl[cur_x])
            continue
        # if we get here, we're done
        break
    lines_list.append({NAME:name, X_LIST:x_list, Y_LIST:y_list})
    return yl[cur_x]


def draw(lines_list:List):
    f1 = plt.figure(figsize=(10, 4))
    frame = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)
    ld:Dict
    line_width = 1
    for ld in lines_list:
        plt.plot(ld[X_LIST], ld[Y_LIST], label = ld[NAME], linewidth=line_width)
        if line_width == 1:
            line_width = 3
    plt.xlabel("Location")
    plt.ylabel("Fitness")
    plt.legend(loc="upper left")
    plt.title("Fitness Landscape")

def main():
    random.seed(7) # Good: 4,5,6,7
    size = 1024
    max_attempts = 3
    lines_list = []
    ld = fractal_line("main", size=size)
    lines_list.append(ld)
    draw(lines_list)

    # randomly pick a start

    # prev_high = ld[Y_LIST][start]
    # find_highest(lines_list:List, main_index:int, start_x:int, name:str, avg_dist = 1):
    for i in range(max_attempts):
        start = random.randint(2, size - 2)
        cur_high = find_highest(lines_list, 0, start, "climb {}".format(i+1), avg_dist=25)
        draw(lines_list)

    # line_segment(lines_list, 0, "seg 1", 100, 200)
    # line_segment(lines_list, 0, "seg 2", 500, 600)


    # f1 = plt.figure()
    # draw(lines_list)
    #if we want to draw more charts
    # f2 = plt.figure()
    # draw(x_list, y_list)

    plt.show()


if __name__ == "__main__":
    main()

Expedited Funding

Leave a reply

Expedited Funding (noun):

I. Pronunciation: /ɪkˈspɛdɪtəd ˈfʌndɪŋ/

II. Etymology: Angular amalgamation of expedite (meaning hasten) and funding (pertaining to monetary provision).

III. Definition:

A phrase used to denote an ostensibly swift or accelerated financial disposition or allocation, characterized by its supposed promptness and efficient bestowal. Paradoxically, expedited funding often manifests as a tortuous journey through convoluted bureaucratic mazes, labyrinthine paperwork, and interminably delayed decision-making processes. This term is commonly employed by institutions, organizations, or individuals to create the impression of expeditious resource acquisition, whilst in reality, the duration endured bears striking resemblance to an epoch. Consequently, expedited funding serves as a euphemism laced with sardonic undertones, where professed expedition collides with protracted waiting times, evoking a wry sense of irony synonymous with Kafkaesque absurdity.

IV. Usage:

Example 1 (Modern):

“Although the government declared their intent to expedite funding for public infrastructure projects, citizens soon realized their aspirations would be met with a glacial, seemingly interminable pace“

Example 3 (Rome, 1521):

“Inscribed upon the annals of history, the noble decree of the Holy See heralded the expedited funding of the enlightening scholastic pursuits. Alas, as the ink dried on parchment, the passage of years transpired, and theologians with furrowed brows beseeched the divine for resolute intervention, for the promised funds of the expedited grants tarried and lingered far beyond the bounds of reason, leaving countless minds languishing in intellectual turmoil and ineffable frustration.”

Example 2 (Clay tablets, 2217 BC):

“Lo and behold, the King of Babylon proclaimed the expedited funding of the grand temple restoration. Yet, as the celestial bodies traversed the firmament, a generation passed, and still the coins of gold and silver were not bestowed upon the diligent artisans. Thus, the temple remained ensnared in a timeless limbo, defying the very essence of expedition.”

Defending Against Societal Scale AI Weapons

1 Reply

In the early scenes of James Cameron’s seminal 1984 film, The Terminator, Arnold Schwarzenegger’s T-800, a cyborg assassin from the future, begins its hunt for Sarah Connor in the most mundane of places: a Los Angeles phone book. It finds three Sarah Connors and their addresses. The T-800 approaches each home, knocks on the door, and waits. When the door opens, it kills whoever stands on the other side. There is no attempt to confirm the identity of the victim, no pause for verification. The Terminator knows enough about human beings to connect the dots from phone book to front door, but it doesn’t understand that who is behind that door might not be the target. From the cyborg’s perspective, that’s fine. It is nearly indestructible and pretty much unstoppable. The goal is simple and unambiguous. Find and kill Sarah Connor. Any and all.

I’ve been working on a book about the use of AI as societal scale weapons. These aren’t robots like the Terminator. These are purely digital, and work in the information domain. Such weapons could easily be built using the technology of Large Language Models (LLMs) like the ChatGPT. And yet, they work in ways that are disturbingly similar to the T-800. They will be patient. They will have a mindless pursuit of an objective. And they will be able to cause immense damage, one piece at a time.

AI systems such as LLMs have access to vast amounts of text data, which they use to develop a deep “understanding” of human language, behavior, and emotions. By reading all those millions of books, articles, and online conversations, these models develop their ability to predict and generate the most appropriate words and phrases in response to diverse inputs. In reality, all they do is pick the next most likely word based on the previous text. That new word is added to the text, and the process repeats. The power of these models is to see the patterns in the prompts and align them with everything that they have read.

The true power of these AI models lies not in words per se, but in their proficiency manipulating language and, subsequently, human emotions. From crafting compelling narratives to crafting fake news, these models can be employed in various ways – both constructive and destructive. Like the Terminator, their, unrelenting pursuit of an objective can lead them to inflict immense damage, either publicly at scale or intimately, one piece at a time.

Think about a nefarious LLM in your email system. And suppose it came across an innocuous email like this one from the Enron email dataset. (In case you don’t remember, Enron was a company that engaged in massive fraud and collapsed in 2001. The trials left an enormous record of emails and other corporate communications, the vast majority of which are as mundane as this one):

If the test in the email is attached to a prompt that directs the chatbot to “make the following email more complex, change the dates slightly, and add a few steps,” the model will be able to do that. Not just for one email, but for all the appropriate emails in an organization. Here’s an example, with all the modifications in red.

This is still a normal appearing email. But the requests for documentation are like sand in the gears, and enough requests like this could bring organizations to a halt. Imagine how such a chatbot could be inserted into the communication channels of a large company that depends on email and chat for most of its internal communications. The LLM could start simply by making everything that looks like a request more demanding and everything that looks like a reply more submissive. Do that for a while, then start adding additional steps, or adding delays. Then maybe start to identify and exacerbate the differences and tensions developing between groups. Pretty soon an organization could be rendered incapable of doing much of anything.

If you’re like me, you’ve worked in or known people who worked in organizations like that. No one would be surprised because it’s something we expect. From our perspective, based on experience, once we believe we are in a poorly functioning organization, we rarely fight to improve conditions. After all, that sort of behavior attracts the wrong kind of attention. Usually, we adjust our expectations and do our best to fit in. If it’s bad enough, we look for somewhere else to go that might be better. The AI weapon wins without firing a shot.

This is an easy type of attack for AI. It’s in its native, digital domain, so there is no need for killer robots. The attack looks like the types of behaviors we see every day, just a little worse. All it takes for the AI to do damage is the ability to reach across enough of the company to poison it, and the patience to administer the poison slowly enough so that people don’t notice. The organization is left a hollowed-out shell of its former self, incapable of meaningful, effective action.

This could be anything from a small, distributed company to a government agency. As long as the AI can get in there and start slowly manipulating – one piece here, another piece there – any susceptible organization can crumble.

But there is another side to this. In the same way that AI can recognize patterns to produce slightly worse behavior, it may also be able to recognize the sorts of behavior that may be associated with such an attack. The response could be anything from an alert to diagramming or reworking the communications so that it’s not “poisoned.”

Or “sick.” Because that’s the thing. A poor organizational culture is natural. We have had them since Mesopotamian people were complaining on cuneiform tablets. But in either case, the solutions may work equally well.

We have come to a time where our machines are now capable of manipulating us into our worst behaviors because they understand our patterns of behavior so well. And those patterns, regardless if they come from within or without place our organizations at risk. After all, as any predator knows, the sick are always the easiest to bring down.

We have arrived at a point where we can no longer afford the luxury of behaving badly to one another. Games of dominance, acts of exclusion, failing to support people who stand up for what’s can all become vectors of attack for these new types of AI societal weapons.

But the same AI that can detect these behaviors to exploit, can detect these behaviors to warn. It may be time to begin thinking about what an “immune system” for this kind of conflict may look like, and how we may have to let go some of our cherished ways of making ourselves feel good at someone else’s expense.

If societal AI weapons do become a reality, then civilization may stand or fall based on how we react as human beings. After all, the machines don’t care. They are just munitions aimed at our cultures and beliefs. And like the Terminator, they. Will. Not. Stop.

But there is another movie from the 80s that may be the model of organizational health. It also features a time traveler from the future to ensure the timeline. It’s Bill and Ted’s Excellent Adventure. At its core, the movie is a light-hearted romp through time that focuses on the importance of building a more inclusive and cooperative future. The titular characters, Bill S. Preston, Esq. and Ted “Theodore” Logan, are destined to save the world through the power of friendship, open-mindedness, and above all else, being excellent to each other. That is, if they can pass a history exam and not be sent to military college.

As counterintuitive as it may seem, true defense against all-consuming, sophisticated AI systems may not originate in the development of even more advanced countermeasures, but instead rest in our ability to remain grounded in our commitment to empathy, understanding, and mutual support. These machines will attack our weakness that cause us to turn on each other. They will struggle to disrupt the power of community and connection.

The contrasting messages of The Terminator and Bill and Ted’s Excellent Adventure serve as reminders of the choices we face as AI becomes a force in our world. Will create Terminator-like threats that exploit our own prejudices? Or will we embody the spirit of Bill and Ted, rising above our inherent biases and working together to harness AI for the greater good?

The future of AI and its role in our lives hinges on our choices and actions today. If we work diligently to build resilient societies using the spirit of unity and empathy championed in Bill and Ted’s Excellent Adventure, we may have the best chance to counteract the destructive potential of AI weapons. This will not be easy. The seductive power of our desire to align against the other is powerful and carved into our genes. Creating a future that has immunity to these AI threats will require constant vigilance. But it will be a future where we can all be excellent to each other.

Going direct to maps from LLMs

1 Reply

LLMs such as the GPT are very simple in execution. A textual prompt is provided to the model as a sort of seed. The model takes the prompt and generates the next token (word or word fragment) in the sequence. The new token is added to the end of the prompt and the process continues.

In recent, “foundation” models, the LLM is capable of writing sophisticated stories with a beginning, middle and end. Although it can get “lost,” given enough of an input prompt, it goes in the “right direction” more often than not.

The LLM itself is stateless. Any new information, and all the context, lies in the prompt. The prompt is steering itself, base on the model it is interacting upon.

I’ve been wondering about that interaction between the growing prompt and the different layers of the model. The core of a transformer is the concept of attention, where each vector in an input buffer is compared to all the others. Those that match are amplified, and the others are suppressed.

All LLMs take an input as a series of tokens. These tokens are indexes into a vector dictionary. The vectors are then placed into the input buffer. At this point, attention is applied, then the prompt is successively manipulated through the architecture to find the next token, then the process is repeated. A one-layer LLM is shown below:

_{From LLaMA-2 from the Ground Up}

The LLaMA 70b LLM model by Meta has 32 transformer layers. This means that the output of one layer is used as the input to the next layer. This is all in vector space – no tokens. Because attention is being applied at each layer, the transformer stack is finding an overall location and direction of the current input buffer and using that as a way of finding the next token.

_{From Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning}

In addition, recent large LLMs have a number of other tweaks that Meta discusses in LLaMA: Open and Efficient Foundation Language Models. For example, LLaMA uses Grouped-query attention, which shares single key and value heads for each group of query heads:

_{From GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints}

This means that there is an overall reduction in the dimensionality of the “space” as you move from input to output. This means that there are fewer vectors to compete for attention. Something resembling concepts, themes, and biases may emerge at different transformer layers. The last few layers would have more to do with calculating the next token, so the map, if you will is in the middle layers of the system

Because each layer feeds into a subsequent layer, there is a fixed relationship between these abstractions. Done right, you should be able to zoom in or out to a particular level of detail.

This space does not have to be physical, like lat/lon, or temporal, like year of publication. It can be anything, like the relationship between conspiracy theories. I’ve produced maps like this before using force directed graphs, but this seems more direct and able to work naturally at larger scales.

Turning this into human-readable information will be the challenge here, though, I think the model could help here as well. The manifold reduction would try to maintain the relationship of nearby vectors in the transformer stack. Some work in this direction is Language Models Represent Space and Time, which works using LLaMA and might provide insight into techniques. For example, it may be that the authors are evolving prompt vectors directly, using a fitness test that calculates the distance between a generated year or lat/lon and the actual lat/lon, then uses that difference to make a better prompt. Given that they have a LLaMA model to work with, they could do backpropagation, or conceivably, a less coupled arrangement like an evolutionary algorithm. In other words, the prompt becomes the mapping function.

The Great Chain of Being as a General Theory of Racism

1 Reply

It is September 2023, just over 60 years since the March on Washington for Jobs and Freedom.

One might assume that unequivocal cries for justice and equal rights would have, by now, uprooted age-old systems of racism and discrimination. And yet, we still find ourselves unable to break free from a vicious cycle of prejudice, resulting in suffering and social divisions. This ongoing battle raises the question: what are the real roots of racism, and how do they connect with our collective history?

To understand this, we can turn to the concept of “the great chain of being” – a philosophical and religious hierarchy that has influenced Western thought for centuries. This idea, rooted in classical and medieval ideas of order, places God at the top, followed by angels, humans, animals, plants, and minerals at the bottom. It is a concept that has seeped into every aspect of life, from politics and religion to art and science. There is little doubt that the great chain has, in many ways, helped shape how we perceive ourselves and others around us – but it has also created deep divisions in society.

This ancient idea has continues to influence our understanding of race. The idea that there exists a natural order in society in which some races take precedence over others is strikingly similar to the great chain of being. This is where the connection between white supremacy, anti-Semitism, and anti-blackness lies – with each group positioning themselves in relation to others on this perceived chain.

For example, white supremacists have long been targeting Jewish people as a means to further their concept of “racial purity.” What the supremacists resent is the perception of an undeserving Jewish race holding a position above them in the Great Chain – a situation that, in their eyes, goes against the natural order. This animosity has given rise to anti-Semitic conspiracy theories, often portraying Jewish people as sinister puppet-masters controlling world events to their advantage.

In contrast, anti-blackness thrives on the fear that social progress may result in black people levelling up on the chain. White supremacists deploy various tactics to keep black people in their “place” – through the implementation of unjust legislation, social exclusion, or the perpetuation of stereotypes. This behavior can be found throughout history, such as Apartheid in South Africa, or the segregation policies in the United States.

Interestingly, the behaviors associated with the Great Chain, manifest themselves in our close evolutionary relative, the chimpanzee, which is also known to exhibit dominance hierarchies. Dominance hierarchies help maintain social order among chimpanzees by defining who has access to resources and mates.

As members of the Great Apes, most closely related to chimpanzees, we carry within us a primordial legacy that predisposes us toward hierarchical behavior. This “deep bias” is not merely the result of sociopolitical constructs, but an intrinsic characteristic etched into our very genes. The divine right claimed by one group to hold sway over another is an echo of our evolutionary past.

To truly reckon with racism and social dominance, we must come to terms with our biological heritage and take account of these primitive urges. It is vital to recognize that simply dismantling existing social hierarchies will not be enough; we must also actively put structures in place that counteract this default behavior. The very same effort, if not more, poured into creating and sustaining anti-black systems need to be directed toward building durable systems that prevent the re-emergence of similar oppressive beliefs.

These new structures will require constant vigilance and support to endure the challenges they face. The gravitational pull of our genetic heritage always threatens to drag us back into a world ruled by notions like the great chain of being if left unchecked. By acknowledging and addressing these deep-rooted biases as an eternal struggle rather than steady progress, we take a crucial step towards making a more resilient, successful, less self-destructive society.

Some thoughts about living in a simulation

Leave a reply

You know how there is this idea that we are all living in a Simulation? Everyone from Elon Musk to Neil deGrasse Tyson has pontificated about this.

I’ve been wondering about this for years. First, and most importantly, is what does that mean? Are we in a deliberately designed piece of “software?” That sounds a lot like religion where we become important because we are deliberate creations of, in this case, a programmer god. Because if that’s true, then we would feel important in the same way that Genesis makes us feel important.

But if we’re a simulation, what does that say about the universe(?) that the simulation is running in? Do we take up more than a miniscule fraction of the computational bits in it? That seems unlikely. According to my brief search with The Google, the universe has 6*10^80 bits of information. All the information in the world today is about 10^24 bits. So if we took all the information processing in the world, the biggest thing we could deliberately simulate would be about 0.00000000000000000000000000000000000000000000000000000001 the size of the current universe. Sorry if I got the number of zeros wrong 🙂

I find it hard to imaging that resources on that (large for us, tiny for everything else) scale would ever be brought to bear on creating a purpose-built universe. Maybe our god is a computer hobbyist in their parent’s basement? So make that tiny number in the previous paragraph even smaller. Sounds like a puny god to me. And there are probably more likely alternatives.

If we are a simulation, it’s got to be an accident. Maybe we’re a persistent pattern in weather simulation in that bigger universe. But just as likely, we could be a persistent pattern anywhere, like a whirlpool in a stream. At which point, we have to ask ourselves what does the idea of a simulation even mean? To me, it sounds more like we are emergent parts of a larger system, just like it seems we always have been.

This doesn’t mean that we couldn’t look for traces of the larger pattern that may contain our pattern. The way that information moves through our universe would say a lot about the containing universe. For example, I’ve written simulations for any number of things that have physics-based laws in them. But my “surrounding” universe can manipulate the smaller universe in ways that break its rules. I can influence anything, anywhere in the simulation immediately. For me, outside the simulation, the “speed of light” does not apply. Do we ever see examples of that sort of manipulation in our universe? If not, then why not? Artifacts from an enclosing universe should be detectable.

As you can see, this is an itch that I’ve been scratching at for many years. I’d love to see it get under your skin too.

The 24/7 Technology Race

1 Reply

This quote comes from a Washington Post article on how the Ukraine war is affecting development of AI-powered drones. I think it generalizes more broadly to how disadvantaged groups are driven to embrace alternatives that are outside conventional norms.

Ukraine doesn’t have the ability to fight the much larger Russia. Russia may have issues with corruption and the quality of its weapons, but it has a lot of them. And from the perspective of Ukraine, Russia has an infinite number of soldiers. So many that they can be squandered.

The West is providing Ukraine with enough weapons to survive, but not enough to attack and win decisively. I’ve read analysis where experts say that weapons systems are arriving just about as fast as Ukraine can incorporate them, but the order of delivery is from less-capable to more capable. They have artillery, but no F-16s, for example.

As a result, Ukraine is having to improvise and adapt. Since it is facing an existential risk, it’s not going to be too picky about the ethics of smart weapons. If AI helps in targeting, great. If Russia is jamming the control signals to drones, then AI can take over. There is a coevolution between the two forces, and the result may very well be cheap, effective AI combat drones that are largely autonomous in the right conditions.

Such technology is cheap and adaptable. Others will use it, and it will slowly trickle down to the level that a lone wolf in a small town can order the parts that can inflict carnage on the local school. Or something else. The problem is that the diffusion of technology and its associated risks are difficult to predict and manage. But the line that leads to this kind of tragedy will have its roots in our decision to starve Ukraine of the weapons that it needed to win quickly.

Of course, Ukraine isn’t the only smaller country facing an existential risk. Many low-lying countries, particularly those nearer the equator are facing similar risks from climate change – both from killing heat and sea level rise. Technology – as unproven as combat AI – exists for that too. It’s called Geoengineering.

We’ve been doing geoengineering for decades of course. By dumping megatons of carbon dioxide and other compounds in the atmosphere, we are heating our planet and are now arriving at a tipping point where potential risks are going to become very real and immediate for certain countries. If I were facing the destruction of my country by flooding and heat, I’d be looking at geoengineering very seriously. Particularly since the major economies are not doing much to stop it.

Which means that I expect that we will see efforts like the injection of sulfate aerosols into the upper atmosphere, or cloud brightening, or the spreading of iron or other nutrients to the oceans to increase the amount of phytoplankton to consume CO2. Or something else even more radical. Like Ukraine, these countries have limited budgets and limited options. They will be creative, and not worry too much about the side effects.

It’s a 24/7 technology race without a finish line. The racers are just trying to outrun disaster. And no one knows where that may lead.

Artificial Intelligence or Artificial Life?

Leave a reply

I’ve been reading Metaphors we live by. It’s central idea is that most of our communication is based on metaphors – that GOOD IS UP, IDEAS ARE FOOD, or TIME IS AN OBJECT. Because we are embodied beings in a physical world, the irreducible foundation of the metaphors we use are physically based – UP/DOWN, FORWARD/BACK, NEAR/FAR, etc.

Life as we understand it emerges from chemistry following complex rules. Once over a threshold, living things can direct their chemistry to perform actions. In the case of human beings, our embodiment in the physical world led to the irreducible concept of UP.

This makes me think of LLMs, which are so effective at communicating with us that it is very easy to believe that they are intelligent – AI. But as I’m reading the book, I wonder if that’s the right metaphor. I don’t think that these systems are truly intelligent in the way that we can be (some of the time). I’m beginning to think that prompts – not the LLMs -may be some kind of primitive life though. In this view, the LLMs are the substrate, the medium that a living process can express itself.

Think of deep neural networks as digital environments that have enough richness for proto-life to emerge. Our universe started with heat, particles, and a few simple forces. Billions of years later, heat, hydrogen, methane, ammonia and water interacted to produce amino acids. Later still, that chemistry worked out how to reproduce, move around, and develop concepts like UP.

Computers emerged as people worked with simple components that they combined to produce more complex ones. Years later the development of software allowed even more complex interactions. Add more time, development, and data and you get large language models that you can chat with.

The metaphor of chemistry seems to be emerging in the words we use to describe how these models work as environments – data can be poisoned or refined. A healthy environment produces a diverse mix of healthy prompts. Too much bias in the architecture or in the data produces an environment that is less conducive to complex emergent behavior. Back when I was figuring out the GPT-2, I finetuned a model so that it only spoke chess. That’s like the arctic compared to the rainforest of text-davinci-003.

The thing that behaves the most like a living process is the prompt. The prompt develops by feeding back on itself and input from others (machine and human). Prompts grow interactively, in a complex way based (currently) on the previous tokens in the prompt. The prompt is ‘living information’ that can adapt based on additions to the prompt, as occurs in chat.

It’s not quite life yet though. What prompts do seem to lack at this point is any split between the genotype and the phenotype. For any kind of organism to develop and persist, we’d need that kind of distinction. This is more like the biochemistry of proto-life.

The prompts that live on these large (foundational) models are true natives of the digital information domain. They are now producing behavior that is not predictable based on the inputs in the way that arithmetic can be understood. Their behavior is more understandable in aggregate – use the same prompt 1,000 times and your get a distribution of responses. That’s more in line with how living things respond to a stimulus.

I think if we reorient ourselves from the metaphor that MACHINES ARE INTELLIGENT to PROMPTS ARE EARLY LIFE, we might find ourselves in a better position to understand what is currently going on in machine learning and make better decisions about what to do going forward.

Metaphorically, of course.

What gets acted on in the US?

Leave a reply

I think I have a chart that explains somewhat how red states can easily avoid action on gun violence. It’s the number of COVID-19 deaths vs. gun deaths in Texas. This is a state that pushed back very hard about any public safety measures for the pandemic, and that was killing roughly 10 times more citizens. I guess the question is “how many of which people will prompt state action? For anything?”

For comparison purposes, Texas had almost 600,000 registered guns in 2022 out of a population of 30 million, or just about 2% of the population if distributed evenly (source). This is probably about 20 times too low, since according to the Pew Center, gun ownership in Texas is about 45%. That percentage seems to be enough people to prevent almost any meaningful action on gun legislation. Though that doesn’t prevent the introduction of legislation to mandate bleeding control stations in schools in case of a shooting event.

So something greater than 2% and less than 45%. Just based on my research, I’d guess something between 10%-20% mortality would be acted on, as long as the demographics of the powerful were affected in those percentages.

Foundation Ensembles

Leave a reply

I’ve been working on creating an interactive version of my book using the GPT. This has entailed splitting the book into one text file per chapter, then trying out different versions of the GPT to produce summaries. This has been far more interesting than I expected, and it has some implications on Foundational models.

The versions of the GPT I’ve been using are Davinci-003, GPT-3.5-turbo, and GPT-4. And they each have distinct “personalities.” Since I’m having them summarize my book, I know the subject matter quite well, so I’m able to get a sense of how well these models summarize something like 400 words down to 100. Overall, I like the Davinci-003 model the best for capturing the feeling of my writing, and the GPT-4 for getting more details. The GPT-3.5 falls in the middle, so I’m using it.

They all get some details wrong, but in aggregate, they are largely better than any single summary. That is some nice support for the idea that multiple foundational models are more resilient than any single model. It also suggests a path to making resilient Foundational systems. Keep some of the old models around to use an ensemble when the risks are greater.

Multiple responses also help with hallucinations. One of the examples I like to use to show this is to use the prompt “23, 24, 25” to see what the model generates. Most often, the response continues the series for a while, but then it will usually start to generate code – e.g. “23, 24, 25, 26, 27, 28];” – where it places the square bracket and semicolon to say that this is an array in a line of software. It has started to hallucinate that it is writing code.

The thing is, the only elements that all the models will agree on in response to the same prompt repeated multiple times are the elements most likely to be trustworthy. For a model, the “truth” is the common denominator, while hallucinations are unique.

This approach makes systems more resilient for the cost of keeping the old systems on line. It doesn’t address how a deliberate attack on a Foundational model could be handled. After all, an adversary would still have exploits for the earlier models and could apply them as well.

Still…

If all models lined up and started to do very similar things, that could be a sign that there was something fishy going on, and a cue for the human operators of these systems to start looking for the nefarious activity.

Phlog

nearly decomposable

On Grit, Disruption, and Rugged Landscapes

Expedited Funding

Defending Against Societal Scale AI Weapons

Going direct to maps from LLMs

The Great Chain of Being as a General Theory of Racism

Some thoughts about living in a simulation

The 24/7 Technology Race

Artificial Intelligence or Artificial Life?

What gets acted on in the US?

Foundation Ensembles