Category Archives: Languages

An almost-from-scratch Python example of a simple neural network

Introduction

A rite of passage in understanding machine learning is writing your own network from scratch. This isn’t usually about making a better framework, it’s about figuring out what’s going on in all those frameworks. What follows is my contribution to this small but growing genre of programming literature.

The code is based on Andrew Trask’s Grokking Deep Learning (Github), which I’ll refer to as GDL below. No, I haven’t finished it, but this is the first major milestone, so I’m documenting it before I forget.

My background is that of a developer. I have been programming for a living since the 1980’s, mostly across the Object-Oriented landscape. I like classes and generalized solutions. It’s how I think, so this will be the framing that I use for this example.

There are two files: SimpleLayer.py, a class that handles the particulars of what a layer in a network needs to do, and simple_nn.py, a file that exercises that class by building a three-layer NN. We’ll walk through simple_nn.py first, which sets up and runs the network. Then we’ll walk through SimpleLayer, which handles training and backpropagation. At the bottom of the post are full code listings, which are also available on GitHub if you want to use this as a basis for experimentation.

simple_nn.py

Let’s start at the beginning:

import numpy as np
import matplotlib.pyplot as plt
import src.SimpleLayer as sl

One of the things that I try to do in these sort of exercises is to keep the amount of libraries to a minimum. For this I use two very vanilla imports, NumPy for math, and Matplotlib for diagrams of the weights changing over time.

Next, some global variables:

# variables ------------------------------------------
# The samples. Columns are the things we're sampling, rows are the samples
streetlights_array = np.array( [[ 1, 0, 1 ],
[ 0, 1, 1 ],
[ 0, 0, 1 ],
[ 1, 1, 1 ]])
num_streetlights = len(streetlights_array[0])
num_samples = len(streetlights_array)

# The data set we want to map to. Each entry in the array matches the corresponding streetlights_array row
walk_vs_stop_array = np.array([[1],
[1],
[0],
[0]])

Here is all the data about the lights. Each row is a sample. Each element in the row is a light. There are four rows in this set. They are matched to a classification of each row in the walk/stop array. These values come from GDL, where the premise is that you have a set of samples from three lights (streetlight_array), and a set of samples of actions that happen (walk_vs_stop_array).

lights

Figure 1: The lights and the behaviors from GDL

The goal is to train a network from the input streetlights  that produces the right walk/stop output. Now in a real network, we’d worry about overfitting and other related issues, but we’re going to ignore that here.

The next two variables are not in GDL. These are layer_array, which will contain the instances of the SimpleLayer class, and error_plot_mat, which will be used by pyplot to draw a chart of the error converging to zero. Or failing to, as the case may be.

# set up the dictionary that will store the numpy weight matrices
layer_array = []

error_plot_mat = [] # for drawing plots

There is one last bit of setup before we start doing things. There are three methods that will be used later in the program:

# Methods ---------------------------------------------
# activation function: sets all negative numbers to zero
# Otherwise returns x
def relu(x: np.array) -> np.array :
    return (x > 0) * x

# This is the derivative of the above relu function, since the derivative of 1x is 1
def relu2deriv(output: np.array) -> np.array:
    return 1.0*(output > 0) # returns 1 for input > 0
    # return 0 otherwise

# create a layer
def create_layer(layer_name: str, neuron_count: int, target: sl.SimpleLayer = None) -> 'SimpleLayer':
    layer = sl.SimpleLayer(layer_name, neuron_count, relu, relu2deriv, target)
    layer_array.append(layer)
    return layer

Let’s go through these one at a time. First, I’d like to say that as someone who likes compiling, strong typing, and all those components that keep me from doing dumb things, I am a fan of Python’s accommodation of typing. It could be better, but it helps:

  • def relu(x: np.array) -> np.array : 
    • This is an example of an activation function. It will be used in the layer to determine whether or not a value propagates through the neuron. In this case, all it does is clamp negative values to zero.
  • def relu2deriv(output: np.array) -> np.array:
    • This is the inverse of the above function, and returns a one (the slope of the line in relu()) if the value is greater than zero.
  • def create_layer(layer_name: str, neuron_count: int, target: sl.SimpleLayer = None) -> ‘SimpleLayer’:
    • This is what I use to create a layer. You pass in the name for your layer, how many neurons it has, and its ‘target’, or the layer below it. One of the things that I discovered when writing the SimpleLayer class is how intimately layers are connected. In this case, building any other layer than the last requires a target layer. This allows the weights that will manage the influence between the neurons in each layer to be set up properly. It could have just as easily been built from top to bottom, and pointed at the ‘source’ layer.
    • The other thing that this method does is to store the newly created layer in the layer_array, which makes experimenting with adding and deleting layers trivial.

Ok, let’s set up the layers in our network! Again, this is a reimplementation of the network built in GDL (chapter 6)

np.random.seed(0)

#set up the layers from last to first, so that there is a target layer
output = create_layer("output", 1)
output.set_neurons([1])

hidden = create_layer("hidden", 4, output)
hidden.set_neurons([1, 2, 3, 4])

input = create_layer("input", 3, hidden)
input.set_neurons([1, 2, 3])

for layer in reversed(layer_array):
    print ("--------------")
    print (layer.to_string())

First, we seed the random generator with a value so that these results are repeatable. It turns out that this network can be made to converge very fast or not to converge simply by picking a different seed. We’ll discuss what this implies later.

So what we’ve created is a stack of layers, built from bottom to top that looks like this:

  • Input layer (3 neurons): This is where the streetlights information will be loaded
  • Hidden layer (4 neurons): This layer mediates the interactions between the input and output layers, making these interactions nonlinear. It’s what allows deep neural networks to learn nonlinear, discontinuous functions from examples (And remember, that’s all that neural networks do. Though to be fair, it may be all our brains do, too…
  • Output layer (1 neuron): This is where the walk/stop values will be used to adjust the output that is generated starting with the original, random weights.

We then print out the contents of each layer. One quick note – I load the neurons up with sequential integers (e.g. [[1. 2. 3.]]). These values get overridden when the system is run, so it’s just a way to quickly verify the as-built neurons :

--------------
layer input: 
target = hidden
source = no source
neurons (row) = [[1. 2. 3.]]
weights (row) = 
[[-0.1526904   0.29178823 -0.12482558  0.783546  ]
 [ 0.92732552 -0.23311696  0.58345008  0.05778984]
 [ 0.13608912  0.85119328 -0.85792788 -0.8257414 ]]
--------------
layer hidden: 
target = output
source = input
neurons (row) = [[1. 2. 3. 4.]]
weights (row) = 
[[0.09762701]
 [0.43037873]
 [0.20552675]
 [0.08976637]]
--------------
layer output: 
target = no target
source = hidden
neurons (row) = [[1.]]
weights (row) = 
None

I think this output is pretty obvious, aside from the weights, so let’s look at them more closely. But first, a short digression.

Normally when I see diagrams and descriptions of connected layers of neurons, I usually see something like this:

FullyConnected

Figure 2: The typical neural network diagram

As you can see, each neuron in the input layer is connected to each neuron in the hidden layer and so on through to the output layer. And that’s nice conceptually, but as a developer, I have no understanding of the mechanics of what’s happening. Here’s what it really looks like. For clarity, only the interactions involving input neuron 1 and hidden neuron 4 are shown, but the process is identical:

FullyConnectedWeights

Figure 3: How the input and hidden layer are actually connected

In this case, we’re looking at the mapping between the input layer and the hidden layer. Each neuron in the input layer gets its own row of weights, let’s say [0.1, 0.2, 0.0, 0.5] for neuron one. If that neuron is set to “10”, then a value of 1.0 will go to hidden neuron 1, a value of 2 to hidden neuron 2, and a value of 5 to hidden neuron 4. This process is repeated for each neuron in the input layer, and the value is added to the associated hidden neuron.

That’s what we mean when we talk about fully connected layers. Everything is mediated through an adjacency matrix of weights. We’ll revisit this in more detail when we walk through SimpleLayer.  So in the listing above the two figures, the randomly initialized weights are organized so that the each neuron in the layer has its own row. Each entry in that row is the scalar value that the row’s neuron value will  be multiplied by as it is accumulated in the target’s neuron. Source neurons are the row component. Target neurons are the column component.

Next is the body of the program:

alpha = 0.2
iter = 0
max_iter = 1000
epsilon = 0.001
error = 2 * epsilon
while error > epsilon:
    error = 0
    for sample_index in range(num_samples):
        input.set_neurons(streetlights_array[sample_index])
        for layer in reversed(layer_array):
            layer.train()

        delta = output.calc_delta(walk_vs_stop_array[sample_index])
        sample_error = np.sum(delta ** 2)
        error += sample_error

        for layer in layer_array:
            layer.learn(alpha)

        # Gather data for the plots
        error_plot_mat.append([sample_error])
        # print("{}.{} Error = {:.5f}".format(iter, sample_index, sample_error))

    error /= num_samples
    if (iter % 10) == 0 :
        print("{} Error = {:.5f}".format(iter, error))
    iter += 1
    # stop even if we don't converge
    if iter > max_iter:
        break

Let’s talk about the local variables first. The first variable, alpha, is the learning rate that we pass in. It’s a scalar that limits the step size in the change in weights. The bigger the scalar, the more likely to overshoot the goal and go into oscillation around it. The smaller the goal, the longer the approach will take, but the greater the chance that it will stabilize. Like the seed we use to set the random number generator, the number of layers, and the number of neurons per layer, this is a hyperparameter. Some, like alpha, result in predictable behavior. Others, like seed, do not. There is a lot of this in deep learning, and you need to be careful about it. In particular, testing the resiliency of the solution by running it with a variety of nonlinear hyperparameters to see if the results are consistent is probably a good idea, though it sucks up compute resources.

The rest of the variables are used for loop control:

  • iter is the current count of times through the loop
  • max_iter is the maximum times we’ll run through the loop, even if we don’t converge
  • epsilon is the error threshold. If the error drops below that, we’re done.
  • error is the sum of the squares of all the output neurons (in this case, one). We initialize it to a value that gets us into the loop
while error > epsilon:
    error = 0
    sample_error_array = []
    for sample_index in range(num_samples):
        input.set_neurons(streetlights_array[sample_index])

This is the main loop. First, we’re going to loop until our error is small. Error is computed by sample, so we need to know what the average (or max – I use average here) error is for each iteration. We also want to save the individual errors by sample for later plotting.

Within each loop, we’re going to evaluate the input streetlights sample against the output walk/stop sample. The first step in this process is to set the input neurons. This is where the input.set_neurons([1, 2, 3]) that we did when we were creating the layers gets overridden. In the training, the output from this layer will overwrite the values in the next layer and so on.

for layer in reversed(layer_array):
    layer.train()

This is the training step. We’ll go into more detail when we walk through SimpleLayer, but for now not that we set through all the layers from the top to the bottom, in reversed order from how they were created and loaded into layer_array.

delta = output.calc_delta(walk_vs_stop_array[sample_index])
sample_error = np.sum(delta ** 2)
error += sample_error

This is where we calculate the array of deltas that are the difference between the goal of the walk/stop array and the output neurons. the error is the sum of the squares of all those deltas. SoS is nice because it’s always positive.

for layer in layer_array:
    layer.learn(alpha)

# Gather data for the plots
sample_error_array.append(sample_error)
# print("{}.{} Error = {:.5f}".format(iter, sample_index, sample_error))

Learning is done from bottom to top, using the deltas stored in the output layer. These are backpropagated through the layers, and the changes in the weights are scaled to 20% of the calculated values so we settle nicely.

We also gather the error data (for each streetlight-walk/stop sample) into a matrix that we can print out when we’re done. If we want to, we can print the error for each sample in the training. Some converge faster than others, but this is not the best way to see that.

error /= num_samples
if (iter % 10) == 0 :
    print("{} Error = {:.5f}".format(iter, error))
iter += 1
# stop even if we don't converge
if iter > max_iter:
    break

At the bottom of the loop, we calculate the average error over all the samples. We then see if we’ve been here too long and break if we are, regardless of whether we’ve converged or not. And lastly, this is how I like to print formatted strings in Python (essentially the same as “%.5f” in Java/C/etc).

Once the loop terminates, we need to see how well the network has learned. As I said earlier, in a real machine learning situation we would be careful about issues such as overfitting by, for example, training against one set of data and testing against another. But since this is a toy problem, so we are simply going to see how it did with the training data. I’ve added some explicit variables for clarity:

  • prediction: The contents of the single neuron in the output layer
  • observed: The value in the walk/stop array that we’re evaluating against
  • accuracy: how close did we get?
print("\n--------------evaluation")
for sample_index in range(len(streetlights_array)):
    input.set_neurons(streetlights_array[sample_index])
    for layer in reversed(layer_array):
        layer.train()
    prediction = float(output.neuron_row_array)
    observed = float(walk_vs_stop_array[sample_index])
    accuracy = 1.0 - abs(prediction - observed)
    print("sample {} - input: {} = pred: {:.3f} vs. actual:{} ({:.2f}% accuracy)".
          format(sample_index, input.neuron_row_array, prediction, observed, accuracy*100.0))

Since the network is already set up with weights, all we need to do is to see how well our inputs match to our outputs. All this means is to take a set of inputs and run them forward to the model. There will be no learning via backpropagation.

So let’s see how we did!

0 Error = 0.35238
10 Error = 0.29001
20 Error = 0.19074
30 Error = 0.12883
40 Error = 0.04666
50 Error = 0.00544

--------------evaluation
sample 0 - input: [[1. 0. 1.]] = pred: 0.978 vs. actual:1.0 (97.78% accuracy)
sample 1 - input: [[0. 1. 1.]] = pred: 1.000 vs. actual:1.0 (100.00% accuracy)
sample 2 - input: [[0. 0. 1.]] = pred: 0.037 vs. actual:0.0 (96.27% accuracy)
sample 3 - input: [[1. 1. 1.]] = pred: 0.000 vs. actual:0.0 (99.95% accuracy)

As you can see, the values converge in less than 60 iterations, and the predictions are quite close. For the second and fourth stoplight pattern, the results are basically exact (100% and 99.95%). That’s not bad for a bunch of random numbers and two simple rules.

These are the kinds of outputs that you get with heavyweight packages like Keras. It’s helpful (We trained successfully! Horay!). And these types of outputs make sense when models are huge – or even bigger toy problems like MNIST (which we will explore in a future post).

But this is toy code for a toy problem so we can show more than that. Being able to visualize what’s going on is very helpful. That’s why the error for each step has been saved in error_plot_mat.

Plotting data like this in Python is one of the joys of using the language. Here’s what it takes:

# plots ----------------------------------------------
fig_num = 1
f1 = plt.figure(fig_num)
plt.plot(error_plot_mat)
plt.title("error")
names = []
for i in range(num_samples):
        names.append("sample_{}".format(i))
names.append("average")
plt.legend(names)

for layer in reversed(layer_array):
    if layer.target != None:
        fig_num += 1
        layer.plot_weight_matrix(fig_num)

for layer in reversed(layer_array):
    fig_num += 1
    layer.plot_neuron_matrix(fig_num)

plt.show()

We are going to be creating a bunch of plots. One for the error, and then one for each set of neurons and their weights. We’ll get back to the layer plots when we’re walking through SimpleLayer, but here’s a plot of all the errors, by sample and average for the entire training session:

outputerror

Figure 4: Error for each sample

Some things worth noting are this is not a linear process. There are times where the learning process is pretty slow, particularly at the beginning in this example. The second observation is that zero error happens much sooner for some samples than others. The first sample with zero error happens around step 150 (iteration 37 or so of the main loop). If the exit condition were based on looking at one sample instead of the average of all the sample errors, the system could exit early. I had this happen when I was using sample_error rather than error in the exit condition. It took a while to figure out why some seed values behaved so differently from others….

And that ends the tour of the main loop. Next, we’ll look at how a layers interact to train and learn.

SimpleLayer

The previous section is roughly equivalent to a Keras, Torch, or other machine learning framework. You get an idea of the behavior of a system and how the construction affects the output, but the details of the implementation are hidden. In this section, we’re going to look at the creation of a layer in detail – the ways they are connected and the ways that they communicate. As with the walkthrough of the main loop, we’ll start with the construction of the layer, then the forward learning process, the training backpropagation process, and graph what’s going on.

Construction

As with simple_nn.py, SimpleLayer is written to have very few dependencies. I actually struggled with whether or not to write my own matrix math, but I think NumPy is pretty clear, and it would get distracting with all the additional code.

import numpy as np
import matplotlib.pyplot as plt
import types
import typing

There are some class-wide variables that we should describe:

class SimpleLayer:
    name = "unset"
    neuron_row_array = None
    neuron_col_array = None
    weight_row_mat = None
    weight_col_mat = None
    plot_mat = [] # for drawing plots
    num_neurons = 0
    delta = 0 # the 'movement' scalar
    target = None
    source = None
    activation_func = None
    derivative_func = None

In order of declaration, these are

  • name: the string name of the layer. Used in printing and surprisingly useful in debugging
  • neuron_row_array: the neurons in row form (i.e. [[n1, n2, n3, … , nN])
  • neuron_col_array: the transpose of neuron_row_array (i.e. [[n1], [n2], [n3], … ,[nN]]. We need the data in both forms for interactions between layers
  • weight_row_mat: the weights in row format, as above
  • weight_col_mat: the weights in column format, as above
  • weight_history_mat: where the weight data from each training pass is stored for plotting
  • neuron_history_mat: where the neuron data from each training pass is stored for plotting
  • num_neurons: the number of neurons in this layer
  • delta: the scalar that changes the size of the “step” this layer takes as it tries to converge on the goal. Passed in as alpha in simple_nn.py
  • target: the layer “below” this layer. May be NULL
  • source: the layer “above” this layer. May be NULL
  • activation_func: the function that controls the nonlinearity of the training process. Passed in as relu() from simple_nn 
  • derivative_func: the function used in backpropagation that is the derivative of the activation function. Passed in as relu2deriv() in simple_nn

Next is the initialization, which is done through the constructor:

# set up the layer with the number of neurons, the next layer in the sequence, and the activation/backprop functions
def __init__(self, name, num_neurons: int, activation_ptr: types.FunctionType, deriv_ptr: types.FunctionType, target: 'SimpleLayer' = None):
    self.reset()
    self.activation_func = activation_ptr
    self.derivative_func = deriv_ptr
    self.name = name
    self.num_neurons = num_neurons
    self.neuron_row_array = np.zeros((1, num_neurons))
    self.neuron_col_array = np.zeros((num_neurons, 1))
    # We only have weights if there is another layer below us
    for i in range(num_neurons):
        self.neuron_history_mat.append([])
    if(target != None):
        self.target = target
        target.source = self
        self.weight_row_mat = 2 * np.random.random((num_neurons, target.num_neurons)) - 1
        self.weight_col_mat = self.weight_row_mat.T

This takes the values supplied in the create_layer() method in simple_nn.py and bulds the layer. Once the local variables are set, the matricies of neurons are created.

If there is a target, the two layers are connected. What this means is that the source layer creates a numpy matrix that has as many rows as the source neurons and as many columns as the target neurons (See figure 3). This matrix is the weights that are used to uniquely distribute the value of each neuron in the source layer to each neuron in the target layer. As with the neurons, this is stored in row and column form.

Once each layer is set up, we are ready to begin the training process.

Training

Training a neural network is the process of take a set of input values and sending them through the entire network to get an output. We can compare that output to the desired value, and then adjust. Using the mechanism of a deep neural network allows us to build a system that can map many input values to a desired output value. In this case, we’re looking at three values in an array, but using exactly the same structure, we can increase the number of values to be the pixels in an image and the output to be the label for that image:

cfar-10

Figure 5: The CFAR-10 Dataset

That takes more layers and some other tricks, but the basic technique is the same.

Ok, back to three values in an array that represent some streetlights. To get this into the input layer, we use the set_neurons() method:

# Fill neurons with values
def set_neurons(self, val_list: typing.List):
    # print("cur = {}, input = {}".format(self.neuron_array, val_list))
    for i in range(0, len(val_list)):
        self.neuron_row_array[0][i] = val_list[i]
    self.neuron_col_array = self.neuron_row_array.T

The numpy neuron arrays are actually two-dimensional arrays that are one element deep. This supports numpy array math like dot product and transpose. That’s why the awkward syntax where we take the val_list and set the neurons to those values. We then take the transpose immediately so that I don’t have to wonder if it’s been done already.

The next step is to ripple the values through the network layers:

def train(self):
    # if not the bottom layer, we can record values for plotting
    if(self.target != None):
        self.weight_history_mat.append(self.nparray_to_list(self.weight_row_mat))

    # if we're not the top layer, propagate weights
    if self.source != None:
        src = self.source
        # set our neuron values as the dot product of the source neurons, and the source weights
        self.neuron_row_array = np.dot(src.neuron_row_array, src.weight_row_mat)

        # No activation function to output layer
        if(self.target != None):
            # Adjust the values based on the activation function. This introduces nonlinearity.
            # For example, the relu function clamps all negative values to zero
            self.neuron_row_array = self.activation_func(self.neuron_row_array)

        # Transpose the neuron array and save for learn()
        self.neuron_col_array = self.neuron_row_array.T

    # record values for plotting
    for i in range(self.num_neurons):
        self.neuron_history_mat[i].append(self.neuron_row_array[0][i])

We start to see how intimately the layers are connected in this method. We look to the target and source layers to adjust our behaviors and set values.

Since this is the top layer, we have no source. That means that record our weights for later plotting and we’re done. The layer below us will set its neurons based this layer’s weights and neurons, as handled in this line:

self.neuron_row_array = np.dot(src.neuron_row_array, src.weight_row_mat)

This is just the first step. If we’re not the bottom layer, we have to see if the neuron values make it past the activation function that we set in simple_nn.py:

# activation function: sets all negative numbers to zero
# Otherwise returns x
def relu(x: np.array) -> np.array :
    return (x > 0) * x

This is done with these lines:

# No activation function to output layer
if(self.target != None):
    # Adjust the values based on the activation function. This introduces nonlinearity.
    # For example, the relu function clamps all negative values to zero
    self.neuron_row_array = self.activation_func(self.neuron_row_array)

By running these same methods on each successive layer object, the streetlight values are slowly, and nonlinearly (in multi-layer networks, which is critical) modified to produce a single output. Unfortunately, that output is guaranteed to be wrong, since it’s based on multiplying the input values by a bunch of random values that we set up each layer with.

Time to fix that.

Learning

Back in simple_nn.py, between the train() and the learn() loops is this line:

delta = output.calc_delta(walk_vs_stop_array[sample_index])

The delta saves out the error for the plotting. The function sets up the values for the learning step:

def calc_delta(self, goal: np.array) -> float:
    self.delta = goal - self.neuron_row_array
    return self.delta

self.delta is a numpy array that stores the difference between the goal(s) and the current value. In this case, there is only one value, but this also works with multiple values. That’s another trick that gets used in training networks. For example, in handling the CIFAR images, there is an output neuron for each category (e.g. horse, automobile, truck, ship, etc.). In out toy example and in the CIFAR case, the goal is a one or zero in the output neuron(s). The delta is the difference between the computed value and the goal. That delta is what we will now backpropagate through the layers, from back to front. And that’s the learning process.

In learning, the basic goal is to adjust the weights that set this layer’s neurons (in this implementation, the source layer). This is done by backpropagating the error delta from this layer to the source layer. Since we only want to adjust the weights that participated in the training, we need to take the derivative of the activation function in train(). Again, the weight matrix is simply the source neurons times this layer’s neurons. For example, if the source layer had three neurons and this layer had four, then the (source) weight matrix would be 3*4 = 12 weights. The whole method is shown below.

def learn(self, alpha):
    if self.source != None:
        src = self.source
        delta_scalar = np.dot(self.delta, src.weight_col_mat)
        delta_threshold = self.derivative_func(src.neuron_row_array)
        src.delta = delta_scalar * delta_threshold
        mat = np.dot(src.neuron_col_array, self.delta)
        src.weight_row_mat += alpha * mat
        src.weight_col_mat = src.weight_row_mat.T

There’s a lot going on here, so let’s go through it slowly:

def learn(self, alpha):
    # if there is a layer above us
    if self.source != None:
        src = self.source

Since weights exist between neurons, we once more have the intimate relationship between this layer’s neurons and the layer above this layer. If there is no layer above us, there is literally nothing to do, which is why this test is first.

delta_scalar = np.dot(self.delta, src.weight_col_mat)

Next, we calculate the error delta scalar array, which is the amount the source layer needs to change (set initially in the output layer’s  calc_delta(), then rippled up through the layers), multiplied across the weights used to set this layer’s neurons (in the source).

delta_threshold = self.derivative_func(src.neuron_row_array)

In the train() process, we distributed the values in a non-linear way – any neuron value below zero was not distributed. (the relu() function from simple_nn.py). That process needs to be mirrored in the backpropagation process. There is always a matched pair of methods that make the core of a neural network – the activation function, and the derivative function.

src.delta = delta_scalar * delta_threshold

This is where the actual change for the source layer is calculated. It’s the product of the delta_scalar and the delta_threshold that we’ve just calculated. This is where the decision process of the derivitive_func() is scaled to the desired amount (the alpha value that we pass in from simple_nn.py). This value will be used when the learn() method is called for the source layer. Like I said, layers are intimately connected.

We now take the self.delta that was calculated in our target layer’s learn() method, and use it to adjust the weights in the source layer that will be used to set our neuron’s values on the next train() pass.

mat = np.dot(src.neuron_col_array, self.delta)
src.weight_row_mat += alpha * mat
src.weight_col_mat = src.weight_row_mat.T

This matrix (mat) contains the adjustments for the source layer’s weights. We want to add a fraction of these (or we won’t converge) values, so we multiply by alpha. The last step is simply making the transpose of the weight matrix.

And that’s pretty much the guts of this implementation. The important things to remember are:

  • Input and output layers are special cases. The neurons are explicitly set in the input layer and there is no activation or derivative function applied to the output neurons (no, I don’t know why yet. When I figure that out, I’ll explain why here)
  • In training, the current layer’s neuron’s values are set by multiplying the source neurons by the source weights.
  • In learning, the source layer’s weights are adjusted by the current layer’s deltas, but thresholded by the derivative of the source layer’s neurons

This is pretty complicated, and I’ve split out the steps so that it’s possible to step through the running code in the debugger and see what’s going on with the values. But that only gives a level of insight at a single step. how can we show the global behavior of a layer?

Graphing

We are going to graph both the changing value of the neurons and the evolving weights. The neurons are an easier problem so we’ll start there:

def plot_neuron_matrix(self, fig_num: int):
    title = "{} neuron history".format(self.name)
    plt.figure(fig_num)
    np_mat = np.array(self.neuron_history_mat)

    plt.plot(np_mat.T, '-o', linestyle=' ', ms=2)

    names = []
    for i in range(self.num_neurons):
        names.append("neuron {}".format(i))
    plt.legend(names)
    plt.title(title)

This method simply takes the history matrix (where we had a column for each time sample), turns it into a numpy array for easier manipulation and plotting, and plots the transpose (where each neuron’s history is a row). Because the neuron’s values change for each sample, the history of how they converge towards the final values doesn’t show up well with lines, so I set the drawing arguments to points ‘-o’, no line ‘ ‘ , with a point size of 2 pixels (ms=2):

Figure 6: Neuron Histories by layer (click to embiggen)

In the input layer, we see that the neurons are either one or zero, just as we set them. These values are then multiplied by the (initially random) weights and further adjusted by the activation function. Those values ripple through the hidden layer, where they are initially random overthe (0, 1) interval where they are then used to adjust the output neurons (which does not involve an activation function). Over time, you can see the system settle into a state where all neurons are either one or zero, depending on the inputs. So how do the weights achieve this?

Even in this toy system, there are still a lot of weights to keep track of, and I’m still working on a way of visualizing the process. I’m visualizing the weights instead of the neurons, because the weights are the “factors in the equation” that manipulate the “x” values to get a “y”. On other words, I’m watching how the “m” and “b” converge on their values in “y = mx + b”, rather than looking at a particular “x” or “y”.

The method that does this is plot_weight_matrix(), which assembles a chart for each set of weights and is called at the end of simple_nn.py:

def plot_weight_matrix(self, fig_num: int):
    var_name = "weight"
    title = "{} to {} {}".format(self.name, self.target.name, var_name)
    plt.figure(fig_num)
    np_mat = np.array(self.weight_history_mat)

    i = 0
    for row in np_mat.T:
        cstr = "C{}".format(i % self.num_neurons)
        plt.plot(row, linewidth = int(i / self.num_neurons)+1, color=cstr)
        i += 1

    names = []
    num_weights = self.num_neurons * self.target.num_neurons
    for i in range(num_weights):
        src_n = i % self.num_neurons
        targ_n = int(i/self.num_neurons)
        names.append("{} s:t[{}:{}]".format(var_name, src_n, targ_n))
    plt.legend(names)
    plt.title(title)

One of the reasons that I really like OO programming is that so much useful data is associated with the object. You don’t have to go looking for it, or scope things in peculiar ways. As a result, for example, generating the title is simply assembling some strings that I already have lying around.

title = "{} to {} {}".format(self.name, self.target.name, var_name)

The next important step is to get the data that we’ve been assembling in train() into a form that the plotting library likes. The data has been assembled in an list of lists, where each individual list is a snapshot of the weights at one step in the training process. I do it this way because of two reasons:

  1. I don’t know how many steps this process is going to take, and python lists handle dynamic memory allocation nicely.
  2. The weight matrix is a 2D NumPy array, and dealing with a series of matricies is something that PyPlot has no idea how to handle.

Here’s the line from train():

if(self.target != None):
    self.plot_mat.append(self.nparray_to_list(self.weight_row_mat))

PyPlot doesn’t really like to handle lists of lists, but it does know how to handle one big NumPy array, so we convert the list of lists to a matrix where the rows are the weights, and the columns are the timesteps:

np_mat = np.array(self.plot_mat)

At this point we could simply plot everything:

plt.plot(np_mat)

That produces a pretty chart:

input_to_hidden

Figure 7: First pass at drawing a lot of weights

But it’s pretty confusing. There are a lot of lines.  So I did two related things. I set line thickness to be a function of which target neuron and the color of the line to be a function of which source neuron. Using the same scheme, I built the a legend to indicate the source and target neurons that identify each weight, using the coordinates of the matrix – basically treating it as an adjacency matrix.

i = 0
for row in np_mat.T:
    cstr = "C{}".format(i % self.num_neurons)
    plt.plot(row, linewidth = int(i / self.num_neurons)+1, color=cstr)
    i += 1

names = []
num_weights = self.num_neurons * self.target.num_neurons
for i in range(num_weights):
    src_n = i % self.num_neurons
    targ_n = int(i/self.num_neurons)
    names.append("{} s:t[{}:{}]".format(var_name, src_n, targ_n))
plt.legend(names)
plt.title(title)

And that gives a chart that lets us examine what’s going on. All the blue lines are the weights that adjust the value coming from source neuron one, distributed over target neurons [0, 1, 2, 3]. All the thin lines are all the weights that set the value of target neuron one from source neurons [0, 1, 2]:

input2hidden

Figure 8: Weights between input and hidden layers

We now have a way to visualize the whole process inside the layers. Let’s see if we can learn anything by looking at how the neurons and weights coevolve over time.

Some final thoughts

I think the fundamental lesson here is one of gradient descent (or hill climbing if you prefer) from a random initial state to a stable set of values that will set the variables in a function. Once those values are found, the function can do it’s job, which in this case is taking a set of observations – ([ 1, 0, 1 ], [ 0, 1, 1 ], [ 0, 0, 1 ], [ 1, 1, 1 ]) and transforming them to a different set of values – ([1], [1], [0], [0]).

Figure 9: Weights influencing neurons

This is at its core stochastic, a mechanism for harnessing randomness by using rules. The weights and neurons exist in a constrained, multidimensional space. Much of this is fixed before a single iteration – the number of neurons and how they are arranged. The types of connections (activation and derivative functions). The initial value of the weights. Even the manner of input and the “fitness test” that determines the error that is measured. Within these constraints, the weights move slowly under multiple influences until they settle into places that they are no longer forced to move. That’s it.

Variations in this system can be used for all kinds of things, ranging from image recognition to generating words, but the basic process is always the same.

I hope this helped you to read as much it helped me to write!

Full code listings

For the most current versions, please use the GitHub repo, but these are up to date as of January 10, 2019

simple_nn.py

'''
Copyright 2019 Philip Feldman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 
associated documentation files (the "Software"), to deal in the Software without restriction, including 
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the 
following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial 
portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT 
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR 
THE USE OR OTHER DEALINGS IN THE SOFTWARE.
'''

# based on https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter6%20-%20Intro%20to%20Backpropagation%20-%20Building%20Your%20First%20DEEP%20Neural%20Network.ipynb
import numpy as np
import matplotlib.pyplot as plt
import src.SimpleLayer as sl

# Methods ---------------------------------------------
# activation function: sets all negative numbers to zero
# Otherwise returns x
def relu(x: np.array) -> np.array :
    return (x > 0) * x

# This is the derivative of the above relu function, since the derivative of 1x is 1
def relu2deriv(output: np.array) -> np.array:
    return 1.0*(output > 0) # returns 1 for input > 0
    # return 0 otherwise

# create a layer
def create_layer(layer_name: str, neuron_count: int, target: sl.SimpleLayer = None) -> 'SimpleLayer':
    layer = sl.SimpleLayer(layer_name, neuron_count, relu, relu2deriv, target)
    layer_array.append(layer)
    return layer

# variables ------------------------------------------
np.random.seed(0)
alpha = 0.2
# the samples. Columns are the things we're sampling, rows are the samples
streetlights_array = np.array( [[ 1, 0, 1 ],
                                [ 0, 1, 1 ],
                                [ 0, 0, 1 ],
                                [ 1, 1, 1 ]])
num_streetlights = len(streetlights_array[0])
num_samples = len(streetlights_array)

# The data set we want to map to. Each entry in the array matches the corresponding streetlights_array row
walk_vs_stop_array = np.array([[1],
                               [1],
                               [0],
                               [0]])

# set up the dictionary that will store the numpy weight matrices
layer_array = []

error_plot_mat = [] # for drawing plots

#set up the layers from last to first, so that there is a target layer
output = create_layer("output", 1)
output.set_neurons([1])
''' # If we want to have four layers (two hidden), use this and comment out the other hidden code below
hidden2 = create_layer("hidden2", 2, output)
hidden2.set_neurons([1, 2])
hidden = create_layer("hidden", 4, hidden2)
hidden.set_neurons([1, 2, 3, 4])
'''
# If we want to have three layers (one hidden), use this and comment out the other hidden code above
hidden = create_layer("hidden", 4, output)
hidden.set_neurons([1, 2, 3, 4])

input = create_layer("input", 3, hidden)
input.set_neurons([1, 2, 3])

for layer in reversed(layer_array):
    print ("--------------")
    print (layer.to_string())

iter = 0
max_iter = 1000
epsilon = 0.001
error = 2 * epsilon
while error > epsilon:
    error = 0
    sample_error_array = []
    for sample_index in range(num_samples):
        input.set_neurons(streetlights_array[sample_index])
        for layer in reversed(layer_array):
            layer.train()

        delta = output.calc_delta(walk_vs_stop_array[sample_index])
        sample_error = np.sum(delta ** 2)
        error += sample_error

        for layer in layer_array:
            layer.learn(alpha)

        # Gather data for the plots
        sample_error_array.append(sample_error)
        # print("{}.{} Error = {:.5f}".format(iter, sample_index, sample_error))

    error /= num_samples
    sample_error_array.append(error)
    error_plot_mat.append(sample_error_array)
    if (iter % 10) == 0 :
        print("{} Error = {:.5f}".format(iter, error))
    iter += 1
    # stop even if we don't converge
    if iter > max_iter:
        break

print("\n--------------evaluation")
for sample_index in range(len(streetlights_array)):
    input.set_neurons(streetlights_array[sample_index])
    for layer in reversed(layer_array):
        layer.train()
    prediction = float(output.neuron_row_array)
    observed = float(walk_vs_stop_array[sample_index])
    accuracy = 1.0 - abs(prediction - observed)
    print("sample {} - input: {} = pred: {:.3f} vs. actual:{} ({:.2f}% accuracy)".
          format(sample_index, input.neuron_row_array, prediction, observed, accuracy*100.0))

# plots ----------------------------------------------
fig_num = 1
f1 = plt.figure(fig_num)
plt.plot(error_plot_mat)
plt.title("error")
names = []
for i in range(num_samples):
        names.append("sample_{}".format(i))
names.append("average")
plt.legend(names)

for layer in reversed(layer_array):
    if layer.target != None:
        fig_num += 1
        layer.plot_weight_matrix(fig_num)

for layer in reversed(layer_array):
    fig_num += 1
    layer.plot_neuron_matrix(fig_num)

plt.show()

SimpleLayer.py

'''
Copyright 2019 Philip Feldman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 
associated documentation files (the "Software"), to deal in the Software without restriction, including 
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the 
following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial 
portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT 
LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO 
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR 
THE USE OR OTHER DEALINGS IN THE SOFTWARE.
'''
# based on https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter6%20-%20Intro%20to%20Backpropagation%20-%20Building%20Your%20First%20DEEP%20Neural%20Network.ipynb
import numpy as np
import matplotlib.pyplot as plt
import types
import typing

# methods --------------------------------------------
class SimpleLayer:
    name = "unset"
    neuron_row_array = None
    neuron_col_array = None
    weight_row_mat = None
    weight_col_mat = None
    weight_history_mat = [] # for drawing plots
    neuron_history_mat = []
    num_neurons = 0
    delta = 0 # the amount to move the source layer
    target = None
    source = None
    activation_func = None
    derivative_func = None

    # set up the layer with the number of neurons, the next layer in the sequence, and the activation/backprop functions
    def __init__(self, name, num_neurons: int, activation_ptr: types.FunctionType, deriv_ptr: types.FunctionType, target: 'SimpleLayer' = None):
        self.reset()
        self.activation_func = activation_ptr
        self.derivative_func = deriv_ptr
        self.name = name
        self.num_neurons = num_neurons
        self.neuron_row_array = np.zeros((1, num_neurons))
        self.neuron_col_array = np.zeros((num_neurons, 1))
        # We only have weights if there is another layer below us
        for i in range(num_neurons):
            self.neuron_history_mat.append([])
        if(target != None):
            self.target = target
            target.source = self
            self.weight_row_mat = 2 * np.random.random((num_neurons, target.num_neurons)) - 1
            self.weight_col_mat = self.weight_row_mat.T

    def reset(self):
        self.name = "unset"
        self.target = None
        self.neuron_row_array = None
        self.neuron_col_array = None
        self.weight_row_mat = None
        self.weight_col_mat = None
        self.weight_history_mat = [] # for drawing plots
        self. neuron_history_mat = []
        self.num_neurons = 0
        self.delta = 0 # the amount to move the source layer
        self.target = None
        self.source = None

    # Fill neurons with values
    def set_neurons(self, val_list: typing.List):
        # print("cur = {}, input = {}".format(self.neuron_array, val_list))
        for i in range(0, len(val_list)):
            self.neuron_row_array[0][i] = val_list[i]
        self.neuron_col_array = self.neuron_row_array.T

    def get_plot_mat(self) -> typing.List:
        return self.weight_history_mat

    # In training, the basic goal is to set a value for the layer's neurons, based on the weights in the source layer mediated by an activation function.
    # This matrix is simply the source neurons times this layer's neurons. For example, if the source layer had three neurons and this layer had four, then
    # the (source) weight matrix would be 3*4 = 12 weights.
    def train(self):
        # if not the bottom layer, we can record values for plotting
        if(self.target != None):
            self.weight_history_mat.append(self.nparray_to_list(self.weight_row_mat))

        # if we're not the top layer, propagate weights
        if self.source != None:
            src = self.source
            # set our neuron values as the dot product of the source neurons, and the source weights
            self.neuron_row_array = np.dot(src.neuron_row_array, src.weight_row_mat)

            # No activation function to output layer
            if(self.target != None):
                # Adjust the values based on the activation function. This introduces nonlinearity.
                # For example, the relu function clamps all negative values to zero
                self.neuron_row_array = self.activation_func(self.neuron_row_array)

            # Transpose the neuron array and save for learn()
            self.neuron_col_array = self.neuron_row_array.T

        # record values for plotting
        for i in range(self.num_neurons):
            self.neuron_history_mat[i].append(self.neuron_row_array[0][i])


    # In learning, the basic goal is to adjust the weights that set this layer's neurons (in this implementation, the source layer). This is done
    # by backpropagating the error delta from this layer to the source layer. Since we only want to adjust the weights that participated in the
    # training, we need to take the derivative of the activation function in train(). Again, the weight matrix is simply the source neurons times
    # this layer's neurons. For example, if the source layer had three neurons and this layer had four, then the (source) weight matrix would be 3*4 = 12 weights.
    def learn(self, alpha):
        # if there is a layer above us
        if self.source != None:
            src = self.source

            # calculate the error delta scalar array, which is the amount this layer needs to change,
            # multiplied across the weights used to set this layer (in the source)
            delta_scalar = np.dot(self.delta, src.weight_col_mat)

            # determine the backpropagation distribution. In the case of Relu, it's just one or zero
            delta_threshold = self.derivative_func(src.neuron_row_array)

            # set the amount the source layer needs to change, based on this layer's delta distributed over the source
            # neurons
            src.delta = delta_scalar * delta_threshold

            # create the weight adjustment matrix by taking the dot product of the source layer's neurons (as columns) and the
            # scaled, thresholded  row of deltas based on this layer's error delta and the source's weight layer
            mat = np.dot(src.neuron_col_array, self.delta)

            # add some percentage of the weight adjustment matrix to the source weight matrix
            src.weight_row_mat += alpha * mat
            src.weight_col_mat = src.weight_row_mat.T

    # given one or more goals (that match the number of neurons in this layer), determine the delta that, when added to the
    # neurons, would reach that goal
    def calc_delta(self, goal: np.array) -> float:
        self.delta = goal - self.neuron_row_array
        return self.delta

    # helper function to turn a NumPy array to a Python list
    def nparray_to_list(self, vals: np.array) -> typing.List[float]:
        data = []
        for x in np.nditer(vals):
            data.append(float(x))
        return data

    def to_string(self):
        target_name = "no target"
        source_name = "no source"
        if self.target != None:
            target_name = self.target.name
        if self.source != None:
            source_name = self.source.name
        return "layer {}: \ntarget = {}\nsource = {}\nneurons (row) = {}\nweights (row) = \n{}".format(self.name, target_name, source_name, self.neuron_row_array, self.weight_row_mat)

    # create a line chart of the plot matrix that we've been building
    def plot_weight_matrix(self, fig_num: int):
        var_name = "weight"
        title = "{} to {} {}".format(self.name, self.target.name, var_name)
        plt.figure(fig_num)
        np_mat = np.array(self.weight_history_mat)

        i = 0
        for row in np_mat.T:
            cstr = "C{}".format(i % self.num_neurons)
            plt.plot(row, linewidth = int(i / self.num_neurons)+1, color=cstr)
            i += 1

        names = []
        num_weights = self.num_neurons * self.target.num_neurons
        for i in range(num_weights):
            src_n = i % self.num_neurons
            targ_n = int(i/self.num_neurons)
            names.append("{} s:t[{}:{}]".format(var_name, src_n, targ_n))
        plt.legend(names)
        plt.title(title)

    def plot_neuron_matrix(self, fig_num: int):
        title = "{} neuron history".format(self.name)
        plt.figure(fig_num)
        np_mat = np.array(self.neuron_history_mat)

        plt.plot(np_mat.T, '-o', linestyle=' ', ms=2)

        names = []
        for i in range(self.num_neurons):
            names.append("neuron {}".format(i))
        plt.legend(names)
        plt.title(title)

Sick echo chambers

Over the past year, I’ve been building a model that lets me look at how opinions evolve in belief space, much in the manner that flocks, herds and schools emerge in the wild.

CI_GP_Poster4a

Recently, I was Listening to BBC Business Daily this morning on Facebook vs Democracy:

  • Presenter Ed Butler hears a range of voices raising concern about the existential threat that social media could pose to democracy, including Ukrainian government official Dmytro Shymkiv, journalist Berit Anderson, tech investor Roger McNamee and internet pioneer Larry Smarr.

Roger McNamee and Larry Smarr in particular note how social media can be used to increase polarization based on emergent poles. In other words, “normal” opposing views can be amplified by attentive bad actors [page 24] with an eye towards causing generalized societal disruption.

My model explores emergent group interactions and I wondered if this adversarial herding in information space as it might work in my model.

These are the rough rules I started with:

  • Herders can teleport, since they are not emotionally invested in their belief space position and orientation
  • Herders appear like multiple individuals that may seem close and trustworthy, but they are actually a distant monolithic entity that is aware of a much larger belief space.
  • Herders amplify arbitrary pre-existing positions. The insight is that they are not herding in a direction, but to increase polarization
  • To add this to the model, I needed to do the following:
    • Make the size of the agent a function of the weight so we can see what’s going on
    • When in ‘herding mode’ the overall heading of the population is calculated, and the agent that is closest to that heading is selected to be amplified by our trolls/bot army.
    • The weight is increased to X, and the radius is increased to Y.
      • X represents AMPLIFICATION BY trolls, bots, etc.
      • A large Y means that the bots can swamp other, normally closer signals. This models the effect of a monolithic entity controlling thousands of bots across the belief space

Here’s a screenshot of the running simulation. There is an additional set of controls at the upper left that allow herding to be enables, and the weight of the influence to be set. In this case, the herding weight is 10. Though the screenshot shows one large agent shape, the amplified shape flits from agent to agent, always keeping closest to the average heading.

2017-10-28

The results are kind of scary. If I set the weight of the herder to 15, I can change the change the flocking behavior of the default to echo chamber.

  • Normal: No Herding
  • Herding weight set to 15, other options the same: HerdingWeight15

I did some additional tweaking to see if having highly-weighted herders ignore each other (they would be coordinated through C&C) would have any effect. It doesn’t. There is enough interaction through the regular populations to keep the alignment space reduced.

It looks like there is a ‘sick echo chamber’ pattern. If the borders are reflective, and the herding weight + influence radius is great enough, then a wall-hugging pattern will emerge.

The influence weight is sort of a credibility score. An agent that has a lot of followers, or says a lot of the things that I agree with has a lot of influence weight The range weight is reach.

Since a troll farm or botnet can be regarded as a single organization,  interacting with any one of the agents is really interacting with the root entity.  So a herding agent has high influence and high reach. The high reach explains the border hugging behavior.

It’s like there’s someone at the back of the stampede yelling YOUR’E GOING THE RIGHT WAY! KEEP AT IT! And they never go off the cliff because they are a swarm Or, it never goes of the cliff, because it manifests as a swarm.

A loud, distributed voice pointing in a bad direction means wall hugging. Note that there is some kind of floating point error that lets wall huggers creep off the edge.Edgecrawling

With a respawn border, we get the situation where the overall heading of the flock doesn’t change even as it gets destroyed as it goes over the border. Again, since the herding algorithm is looking at the overall population, it never crosses the border but influences all the respawned agents to head towards the same edge: DirectionPreserving

Who’d have thought that there could be something worse than runaway polarization?

Trustworthy News Model Assumptions

Modifications

  • 12.13.16: Initial post
  • 12:16:16: Added reference to proposal and explicitly discussed explorer and exploiter types.

A web version of my Google Docs dissertation proposal is here. Blame them for the formatting issues. The section this is building on is Section 5.3.1. A standalone description of this task is here.

The first part of my dissertation work is to develop an agent-based simulation that exhibits information bubble/antibubble behavior. Using Sen and Chakrabarti’s Sociophysics as my guide, I’m working up the specifics of the model. My framework is an application (JavaFX, because that’s what I’m using at work these days). It’s basically an empty framework with a trivial model that allows clustering based on similar attributes such as color: strawmanapp

Going forward, I need to clarify and defend the model, so I’m going to be listing the components here.

Agent assumptions

  • Agents get their information from global sources (news media). They have equal access, but visibility is restricted
    • Agents are Explorers or Exploiters (Which may be made up of Confirmers and Avoiders)
    • Agents have ‘budgets’ that they can allocate
    • Finding sources has a cost. Sources from the social network has a lower cost to access
    • Keeping a source is cheaper than getting a new one
    • For explorers, the cost of getting a new source is lower than an exploiter.
    • The ‘belief’ as a set of ‘statements’ appears to be valid
    • The collection of statements and the associated values create a position in an n-dimensional hilbert space of information. Position and velocity should be calculable.
    • Start at one dimension to reproduce prior opinion models

Network assumptions

  • There are two items that we are looking for.
    • The first is the network configuration over time. What nodes do agents connect to for their information.
    • The second is the content of that information. For that, we’ll probably need some dimensionality reduction, such as NMF (look for a post on implementing this later). This is where we look for echo chambers of information, as opposed to the agents participating in them
  • Adjustable to include scale-free, small world, and null configurations
  • What about loops? Feedback could be interesting, since a small group that is semi-isolated could form into a very loud bubble that could lower the cost of finding information. So a notion of volume might be needed that emerges from a set of agreeing agents. This could be attraction, though I think I like an economic approach more?
  • There is also a ‘freezing’ issue, where a stable state is reached where two cliques containing different states are lightly connected, but not enough that the neighbors in one clique can be convinced to change their opinion [Fig. 6.2, pg 135]

Measures

  • Residual Energy: The difference between the actual energy and the known energy of the perfectly-ordered ground state (full consensus).
  • Deviation from null network.
  • Clustering as per community detection (Girard et. al)

Implementation details

  • Able to be run multiple times with the same configuration but different seed
  • Outputs to… something. MySql or Excel probably
  • Visualization using t-SNE? Description plus Java implementation is here: https://lvdmaaten.github.io/tsne/

More to come as the model fleshes out.

 

JavaScript’s Gulf of Evaluation and Gulf of Execution

<rant>

It came to me yesterday why JavaScript development is such slow going for me. I think it’s because it feels like an OO language, but it’s really not. Particularly when using TypeScript, there are classes, ‘this’, inheritance, etc. The thing is that all these ‘artifacts’, for lack of a better term have been added to the language and aren’t there natively. That means that often, things that would behave as I expect them in a more ‘native’ OO language like Java (or Actionscript for that matter) don’t behave intuitively in JS. This means more time in the debugger saying things like “why isn’t that working?”.

Don Norman describes this as the Gulf of Evaluation and the Gulf of Execution. His canonical example is the settings for his freezer. Unless you know what the controls are affecting behind the panel, the odds of getting the temperature correct in the setup he describes are essentially random.

Now, consider probably the most canonical of the JS quirks, the behavior of this. It has scope only in the current function. Nest a function within a function and the scope of the parent ‘object’ doesn’t exist in the parent’s function. This is because in reality, there is no parent object, just a hierarchy of functions. But we have closure, so by setting ‘self = this’ in the parent, we get an approximation of the desired behavior. And now with fat arrow notation, this can indeed be nested (but Typescript won’t let you inherit a fat arrow method). But you have to know that, just like the mechanism behind the panel of Don Norman’s freezer.

The most recent thing that I had to deal with was the assembly of a hierarchical data provider object from a set of database calls. The structure is pretty straightforward:

dataProvider = {
    items:{
        type1:{
            item1:{...},
            item2:{...}
        },
        type2{
            item1:{...},
            item2:{...}
        },
        ....
    },
    associations:{[{...},{...},{...},...]}
}

I really thought I should be able declare these items on the fly, something like:

dataProvider = {};
dataProvider['items']['type']['item1'] = {...};
dataProvider['items']['type']['item2'] = {...};
dataProvider['associations'] = [];
dataProvider['associations'].push({...});
etc.

Instead, the ‘item1’ object gets created, but the [‘items’] object does not. I expected construction of the object to chain from tail to head like it does with execution (e.g. foo.bar().baz().etc). Instead I get a null object error, and a “why isn’t that working?” moment.

In the end, I had to write some code that created the original object and then before each item was added, to make sure that the appropriate property existed, otherwise create it and add it. So, instead of an hour or so of casual coding, this simple task mushroomed into an afternoon’s worth of careful development and testing.

Interestingly, I had to pick up PHP after a long absence, and for me at least, it behaves the way I expect OO languages to behave, with very consistent quirks that you have to learn once – I’m looking at you __construct()

So that’s why JS development takes too freakin’ long for my calcified OOP brain. But it’s also about why we should also be very careful when we try to make something look like something it’s not.

</rant>

Gotchas or Special Cases?

I’ve now been working in AngularTypeScript for a while, a few things have cropped up that are probably worth mentioning. Some things are just for clarity, others are because they had me confused for a while.

So without further adieu, my laundry list:

Structuring the main angular app class

I’ve decided that I like using the constructor, rather than instantiating the object and then calling a method. Mostly this is because in TypeScript, the arguments to the constructor aren’t defined in interfaces so it can vary naturally, and because it’s a bit less typing for the same result. Here’s the app I’m currently working on:

module AngularApp {
   // define how this application assembles.

   class QueryMain {
        serviceModule:ng.IModule;
        appModule:ng.IModule;

        constructor(angular:ng.IAngularStatic,
                        queryServicePtr:Function,
                        queryDirectivePtr:Function,
                        glNetDirectivePtr:Function,
                        infoDialogDirectivePtr:Function,
                        queryControllerPtr:Function){

            this.serviceModule = angular.module('phpConnection', []);
            this.serviceModule.service('QueryService', ['$http', queryServicePtr]);

            this.appModule = angular.module('rssApp', ['phpConnection', 'ngSanitize']);
            this.appModule.controller('MainCtrl', ['QueryService', '$timeout','$rootScope', queryControllerPtr]);
            this.appModule.directive('ngFeedPanel', ['$timeout','$rootScope', queryDirectivePtr]);
            this.appModule.directive('ngInfoDialog', ['$timeout','$rootScope', infoDialogDirectivePtr]);
            this.appModule.directive('ngNetworkWebgl', ['$timeout','$rootScope', glNetDirectivePtr]);
        }
   }


   // instantiate Angular with the components defined in the other files. Note
    // that weird things may happen with transclusion?

    new QueryMain(
        angular,
        PhpConnectionService.UrlAccess,
        new RssAppDirectives.ngFeedPanel().ctor,
        new WGLA2_dirtv.ngNetworkWebgl().ctor,
        new WGLA2_dirtv.ngInfoDialog().ctor,
        RssControllersModule.RssController
    )
}

There are really three patterns to note here. The first is that everything is that everything that the class QueryMain uses is passed in. No globals. And going along with that, please note that directives (and factories) have to be instantiated, so we pass in a pointer to a ctor() function, rather than the class as a whole. The last thing to mention is that there is no chaining. The modules are declared and then the components are added on individually rather than module().directive.controller() etc. The reason for this is that if you happen to declare, for example, a service that is needed by some later item. That service will not be visible if it’s chained with the declaration of the item. Line by line declarations appear to have more predictable results.

The wonderfulness of interfaces and a revisit of fatArrow = ():notation => {}

There seem to be times when it’s nicer to use fat arrow notation. In my experience, this pops up the most often in directives, though it can show up in promises as well. I’ve had some odd experiences where it seemed that ‘this’ doesn’t survive scope/closure changes into a called method. I can’t find any examples where I wasn’t able to make it work with conventional notation, but it’s worth covering anyway.

Since I’ve been working more with directives recently, we’ll use one of the directives used in the above code as an example. This one has had parts stripped out for clarity. By the way, as with every other class that is an Angular/Typescript component this extends my ATSBase class, which is covered here. :

export class ngFeedPanel extends NovettaUtils.ATSBase {
    private myDirective:ng.IDirective;
    private cobj:RssControllersModule.ICallbackPointers;

    constructor() {
        super();
    }
    private linkFn (scope:any, element:any, attrs:any) {
        this.cobj = scope.callbackObj;

        scope.nlpQuery = ():void => {
            var mobj:RssControllersModule.IDataResponse = scope.messageObj;
            this.cobj.setQueryString("");
            this.cobj.nlpQuery(mobj);
        };
    }

    public ctor(timeout:ng.ITimeoutService, rootscope:ng.IScope):ng.IDirective {

        if (!this.myDirective) {
            this.myDirective = {
                templateUrl: 'directives/rssFeed.html',
                restrict: 'AE',
                scope: {
                    messageObj: '=',
                    callbackObj: '='
                },
                link: this.linkFn
            }
        }
        return this.myDirective;
    }
}

If you’ve read any earlier posts, you’ll know that I like pointers. The ctor() method is used as a function pointer in the main app, and in turn the link: element of the ng.IDirective object points to the linkFn() method in this class.

Unlike interfaces for other languages, I’ve found interfaces most useful for handling the pattern of:

myObj = {
    thing1 : "thing1",
    thing2 : "thing2",
    thing3 : "thing3",
    thing4 : "thing4"
};

I hate those things. I’ll make some typo and not notice until the browser crashes in a test. And because there is often no context, I’ll look at the broken code and not see the problem (thingOne, not thing1, or something like that)

Typescript makes sure that doesn’t happen. This interface:

export interface ICallbackPointers{
    setQueryString : Function;
    addToQueryString : Function;
    rssQuery : Function;
    nlpQuery : Function;
}

Gets declared as a typed object:

export class RssController extends WGLA2_ctrl.Network3DCtrl{
   public callbacks:ICallbackPointers;
   ...

Instanced in the constructor (function pointers!):

this.callbacks = {
    setQueryString : this.setQueryString,
    addToQueryString : this.addToQueryString,
    rssQuery : this.googleNewsSubmit,
    nlpQuery : this.nlpQuerySubmit
};

Passed through the html:

<div class="resultsWrapper">
    <div ng-repeat="item in mc.itemDataArray track by $index">
        <ng-feed-panel message-obj="item" callback-obj="mc.callbacks"></ng-feed-panel>
    </div>
</div>

And used in the directive (also shown above as part of the linkFn()):

scope.nlpQuery = ():void => {
    var mobj:RssControllersModule.IDataResponse = scope.messageObj;
    this.cobj.setQueryString("");
    this.cobj.nlpQuery(mobj);
};

Never a chance for a mistake, but with all the power of ad-hoc object creation. Very cool.

Fat Arrow has been more mysterious. In the following I show two sets of code that use a service to get data from a source. In the fist example all the code is contained within a single class that extends ATSBase, which creates a fat arrow alias for each method. Nonetheless, without fat arrow, ‘this’ does not track back to the class:

public promiseCaller():void{
   this.promise = this.service.getQueries();
   this.promise.then(this.processData, this.errorData);
}

public processData = (data:any):void => {
   // 'this' is out of scope without fat arrow
   console.log("got data");
};

public errorData = (data:any):void => {
   // 'this' is out of scope without fat arrow
   alert("error getting data");
};

However, in the version shown below, ‘this’ is preservedwithin goodUserQuery() and errorResponse.

private goodUserQuery (response:any) {console.log("got data");}
private errorResponse (response:any) {alert("error getting data");}
public promiseCaller():void{
    this.queryService.submit(qstr, this.goodUserQuery, this.errorResponse);
}

It’s even preserved in the call to the queryService.submit call that takes place in a different service class, part of which is shown below:

public submit(query:string, goodResponse:any, errorResponse:any):any {
   return this.httpService(query).then(goodResponse, errorResponse);
}

So that’s odd.

The safe pattern appears to be that for small methods that I’m unlikely to extend, I’ll use fat arrow. Wrapping methods will extend just fine and will use the fat arrow functions inside. But I wouldn’t be able to extend the inner methods. Mostly, I think that’s fine and more likely to keep me out of trouble then accidentally typing ‘this’ when I should’ve typed ‘self’. So you basically have the choice:

scope.handleButtonPressFat = (strVal:string):void => {
   this.handleButtonPress(scope, strVal);
};
scope.handleButtonPress = function(strVal:string):void {
   self.handleButtonPress(scope, strVal);
};

But if you find yourself in a function that has been called out of a promise and closure isn’t working the way you think it should, try seeing if it can be fixed by using fat arrow.

TypeScript and AngularJS Unification?

Because of certain obscure reasons, AngularJS needs to do very particular things with “this”. Shortly after taking up TypeScript and trying to feed in Angular ‘objects’, I learned about how “Fat Arrow Notation” (FAN) allows for a clean interface with Angular. It’s covered in earlier posts, but the essence is

// Module that uses the angular controller, directive, factory and service defined above.
module AngularApp {
   // define how this application assembles.
   class AngularMain {
      serviceModule:ng.IModule;
      appModule:ng.IModule;

      public doCreate(angular:ng.IAngularStatic, tcontroller:Function, tservice:Function, tfactory:Function, tdirective:Function) {
         this.serviceModule = angular.module('globalsApp', [])
            .factory('GlobalsFactory', [tfactory])
            .service('GlobalsService', [tservice]);

         this.appModule = angular.module('simpleApp', ['globalsApp'])
            .controller('MainCtrl', ['GlobalsFactory', 'GlobalsService', '$timeout', tcontroller])
            .directive('testWidget', ['GlobalsService', tdirective]);
      }
   }
   // instantiate Angular with the components defined in the 'InheritApp' module above. Note that the directive and the factory
   // have to be instantiated before use.
   new AngularMain().doCreate(angular,
      InheritApp.TestController2,
      InheritApp.TestService,
      new InheritApp.TestFactory().ctor,
      new InheritApp.TestDirective().ctor);
}

Basically, the Angular parts that need to new() components (Controllers and Services) get the function pointer to the class, while components that depend on the object already being created (Directives and Factories) have a function pointer passed in that returns an object that in turn points to the innards of the class.

So I refactored all my webGL code to FAN and lo, all was good. I made good progress on building my shiny 3D charts.

Well, charts are pretty similar, so I wanted to take advantage of TypeScript’s inheritance, make a BaseChart class, which I would then extend to Area, Bar, Column, Scatter, etc. What I expected to be able to do was take a base method:

public fatArrowFunction = (arg:string):void => {
   alert("It's Parent fatArrowFunction("+arg+")");
};

And extend it:

public fatArrowFunction = (arg:string):void => {
   super.fatArrowFunction(arg)
   alert("It's Child fatArrowFunction("+arg+")");
};

“Um, no.”, said the compiler. “super() cannot be used with FAN”.

“WTF?” Said I.

It turns out that this is a known not-really-a-bug, that people who are combining Angular and TypeScript run into. After casting around for a bit, I found the fix as well:

class Base {
   constructor() {
      for (var p in this) {

         if (!Object.prototype.hasOwnProperty.call(this, p) && typeof this[p] == 'function') {
            var method = this[p];
            this[p] = () => {
               method.apply(this, arguments);
            };
            // (make a prototype method bound to the instance)
         }
      }
   }
}

Basically what this does is scan through the prototype list and set a bunch of fat arrow function pointers that point back to the prototype function. It seems that there are some people that complain, but as a developer who cut his teeth on C programming, I find function pointers kind of comforting. They become a kind of abbreviation of some big complex thing.

The problem is that the example doesn’t’ quite work, at least in the browsers I’m currently using (Chrome 41, IE 11, FF 36). Instead of pointing at their respective prototypes, all the pointers appear to reference the last item of the loop. And the behavior doesn’t show up well in debuggers. I had to print the contents of the function to see that the pointer named one thing was pointing at another. And this happened in a number of contexts. For example, this[fnName] = () => {this[‘__proto__’][fnName].apply(this, arguments);} gives the same problem.

After a few days of flailing and learning a lot, I went back to basics and tried setting the function pointers explicitly in the constructor of each class. It worked, and it wasn’t horrible. Then, and pretty much just for kicks, I added the base class back in with this method:

public setFunctionPointer(self:any, fnName:string):void{
   this[fnName] = function () {
      //console.log("calling ["+fnName+"]")
      return self['__proto__'][fnName].apply(self, arguments);
   };
}

And gave it a shot. And it worked! I was pleasantly surprised. And because I’m an eternal optimist, I added the loop back, but this time using the function call:

constructor() {
   var proto:Object = this["__proto__"];
   var methodName:string;

   for (var p in proto){
         methodName = p;
         if(methodName !== 'constructor'){
            this.setFunctionPointer(this, methodName);
         }
         //console.log("\t"+methodName+" ("+typeof proto[p]+")");
      }
}

And that, my droogs, worked.

I think it’s a prototype chaining issue, but I’m not sure how. In the non-working code, we’re basically setting this[fnName] = function () { this[fnName].apply(self, arguments)}. That should chain up to the prototype and work, but I don’t think it is. Rather, all the functions wind up chaining to the same place.

function Base() {
    var _this = this;
    for (var p in this) {
        if (!Object.prototype.hasOwnProperty.call(this, p) && typeof this[p] == 'function') {
            var method = this[p];
            this[p] = function () {
                method.apply(_this, arguments);
            };
        }
    }
}

On the other hand, look at the code generated when we use the function we get the following:

var ATSBase = (function () {
    function ATSBase() {
        var proto = this["__proto__"];
        var methodName;
        for (var p in proto) {
            methodName = p;
            if (methodName !== 'constructor') {
                this.setFunctionPointer(this, methodName);
            }
        }
    }
    
    ATSBase.prototype.setFunctionPointer = function (self, fnName) {
        this[fnName] = function () {
            //console.log("calling ["+fnName+"]")
            return self['__proto__'][fnName].apply(self, arguments);
        };
    };
    return ATSBase;
})();

Now, rather than starting at the root, the actual call is done in the prototype. I think this may cause the chain to start in the prototype object, but then again, looking at the code, I don’t see why that should be the case. One clear difference is the fact that in the first version, “this” can be in two closure states (this[p] = function (){method.apply(_this, arguments);};). So it could be closure is behaving in less than obvious ways.

Unfortunately, we are at the point in development where something works, so it’s time to move on. Maybe later after the codebase is more mature, I’ll come back and examine this further. You can explore a running version here.

Web Dev Jenga

Back in the distant past, some very smart people wrote a paper about the past, present and future of user interface software tools. In it they discuss the idea of a tool having a low threshold to learning and a high ceiling of capability. Inevitably, they say, we build tools that start with low threshold and slowly add capability until the high ceiling of capability is reached. Unfortunately, this (almost?) always means that the low threshold to learn is lost amid all the added complexity.

I have seen this happen with FORTRAN, C/C++, Java, JavaScript, and HTML. It’s a pain, but I think it’s inevitable. Interestingly, I think that if you keep things hard, they paradoxically stay simple. The difference between GL, OpenGL and WebGL is really not all that different. There was a big change with the introduction of shaders, but that’s one major shift in something like 20 years.

Now it’s happening to tools. It’s easy to write a quick tool that handles some aspect of development. If it has low threshold for learning and good utility, then it gets picked up and suddenly we have a new way of doing the same old thing. Maybe it’s better, but often it’s just different. The unfortunate result is now we have stacks of frameworks, languages and tools that we don’t understand well. The normal scenario is:

  • Have a confounding problem.
  • Ask Google/StackOverflow about it.
  • Try the responses that seem best until something works
  • Move on to the next confounding problem

As a professional developer, I only have so much time to drill down into things to obtain deep understanding. Many times, you have to trust. It’s faith-based coding, and it really reminds me of building a tower from Jenga blocks. We add and subtract things all the time. It’s a miracle that the thing stays up as often as they do.

My adventures with ‘thrangularJS’ has settled down to the point where I’m building reasonably complex pieces that need to be assembled in a particular order. The watcher in IntelliJ will compile TypeScript to JavaScript, but just in the context of that one file. If a change has been made in a TypeScript file, chances are that it will have to ramify through the project. This is one of those things that has to happen using compilers that doesn’t happen with interpreters.

So, I start to look at what the web development community is doing with dependency management these days. The answer seemed to be Grunt with a typescript task. This worked, but it compiled all the files even if only one needed to be changed. What I really wanted was a makefile. But the makefile needed to be triggered by a watcher. It turns out that there has been some thought on this, and it works in a clean way.

Since I’m on a windows box, I’m using GnuMake. It’s extremely stable, last updated in 2006 (when iPhones were introduced, I think). I’m also using Grunt, installed by npm. Not as stable as make, but it’s never done me wrong. Following the install of make, and Grunt the components have to be set up in the project directory:

npm init
npm install grunt --save-dev
npm install grunt-exec --save-dev
npm install grunt-contrib-watch --save-dev

Then we need a makefile. This is mine, and there are probably better ways to do it. But it’s clear (you can learn everything you ever wanted to know about make here):

CC = tsc 
CFLAGS=  --declaration --noImplicitAny --target ES5 --sourcemap

build: modules/AppMain.js

modules/AppMain.js : directives/WGLA2_directives.js controllers/WGLA1_controller.js \
   modules/AppMain.ts
   $(CC) $(CFLAGS) modules/AppMain.ts

classes/WebGlInterfaces.js : classes/WebGlInterfaces.ts
   $(CC) $(CFLAGS) classes/WebGlInterfaces.ts

classes/WebGlCanvasClasses.js : classes/WebGlInterfaces.js \
           classes/WebGlCanvasClasses.ts
   $(CC) $(CFLAGS) classes/WebGlCanvasClasses.ts

classes/WebGlComponentClasses.js : classes/WebGlInterfaces.js  \
   classes/WebGlComponentClasses.ts
   $(CC) $(CFLAGS) classes/WebGlComponentClasses.ts

controllers/WGLA1_controller.js : classes/WebGlInterfaces.js classes/WebGlComponentClasses.js classes/WebGlCanvasClasses.js \
   controllers/WGLA1_controller.ts
   $(CC) $(CFLAGS) controllers/WGLA1_controller.ts

directives/WGLA2_directives.js : classes/WebGlInterfaces.js classes/WebGlComponentClasses.js classes/WebGlCanvasClasses.js \
   directives/WGLA2_directives.ts
   $(CC) $(CFLAGS) directives/WGLA2_directives.ts

Last, we need a GruntFile.js to knit it all together:

module.exports = function (grunt) {
    //grunt.loadNpmTasks('grunt-ts');
    grunt.loadNpmTasks('grunt-contrib-watch');
    grunt.loadNpmTasks('grunt-exec');

    grunt.initConfig({
        pkg: grunt.file.readJSON('package.json'),

        exec: {
            make: {
                command: 'make build'
            }
        },
        watch: {
            files: ['**/*.ts', '!**/*.d.ts'],
            tasks:['exec:make'] //tasks: ['ts']
        }

    });

    grunt.registerTask('default', ['watch']);

}

And that’s it. Make is a little tricky to learn (high threshold), but has tremendous power and flexibility (high ceiling). Grunt is kept simple and obvious and can be swapped out easily if something better comes along. Other tasks can be added to make such as uglify, test, deploy, pretty much whatever you want. And it’s guaranteed to go in the right order.

It’s a good block to keep in your Jenga tower.