Dynamic Weight Matrix Neural Network
I have a neural netwok that takes in a random noise vector and maps it explicitly to an image.
If i was to do this with one training example , it would memorize the example.
if i increase the number of examples, it starts to unlearn the previous examples in order to learn the new ones.
Think of the equation of a straight line. y = mx + c
where y is the image, x is the random noise vector, m is the weight matrix and c is the bias term
so with the first example we can change m the weight matrix, through training, till we fit x to y
if we took another training pair we would need to change m again and so unlearn the first example.
we will have to do it another way
thsi will involve having two weight matrices.
W and a soft copy of W called W2
W will b the template weight matrix. and W2 will be made by swapping around the indeces without affecting their memory allocation as this is a soft copy of W
What this will mean is that on the graph we have a family of straight lines, whose gradients (m) or weights are reflected the different permutations posible on W
we will have a line dedicated to each noise vector / image pair
and now we can learn a 'single' weight matrix that will work for all training pairs
becaseu they will each have their own line to approximate.
in order to dedicate a weight matrix to an input we will have it that the form of W2 depends on the particular numbers being entered. We therfore have a rule for moving from one Weight matrix form to the other. if for example the input at a neuron is 0.351 then we connect that neurons first connection to the 0th neuron in the next layer
its second connection to the 3th neuron
its third connection to the fiveth neuron in the next layer and so on
we now have a straight line for every possible noise vector
of course moving one line will move all of the others, so we need a plan
when we train each form of the weight matrix ,by GD, we adjust each straight line so that the data point is near it
we will also be averaging gradients between examples.
what this means is that when we have completed training , each training example will be the same distance from its line
this may not be optimal however
depending on which noise vectors mapped to which image, we could start of with some of the images being very far from their lines
when we move the lines averaging gradients we could end up with all the datapoints being a distance from their lines.
the key would be to add more depth to the network
that will make it non linear and we can move the curve closer to the datapoint by using the second and third layers an d even more
Comments
Post a Comment