Updating Beliefs With Neural Samples

When people first thought of how to create intelligent machines, they first thought of Logic. Valid deduction from pre-existing facts was seen as what separates the brute from the human. As AI developed, it became clear that classical logic which is based on hard-coded rules is too brittle in the face of real-world problems. Many people converged on the idea that the brain must cope with Uncertainty. The language of probability theory was recognized as a better model for building intelligent systems that would show adaptation and robustness in coping with Reality.

Bayesian statistics is an interpretation of probability that permits the definition of probability as a Belief in the State of the World. Beliefs are not fixed, but can be updated in the face of real-world evidence. The Bayes theorem is a formula gives an optimal mathematical way for calculating new beliefs, which we can also call World Models or Hypotheses, given the probability that the model assigns to new data (likelihood) weighted by the model probability without seeing new data (prior), normalized by the probability of the data (partitition function, Z). The formula is:

This blog post is concerned with how to calculate this posterior using idealized spikes from a neuron under the assumption that the neuron fires with a Poisson distribution (a pretty good model for cortical neuron firing, though see "Efficient codes and balanced networks" by Deneve and Machens, Nature 2016, which argues that Poisson codes are not efficient). We will use an algorithm called Markov Chain Monte Carlo, more specifically the Metropolis-Hastings algorithm.

The computational bottleneck for calculating the posterior is the term in the denominator, which is the probability of data or evidence, also called the partition function (a term from statistical physics). This is a very high-dimensional integral. Calculating the probability of data involves integrating over all possible hypotheses that could have given rise to this data. Here we will use random samples represented as spikes to model a posterior that is Gaussian (just for the heck of it), but the current implementation of the algorithm can be applied to any distribution that we know up to a normalizing constant. It is possible to use this methodology for an arbitrary posterior with a very complicated shape, we are only calculating a constant with the algorithm. Check out the blog posts that this short post is based on https://towardsdatascience.com/markov-chain-monte-carlo-in-python-44f7e609be98 and https://am207.github.io/2017/wiki/metropolishastings.html . I took code from the former and reposted it below. It uses a fixed parametric function, but this is just a small demonstration, so it will suffice from now. 

We study a neuron firing with an exponential distribution describing the time until the next spike given that it has just spiked. This is what it looks like:




The red and blue are spikes and the exponential squiggles is the probability that the next spike will happen. At each spike you have a new exponential distribution from which you can sample the time of the next spike. This model is Markovian, meaning that the timing of the next spike depends only on the previous spike not on the history. This is an idealization as neurons exhibit learning and adaptation. The bottom figure is a function between two spike trains which needs to be integrated over to get a distance measure between two spike trains. We will not explore this aspect in this post. For an in depth discussion see "The Poisson and Exponential Distribution" (https://neurophysics.ucsd.edu/courses/physics_171/exponential.pdf) and Poisson Model of Spike Generation (http://www.cns.nyu.edu/~david/handouts/poisson.pdf).

I tried to modify the code in https://am207.github.io/2017/wiki/metropolishastings.html to make a sampling algorithm that uses an exponential distribution as a proposal distribution, but failed miserably. I don't understand the Metropolis-Hastings algorithm well enough. In this post, I would like to practice a little bit about thinking of the algorithm implemented in the AM-207 link. It uses a gamma distribution as a proposal distribution.

I would like to describe a neural implementation of the Metropolis-Hastings algorithm implemented in AM-207 link. By neural implementation, I mean something more like a neuromorphic implementation, not how biological neurons might implement it. Neuromorphic engineering is a discipline that takes inspiration from neural systems to make massively parallel and ultra-efficient electronic computing devices. I am inspired by the images in http://sandamirskaya.eu/ . This lady neuromorphic engineer makes fascinating systems. I got so inspired when I saw her diagrams. I started thinking in a completely new way. For another neuromorphic engineer see https://web.stanford.edu/group/brainsinsilicon/boahen.html

Anyway, the AM-207 link uses Gamma distribution as a proposal. The inter-spike interval distribution of neurons can be modelled with a Gamma distribution. So we can assume two neurons connected bi-directionally by two communication channels. It's the minimal circuit for implementing the algorithm. The first neuron generates spikes with a gamma inter-spike interval and also communicates the result of a comparison computation to the second neuron. The second neuron receives the spikes or samples. The second neuron has access to an un-normalized data distribution and its task is to integrate the information coming from the first neuron to approximate that data distribution. It decides whether to update a Markov chain state of the first neuron (by deciding whether to fire or send a message down its communication channel).



This is the algorithm in code:
def metropolis_hastings(p,q, qdraw, nsamp, xinit):
    samples=np.empty(nsamp)
    x_prev = xinit
    accepted=0
    for i in range(nsamp):
        x_star = qdraw(x_prev)
        p_star = p(x_star)
        p_prev = p(x_prev)
        pdfratio = p_star/p_prev
        proposalratio = q(x_prev, x_star)/q(x_star, x_prev)
        if np.random.uniform() < min(1, pdfratio*proposalratio):
            samples[i] = x_star
            x_prev = x_star
            accepted +=1
        else:#we always get a sample
            samples[i]= x_prev

qdraw samples an inter-spike interval in the first neuron. The proposalratio is also communicated to the second neuron. np.random.uniform()<min(1, pdfratio*proposalratio) is a threshold operation in the second neuron. pdfratio is calculated in the second neuron when it receives a sample from the first neuron. If it is satisfied the second neuron sends a message to the first neuron to update its inner state.

Thus, as the first neuron spikes e.g. emits random samples in time, the second neuron matches the samples with its internal distribution and some randomness and modulates the activity of the first neuron through a feedback connection.

I like Neuromorphic Engineering:-)

I should say that this post is inspired by a wonderful paper by people from Wolfgang Maass' group which talks a much more realistic model of neural Monte Carlo:-) ("Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons", Buesing et al, 2011, PLOS).

Also, keep in mind that real networks with which the brain computes look more like this:-)



Kommentaarid