I was up all night coding analyses for IBL open data API. I even got a simple application with MLFlow (a machine learning experiment tracking library) to work. I am working on Communication Subspaces, applying the methodology from this paper to IBL data https://pubmed.ncbi.nlm.nih.gov/30770252/ I built a data class with functions to convert raw spike times into the format used by the paper. Read on for more spontaneous thoughts and remarks on the process. Here's the repo with the code I am developing https://github.com/mariakesa/CommunicationSubspaces
So what's up with the data? Some observations: Simultaneous recordings are assigned experimental ID's (eids). You can query the data. Brain areas are assigned numerical codes which can be translated into abbreviations. There are alot of brain area pairs to analyze-- check out the binarized co-measurement matrix across brain areas:
I am fairly certain I want to make a StreamLit app out of my analyses and I was going to make a heatmap of brain area pairs which you could click to select regression analyses between two brain areas, but as you can see, the data is too vast. I have to think of a way to make the co-occurrence matrix more navigable. The question is can I use NMF on the co-occ matrix, for example?
Communication Subspace paper pre-processes the spike trains from single neurons by subtracting the PSTH from all the trials and then squashing them together into a single trial-to-trial fluctuation vector (standard deviation). Also, this procedure is done over trials with the same stimulus. Brainbox library had some useful functions for performing these procedures, but the documentation was difficult to understand and it took me a while to implement these preprocessing steps and after hours of work I still have some bugs, because some arrays go to nan's probably because there is division by zero somewhere (neurons that emit no spikes during the trials).
It's humbling to realize that implementing all of my ideas will take a lot of time and patience, but I think the payoff will be great, because I am already acquiring in demand programming skills by playing around with serious data. It is very likely that I will start contributing to these open source projects. Tiring stuff, but great fun:)
Kommentaarid
Postita kommentaar