Strange Idea

Update: https://www.cell.com/neuron/fulltext/S0896-6273(20)30518-3

https://www.cell.com/neuron/fulltext/S0896-6273(10)01081-0

 Here's a weird idea. Can I encode an EEG signal as an fMRI signal (or NP signal into a CaIm signal?)? Or the other way around?

I've been trying my hand at EEG data and when I want to do analysis, I want to squeeze the temporal data to gether into a single dimension (for example using a short-time Fourier Transform). But when I work with fMRI data which has poor temporal resolution, I want to expand it in the temporal dimension.

You know what they say. The grass is always greener on the other side. 


Chat GPT:

Cross-modal models are neural networks that are trained to understand and relate different modalities of data, such as images, text, audio, and EEG/fMRI signals. These models can be used for a variety of tasks, such as image captioning, text-to-speech synthesis, and audio-visual scene understanding.

One common way to build cross-modal models is to use an encoder-decoder architecture, where one modality is encoded into a compact representation, and then decoded into another modality. For example, an image captioning model would take an image as input, pass it through an encoder CNN to extract features, and then pass those features through a decoder RNN to generate a sentence describing the image.

Another approach is to use a shared representation, where multiple modalities are projected into a common latent space and then used for downstream tasks. For example, a model trained to predict gaze direction from both EEG and fMRI signals could project both modalities into a shared latent space, where the similarity of the representations for the same task could be computed.

Cross-modal models have been used in a wide range of applications, such as multimedia retrieval, human-computer interaction, and cognitive science. However, building such models can be challenging due to the diversity of the data and the lack of aligned training data across modalities.

---

Me: okay, so you could have a single saliency map and have two regressions on top of it, one predicting fMRI and another predicting EEG. It's a kind of multi-task learning. Maybe you could do something interesting with these two sets of regression weights?


Chat GPT: Yes, that's a possibility. The idea of using a shared saliency map as a way to navigate between modalities is an interesting one. The shared saliency map could be a lower-dimensional representation of the visual features that drive attention, and the regression weights on top of it could be used to predict brain activity in both modalities. One potential approach would be to use a convolutional neural network (CNN) to extract visual features from the images and generate a saliency map, and then use this saliency map as input to separate regression models for the fMRI and EEG data.

One interesting aspect of this approach is that the shared saliency map could be used to align the two modalities in a way that is not possible with raw data. For example, you could use the saliency map to identify regions of the brain that are activated in response to specific visual features, and then use this information to analyze the EEG and fMRI data in a more meaningful way.

Another possibility would be to use the multi-task learning framework to learn a shared representation that is useful for both the fMRI and EEG regression tasks. This shared representation could be used to analyze the relationship between the two modalities, and could also be useful for other applications such as decoding brain activity or identifying neural markers of specific cognitive processes.

It is worth noting that this is a very challenging task and there are a lot of open questions to be answered, but it's worth to explore this idea further.

---

Me: Well, the question is clear. Can we learn more about feature driven saliency attention by combining fmri and eeg data using a convnet. fmri is good in the spatial dimension and eeg in temporal dimension. Can we create synthetic data based on these two modalities that is dense both in space and time. We can do that we sparse sensing so why not with this approach.


----

Me: You didn't understand my idea, GPT, but it's okay:-) What I mean is that can we use the temporal information in EEG, compressed into a FFT signature (we can expand it out after learning regression weights) to expand fMRI data in the temporal dimension

....

GPT:

Yes, that's similar to PCA in that you're using a linear combination of basis vectors to represent the data, but instead of using the principal components of the data as the basis vectors, you're using the regression weights learned from the saliency map.

In terms of dimensionalities, let's say that the EEG data has shape (n_samples, n_timepoints, n_channels) and the fMRI data has shape (n_samples, n_voxels, n_timepoints). After learning the regression weights, you would have two matrices, one for the EEG data with shape (n_timepoints, n_weights) and one for the fMRI data with shape (n_voxels, n_weights). If you multiply these two matrices, you would get a new matrix with shape (n_voxels, n_timepoints). This new matrix would be a dense representation of the fMRI data in the temporal dimension, using the temporal information from the EEG data.

---

Thank you, I'm glad I could help! Is there anything else you need assistance with?

I think I'm happy with the status right now. Let's keep talking a bit later when I've thought about it more. For example, there is the question of removing noise from the two recordings by using information from each other

Yes, that's a good idea. One approach could be to use the information from one modality to denoise the other modality. For example, if you have a high-dimensional EEG signal and a low-dimensional fMRI signal, you could use the fMRI signal to denoise the EEG signal by projecting it onto the subspace spanned by the fMRI signal. Another approach could be to use a joint denoising method that utilizes information from both modalities to remove noise. This could be done using techniques such as blind source separation or independent component analysis.

---

On comparing PCA and ICA as saliency encoders

There are several publicly available gaze tracking datasets that include images and fixation data. Some examples include:

You can use these datasets to evaluate the performance of different saliency map algorithms by comparing the predicted saliency maps to the ground-truth fixations. One common metric for evaluating saliency map performance is the area under the receiver operating characteristic curve (AUC-ROC).

---

(Got the baseline model of attention)

It is possible to use PCA and ICA on the images directly and then take the difference of Gaussians (DoG) to create a saliency map. The DoG filter can be used to enhance the edges and fine details in the image, which can be useful for attention-related tasks. However, it is important to note that the results will be highly dependent on the specific parameters of the filter, such as the size and standard deviation of the Gaussian kernels. Additionally, neural networks such as convolutional neural networks (CNNs) have been shown to be highly effective at encoding image features and attention-related information, and may be able to achieve better performance than a simple DoG filter. It would be interesting to compare the performance of a DoG filter baseline model with a CNN-based model on attention-related tasks.

Kommentaarid