World's most popular travel blog for travel bloggers.

Multi modal learning from separate modal datasets

, , No Comments
Problem Detail: 

Let's say I'm trying to learn emotion given multi-modal sources. In this case, video and audio. However, I only have one dataset for video emotions and one dataset for audio. Is it possible to train a Deep Learning classifier on each dataset separately? One way I can imagine doing this is creating some sort of shared hidden units, similar to the figure below: enter image description here

However, in the publication this is taken from, these weights are trained simultaneously.

Are there examples of this type of multi-modal learning in the literature? I'm not sure what to look up and all I can come up with myself is "something-something Deep Belief Network".

Asked By : Seanny123
Answered By : Seanny123

Back in 2012, this was accomplished by Srivastava and Salakhutdinov in "Learning Representations for Multimodal Data with Deep Belief Nets" where they mixed textual and visual features using the following Deep Belief network architecture:

multi-modal DBN

Note the specially defined hidden layer that can be trained independently.

More recently in 2015, this was accomplished by Huang, Y.; Wang, W. and Wang, L. in "Unconstrained Multimodal Multi-Label Learning" where they use a conditional Restricted Boltzmann Machine applied to creating text labels for images:

conditional restricted boltzmann machine

Note that each observed variable, t for tags and m for images are connected to h, but there is the built in assumption that m is more frequently available than t.

One problem with both of these approaches is I'm unclear on how well they're going to scale.

Best Answer from StackOverflow

Question Source :

3200 people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback