This is a page for uploading papers and short snippets to summarize the paper and sketch out ideas. Presently, it will be used by Pulkit and Mayur to explore ideas for representing 3D worlds.
Sensory-motor Discussion (June 30)
A general algorithm for transforming signals from vision to motor domain. The simple test case: Sparse coding on spatio-temporal data as a robot moves around with the world. The robot has gyros/accelerometers (referred as motion readings) attached to it. We wish to establish correlation between some of the learnt dictionary elements and the motion readings. For example a dictionary element might be a vertical bar moving from center to the periphery which would be positively correlated with forward motion. The things to investigate are:
1. Difference in the learnt dictionary elements with and without motion cues. (One simple way of incorporating motion cues is to treat motion as an extra dimension). For example we might not get dictionary elements which directly correlate with forward/backward motion unless we put in motion cues. It will be interesting to investigate such things.
2. Next, when a video is shown to the agent - it should be able to infer forward or backward motion. This can also be broken down into inferring/reconstructing both the image and the gyro/accelerometer information that should result in that particular spatio-temporal input.
To Do: Erick has a setup with a robot with gyro and some other motion sensors.Talk to Erick and get an update on the setup we have for our experiment. Further, we need to read Charle's papers on Spatial-Temporal sparse coding.
Explore the idea of labeling surfaces in conjunction with motor data. This should naturally lead to the idea of expanding upon types of surfaces that can later be classified into objects and such. This work will differ from work in computer vision in that it will also use motor information to intrinsically help label the surfaces.
Sensory-motor Discussion (July 4th)
Learning joint representations using multi-sensory data (or motor + vision).
Encoding with multi-sensory and testing with only vision (say). To show that learnt dictionaries are better with joint feature spaces.
Blurry (poor visual acuity) images of different textures/surfaces. Along with data such as frictional coefficient, elasticity, etc. Learn two dictionaries a) feature space of textures b) feature space of textures + surface properties
Testing: if the second dictionary wins, we are set.
Princeton object dataset.
Two dictionaries: a) RGBD dictionary (3D) b) RGB dictionary (2D)
During test time, use only RGB inputs and discard depth and see which of them buys you better classification.
One suggestion for learning dictionary is to use a discriminative clustering algorithm.