Related: invariance theory (Poggio), No Free Lunch Theorems (Wolpert 2002), Constraint Learning
Motivation: labeled data is scarce and labelling is expensive. The idea is to use supervision through constraints over the output space. These constraints are derived from prior domain knowledge. Experiments in the paper are from real world and simulated computer vision tasks. Results show: it works, but it’s difficult to encode prior knowledge in loss functions.
- f: x -> y
- g: y -> z
- x: input image
- y: height of pillow
- z: predicted height of pillow from laws of physics
Restriction to problems where output is complex, e.g. height of pillow, not binary classification
Structure in outputs is modelled by a weighted constraints function that penalizes structures that are not consistent with prior knowledge
Pillow height example:
ConvNet with scalar output predicted height Take a sequence of N images and the ConvNets prediction for each image, this gives you a sequence of heights at some fixed time intervals Now fit a parabola with fixed curvature (obtained from a=9.81m/s^2 only ??) to those height predictions as a function of time The training error is now the absolute error between the ConvNet output and the parabola summed over the N samples
Evaluation is done by measuring the correlation between predicted heights and ground truth. This is because knowing the actual height requires knowing the distance of the object to the camera.
More experiments follow with more complex constraints.
Bottom line is: primary constraint is necessary but not sufficient. Fits into a broader area of constraint learning
Own ideas: for labels encode prior knowledge in a similarity matrix, e.g. for MNIST 0, 8 are similar, 5, 8 are similar 1, 7 are similar; for physics what seems to work well is to try to learn the high level features in parallel to the targets