Distributed Machine Learning

Traditional machine learning and computer vision algorithms, particularly those that exploit various probabilistic and learning-based approaches, are often formulated in centralized settings. However, modern computational settings are becoming increasingly characterized by networks of peer-to-peer connected devices, with local data processing abilities. A number of distributed algorithms have been proposed to address the problems such as calibration, pose estimation, tracking, object and activity recognition in large camera networks.

One critical challenge in distributed data analysis includes dealing with missing data. In camera networks, different nodes will only have access to a partial set of data features because of varying camera views or object movement. For instance, object points used for Structure from Motion tasks may be visible only in some cameras and only in particular object poses. As a consequence, different nodes will be frequently exposed to missing data. However, most current distributed data analysis methods are algebraic in nature and cannot seamlessly handle such missing data.

In this line of work we investigate methods to estimation and learning of generative probabilistic models in a distributed context where certain sensor data can be missing. In particular, we have shown how traditional centralized models, such as probabilistic PCA (PPCA) and missing-data PPCA, Bayesian PCA (BPCA) can be learned when the data is distributed across a network of sensors. The goal is to show that the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods while being able to handle challenging situations such as missing or noisy observations.