We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.
In this article we introduce a new method for estimating camera sensitivity functions from spectral power input and camera response data. We also show how the procedure can be extended to deal with camera nonlinearities. Linearization is an important part of camera characterization, and we argue that it is best to jointly fit the linearization and the sensor response functions. We compare our method with a number of others, both on synthetic data and for the characterization of a real camera. All data used in this study is available online at www.cs.sfu.ca/~colour/data. © 2002 Wiley Periodicals, Inc. Col. Res. Appl.
We propose a new method to measure "visualness" of concepts, that is, what extent concepts have visual characteristics. To know which concept has visually discriminative power is important for image annotation, especially automatic image annotation by image recognition system, since not all concepts are related to visual contents. Our method performs probabilistic region selection for images which are labeled as concept "X" or "non-X", and computes an entropy measure which represents "visualness" of concepts. In the experiments, we collected about forty thousand images from the World-Wide Web using the Google Image Search for 150 concepts. We examined which concepts are suitable for annotation of image contents. Copyright © 2005 ACM.
We propose a top down approach for understanding indoor scenes such as bedrooms and living rooms. These environments typically have the Manhattan world property that many surfaces are parallel to three principle ones. Further, the 3D geometry of the room and objects within it can largely be approximated by non overlapping simple structures such as single blocks (e.g. the room boundary), thin blocks (e.g. picture frames), and objects that are well modeled by single blocks (e.g. simple beds). We separately model the 3D geometry, the imaging process (camera parameters), and edge likelihood, to provide a generative statistical model for image data. We fit this model using data driven MCMC sampling. We combine reversible jump Metropolis Hastings samples for discrete changes in the model such as the number of blocks, and stochastic dynamics to estimate continuous parameter values in a particular parameter space that includes block positions, block sizes, and camera parameters. We tested our approach on two datasets using room box pixel orientation. Despite using only bounding box geometry and, in particular, not training on appearance, our method achieves results approaching those of others. We also introduce a new evaluation method for this domain based on ground truth camera parameters, which we found to be more sensitive to the task of understanding scene geometry. © 2011 IEEE.
We propose four probabilistic generative models for simultaneously modeling gene expression levels and Gene Ontology (GO) tags. Unlike previous approaches for using GO tags, the joint modeling framework allows the two sources of information to complement and reinforce each other. We fit our models to three time-course datasets collected to study biological processes, specifically blood vessel growth (angiogenesis) and mitotic cell cycles. The proposed models result in a joint clustering of genes and GO annotations. Different models group genes based on GO tags and their behavior over the entire time-course, within biological stages, or even individual time points. We show how such models can be used for biological stage boundary estimation de novo. We also evaluate our models on biological stage prediction accuracy of held out samples. Our results suggest that the models usually perform better when GO tag information is included.