Jacobus J Barnard
Publications
Abstract:
In this paper we introduce a new method for determining the relationship between signal spectra and camera RGB which is required for many applications in color. We work with the standard camera model, which assumes that the response is linear. We also provide an example of how the fitting procedure can be augmented to include fitting for a previously estimated non-linearity. The basic idea of our method is to minimize squared error subject to linear constraints, which enforce positivity and range of the result. It is also possible to constrain the smoothness, but we have found that it is better to add a regularization expression to the objective function to promote smoothness. With this method, smoothness and error can be traded against each other without being restricted by arbitrary bounds. The method is easily implemented as it is an example of a quadratic programming problem, for which there are many software solutions available. In this paper we provide the results using this method and others to calibrate a Sony DXC-930 CCD color video camera. We find that the method gives low error, while delivering sensors which are smooth and physically realizable. Thus we find the method superior to methods which ignore any of these considerations.
Abstract:
A system for learning the semantics of collections of images from features and associated text is discussed. The idea of the application of this system to the digital image libraries is explored. The nature of search and browsing is considered and it is argued that for many applications these should be used together.
Abstract:
We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images.
Abstract:
We present an overview of a new paradigm for tackling long standing computer vision problems. Specifically our approach is to build statistical models which translate from a visual representations (images) to semantic ones (associated text). As providing optimal text for training is difficult at best, we propose working with whatever associated text is available in large quantities. Examples include large image collections with keywords, museum image collections with descriptive text, news photos, and images on the web. In this paper we discuss how the translation approach can give a handle on difficult questions such as: What counts as an object? Which objects are easy to recognize and which are hard? Which objects are indistinguishable using our features? How to integrate low level vision processes such as feature based segmentation, with high level processes such as grouping. We also summarize some of the models proposed for translating from visual information to text, and some of the methods used to evaluate their performance.
Abstract:
We extend a recently developed method for learning the semantics of image databases using text and pictures. We incorporate statistical natural language processing in order to deal with free text. We demonstrate the current system on a difficult dataset, namely 10,000 images of work from the Fine Arts Museum of San Francisco. The images include line drawings, paintings, and pictures of sculpture and ceramics. Many of the images have associated free text whose varies greatly, from physical description to interpretation and mood. We use WordNet to provide semantic grouping information and to help disambiguate word senses, as well as emphasize the hierarchical nature of semantic relationships. This allows us to impose a natural structure on the image collection, that reflects semantics to a considerable degree. Our method produces a joint probability distribution for words and picture elements. We demonstrate that this distribution can be used (a) to provide illustrations for given captions and (b) to generate words for images outside the training set. Results from this annotation process yield a quantitative study of our method. Finally, our annotation process can be seen as a form of object recognizer that has been learned through a partially supervised process.