We present an overview of a new paradigm for tackling long standing computer vision problems. Specifically our approach is to build statistical models which translate from a visual representations (images) to semantic ones (associated text). As providing optimal text for training is difficult at best, we propose working with whatever associated text is available in large quantities. Examples include large image collections with keywords, museum image collections with descriptive text, news photos, and images on the web. In this paper we discuss how the translation approach can give a handle on difficult questions such as: What counts as an object? Which objects are easy to recognize and which are hard? Which objects are indistinguishable using our features? How to integrate low level vision processes such as feature based segmentation, with high level processes such as grouping. We also summarize some of the models proposed for translating from visual information to text, and some of the methods used to evaluate their performance.
We extend a recently developed method for learning the semantics of image databases using text and pictures. We incorporate statistical natural language processing in order to deal with free text. We demonstrate the current system on a difficult dataset, namely 10,000 images of work from the Fine Arts Museum of San Francisco. The images include line drawings, paintings, and pictures of sculpture and ceramics. Many of the images have associated free text whose varies greatly, from physical description to interpretation and mood. We use WordNet to provide semantic grouping information and to help disambiguate word senses, as well as emphasize the hierarchical nature of semantic relationships. This allows us to impose a natural structure on the image collection, that reflects semantics to a considerable degree. Our method produces a joint probability distribution for words and picture elements. We demonstrate that this distribution can be used (a) to provide illustrations for given captions and (b) to generate words for images outside the training set. Results from this annotation process yield a quantitative study of our method. Finally, our annotation process can be seen as a form of object recognizer that has been learned through a partially supervised process.
PMID: 22917928;PMCID: PMC3529353;Abstract:
The actin-bundling protein fascin is a key mediator of tumor invasion and metastasis and its activity drives filopodia formation, cell-shape changes and cell migration. Small-molecule inhibitors of fascin block tumor metastasis in animal models. Conversely, fascin deficiency might underlie the pathogenesis of some developmental brain disorders. To identify fascin-pathway modulators we devised a cell-based assay for fascin function and used it in a bidirectional drug screen. The screen utilized cultured fascin-deficient mutant Drosophila neurons, whose neurite arbors manifest the 'filagree' phenotype. Taking a repurposing approach, we screened a library of 1040 known compounds, many of them FDA-approved drugs, for filagree modifiers. Based on scaffold distribution, molecular-fingerprint similarities, and chemical-space distribution, this library has high structural diversity, supporting its utility as a screening tool. We identified 34 fascin-pathway blockers (with potential anti-metastasis activity) and 48 fascin-pathway enhancers (with potential cognitive-enhancer activity). The structural diversity of the active compounds suggests multiple molecular targets. Comparisons of active and inactive compounds provided preliminary structure-activity relationship information. The screen also revealed diverse neurotoxic effects of other drugs, notably the 'beads-on-a-string' defect, which is induced solely by statins. Statin-induced neurotoxicity is enhanced by fascin deficiency. In summary, we provide evidence that primary neuron culture using a genetic model organism can be valuable for early-stage drug discovery and developmental neurotoxicity testing. Furthermore, we propose that, given an appropriate assay for target-pathway function, bidirectional screening for brain-development disorders and invasive cancers represents an efficient, multipurpose strategy for drug discovery. © 2012. Published by The Company of Biologists Ltd.
We present a statistical learning approach for finding recreational trails in aerial images. While the problem of recognizing relatively straight and well defined roadways in digital images has been well studied in the literature, the more difficult problem of extracting trails has received no attention. However, trails and rough roads are less likely to be adequately mapped, and change more rapidly over time. Automated tools for finding trails will be useful to cartographers, recreational users and governments. In addition, the methods developed here are applicable to the more general problem of finding linear structure. Our approach combines local estimates for image pixel trail probabilities with the global constraint that such pixels must link together to form a path. For the local part, we present results using three classification techniques. To construct a global solution (a trail) from these probabilities, we propose a global cost function that includes both global probability and path length. We show that the addition of a length term significantly improves trail finding ability. However, computing the optimal trail becomes intractable as known dynamic programming methods do not apply. Thus we describe a new splitting heuristic based on Dijkstra's algorithm. We then further improve upon the results with a trail sampling scheme. We test our approach on 500 challenging images along the 2500 mile continental divide mountain bike trail, where assumptions prevalent in the road literature are violated. ©2008 IEEE.
Research has been devoted in recent years to relevance feedback as an effective solution to improve performance of image similarity search. However, few methods using the relevance feedback are currently available to perform relatively complex queries on large image databases. In the case of complex image queries, images with relevant concepts are often scattered across several visual regions in the feature space. This leads to adapting multiple regions to represent a query in the feature space. Therefore, it is necessary to handle disjunctive queries in the feature space. In this paper, we propose a new adaptive classification and cluster-merging method to find multiple regions and their arbitrary shapes of a complex image query. Our method achieves the same high retrieval quality regardless of the shapes of query regions since the measures used in our method are invariant under linear transformations. Extensive experiments show that the result of our method converges to the user's true information need fast, and the retrieval quality of our method is about 22% in recall and 20% in precision better than that of the query expansion approach, and about 35% in recall and about 31% in precision better than that of the query point movement approach, in MARS. © 2005 Elsevier Inc. All rights reserved.