Discovering automatically the semantic structure of tagged visual data (e.g. web videos and images) is important for visual data analysis and interpretation, enabling the machine intelligence for effectively processing the fast-growing amount of multi-media data. However, this is non-trivial due to the need for jointly learning underlying correlations between heterogeneous visual and tag data. The task is made more challenging by inherently sparse and incomplete tags. In this work, we develop a method for modelling the inherent visual data concept structures based on a novel Hierarchical-Multi-Label Random Forest model capable of correlating structured visual and tag information so as to more accurately interpret the visual semantics, e.g. disclosing meaningful visual groups with similar high-level concepts, and recovering missing tags for individual visual data samples. Specifically, our model exploits hierarchically structured tags of different semantic abstractness and multiple tag statistical correlations in addition to modelling visual and tag interactions. As a result, our model is able to discover more accurate semantic correlation between textual tags and visual features, and finally providing favourable visual semantics interpretation even with highly sparse and incomplete tags. We demonstrate the advantages of our proposed approach in two fundamental applications, visual data clustering and missing tag completion, on benchmarking video (i.e. TRECVID MED 2011) and image (i.e. NUS-WIDE) datasets.
- Visual semantic structure;
- Tag hierarchy;
- Tag correlation;
- Sparse tags;
- Incomplete tags;
- Data clustering;
- Missing tag completion;
- Random forest
© 2017 Published by Elsevier B.V.
Note to users:
Accepted manuscripts are Articles in Press that have been peer reviewed and accepted for publication by the Editorial Board of this publication. They have not yet been copy edited and/or formatted in the publication house style, and may not yet have the full ScienceDirect functionality, e.g., supplementary files may still need to be added, links to references may not resolve yet etc. The text could still change before final publication.
Although accepted manuscripts do not have all bibliographic details available yet, they can already be cited using the year of online publication and the DOI, as follows: author(s), article title, Publication (year), DOI. Please consult the journal’s reference style for the exact appearance of these elements, abbreviation of journal names and use of punctuation.
When the final article is assigned to volumes/issues of the Publication, the Article in Press version will be removed and the final version will appear in the associated published volumes/issues of the Publication. The date the article was first made available online will be carried over.
Author : Jingya Wang, Xiatian Zhu, Shaogang Gong
from ScienceDirect Publication: Artificial Intelligence http://ift.tt/2qOZeaE