Paper in IEEE WACV (2017): “Complex Event Recognition from Images with Few Training Examples”

March 27th, 2017 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Computer Vision, PAMI/ICCV/CVPR/ECCV, Papers, Unaiza Ahsan No Comments »

Paper

  • U. Ahsan, C. Sun, J. Hays, and I. Essa (2017), “Complex Event Recognition from Images with Few Training Examples,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2017. [PDF] [arXiv] [BIBTEX]
    @InProceedings{    2017-Ahsan-CERFIWTE,
      arxiv    = {https://arxiv.org/abs/1701.04769},
      author  = {Unaiza Ahsan and Chen Sun and James Hays and Irfan
          Essa},
      booktitle  = {IEEE Winter Conference on Applications of Computer
          Vision (WACV)},
      month    = {March},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2017-Ahsan-CERFIWTE.pdf},
      title    = {Complex Event Recognition from Images with Few
          Training Examples},
      year    = {2017}
    }

Abstract

We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples. We introduce a novel framework to discover event concept attributes from the web and use that to extract semantic features from images and classify them into social event categories with few training examples. Discovered concepts include a variety of objects, scenes, actions and event subtypes, leading to a discriminative and compact representation for event images. Web images are obtained for each discovered event concept and we use (pre-trained) CNN features to train concept classifiers. Extensive experiments on challenging event datasets demonstrate that our proposed method outperforms several baselines using deep CNN features directly in classifying images into events with limited training examples. We also demonstrate that our method achieves the best overall accuracy on a data set with unseen event categories using a single training example.

AddThis Social Bookmark Button

Paper (WACV 2016) “Discovering Picturesque Highlights from Egocentric Vacation Videos”

March 7th, 2016 Irfan Essa Posted in Computational Photography and Video, Computer Vision, Daniel Castro, PAMI/ICCV/CVPR/ECCV, Vinay Bettadapura No Comments »

Paper

  • D. Castro, V. Bettadapura, and I. Essa (2016), “Discovering Picturesque Highlights from Egocentric Vacation Video,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2016. [PDF] [WEBSITE] [arXiv] [BIBTEX]
    @InProceedings{    2016-Castro-DPHFEVV,
      arxiv    = {http://arxiv.org/abs/1601.04406},
      author  = {Daniel Castro and Vinay Bettadapura and Irfan
          Essa},
      booktitle  = {Proceedings of IEEE Winter Conference on
          Applications of Computer Vision (WACV)},
      month    = {March},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2016-Castro-DPHFEVV.pdf},
      title    = {Discovering Picturesque Highlights from Egocentric
          Vacation Video},
      url    = {http://www.cc.gatech.edu/cpl/projects/egocentrichighlights/},
      year    = {2016}
    }

Abstract

2016-Castro-DPHFEVVWe present an approach for identifying picturesque highlights from large amounts of egocentric video data. Given a set of egocentric videos captured over the course of a vacation, our method analyzes the videos and looks for images that have good picturesque and artistic properties. We introduce novel techniques to automatically determine aesthetic features such as composition, symmetry, and color vibrancy in egocentric videos and rank the video frames based on their photographic qualities to generate highlights. Our approach also uses contextual information such as GPS, when available, to assess the relative importance of each geographic location where the vacation videos were shot. Furthermore, we specifically leverage the properties of egocentric videos to improve our highlight detection. We demonstrate results on a new egocentric vacation dataset which includes 26.5 hours of videos taken over a 14-day vacation that spans many famous tourist destinations and also provide results from a user-study to access our results.

 

AddThis Social Bookmark Button

Paper in WACV (2015): “Semantic Instance Labeling Leveraging Hierarchical Segmentation”

January 6th, 2015 Irfan Essa Posted in Computer Vision, Henrik Christensen, PAMI/ICCV/CVPR/ECCV, Papers, Robotics, Steven Hickson No Comments »

Paper

  • S. Hickson, I. Essa, and H. Christensen (2015), “Semantic Instance Labeling Leveraging Hierarchical Segmentation,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF] [DOI] [BIBTEX]
    @InProceedings{    2015-Hickson-SILLHS,
      author  = {Steven Hickson and Irfan Essa and Henrik
          Christensen},
      booktitle  = {Proceedings of IEEE Winter Conference on
          Applications of Computer Vision (WACV)},
      doi    = {10.1109/WACV.2015.147},
      month    = {January},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Hickson-SILLHS.pdf},
      publisher  = {IEEE Computer Society},
      title    = {Semantic Instance Labeling Leveraging Hierarchical
          Segmentation},
      year    = {2015}
    }

Abstract

Most of the approaches for indoor RGBD semantic labeling focus on using pixels or superpixels to train a classifier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmentation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decision trees as a classifier using simple features such as the 3D size, LAB color histogram, width, height, and shape as specified by a histogram of surface normals. We test our method on the NYU V2 depth dataset, a challenging dataset of cluttered indoor environments. Our experiments using the NYU V2 depth dataset show that our method achieves state of the art results on both a general semantic labeling introduced by the dataset (floor, structure, furniture, and objects) and a more object specific semantic labeling. We show that training a classifier on a segmentation from a hierarchy of super pixels yields better results than training directly on super pixels, patches, or pixels as in previous work.2015-Hickson-SILLHS_pdf

AddThis Social Bookmark Button

Paper in IEEE WACV (2015): “Finding Temporally Consistent Occlusion Boundaries using Scene Layout”

January 6th, 2015 Irfan Essa Posted in Computational Photography and Video, Computer Vision, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, S. Hussain Raza, Uncategorized No Comments »

Paper

  • S. H. Raza, A. Humayun, M. Grundmann, D. Anderson, and I. Essa (2015), “Finding Temporally Consistent Occlusion Boundaries using Scene Layout,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF] [DOI] [BIBTEX]
    @InProceedings{    2015-Raza-FTCOBUSL,
      author  = {Syed Hussain Raza and Ahmad Humayun and Matthias
          Grundmann and David Anderson and Irfan Essa},
      booktitle  = {Proceedings of IEEE Winter Conference on
          Applications of Computer Vision (WACV)},
      doi    = {10.1109/WACV.2015.141},
      month    = {January},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Raza-FTCOBUSL.pdf},
      publisher  = {IEEE Computer Society},
      title    = {Finding Temporally Consistent Occlusion Boundaries
          using Scene Layout},
      year    = {2015}
    }

Abstract

We present an algorithm for finding temporally consistent occlusion boundaries in videos to support segmentation of dynamic scenes. We learn occlusion boundaries in a pairwise Markov random field (MRF) framework. We first estimate the probability of a spatiotemporal edge being an occlusion boundary by using appearance, flow, and geometric features. Next, we enforce occlusion boundary continuity in an MRF model by learning pairwise occlusion probabilities using a random forest. Then, we temporally smooth boundaries to remove temporal inconsistencies in occlusion boundary estimation. Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene. We have developed a dataset with fully annotated ground-truth occlusion boundaries of over 30 videos (∼5000 frames). This dataset is used to evaluate temporal occlusion boundaries and provides a much-needed baseline for future studies. We perform experiments to demonstrate the role of scene layout, and temporal information for occlusion reasoning in video of dynamic scenes.

AddThis Social Bookmark Button

Paper in IEEE WACV (2015): “Leveraging Context to Support Automated Food Recognition in Restaurants”

January 6th, 2015 Irfan Essa Posted in Activity Recognition, Computer Vision, Edison Thomaz, First Person Computing, Gregory Abowd, Mobile Computing, PAMI/ICCV/CVPR/ECCV, Papers, Ubiquitous Computing, Uncategorized, Vinay Bettadapura No Comments »

Paper

  • V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, and I. Essa (2015), “Leveraging Context to Support Automated Food Recognition in Restaurants,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF] [WEBSITE] [DOI] [arXiv] [BIBTEX]
    @InProceedings{    2015-Bettadapura-LCSAFRR,
      arxiv    = {http://arxiv.org/abs/1510.02078},
      author  = {Vinay Bettadapura and Edison Thomaz and Aman
          Parnami and Gregory Abowd and Irfan Essa},
      booktitle  = {Proceedings of IEEE Winter Conference on
          Applications of Computer Vision (WACV)},
      doi    = {10.1109/WACV.2015.83},
      month    = {January},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Bettadapura-LCSAFRR.pdf},
      publisher  = {IEEE Computer Society},
      title    = {Leveraging Context to Support Automated Food
          Recognition in Restaurants},
      url    = {http://www.vbettadapura.com/egocentric/food/},
      year    = {2015}
    }

 

Abstract

The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat. In this paper, we study how taking pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with additional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demonstrate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant’s online menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food images taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).food-poster

AddThis Social Bookmark Button