Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

June 16th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [DOI] [BLOG] [BIBTEX]
    @InProceedings{    2012-Kim-DRIDSWCM,
      author  = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      blog    = {},
      booktitle  = {Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)},
      doi    = {10.1109/CVPR.2012.6247809},
      pdf    = {},
      publisher  = {IEEE Computer Society},
      title    = {Detecting Regions of Interest in Dynamic Scenes
          with Camera Motions},
      url    = {},
      video    = {},
      year    = {2012}


We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

AT IWCV 2012: “Videos Understanding: Extracting Content and Context from Video.”

May 24th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Presentations, Visual Surviellance No Comments »

Videos Understanding: Extracting Content and Context from Video.

(Presentation at the International Workshop on Computer Vision 2012, Ortigia, Siracusa, Sicily, May 22-24, 2012.)

Irfan Essa


In this talk, I will describe various efforts aimed at extracting context and content from video. I will highlight some of our recent work in extracting spatio-temporal features and the related saliency information from the video, which can be used to detect and localize regions of interest in video. Then I will describe approaches that use structured and unstructured representations to recognize the complex and extended-time actions.  I will also discuss the need for unsupervised activity discovery, and detection of anomalous activities from videos. I will show a variety of examples, which will include online videos, mobile videos, surveillance and home monitoring video, and sports videos. Finally, I will pose a series of questions and make observations about how we need to extend our current paradigms of video understanding to go beyond local spatio-temporal features, and standard time-series and bag of words models.

AddThis Social Bookmark Button

Kihwan Kim’s Thesis Defense (2011): “Spatio-temporal Data Interpolation for Dynamic Scene Analysis”

December 6th, 2011 Irfan Essa Posted in Computational Photography and Video, Kihwan Kim, Modeling and Animation, Multimedia, PhD, Security, Visual Surviellance, WWW No Comments »

Spatio-temporal Data Interpolation for Dynamic Scene Analysis

Kihwan Kim, PhD Candidate

School of Interactive Computing, College of Computing, Georgia Institute of Technology

Date: Tuesday, December 6, 2011

Time: 1:00 pm – 3:00 pm EST

Location: Technology Square Research Building (TSRB) Room 223


Analysis and visualization of dynamic scenes is often constrained by the amount of spatio-temporal information available from the environment. In most scenarios, we have to account for incomplete information and sparse motion data, requiring us to employ interpolation and approximation methods to fill for the missing information. Scattered data interpolation and approximation techniques have been widely used for solving the problem of completing surfaces and images with incomplete input data. We introduce approaches for such data interpolation and approximation from limited sensors, into the domain of analyzing and visualizing dynamic scenes. Data from dynamic scenes is subject to constraints due to the spatial layout of the scene and/or the configurations of video cameras in use. Such constraints include: (1) sparsely available cameras observing the scene, (2) limited field of view provided by the cameras in use, (3) incomplete motion at a specific moment, and (4) varying frame rates due to different exposures and resolutions.

In this thesis, we establish these forms of incompleteness in the scene, as spatio- temporal uncertainties, and propose solutions for resolving the uncertainties by applying scattered data approximation into a spatio-temporal domain.

The main contributions of this research are as follows: First, we provide an effi- cient framework to visualize large-scale dynamic scenes from distributed static videos. Second, we adopt Radial Basis Function (RBF) interpolation to the spatio-temporal domain to generate global motion tendency. The tendency, represented by a dense flow field, is used to optimally pan and tilt a video camera. Third, we propose a method to represent motion trajectories using stochastic vector fields. Gaussian Pro- cess Regression (GPR) is used to generate a dense vector field and the certainty of each vector in the field. The generated stochastic fields are used for recognizing motion patterns under varying frame-rate and incompleteness of the input videos. Fourth, we also show that the stochastic representation of vector field can also be used for modeling global tendency to detect the region of interests in dynamic scenes with camera motion. We evaluate and demonstrate our approaches in several applications for visualizing virtual cities, automating sports broadcasting, and recognizing traffic patterns in surveillance videos.


  • Prof. Irfan Essa (Advisor, School of Interactive Computing, Georgia Institute of Technology)
  • Prof. James M. Rehg (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Thad Starner (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Greg Turk (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Jessica K. Hodgins (Robotics Institute, Carnegie Mellon University, and Disney Research Pittsburgh)
AddThis Social Bookmark Button

In the News (2010): DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery

September 2nd, 2010 Irfan Essa Posted in Activity Recognition, Grant Schindler, PERSEAS, Visual Surviellance No Comments »

via Kitware – News: DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery.

Kitware has received a $13,883,314 contract from Defense Advanced Research Projects Agency (DARPA) to develop a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments.

The primary information elements in WAMI data are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. Kitware’s software system will use algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.

The developed algorithms will form the basis of a software prototype called the Persistent Stare Exploitation and Analysis System (PerSEAS) that will significantly augment an end-user’s ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned – or defined – threat activity.

The advanced PerSEAS system will markedly improve an analyst’s ability to handle burgeoning WAMI data and reduce the time required to perform many current exploitation tasks, greatly enhancing the military’s capability to analyze and utilize the data for forensic analysis and through the issuance of timely threat alerts with a minimal number of false alarms.

Due to the complex, multi-disciplinary nature of the research, Kitware will partner with academic experts in the fields of computer vision, probabilistic reasoning, machine learning and other related domains. Phase I of the research is expected to be completed in two years.

The awarded contract will expand Kitware’s leadership in the field of computer vision, video analysis and advanced visualization software. The project will build upon our previous DARPA-sponsored research into content-based video retrieval on the VIRAT program; anomaly detection on the PANDA program; and the recognition of complex multi-agent activities in video.

To meet the PerSEAS program’s needs, Kitware has assembled a world-class team including four leading defense technology companies, Northrop Grumman Corporation, ; Honeywell Automation and Control Solutions Laboratories, Aptima, Inc., and Navia, Inc. As well as multiple internationally-renowned research institutions, including: the University of California, Berkeley; Computer Vision Laboratory, University of Maryland; Rensselaer Polytechnic Institute; the Computer Vision Lab at the University of Central Florida; the School of Interactive Computing at Georgia Tech and its affiliated Center for Robotics & Intelligent Machines; and Columbia University.


AddThis Social Bookmark Button


June 1st, 2010 Irfan Essa Posted in Grant Schindler, PERSEAS, Visual Surviellance No Comments »

 Persistent Stare Exploitation and Analysis System (PerSEAS)

Part of the team led by Kitware Inc to work on Defense Advanced Research Projects Agency – Persistent Stare Exploitation and Analysis System (PerSEAS).

The Persistent Stare Exploitation and Analysis System (PerSEAS) program is developing the capability to automatically and interactively identify potential threats as they emerge based on the correlation of multiple disparate activities and events in wide area motion imagery (WAMI) and multi-INT data.  PerSEAS will enable new methods of threat hypothesis adjudication and forensic analysis through activity-based modeling and inferencing capabilities.


AddThis Social Bookmark Button