PhD Thesis by Zahoor Zafrulla “Automatic recognition of American Sign Language Classifiers

May 2nd, 2014 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Face and Gesture, PhD, Thad Starner, Zahoor Zafrulla No Comments »

Title: Automatic recognition of American Sign Language Classifiers

Zahoor Zafrulla
School of Interactive Computing
College of Computing
Georgia Institute of Technology


Dr. Thad Starner (Advisor, School of Interactive Computing, Georgia Tech)
Dr. Irfan Essa (Co-Advisor, School of Interactive Computing, Georgia Tech)
Dr. Jim Rehg (School of Interactive Computing, Georgia Tech)
Dr. Harley Hamilton (School of Interactive Computing, Georgia Tech)
Dr. Vassilis Athitsos (Computer Science and Engineering Department, University of Texas at Arlington)


Automatically recognizing classifier-based grammatical structures of American Sign Language (ASL) is a challenging problem. Classifiers in ASL utilize surrogate hand shapes for people or “classes” of objects and provide information about their location, movement and appearance. In the past researchers have focused on recognition of finger spelling, isolated signs, facial expressions and interrogative words like WH-questions (e.g. Who, What, Where, and When). Challenging problems such as recognition of ASL sentences and classifier-based grammatical structures remain relatively unexplored in the field of ASL recognition.

One application of recognition of classifiers is toward creating educational games to help young deaf children acquire language skills. Previous work developed CopyCat, an educational ASL game that requires children to engage in a progressively more difficult expressive signing task as they advance through the game.

We have shown that by leveraging context we can use verification, in place of recognition, to boost machine performance for determining if the signed responses in an expressive signing task, like in the CopyCat game, are correct or incorrect. We have demonstrated that the quality of a machine verifier’s ability to identify the boundary of the signs can be improved by using a novel two-pass technique that combines signed input in both forward and reverse directions. Additionally, we have shown that we can reduce CopyCat’s dependency on custom manufactured hardware by using an off-the-shelf Microsoft Kinect depth camera to achieve similar verification performance. Finally, we show how we can extend our ability to recognize sign language by leveraging depth maps to develop a method using improved hand detection and hand shape classification to recognize selected classifier-based grammatical structures of ASL.

AddThis Social Bookmark Button

Atlanta Magazine Features, Thad Starner, “Magnifying glass”

March 3rd, 2014 Irfan Essa Posted in In The News, Thad Starner, Ubiquitous Computing No Comments »

A wonderful write up on my friend and colleague, Thad Starner in the Atlanta Magazine.  Worth a read for sure

“The guy with the computer on his face.” This would have been a fair description of Starner at almost any time over the past twenty years. He first built his own wearable computer with a head-mounted display in 1993, and has donned some version or another of the computer-eyepiece-Internet system most days since then. But over the previous year, something changed.

via Magnifying glass – Features – Atlanta Magazine.

AddThis Social Bookmark Button

Paper (2009): ICASSP “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection”

February 4th, 2009 Irfan Essa Posted in 0205507, Face and Gesture, ICASSP, James Rehg, Machine Learning, Pei Yin, Thad Starner No Comments »

Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.


The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity

AddThis Social Bookmark Button

Paper: ISWC (2008) “Localization and 3D Reconstruction of Urban Scenes Using GPS”

September 28th, 2008 Irfan Essa Posted in ISWC, Kihwan Kim, Mobile Computing, Papers, Thad Starner No Comments »

Kihwan Kim, Jay Summet, Thad Starner, Daniel Ashbrook, Mrunal Kapade and Irfan Essa  (2008) “Localization and 3D Reconstruction of Urban Scenes Using GPS” In Proceedings of IEEE Symposium on Wearable Computing (ISWC) 2008 (To Appear). [PDF]



Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the moving units and the orbiting satellites. We measure the size and height of skyscrapers as well as automatically constructing a density map representing the location of multiple buildings in an urban landscape.  If deployed on a large scale, via a cellular service provider’s GPS-enabled mobile phones or GPS-tracked delivery vehicles, the system could provide an inexpensive means of continuously creating and updating 3D maps of urban environments.

AddThis Social Bookmark Button

Thesis David Minnen PhD (2008): “Unsupervised Discovery of Activity Primitives from Multivariate Sensor Data”

June 18th, 2008 Irfan Essa Posted in Activity Recognition, David Minnen, PhD, Thad Starner No Comments »

Unsupervised Discovery of Activity Primitives from Multivariate Sensor Data



This research addresses the problem of temporal pattern discovery in real-valued, multivariate sensor data. Several algorithms were developed, and subsequent evaluation demonstrates that they can efficiently and accurately discover unknown recurring patterns in time series data taken from many different domains. Different data representations and motif models were investigated in order to design an algorithm with an improved balance between run-time and detection accuracy. The different data representations are used to quickly filter large data sets in order to detect potential patterns that form the basis of a more detailed analysis. The representations include global discretization, which can be efficiently analyzed using a suffix tree, local discretization with a corresponding random projection algorithm for locating similar pairs of subsequences, and a density-based detection method that operates on the original, real-valued data. In addition, a new variation of the multivariate motif discovery problem is proposed in which each pattern may span only a subset of the input features. An algorithm that can efficiently discover such “subdimensional” patterns was developed and evaluated. The discovery algorithms are evaluated by measuring the detection accuracy of discovered patterns relative to a set of expected patterns for each data set. The data sets used for evaluation are drawn from a variety of domains including speech, on-body inertial sensors, music, American Sign Language video, and GPS tracks.

AddThis Social Bookmark Button