The work carried out during the action took place at the Infolab, within the Computer Science and Artificial Intelligence Lab (CSAIL) of the Massachusetts Institute of Technology (MIT), and at the Unit for Visually Impaired People (UVIP) of the Italian Institute of Technology (IIT). The following main results were achieved:
• Definition of the characteristics of groups participating to the studies. Three groups of subjects (both children and adults) were selected: typically developing (TYP), with visual impairments (VI), and with Autism Spectrum Disorders (ASD). The aim was to characterize the influences of sensory perceptual issues (visual impairments) and cognitive issues (neurodevelopmental disorders such as ASD) on the development of social skills, considering visual attention patterns in the context of social interactions.
• Design of a dataset of dynamic (video) visual social stimuli, suitable to be used both for studies with human subjects and to train computer vision models. The dataset is suitable on one hand to characterize the development of the human ability to understand social behaviors (specifically to classify interactions between agents, distinguishing among help, hinder or no interaction). On the other hand, it can be used to train computer vision models to classify social interactions.
• Collection of behavioral and eye-tracking datasets from subjects with visually impairments, subjects with autism spectrum disorders and healthy subjects, both children and adults. These datasets contributed to: 1) characterize how social skills develop on typically developing individuals and on individuals with sensory or cognitive issues, and 2) build a benchmark for training human-like attentive computational models, able to predict the effect of perceptual and cognitive issues on visual social attention. Specifically, the datasets included: 1) an eye-tracking dataset on static images, during a preferential looking task (social versus object) from VI children (considering both low visual acuity and nystagmus) and sighted children; 2) behavioral datasets of answers related to recognition of facial expressions with and without face masks from sighted and VI children (including children with low visual acuity, nystagmus and saccadic issues); 3) eye-tracking datasets (including behavioral answers) collected from sighted adults, children with VI and children with ASD, who were asked to classify videos of social interactions (help, hinder and no interaction) among simple agents performing goal-oriented actions.
• Development of deep models: 1) to predict social salience of the visual input during classification of social interactions, based on human-gaze data; 2) to augment the visual input based on identified socially salient areas. The first application provided a baseline for developing human-like computational models of visual attention in the context of social interactions. The outputs from this research were intended to be exploitable for applications of machine learning tools to support the assessment and rehabilitation of sensory visual impairments as well as of developmental disorders affecting social behaviors, such as ASD.