Cognitively-Motivated Deep Learning (2019-)

For too many decades the emphasis in our community has been on task-specific decoding performance rather than creating models that have good generalization power and, especially, good induction properties, i.e., can learn from one-to-five examples just like humans do. My vision is creating cognitively-motivated representations (aka models) that radically depart from the unified metric space fallacy (aka the real-world bias) and respect macroscopic cognitive principles such as low-dimensionality, hierarchy, abstraction, two-tier architecture (system 1 vs system 2) etc. Instead of following the popular path in representation modeling of adding these constraints as training tricks in deep neural nets or regularization terms in autoencoder training, we propose instead a top-down hierarchical manifold representation that explicitly (by design) respects cognitive principles. In our recent work, we show that by creating and reasoning using an ensemble of sparse, low-dimensional subspaces we achieve human-like performance not only for decoding but also for induction (learning) lexical semantics.

Natural Multiparty Dialogue Interaction (2021-2022)

While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, in collaboration with colleagues at Amazon we released the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. Multiparty dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we proposed the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrated that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue systems that are trained for single-user dialogues. Further, this method surpasses training a medium-sized model directly on multi-user dialogues and generalizes to unseen domains.

Behavioral Signals: Emotion and Behavioral Tracking in the Lab and in the Wild (2017-2021)

Despite significant progress, emotion AI remains a challenging R&D area especially when technology is being transferred from the academic lab to a startup industrial setting. In collaboration with the team at Behavioral Signals, we introduced a series of innovations towards building a general purpose emotion AI conversational platform, the OliverAPI. Specifically progress has been made in the areas of data imbalance, data sparseness and data augmentation, as well as, multimodal fusion using novel neural network architectures. A series of practical considerations have also been addressed including data annotation, cultural/social biases in the data, scalability and performance. These technologies have been applied to a wide-range of use-cases in the real world from conversational speech analysis to multimedia processing, mental health and human-robot interaction.

BabyRobot: Child-Robot Communication (2016-2019)

I served as the technical coordinator of the EU-IST H2020 BabyRobot project. In the BabyRobot project we model human-robot communication as a three-step process: sharing attention, establishing common ground and forming shared goals. Our main goal is to create robots that analyze and track human behavior over time in the context of their surroundings (situational) using audio-visual monitoring in order to establish common ground and intention-reading capabilities. In BabyRobot we focus on the typically developing and autistic spectrum children user population in order to define, implement and evaluate child-robot interaction application scenarios for developing specific socio-affective, communication and collaboration skills. Breakthroughs in core robotic technologies are needed to support this research mainly in the areas of motion planning and control in constrained spaces, gestural kinematics, sensorimotor learning and adaptation. BabyRobot ambition is to create robots that can establish communication protocols and form collaboration plans on the fly will have impact beyond the consumer and healthcare application markets addressed here.

SpeDial: Spoken Dialogue Analytics (2013-2015)

I am the coordinator of the EU-IST FP7 SpeDial project. In the SpeDial project we propose a process for spoken dialogue service development, enhancement and customization of deployed services, where data logs are analyzed and used to enhance the service in a semi-automated fashion. A list of mature technologies will be used to: 1) identify hot-spots in the dialogue and propose alternative call-flow structures, 2) select among list of prompts to reach target KPIs, 3) update grammars using transcribed service data and 4 customize application for specific user populations. Specifically, the list of technologies used will be: affective modeling of spoken dialogue, call-flow/discourse analysis, machine translation, crowd-sourcing, grammar induction, user modeling. The technologies listed above will be integrated in a service-doctoring platform that will enhance deployed services. Our business model is quick deployment of a prototype service, followed by service enhancement using our platform. The reduced development time and time-to-market will provide significant differentiation for SME in the speech services areas, as well as, end-users. The business opportunity is significant especially given the consolidation of the speech services industry and the lack of major competition.

BabyAffect: Affective and behavioral modeling of early lexicalizations of ASD and TD children (2014-2015)

I am the coordinator of the Greek SRT Aristeia II BabyAffect project (research excellence grant). The main scientific preposition behind the BabyAffect project is that the extra-lexical and extra-linguistic stream in child-caregiver communication, e.g., affect, communicative intent, is an important source of (often complementary) information that enhances significantly the lexical acquisition process in early childhood both in terms of quality (e.g., semantic categorization ability) and quantity (rate of learning, vocabulary spurt). We intend to demonstrate this both experimentally using statistical information extracted from audio- visual recordings of infants (and their caregivers) and formally using cognitive models of the lexical acquisition process using parallel distributed models and semantic networks. BabyAffect Main Goals are: 1) To develop a computational model for early vocabulary development using multimodal data conveying emotions and communicative functions from typical and atypical populations. 2) To collect and make available to different disciplines (AI, Psycholinguists, Developmental Psychologists, Human Language Technology) a large amount of multimodal data from Greek speaking children of the one-word stage in natural environments. 3) To investigate the ability of typical and atypical children to express emotions and communicative functions through distinct acoustic patterns, in order to develop an automatic screening tool for detecting children with autism and language delay (on the basis of their ability to use distinct acoustic patterns to express different emotions and communicative functions).

PortDial: Language Resources for Portable Multilingual Dialogue Systems (2012-2014)

I am the coordinator of the EU-IST FP7 PortDial project. The PortDial project aims to apply grammar induction and semantic web technologies towards the creation of domain-specific multilingual SDS resources, specifically, data-linked ontologies and grammars. The main goal of PortDial is to design machine-aided methods for creating, cleaning-up and publishing multilingual domain ontologies and grammars for spoken dialogue system prototyping in various application domains. The project aims at delivering a commercial platform for quick prototyping of interactive spoken dialogue applications to new domains and languages. It will focus on the corresponding multilingual collections of resources for specific application domains and a multilingual linked data ontological corpus that can be freely used for SDS research and prototyping for non-commercial purposes. With the main contribution of this project, partners expect to save up to 50% of development time, significantly improve grammar coverage, and lowering the barrier-to-entry for speech services prototyping by introducing data-populated ontologies and grammar induction. The application domains of PortDial include entertainment, travel, finance and customer service.

CogniMuse: Multimodal Signal and Event Processing In Perception and Cognition (2012-2015)

I am a collaborator for the Greek SRT Aristeia research excellence grant Cognimuse. Motivated by the grand challenge to endow computers with human-like abilities for multimodal sensory information processing, perception and cognitive attention, CogniMuse undertakes fundamental research in modeling multisensory and sensory-semantic integration via a synergy between system theory, computational algorithms and human cognition. It focuses on integrating three modalities (audio, vision and text) toward detecting salient perceptual events and combining them with semantics to build higher-level stable events through controlled attention mechanisms. My main contribution to CogniMuse is on the text modality, as well as on the fusion of low-level (sensory) and high-level (semantic) information.

USC/ISI Collaborative Projects

I am involved in a variety of joint research efforts with my long-time collaborator Prof. Shri Narayanan at the SAIL lab at USC. These include: speech analysis and recognition of children speech, analysis of narratives of autistic children, application of network-based DSMs and semantic-affective models to computation politics and the legal domain, as well as automatic analysis of movie content.

Past Projects: 2000-2010

  • I was the project coordinator for the Greek SRT PENED project on aerodynamic modeling of the vocal tract during speech production running from 2006 to 2009. This was a three way collaboration between our team at Tech. Univ. of Crete and two groups at NTUA (fluid dynamics, speech processing). For some relevant results see joint work with Dr. Pirros Tsiakoulis at my AM-FM modulation related publications.
  • I led the effort for the TSI-TUC team for the EU-IST FP6 Network of Excellence MUSCLE on multimedia understanding from 2004 to 2008. Notable outputs from this project include our collaborative work with NTUA on saliency-based movie summarization, see also relevant publications here.
  • I led the effort for the TSI-TUC team for the EU-IST FP6 STREP project HIWIRE on robust speech recognition from 2004 to 2007. One of the main outputs of the projects was the HIWIRE front-end that reduced word error rate by 20% relative compared to the ETSI advanced standard front-end. For more details on the HIWIRE project and HIWIRE database see my robust ASR publications.
  • I was the principal investigator of the spoken dialogue team at Bell Labs, Lucent Technologies for the DARPA Communicator project on spoken dialogue systems 2000-2001. I worked together with Eric Fosler-Lussier, Egbert Ammicht, Jeff Kuo and many others towards modular and generalizable information seeking dialogue systems. For our papers on semantic and pragmatic processing see my publications section of multimodal dialogue systems.
  • I have also participated in various research efforts as a consultant including the EU-FP6 FET project ASPI on speech inversion, the EU-FP7 STREP project DictaSign on sign language recognition, the Greek SRT PENED project on multimodal interfaces etc.