In gesture recognition, one challenge that researchers and developers face is the need for recognition strategies that mediate between false positives and false negatives. In this paper, we examine bi-level thresholding, a recognition strategy that uses two threshold: a tighter threshold limits false positives and recognition errors, and a looser threshold prevents repeated errors (false negatives) by analyzing movements in sequence. We first describe early observations that lead to the development of the bi-level thresholding algorithm. Next, using a wizard-of-Oz recognizer, we hold recognition rates constant and adjust for fixed versus bi-level thresholding; we show that systems using bi-level thresholding result in significant lower workload scores on the NASA-TLX and significantly lower accelerometer variance when performing gesture input. Finally, we examine the effect that bi-level thresholding has on a real-world data set of wrist and finger gestures, showing an ability to significantly improve measures of precision and recall. Overall, these results argue for the viability of bi-level thresholding as an effective technique for balancing between false positives, recognition errors and false negatives.
This paper reports on the development of capabilities for (on-screen) virtual agents and robots to support isolated older adults in their homes. A real-time architecture was developed to use a virtual agent or a robot interchangeably to interact via dialog and gesture with a human user. Users could interact with either agent on twelve different activities, some of which included on-screen games, and forms to complete. The paper reports on a pre-study that guided the choice of interaction activities. A month-long study with 44 adults between the ages of 55 and 91 assessed differences in the use of the robot and virtual agent.
Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time, and showcase this using our "PredDial" portal. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. We explore the correlations between different dialogue acts and the outcome of the conversations in detail, using an actionable-rule discovery task by leveraging state-of-the-art sequential rule mining algorithm while modeling a set of conversations as a set of sequences. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.
Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success, but also a hindrance to good quality: contributions can be of poor quality because anyone, even anonymous users, can participate. Though Wikipedia has defined guidelines as to what makes the perfect article, authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. However, little has been done to support quality assessment of user-generated content through interactive tools that combine automatic methods and human intelligence. We developed WikiLyzer, a Web toolkit comprising three interactive applications designed to assist (i) knowledge discovery experts in creating and testing metrics for quality measurement, (ii) Wikipedia users searching for good articles, and (iii) Wikipedia authors that need to identify weaknesses to improve a particular article. A design study sheds a light on how experts could create complex quality metrics with our tool, while a user study reports on its usefulness to identify high-quality content.
Collective urban mobility embodies the residents local insights on the city. Mobility practices of the residents are produced from their spatial choices, which involve various considerations such as the atmosphere of destinations, distance, past experiences, and preferences. The advances in mobile computing and the rise of geo-social platforms have provided the means for capturing the mobility practices; however, interpreting the residents insights is challenging due to the scale and complexity of an urban environment, and its unique context. In this paper, we present MobInsight, a framework for making localized interpretations of urban mobility that reflect various aspects of the urbanism. MobInsight extracts a rich set of neighborhood features through holistic semantic aggregation, and models the mobility between all-pairs of neighborhoods. We evaluate MobInsight with the mobility data of Barcelona and demonstrate diverse localized and semantically-rich interpretations.
Understanding a target audiences emotional responses to video advertisements is crucial to stakeholders. However, traditional methods for collecting such information are slow, expensive, and coarse-grained. We propose AttentiveVideo, an intelligent mobile interface with corresponding inference algorithms to monitor and quantify the effects of mobile video advertising in real time. AttentiveVideo employs a combination of implicit photoplethysmography (PPG) sensing and facial expression analysis (FEA) to predict viewers attention, engagement, and sentiment when watching video advertisements on unmodified smartphones. In a 24-participant study, AttentiveVideo achieved good accuracy on a wide range of emotional measures (the best accuracy = 73.4%, kappa = 0.46 across 9 measures). We also found that the PPG sensing channel and the FEA technique are complementary in both prediction accuracy and signal availability. These findings show the potential for both low-cost collection and deep understanding of emotional responses to mobile video advertisements.
Sleep is the most important aspect of healthy and active living. Right amount of sleep at the right time helps an individual to protect his physical, mental, cognitive health and maintain his quality of life. The most durative of the Activities of Daily Living (ADL), sleep, has a major synergic influence on a persons fuctional, behavioral and cognitive health. A deep understanding of sleep behavior and its relationship with its physiological signals, and contexts (such as eye or body movements) is necessary to design and develop a robust intelligent sleep monitoring system. In this paper, we propose an intelligent algorithm to detect the microscopic states of the sleep, which fundamentally constitute the components of a good and bad sleeping behavior and thus help shape the formative assessment of sleep quality. Our initial analysis includes the investigation of several classification techniques to identify and correlate the relationship of microscopic sleep states with the overall sleep behavior. Subsequently, we also propose an online algorithm based on change point detection to process and classify the microscopic sleep states. We also develop a lightweight version of the proposed algorithm for real-time sleep monitoring, recognition and assessment at scale. For a larger deployment of our proposed model across a community of individuals, we propose an active learning based methodology to reduce the effort of ground truth data collection and labeling. Finally, we evaluate the performance of our proposed algorithms on real data traces, and demonstrate the efficacy of our models for detecting and assessing the fine-grained sleep states beyond an individual.
Deep learning has emerged as a powerful tool for feature-driven labeling of datasets. However, for it to be effective, it requires a large and finely-labeled training dataset. Precisely labeling a large training dataset is expensive, time consuming, and error-prone. In this paper we present a visually-driven deep learning approach that starts with a coarsely-labeled training dataset, and iteratively refines the labeling through intuitive interactions that leverage the latent structures of the dataset. Our approach can be used to (a) alleviate the burden of intensive manual labeling that captures the fine nuances in a high-dimensional dataset by simple visual interactions, (b) replace a complicated (and therefore difficult to design) labeling algorithm by a simpler (but coarse) labeling algorithm supplemented by user interaction to refine the labeling, or (c) use low-dimensional features (such as the RGB colors) for coarse labeling and turn to higher-dimensional (hyperspectral) latent structures, that are progressively revealed by deep learning, for fine labeling. We validate our approach through use cases on three high-dimensional datasets.
The beyond-relevance objectives of recommender systems have been drawing more and more attention. For example, a diversity-enhanced interface has been shown to associate positively with overall levels of user satisfaction. However, little is known about how users adopt diversity-enhanced interfaces to accomplish various real-world tasks. In this paper, we present two attempts at creating a visual diversity-enhanced interface that presents recommendations beyond a simple ranked list. Our goal was to design a recommender system interface to help users explore the different relevance prospects of recommended items in parallel and to stress their diversity. Two within-subject user studies in the context of social recommendation at academic conferences were conducted to compare our visual interfaces. Results from our user study show that the visual interfaces significantly reduced the exploration efforts required for given tasks and helped users to perceive the recommendation diversity. We show that the users examined a diverse set of recommended items while experiencing an improvement in overall user satisfaction. Also, the users' subjective evaluations show significant improvement in many user-centric metrics. Experiences are discussed that shed light on avenues for future interface designs.
Trying to understand a players characteristics with regards to a computer game is a major line of research known as player modeling. The purpose of such player modeling is typically the adaptation of the game itself. We present two studies that extend player modeling into player profiling by trying to identify through a players in-game behavior more abstract personality traits such as the need for cognition and self-esteem. We present evidence that game mechanics that can be broadly adopted by several game genres, such as hints and a players self-evaluation at the end of a level, correlate with the aforementioned personality traits. We conclude by presenting future directions for research regarding this topic.
Design sketching is an important tool for designers and creative professionals to express their ideas and concepts in a visual medium. Being a critical and versatile skill for many different disciplines, courses on design sketching are sometimes taught in universities. Courses today predominately rely on pen and paper, however this traditional pedagogy is limited by the availability of human instructors who can provide personalized feedback. Using a stylus-based intelligent tutoring system called PerSketchTivity, we aim to mimic the feedback given by an instructor and assess the student drawn sketches to give them insight into the areas they need to improve on. In order to provide effective feedback to users, it is important to identify what features of their sketches they should work on to improve their sketching ability. After consulting with several domain experts in sketching, we came up with an initial list of 22 different metrics that could potentially differentiate expert and novice sketches. We gathered over 2000 sketches from 20 novices and four experts for analysis. Seven metrics were shown to significantly correlate with the quality of expert sketches and provided insight into providing intelligent user feedback as well as an overall model of expert sketching ability.
Gone are the days of robots solely operating in isolation, without direct interaction with people. Rather, robots are increasingly being deployed in environments and roles that require complex social interaction with humans. The implementation of human-robot teams continues to increase as technology develops in tandem with the state of human-robot interaction (HRI) research. Trust, a major component of much human interaction, is an important facet of HRI. However, the ideas of trust repair and trust violations are understudied in the HRI literature. Trust repair is the activity of rebuilding trust after one party breaks the trust of another. These trust breaks are referred to as trust violations. As HRI becomes widespread, so will trust violations; as a result, a clear understanding of the process of HRI trust repair must be developed in order to ensure that a human-robot team can continue to perform well after trust is violated. Previous research on human-automation trust and human-human trust can serve as starting places for exploring trust repair in HRI. Although existing models of human-automation and human-human trust are helpful, they do not account for some of the complexities of building and maintaining trust in unique relationships between humans and robots. As such, the purpose of this paper is to provide a foundation for exploring human-robot trust repair by drawing upon prior work in the human-robot and human-human trust literature, concluding with recommendations for advancing this body of work.
This paper introduces Cartograph, a visualization system that harnesses the vast world knowledge encoded within Wikipedia to create thematic maps of almost any data. Cartograph extends previous systems that visualize non-spatial data using geographic approaches. While these systems required data with an existing semantic structure, Cartograph unlocks spatial visualization for a much larger variety of datasets by enhancing input datasets with semantic information extracted from Wikipedia. Cartograph's map embeddings use neural networks trained on Wikipedia article content and user navigation behavior. Using these embeddings, the system can reveal connections between points that are unrelated in the original data sets, but are related in meaning and therefore embedded close together on the map. We describe the design of the system and key challenges we encountered. We present findings from an exploratory user study and introduce a novel human-centered evaluation technique that can be used on a variety of scatterplot visualizations.
As pen-centric systems increase in the marketplace, they create a parallel need for learning analytic techniques based on dynamic writing. Recent empirical research has shown that signal-level features of dynamic handwriting, such as stroke distance, pressure, and duration, are adapted to conserve total energy expenditure as individuals consolidate expertise in a domain. The aim of this research was to examine how accurately three different machine learning algorithms could automatically classify students by their level of domain expertise, without conducting any written content analysis. Compared with an unsupervised classification accuracy of 71%, a hybrid approach that combined empirical-statistical guidance of machine learning consistently led to correctly classifying 79-92% of students by their expertise level. The hybrid approach also enabled deriving a causal understanding of the basis for prediction success, improved transparency, and a foundation for generalizing results. These findings open up opportunities to design new student-adaptive educational technologies based on individualized data for existing pen-centric systems.
Endowing animated virtual characters with emotionally expressive behaviors is paramount to improve the quality of the interactions between humans and virtual characters. Full-body motion, in particular its subtle kinematic variations, represents an effective way of conveying emotionally expressive content. However, before synthesizing expressive full-body movements, it is necessary to identify and understand what qualities of human motion are salient to the perception of emotions and how these qualities can be exploited when generating novel and equally expressive full-body movements. Based on previous studies, we argue that it is possible to perceive and generate expressive full-body movements from end-effector trajectories alone. Hence, end-effector trajectories define a reduced motion space that is adequate for the characterization of the expressive qualities of human motion and that is both fitting for the analysis and generation of emotionally expressive full-body movements. The purpose and main contribution of this work is the methodological framework we defined and used to assess the validity and applicability of the end-effector trajectories for the perception and generation of expressive full-body movements. This framework consists of the creation of a motion capture database of expressive theatrical movements, the development of a motion synthesis system based on trajectories re-played or re-sampled and inverse kinematics, and two perceptual studies.
We present an intelligent virtual interviewer that engages with a user in a text-based conversation and automatically infers the users personality traits. We investigate how the personality of a virtual interviewer as well as the personality of a user inferred from a virtual interview influences the users trust in the virtual interviewer from two perspectives: the users willingness to confide in, and listen to, a virtual interviewer. We have developed two virtual interviewers with distinct personalities and deployed them in a series of real-world events. We present findings from four real-world deployments with completed interviews of 1280 users, including 606 actual job applicants. Notably, users are more willing to confide in and listen to a virtual interviewer with a serious, assertive personality in a high-stakes job interview. Moreover, users personality traits, inferred from their chat text, along with interview context, influence their perception of a virtual interviewer, and their willingness to confide in and listen to a virtual interviewer. Finally, we discuss the implications of our work on building hyper-personalized, intelligent agents based on user traits.
A significant fraction of information searches are motivated by the user's primary task, such as documents that the user is reading or writing that trigger the information need and search activity. An ideal search engine would be able to use information inferred from the primary task in order to retrieve useful information. Previous work has shown that many information retrieval activities depend on the primary task in which the retrieved information is to be used, but fairly little research has been focusing on methods that automatically learn the informational intents from the primary task context. We study how the implicit primary task context can be used to model the user's search intent and to proactively retrieve relevant and useful information. Data comprising of logs from a user study, in which users are writing an essay, demonstrate that users' search intents can be inferred from the task and relevant and useful information can be proactively retrieved. Data from simulations with several data sets of different complexity show that the proposed approach of using primary task context generalizes to a variety of data. Our findings have implications for the design of proactive search systems that can infer users' search intent implicitly by monitoring users' primary task activities.
Drone navigation in complex environments poses many problems to teleoperators. Especially in 3D structures like buildings or tunnels, viewpoints are often limited to the drone's current camera view, nearby objects can be collision hazards, and frequent occlusion can hinder accurate manipulation. To address these issues, we have developed a novel interface for teleoperation that provides a user with environment-adaptive viewpoints that are automatically configured to improve safety and smooth user operation. This real-time adaptive viewpoint system takes robot position, orientation, and 3D pointcloud information into account to modify user-viewpoint to maximize visibility. Our prototype uses simultaneous localization and mapping (SLAM) based reconstruction with an omnidirectional camera and we use resulting models as well as simulations in a series of preliminary experiments testing navigation of various structures. Results suggest that automatic viewpoint generation can outperform first and third-person view interfaces for virtual teleoperators in terms of ease of control and accuracy of robot operation.
Discovering the correlations among variables of air quality data is challenging because the correlation time-series are long-lasting, multi-faceted, and information-sparse. In this paper, we propose a novel visual representation, called Time-Correlation Partitioning (TCP) tree that compactly characterizes correlations of multiple air quality variables and their evolutions. A TCP tree is generated by partitioning the information-theoretic correlation time-series into pieces with respect to the variable hierarchy and temporal variations, and reorganizing these pieces into a hierarchically nested structure. The visual exploration of a TCP tree provides a sparse data traversal of the correlation variations, and a situation-aware analysis of correlations among variables. This can help meteorologists understand the correlations among air quality variables better. We demonstrate the efficiency of our approach in a real-world air quality investigation scenario.
Symbolic motion planning for robots is the process of specifying and planning robot tasks in a discrete space, then carrying them out in a continuous space in a manner that preserves the discrete-level task specifications. Despite progress in symbolic motion planning, many challenges remain, including addressing scalability for multi-robot systems and improving solutions by incorporating human intelligence. In this paper, distributed symbolic motion planning for multi-robot systems is developed to address scalability. More specifically, compositional reasoning approaches are developed to decompose the global planning problem, and atomic propositions for observation, communication, and control are proposed to address inter-robot collision avoidance. To improve solution quality and adaptability, a dynamic, quantitative, and probabilistic human-to-robot trust model is developed to aid this decomposition. Furthermore, a trust-based real-time switching framework is proposed to switch between autonomous and manual motion planning for tradeoffs between task safety and efficiency. Deadlock- and livelock-free algorithms are designed to guarantee reachability of goals with a human-in-the-loop. A set of non-trivial multi-robot simulations with direct human input and trust evaluation are provided demonstrating the successful implementation of the trust-based multi-robot symbolic motion planning methods.
Enticing passers-by to a focused interaction with a public display requires taking appropriate action depending on how much attention visitors are already paying to the display. A suchlike system might want to emit a strong signal that makes the inattentive visitor look or turn towards it or choose to give the actual content in a way that indicates that a head-on looking visitor has been registered and is addressed individually (as opposed to a dumb system just playing a message in a loop). The challenge in this connection is to reliably determine the attention of passers-by both considering single persons and groups of visitors simultaneously appearing within the displays field of view. In this article, we present a model for estimating individual and collective human attention towards a focal stimulus and investigate different technical methods for measuring physical expressive features (i.e. the basis for deriving a persons attention). In the course of an experimental setup we compare a Support Vector Machine (SVM) as a measuring technique, a neural network using a Multilayer Perceptron (MLP) and a Finite State Machine (FSM) and compare the results to a manual reference classification. We carve out strengths and weaknesses and identify the most feasible measuring method with regard to precision of recognition and practical application.
Trust in automation has become a topic of intensive study over the past two decades. While the earliest trust experiments involved human interventions to correct failures/errors in automated control systems a majority of subsequent studies have investigated information acquisition and analysis decision aiding tasks such as target detection for which automation reliability is more easily manipulated. Despite the high level of international dependence on automation in industry and transport almost all current studies have employed Western samples primarily from the US. The present study addresses these gaps by running a large sample experiment in three (US, Taiwan and Turkey) diverse cultures using a trust sensitive task consisting of both automated control and target detection subtasks. This paper presents results for the target detection subtask for which reliability and task load were manipulated. The current experiments allow us to determine whether reported effects are universal or specific to Western culture, vary in baseline or magnitude, or differ across cultures. Results generally confirm consistent effects of manipulations across the three cultures as well as cultural differences in initial trust and variation in effects of manipulations consistent with 10 cultural hypotheses based on Hofstedes Cultural Dimensions and Leung and Cohens theory of Cultural Syndromes. These results provide critical implications and insights for enhancing human trust in intelligent automation systems across cultures. Our paper presents the following contributions: First, to the best of our knowledge, this is the first set of studies that deal with cultural factors across all the cultural syndromes identified in the literature by comparing trust in the Honor, Face, Dignity cultures. Second, this is the first set of studies that uses a validated cross-cultural trust measure for measuring trust in automation. Third, our experiments are the first to study the dynamics of trust across cultures
Convergent lines of evidence indicate that anthropomorphic robots are represented using neurocognitive mechanisms typically employed in social reasoning about other people. Relatedly, a growing literature documents that contexts of threat can exacerbate coalitional biases in social perceptions. Integrating these research programs, the present studies test whether cues of violent intergroup conflict modulate perceptions of the intelligence, emotional experience, or overall personhood of robots. In Studies 1 and 2, participants evaluated a large, bipedal all-terrain robot; in Study 3, participants evaluated a small, social robot with humanlike facial and vocal characteristics. Across all studies, cues of violent conflict caused significant decreases in perceived robotic personhood, and this shift was mediated by parallel reductions in emotional sympathy with the robot (with no significant effects of threat on attributions of intelligence). In addition, in Study 2, participants in the conflict condition estimated the large bipedal robot to be less effective in military combat, and this difference was mediated by the reduction in perceived robotic personhood. These results are discussed as they motivate future investigation into the links between threat, coalitional bias and human-robot interaction.
Sophisticated ubiquitous sensing systems are being used to measure motor ability in clinical settings. Intended to augment clinical decision-making, the interpretability of the machine learning measurements underneath becomes critical to their use. We explore how visualization can support the interpretability of machine learning measures through the case of Assess MS, a system to support the clinical assessment of Multiple Sclerosis. A substantial design challenge is to make visible the algorithms decision-making process in a way that allows clinicians to integrate the algorithms result into their own decision process. To this end, we present an iterative design research study that draws out challenges of supporting interpretability in a real-world system. The key contribution of this paper is to illustrate that simply making visible the algorithmic decision-making process is not helpful in supporting clinicians in their own decision-making process. It disregards that people and algorithms make decisions in different ways. Instead, we propose that visualisation can provide context to algorithmic decision-making, rendering observable a range of internal workings of the algorithm from data quality issues to the web of relationships generated in the machine learning process.