Introduction to the Special Issue on Human-Centered Machine Learning
In gesture recognition, one challenge that researchers and developers face is the need for recognition strategies that mediate between false positives and false negatives. In this paper, we examine bi-level thresholding, a recognition strategy that uses two threshold: a tighter threshold limits false positives and recognition errors, and a looser threshold prevents repeated errors (false negatives) by analyzing movements in sequence. We first describe early observations that lead to the development of the bi-level thresholding algorithm. Next, using a wizard-of-Oz recognizer, we hold recognition rates constant and adjust for fixed versus bi-level thresholding; we show that systems using bi-level thresholding result in significant lower workload scores on the NASA-TLX and significantly lower accelerometer variance when performing gesture input. Finally, we examine the effect that bi-level thresholding has on a real-world data set of wrist and finger gestures, showing an ability to significantly improve measures of precision and recall. Overall, these results argue for the viability of bi-level thresholding as an effective technique for balancing between false positives, recognition errors and false negatives.
This paper reports on the development of capabilities for (on-screen) virtual agents and robots to support isolated older adults in their homes. A real-time architecture was developed to use a virtual agent or a robot interchangeably to interact via dialog and gesture with a human user. Users could interact with either agent on twelve different activities, some of which included on-screen games, and forms to complete. The paper reports on a pre-study that guided the choice of interaction activities. A month-long study with 44 adults between the ages of 55 and 91 assessed differences in the use of the robot and virtual agent.
Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time, and showcase this using our "PredDial" portal. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. We explore the correlations between different dialogue acts and the outcome of the conversations in detail, using an actionable-rule discovery task by leveraging state-of-the-art sequential rule mining algorithm while modeling a set of conversations as a set of sequences. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.
Machine learning (ML) has become increasingly inuential to human society, yet the primary advancements and applications of ML are driven by research in only a few computational disciplines. Even applications that affect or analyze human behaviors and social structures are often developed with limited input from experts outside of computational elds. Social scientistsexperts trained to examine and explain the complexity of human behavior and interactions in the worldhave considerable expertise to contribute to the development of ML applications for human-generated data, and their analytic practices could benet from more human- centered ML methods. In this work, we highlight some of the gaps in applying ML to social science research. Building upon content analysis of social media papers, a survey study, and interviews, we summarize the current use and challenges of ML in social sciences. Additionally, we utilize our experience designing a visual analytics tool for collaborative qualitative coding as a case study to illustrate how we might re-imagine the way ML could support workows for social scientists. Finally, we propose three research directions to ground ML applications for social science with the ultimate goal of achieving truly human-centered machine learning.
Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the "cause" and "treat" relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.
Collective urban mobility embodies the residents local insights on the city. Mobility practices of the residents are produced from their spatial choices, which involve various considerations such as the atmosphere of destinations, distance, past experiences, and preferences. The advances in mobile computing and the rise of geo-social platforms have provided the means for capturing the mobility practices; however, interpreting the residents insights is challenging due to the scale and complexity of an urban environment, and its unique context. In this paper, we present MobInsight, a framework for making localized interpretations of urban mobility that reflect various aspects of the urbanism. MobInsight extracts a rich set of neighborhood features through holistic semantic aggregation, and models the mobility between all-pairs of neighborhoods. We evaluate MobInsight with the mobility data of Barcelona and demonstrate diverse localized and semantically-rich interpretations.
Sleep is the most important aspect of healthy and active living. Right amount of sleep at the right time helps an individual to protect his physical, mental, cognitive health and maintain his quality of life. The most durative of the Activities of Daily Living (ADL), sleep, has a major synergic influence on a persons fuctional, behavioral and cognitive health. A deep understanding of sleep behavior and its relationship with its physiological signals, and contexts (such as eye or body movements) is necessary to design and develop a robust intelligent sleep monitoring system. In this paper, we propose an intelligent algorithm to detect the microscopic states of the sleep, which fundamentally constitute the components of a good and bad sleeping behavior and thus help shape the formative assessment of sleep quality. Our initial analysis includes the investigation of several classification techniques to identify and correlate the relationship of microscopic sleep states with the overall sleep behavior. Subsequently, we also propose an online algorithm based on change point detection to process and classify the microscopic sleep states. We also develop a lightweight version of the proposed algorithm for real-time sleep monitoring, recognition and assessment at scale. For a larger deployment of our proposed model across a community of individuals, we propose an active learning based methodology to reduce the effort of ground truth data collection and labeling. Finally, we evaluate the performance of our proposed algorithms on real data traces, and demonstrate the efficacy of our models for detecting and assessing the fine-grained sleep states beyond an individual.
Interactive Machine Learning (IML) seeks to complement human perception and intelligence by tightly integrating these strengths with the computational power and speed of computers. The interactive process is designed to involve input from the user but does not require the background knowledge or experience that might be necessary to work with more traditional machine learning techniques. Under the IML process, non-experts can apply their domain knowledge and insight over otherwise unwieldy datasets to find patterns of interest or develop complex data driven applications. This process is co-adaptive in nature and relies on careful management of the interaction between human and machine. Design of the interface is fundamental to the success of this approach, yet there is a lack of consolidated principles on how such an interface should be implemented. This article presents a detailed review and characterization of Interactive Machine Learning from an interactive systems perspective. We propose and describe a structural and behavioural model of a generalized IML system and identify solution principles for building effective interfaces for IML. Where possible, these emergent solution principles are contextualized by reference to the broader human-computer interaction literature. Finally, we identify strands of user interface research key to unlocking more efficient and productive non-expert interactive machine learning applications.
Tagging of environment audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g. a single labeled example), it is often not feasible to train an automatic labeler, because many techniques (e.g. Deep Learning) require a large number of human-labeled training examples. Also, fully-automated labeling may not show sufficient agreement with human labeling for many uses. We describe a human-in-the-loop labeling approach that lets a single user greatly reduce the time required to label audio that is tediously long for a human (e.g. 20 hours), has target sounds that are sparse in the audio (10% or less of the audio contains the target), and has too few prior labeled examples (e.g. one) to train a state-of-the-art machine audio labeling system. In this work we describe an interactive sound annotator for this use case. Results from a human-subject study show our tool helped participants label all target sound events within a recording twice as fast as labeling them manually. We present a method to decompose the overall performance of the proposed system into two key factors, interaction overhead and machine accuracy by measuring each of them respectively. These results indicate a future system should be able to speed labeling by as much as a factor of four.
Design sketching is an important tool for designers and creative professionals to express their ideas and concepts in a visual medium. Being a critical and versatile skill for many different disciplines, courses on design sketching are sometimes taught in universities. Courses today predominately rely on pen and paper, however this traditional pedagogy is limited by the availability of human instructors who can provide personalized feedback. Using a stylus-based intelligent tutoring system called PerSketchTivity, we aim to mimic the feedback given by an instructor and assess the student drawn sketches to give them insight into the areas they need to improve on. In order to provide effective feedback to users, it is important to identify what features of their sketches they should work on to improve their sketching ability. After consulting with several domain experts in sketching, we came up with an initial list of 22 different metrics that could potentially differentiate expert and novice sketches. We gathered over 2000 sketches from 20 novices and four experts for analysis. Seven metrics were shown to significantly correlate with the quality of expert sketches and provided insight into providing intelligent user feedback as well as an overall model of expert sketching ability.
Gone are the days of robots solely operating in isolation, without direct interaction with people. Rather, robots are increasingly being deployed in environments and roles that require complex social interaction with humans. The implementation of human-robot teams continues to increase as technology develops in tandem with the state of human-robot interaction (HRI) research. Trust, a major component of much human interaction, is an important facet of HRI. However, the ideas of trust repair and trust violations are understudied in the HRI literature. Trust repair is the activity of rebuilding trust after one party breaks the trust of another. These trust breaks are referred to as trust violations. As HRI becomes widespread, so will trust violations; as a result, a clear understanding of the process of HRI trust repair must be developed in order to ensure that a human-robot team can continue to perform well after trust is violated. Previous research on human-automation trust and human-human trust can serve as starting places for exploring trust repair in HRI. Although existing models of human-automation and human-human trust are helpful, they do not account for some of the complexities of building and maintaining trust in unique relationships between humans and robots. As such, the purpose of this paper is to provide a foundation for exploring human-robot trust repair by drawing upon prior work in the human-robot and human-human trust literature, concluding with recommendations for advancing this body of work.
As pen-centric systems increase in the marketplace, they create a parallel need for learning analytic techniques based on dynamic writing. Recent empirical research has shown that signal-level features of dynamic handwriting, such as stroke distance, pressure, and duration, are adapted to conserve total energy expenditure as individuals consolidate expertise in a domain. The aim of this research was to examine how accurately three different machine learning algorithms could automatically classify students by their level of domain expertise, without conducting any written content analysis. Compared with an unsupervised classification accuracy of 71%, a hybrid approach that combined empirical-statistical guidance of machine learning consistently led to correctly classifying 79-92% of students by their expertise level. The hybrid approach also enabled deriving a causal understanding of the basis for prediction success, improved transparency, and a foundation for generalizing results. These findings open up opportunities to design new student-adaptive educational technologies based on individualized data for existing pen-centric systems.
Endowing animated virtual characters with emotionally expressive behaviors is paramount to improve the quality of the interactions between humans and virtual characters. Full-body motion, in particular its subtle kinematic variations, represents an effective way of conveying emotionally expressive content. However, before synthesizing expressive full-body movements, it is necessary to identify and understand what qualities of human motion are salient to the perception of emotions and how these qualities can be exploited when generating novel and equally expressive full-body movements. Based on previous studies, we argue that it is possible to perceive and generate expressive full-body movements from end-effector trajectories alone. Hence, end-effector trajectories define a reduced motion space that is adequate for the characterization of the expressive qualities of human motion and that is both fitting for the analysis and generation of emotionally expressive full-body movements. The purpose and main contribution of this work is the methodological framework we defined and used to assess the validity and applicability of the end-effector trajectories for the perception and generation of expressive full-body movements. This framework consists of the creation of a motion capture database of expressive theatrical movements, the development of a motion synthesis system based on trajectories re-played or re-sampled and inverse kinematics, and two perceptual studies.
People are not infallible consistent ``oracles'': their confidence in decision-making may vary significantly between tasks and over time. We have previously reported the benefits of using an interface and algorithms that explicitly captured and exploited users' confidence: error rates were reduced by up to 50% for an industrial multi-class learning problem; and the number of interactions required in a design optimisation context was reduced by 33%. Having access to users' confidence judgements could significantly benefit intelligent interactive systems in industry, in areas such as Intelligent Tutoring systems, and in healthcare. There are many reasons for wanting to capture information about confidence implicitly. Some are ergonomic, but others are more `social' - such as wishing to understand (and possibly take account of) users' cognitive state without interrupting them. We investigate the hypothesis that users' confidence can be accurately predicted from measurements of their behaviour. Eye-tracking systems were used to capture users' gaze patterns as they undertook a series of visual decision tasks, after each of which they reported their confidence on a 5-point Likert scale. Subsequently, predictive models were built using ``conventional" Machine Learning approaches for numerical summary features derived from users' behaviour. We also investigate the extent to which the deep learning paradigm can reduce the need to design features specific to each application, by creating ``gazemaps" -- visual representations of the trajectories and durations of users' gaze fixations -- and then training deep convolutional networks on these images. Treating the prediction of user confidence as a two-class problem (confident/not confident), we attained classification accuracy of 88% for the scenario of new users on known tasks, and 87% for known users on new tasks. Considering the confidence as an ordinal variable, we produced regression models with a mean absolute error of H0.7 in both cases. Capturing just a simple subset of non-task-specific numerical features gave slightly worse, but still quite high accuracy (eg. MAE H1.0). Results obtained with gazemaps and convolutional networks are competitive, despite not having access to longer-term information about users and tasks, which was vital for the `summary' feature sets. This suggests that the gazemap-based approach forms a viable, transferable, alternative to hand-crafting features for each different application. These results provide significant evidence to confirm our hypothesis, and offer a way of substantially improving many interactive artificial intelligence applications via the addition of cheap non-intrusive hardware and computationally cheap prediction algorithms
Symbolic motion planning for robots is the process of specifying and planning robot tasks in a discrete space, then carrying them out in a continuous space in a manner that preserves the discrete-level task specifications. Despite progress in symbolic motion planning, many challenges remain, including addressing scalability for multi-robot systems and improving solutions by incorporating human intelligence. In this paper, distributed symbolic motion planning for multi-robot systems is developed to address scalability. More specifically, compositional reasoning approaches are developed to decompose the global planning problem, and atomic propositions for observation, communication, and control are proposed to address inter-robot collision avoidance. To improve solution quality and adaptability, a dynamic, quantitative, and probabilistic human-to-robot trust model is developed to aid this decomposition. Furthermore, a trust-based real-time switching framework is proposed to switch between autonomous and manual motion planning for tradeoffs between task safety and efficiency. Deadlock- and livelock-free algorithms are designed to guarantee reachability of goals with a human-in-the-loop. A set of non-trivial multi-robot simulations with direct human input and trust evaluation are provided demonstrating the successful implementation of the trust-based multi-robot symbolic motion planning methods.
Technologies for sensing movement are expanding towards everyday use in virtual reality, gaming, and artistic practices. In this context, there is a need for methodologies and frameworks to help designers and users create meaningful movement experiences. Mapping through Interaction is a conceptual and computational framework for crafting sonic interactions from demonstrations of embodied associations between motion and sound. It draws upon existing literature emphasizing the importance of bodily experience in sound perception and cognition, and uses interactive machine learning to build the mapping iteratively from user demonstrations. We present a method for modeling the mapping between motion and sound parameter sequences using probability distributions. In particular, we examine Gaussian Mixture Regression and a hierarchical extension to Hidden Markov Regression for continuous movement recognition and sound parameter generation. We discuss the role and interpretation of the model parameters for user-centered interaction design. We review two applications of the approach where users can personalize hand gesture control strategies for continuous interaction with sound textures or vocalizations.
Convergent lines of evidence indicate that anthropomorphic robots are represented using neurocognitive mechanisms typically employed in social reasoning about other people. Relatedly, a growing literature documents that contexts of threat can exacerbate coalitional biases in social perceptions. Integrating these research programs, the present studies test whether cues of violent intergroup conflict modulate perceptions of the intelligence, emotional experience, or overall personhood of robots. In Studies 1 and 2, participants evaluated a large, bipedal all-terrain robot; in Study 3, participants evaluated a small, social robot with humanlike facial and vocal characteristics. Across all studies, cues of violent conflict caused significant decreases in perceived robotic personhood, and this shift was mediated by parallel reductions in emotional sympathy with the robot (with no significant effects of threat on attributions of intelligence). In addition, in Study 2, participants in the conflict condition estimated the large bipedal robot to be less effective in military combat, and this difference was mediated by the reduction in perceived robotic personhood. These results are discussed as they motivate future investigation into the links between threat, coalitional bias and human-robot interaction.
Sophisticated ubiquitous sensing systems are being used to measure motor ability in clinical settings. Intended to augment clinical decision-making, the interpretability of the machine learning measurements underneath becomes critical to their use. We explore how visualization can support the interpretability of machine learning measures through the case of Assess MS, a system to support the clinical assessment of Multiple Sclerosis. A substantial design challenge is to make visible the algorithms decision-making process in a way that allows clinicians to integrate the algorithms result into their own decision process. To this end, we present an iterative design research study that draws out challenges of supporting interpretability in a real-world system. The key contribution of this paper is to illustrate that simply making visible the algorithmic decision-making process is not helpful in supporting clinicians in their own decision-making process. It disregards that people and algorithms make decisions in different ways. Instead, we propose that visualisation can provide context to algorithmic decision-making, rendering observable a range of internal workings of the algorithm from data quality issues to the web of relationships generated in the machine learning process.