Today, intelligent machines interact and collaborate with humans in a way that demands a greater level of trust between human and machine. A first step towards building intelligent machines that are capable of building and maintaining trust with humans is the design of a sensor that will enable machines to estimate human trust level in real-time. In this paper, two approaches for developing classifier-based empirical trust sensor models are presented that specifically use electroencephalography (EEG) and galvanic skin response (GSR) measurements. Human subject data collected from 45 participants is used for feature extraction, feature selection, classifier training, and model validation. The first approach considers a general set of psychophysiological features across all participants as the input variables and trains a classifier-based model for each participant, resulting in a trust sensor model based on the general feature set (i.e., a "general trust sensor model"). The second approach considers a customized feature set for each individual and trains a classifier-based model using that feature set, resulting in improved mean accuracy but at the expense of an increase in training time. This work represents the first use of real-time psychophysiological measurements for the development of a human trust sensor. Implications of the work, in the context of trust management algorithm design for intelligent machines, are also discussed.
Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the "cause" and "treat" relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.
When searching on the web, results are often returned as lists of hundreds to thousands of items, making it difficult for users to understand or navigate the space of results. Research has demonstrated that using clustering to partition search results into coherent, topical clusters can aid in both exploration and discovery. Yet clusters generated by an algorithm for this purpose are often of poor quality and do not satisfy users. As a result, experts must manually evaluate and refine the clustered results for each search query, a process that does not scale to large numbers of search queries. In this work, we investigate using crowd-based human evaluation to inspect, evaluate, and improve clusters to create high-quality clustered search results at scale. We introduce a workflow that begins by using a collection of well-known clustering algorithms to produce a set of clustered search results for a given query. Then, we use crowd workers to holistically assess the quality of each clustered search result in order to find the best one. Finally, the workflow has the crowd spot and fix problems in the best result in order to produce a final output. We evaluate this workflow on 120 top search queries from the Google Play Store, some of whom have clustered search results as a result of evaluations and refinements by experts. Our evaluations demonstrate that the workflow is effective at reproducing the evaluation of expert judges and also improves clusters in a way that agrees with experts and crowds alike.
The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multi-focus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodess efficacy and potential impact in themHealth domain. Ultimately we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research. For a video demonstration of Chronodes, please refer to the provided video figure.
Full-body human movement is characterized by fine-grain expressive qualities that humans are easily capable to exhibit and recognize in others movement. In sports (e.g., martial arts) as well as in performing arts (e.g., dance), the same sequence of movements can be performed in a wide range of ways characterized by different qualities, often in terms of subtle (spatial and temporal) perturbations of the movement. Even a non-expert observer can distinguish between a top-level and an average performance by a dancer or martial artist. The difference is not in the performed movements - the same in both cases - but in the quality of their performance. In this paper, we present a computational framework aiming at an automated approximate measure of movement quality in full-body physical activities. Starting from motion capture data, the framework computes low-level (e.g., a limb velocity) and high-level (e.g., synchronization between different limbs) movement features. Then, this vector of features is integrated to compute a value aiming at providing a quantitative assessment of movement quality, approximating the evaluation an external expert observer would give of the same sequence of movements. Next, a system representing a concrete implementation of the framework is proposed. Karate is adopted as a testbed. We selected two different katas (i.e., detailed choreographies of movements in karate), characterized by different overall attitude and expression (aggressiveness, meditation), and we asked seven athletes, having various levels of experience and age, to perform them. Motion capture data were collected from the performances and were analyzed with the system. The results of the automated analysis were compared with the scores given by fourteen karate experts who rated the same performances. Results show that the movement quality scores computed by the system and the ratings given by the human observers are highly correlated (Pearsons correlations r = 0.84, p = 0.001 and r = 0.75, p = 0.005).
Research impact plays a critical role in evaluating the research quality and influence of a scholar, a journal, or a conference. Many researchers have attempted to quantify research impact by introducing different types of metrics based on citation data, such as h-index, citation count, and impact factor. These metrics are widely used in academic community. However, quantitative metrics are highly aggregated in most cases and sometimes biased, which probably results in the loss of impact details that are important for comprehensively understanding research impact. For example, which research area does a researcher have great research impact on? How does the research impact change over time? How do the collaborators take effect on the research impact of an individual? Simple quantitative metrics can hardly help answer such kind of questions, since more detailed exploration of the citation data is needed. Previous work on visualizing citation data usually only shows limited aspects of research impact and may suffer from other problems including visual clutter and scalability issues. To fill this gap, we propose an interactive visualization tool ImpactVis for better exploration of research impact through citation data. Case studies and in-depth expert interviews are conducted to demonstrate the effectiveness of ImpactVis.
User grouping in asynchronous online forums is a common phenomenon nowadays. People with similar backgrounds or shared interests like to get together in group discussions. As tens of thousands of archived conversational posts accumulate, challenges emerge for forum administrators and analysts to effectively explore user groups in large-volume threads and gain meaningful insight into hierarchical discussions. Simply identifying and comparing groups in a single thread are nontrivial tasks as the number of users and posts increases with time and noises hamper detections of user groups. Researchers in data mining field have proposed a large body of algorithms to explore user grouping, however the result is not revealing to laymen. To address these problems, we present VisForum, a visual analytic system which allows people to interactively explore user groups in a forum. We work closely with two educators who have released courses in MOOC platforms and compile a list of design goals to guide our design. Following a set of design rationales and tasks, we design and implement a multi-coordinated interface as well as several novel glyphs, i.e. group glyph, user glyph and set glyph, with different granularities. Accordingly, we propose the Group Detecting & Sorting Algorithm to reduce noises in a collection of posts, and employ the concept of `Forum-index' for end users to identify high-impact forum members. Two case studies using different real-world datasets demonstrate the usefulness of the system and the effectiveness of novel glyph designs. Furthermore, we conduct an in-lab user study to present the usability of VisForum.
In 2015, the top ten largest amusement park corporations saw a combined annual attendance of over 400 million visitors. Daily average attendance in some of the most popular theme parks in the world can average 44,000 visitors per day. These visitors ride attractions, shop for souvenirs and dine at local establishments; however, a critical component of their visit is the overall park experience. This experience depends on the wait time for rides, the crowd flow in the park and various other factors linked to the crowd dynamics and human behavior. As such, better insight into visitor behavior can help theme parks devise competitive strategies for improved customer experience. Research into the use of attractions, facilities and exhibits can be studied, and as behavior profiles emerge, park operators can also identify anomalous behaviors of visitors which can improve safety and operations. In this paper, we present a visual analytics framework for analyzing crowd dynamics in theme parks. Our proposed framework is designed to support behavioral analysis by summarizing patterns and detecting anomalies. We provide methodologies to link visitor movement data, communication data, and park infrastructure data. This combination of data sources enables a semantic analysis of who, what, when and where analysis, enabling analysts to explore visitor-visitor interactions and visitor-infrastructure interactions. Analysts can explore behaviors at the macro level through semantic trajectory clustering views for group behavior dynamics, as well as at the micro level using trajectory traces and a novel visitor network analysis view. We demonstrate the efficacy of our framework through two case studies of simulated theme park visitors.
Exploring high-dimensional data is challenging. Dimension reduction algorithms, such as weighted multi- dimensional scaling, support data explorations by projecting datasets to two dimensions for visualization. These projections can be explored through parametric interaction, tweaking underlying parameterizations, and observation-level interaction, directly interacting with the points within the projection. In this paper, we present the results of a controlled usability study determining the differences, advantages, and drawbacks among parametric interaction, observation-level interaction, and their combination. The study assesses both interaction techniques affects on domain-specific high-dimensional data analyses performed by non-experts of statistical algorithms. This study is performed using Andromeda, a tool that enables both parametric and observation-level interaction to provide in-depth data exploration. The results indicate that the two forms of interaction serve different, but complementary, purposes in gaining insight through steerable dimension reduction algorithms.
This paper presents a conceptual framework for human-robot trust which uses game theory to represent a definition of trust, derived from social psychology. This conceptual framework generates several testable hypotheses related to human-robot trust. This paper examines these hypotheses and a series of experiments we have conducted which both provide support for and also conflict with our framework for trust. We also discuss the methodological challenges associated with investigating trust. The paper concludes with a description of the important areas for future research on the topic of human-robot trust.