Graphical Adaptive Menus are Graphical User Interfaces menus whose items predicted of immediate usage can be automatically rendered in a prediction window. Rendering this prediction window is a key question for adaptivity to enable the end user to appropriately differentiate predicted items from normal ones and to select appropriate items consequently. Adaptivity for graphical menus has been more largely investigated for normal screens, such as desktops, than for small screens, like smartphones where real estate imposes severe rendering constraints. To this end, this paper explores a design space where Graphical Adaptive Menus are designed based on Bertins eight visual variables (i.e., position, size, shape, value, color, orientation, texture, and motion)and their combination by contrasting their rendering for small screens with respect to normal screens. Based on this design space, previously introduced techniques for graphical adaptive menus are revisited in terms of four properties (i.e. spatial, physical, format, and temporal stabilities) and discussed for both normal and small screens. The paper then reports on some experiments conducted for selected case studies and provides a set of usability guidelines useful for designers and practitioners to implement graphical adaptive menus
This paper reports on the development of capabilities for (on-screen) virtual agents and robots to support isolated older adults in their homes. A real-time architecture was developed to use a virtual agent or a robot interchangeably to interact via dialog and gesture with a human user. Users could interact with either agent on twelve different activities, some of which included on-screen games, and forms to complete. The paper reports on a pre-study that guided the choice of interaction activities. A month-long study with 44 adults between the ages of 55 and 91 assessed differences in the use of the robot and virtual agent.
Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time, and showcase this using our "PredDial" portal. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. We explore the correlations between different dialogue acts and the outcome of the conversations in detail, using an actionable-rule discovery task by leveraging state-of-the-art sequential rule mining algorithm while modeling a set of conversations as a set of sequences. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.
Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this paper, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve collective performance by offsetting systematic biases induced by individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that allows aggregation across different tools. Then, we conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the design space of post processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of EM-based aggregation method. The result implies that not only the individual tool's design but also the correction mechanism can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 by providing a novel region-based error correction method and additional in-depth evaluation of the proposed approach.
Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success, but also a hindrance to good quality: contributions can be of poor quality because anyone, even anonymous users, can participate. Though Wikipedia has defined guidelines as to what makes the perfect article, authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. However, little has been done to support quality assessment of user-generated content through interactive tools that combine automatic methods and human intelligence. We developed WikiLyzer, a Web toolkit comprising three interactive applications designed to assist (i) knowledge discovery experts in creating and testing metrics for quality measurement, (ii) Wikipedia users searching for good articles, and (iii) Wikipedia authors that need to identify weaknesses to improve a particular article. A design study sheds a light on how experts could create complex quality metrics with our tool, while a user study reports on its usefulness to identify high-quality content.
Creating scripts for tasks involving manipulating GUIs is hard even for programmers due to limitations on accessing to and interacting with applications widgets. For non-programmng users, it seemed impossible to create scripts for those tasks. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up questions. We propose a user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications. While pure programming-by-demonstration is still unrealistic, we use quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by our system. Our models of events and appearance are surprisingly simple, but are combined effectively to cope with varying amounts of supervision. The best available baseline, Sikuli Slides, struggled with the majority of the tests in our user study experiments. The prototype with our proposed approach successfully helped users accomplish simple linear tasks, complicated tasks (monitoring, looping, and mixed), and tasks that span across multiple executables. Even when both systems could ultimately perform a task, ours was trained and refined by the user in less time.
Understanding a target audiences emotional responses to video advertisements is crucial to stakeholders. However, traditional methods for collecting such information are slow, expensive, and coarse-grained. We propose AttentiveVideo, an intelligent mobile interface with corresponding inference algorithms to monitor and quantify the effects of mobile video advertising in real time. AttentiveVideo employs a combination of implicit photoplethysmography (PPG) sensing and facial expression analysis (FEA) to predict viewers attention, engagement, and sentiment when watching video advertisements on unmodified smartphones. In a 24-participant study, AttentiveVideo achieved good accuracy on a wide range of emotional measures (the best accuracy = 73.4%, kappa = 0.46 across 9 measures). We also found that the PPG sensing channel and the FEA technique are complementary in both prediction accuracy and signal availability. These findings show the potential for both low-cost collection and deep understanding of emotional responses to mobile video advertisements.
Recent research has shown that reliable recognition of sign language words and phrases using user-friendly and non-invasive armbands is feasible and desirable. This work provides an analysis and implementation of including fingerspelling recognition(FR) in such systems, which is a much harder problem due to lack of distinctive hand movements. A novel algorithm called DyFAV (Dynamic Feature Selection and Voting) is proposed for this purpose that exploits the fact that fingerspelling has a finite corpus (26 letters for ASL). Detailed analysis of the algorithm used as well as comparisons with other traditional machine learning algorithms is provided. The system uses an independent multiple agent voting approach to identify letters with high accuracy. The independent voting of the agents ensures that the algorithm is highly parallelizable and thus recognition times can be kept low to suit real-time mobile applications. A thorough explanation and analysis is presented on results obtained on the ASL alphabet corpus for 9 people with limited training. An average recognition accuracy 95.36\% is reported and compared with recognition results from other machine learning techniques. This result is extended by including 6 new users with data collected under similar settings as the previous dataset. Furthermore, a feature selection schema using a subset of the sensors is proposed and the results are evaluated. The mobile, non-invasive, and real time nature of the technology is demonstrated by evaluating performance on various types of android phones and remote server configurations. A brief discussion of the UI is provided along with guidelines for best practices.
Deep learning has emerged as a powerful tool for feature-driven labeling of datasets. However, for it to be effective, it requires a large and finely-labeled training dataset. Precisely labeling a large training dataset is expensive, time consuming, and error-prone. In this paper we present a visually-driven deep learning approach that starts with a coarsely-labeled training dataset, and iteratively refines the labeling through intuitive interactions that leverage the latent structures of the dataset. Our approach can be used to (a) alleviate the burden of intensive manual labeling that captures the fine nuances in a high-dimensional dataset by simple visual interactions, (b) replace a complicated (and therefore difficult to design) labeling algorithm by a simpler (but coarse) labeling algorithm supplemented by user interaction to refine the labeling, or (c) use low-dimensional features (such as the RGB colors) for coarse labeling and turn to higher-dimensional (hyperspectral) latent structures, that are progressively revealed by deep learning, for fine labeling. We validate our approach through use cases on three high-dimensional datasets.
This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection of base recommendation algorithms for e-commerce. We focus on the problem of item-to-item recommendations, for which multiple behavioral and attribute-based predictors are provided to an ensemble learner. In addition, we detail the construction of a personalized predictor based on k-Nearest Neighbors (kNN), with temporal decay capabilities and event weighting. We show how to adapt Thompson Sampling to realistic situations when neither action availability nor reward stationarity is guaranteed. Furthermore, we investigate the effects of priming the sampler with pre-set parameters of reward probability distributions by utilizing the product catalog and/or event history, when such information is available. We report our experimental results based on the analysis of three real-world e-commerce datasets.
The beyond-relevance objectives of recommender systems have been drawing more and more attention. For example, a diversity-enhanced interface has been shown to associate positively with overall levels of user satisfaction. However, little is known about how users adopt diversity-enhanced interfaces to accomplish various real-world tasks. In this paper, we present two attempts at creating a visual diversity-enhanced interface that presents recommendations beyond a simple ranked list. Our goal was to design a recommender system interface to help users explore the different relevance prospects of recommended items in parallel and to stress their diversity. Two within-subject user studies in the context of social recommendation at academic conferences were conducted to compare our visual interfaces. Results from our user study show that the visual interfaces significantly reduced the exploration efforts required for given tasks and helped users to perceive the recommendation diversity. We show that the users examined a diverse set of recommended items while experiencing an improvement in overall user satisfaction. Also, the users' subjective evaluations show significant improvement in many user-centric metrics. Experiences are discussed that shed light on avenues for future interface designs.
Trying to understand a players characteristics with regards to a computer game is a major line of research known as player modeling. The purpose of such player modeling is typically the adaptation of the game itself. We present two studies that extend player modeling into player profiling by trying to identify through a players in-game behavior more abstract personality traits such as the need for cognition and self-esteem. We present evidence that game mechanics that can be broadly adopted by several game genres, such as hints and a players self-evaluation at the end of a level, correlate with the aforementioned personality traits. We conclude by presenting future directions for research regarding this topic.
This paper introduces Cartograph, a visualization system that harnesses the vast world knowledge encoded within Wikipedia to create thematic maps of almost any data. Cartograph extends previous systems that visualize non-spatial data using geographic approaches. While these systems required data with an existing semantic structure, Cartograph unlocks spatial visualization for a much larger variety of datasets by enhancing input datasets with semantic information extracted from Wikipedia. Cartograph's map embeddings use neural networks trained on Wikipedia article content and user navigation behavior. Using these embeddings, the system can reveal connections between points that are unrelated in the original data sets, but are related in meaning and therefore embedded close together on the map. We describe the design of the system and key challenges we encountered. We present findings from an exploratory user study and introduce a novel human-centered evaluation technique that can be used on a variety of scatterplot visualizations.
As pen-centric systems increase in the marketplace, they create a parallel need for learning analytic techniques based on dynamic writing. Recent empirical research has shown that signal-level features of dynamic handwriting, such as stroke distance, pressure, and duration, are adapted to conserve total energy expenditure as individuals consolidate expertise in a domain. The aim of this research was to examine how accurately three different machine learning algorithms could automatically classify students by their level of domain expertise, without conducting any written content analysis. Compared with an unsupervised classification accuracy of 71%, a hybrid approach that combined empirical-statistical guidance of machine learning consistently led to correctly classifying 79-92% of students by their expertise level. The hybrid approach also enabled deriving a causal understanding of the basis for prediction success, improved transparency, and a foundation for generalizing results. These findings open up opportunities to design new student-adaptive educational technologies based on individualized data for existing pen-centric systems.
We present an intelligent virtual interviewer that engages with a user in a text-based conversation and automatically infers the users personality traits. We investigate how the personality of a virtual interviewer as well as the personality of a user inferred from a virtual interview influences the users trust in the virtual interviewer from two perspectives: the users willingness to confide in, and listen to, a virtual interviewer. We have developed two virtual interviewers with distinct personalities and deployed them in a series of real-world events. We present findings from four real-world deployments with completed interviews of 1280 users, including 606 actual job applicants. Notably, users are more willing to confide in and listen to a virtual interviewer with a serious, assertive personality in a high-stakes job interview. Moreover, users personality traits, inferred from their chat text, along with interview context, influence their perception of a virtual interviewer, and their willingness to confide in and listen to a virtual interviewer. Finally, we discuss the implications of our work on building hyper-personalized, intelligent agents based on user traits.
Even without speech recognition errors, robots may encounter difficulties interpreting natural language instructions. We report on a research framework developed for robustly handling miscommunication between people and robots in task-oriented spoken dialogue. We describe TeamTalk, a conversational interface to situated agents like robots that incorporates detection and recovery from the situated grounding problems of referential ambiguity and impossible actions. The current work investigates algorithms for spatial reasoning and nearest-neighbor learning to decide on recovery strategies that a virtual robot should use in different contexts, and evaluates this approach in a longitudinal study over six sessions for each of six participants. When the robot encounters a grounding problem, it looks back on its interaction history to consider how it resolved similar situations. The learning algorithm was trained initially on crowdsourced data but was supplemented by interactions from the study. We compare results collected with user-specific and general models, with user-specific models performing best on measures of dialogue efficiency. The overall contribution is an approach to incorporating additional information from situated context, namely a robot's path planner and its surroundings, to detect and recover from miscommunication using dialogue.
Drone navigation in complex environments poses many problems to teleoperators. Especially in 3D structures like buildings or tunnels, viewpoints are often limited to the drone's current camera view, nearby objects can be collision hazards, and frequent occlusion can hinder accurate manipulation. To address these issues, we have developed a novel interface for teleoperation that provides a user with environment-adaptive viewpoints that are automatically configured to improve safety and smooth user operation. This real-time adaptive viewpoint system takes robot position, orientation, and 3D pointcloud information into account to modify user-viewpoint to maximize visibility. Our prototype uses simultaneous localization and mapping (SLAM) based reconstruction with an omnidirectional camera and we use resulting models as well as simulations in a series of preliminary experiments testing navigation of various structures. Results suggest that automatic viewpoint generation can outperform first and third-person view interfaces for virtual teleoperators in terms of ease of control and accuracy of robot operation.
Social Signal Processing techniques have given the opportunity to analyze in-depth human behavior in social face-to-face interactions. With recent advancements, it is henceforth possible to use these techniques to augment social interactions, especially the human behavior in oral presentations. The goal of this paper is to train a computational model able to provide a relevant feedback to a public speaker concerning his coverbal communication. Hence, the role of this model is to augment the social intelligence of the orator and then the relevance of his presentation. To this end, we present an original interaction setting in which the speaker is equipped with only wearable devices. Several coverbal modalities have been extracted and automatically annotated namely speech volume, intonation, speech rate, eye gaze, hand gestures and body movements. An offline report was addressed to participants containing the performance scores on the overall modalities. In addition, a post-experiment study was conducted to collect participants opinions on many aspects of the studied interaction and the results were rather positive. Moreover, we annotated recommended feedbacks for each presentation session, and to retrieve these annotations, a Dynamic Bayesian Network model was trained using as inputs the multimodal performance scores. We will show that our assessment behavior models.
Discovering the correlations among variables of air quality data is challenging because the correlation time-series are long-lasting, multi-faceted, and information-sparse. In this paper, we propose a novel visual representation, called Time-Correlation Partitioning (TCP) tree that compactly characterizes correlations of multiple air quality variables and their evolutions. A TCP tree is generated by partitioning the information-theoretic correlation time-series into pieces with respect to the variable hierarchy and temporal variations, and reorganizing these pieces into a hierarchically nested structure. The visual exploration of a TCP tree provides a sparse data traversal of the correlation variations, and a situation-aware analysis of correlations among variables. This can help meteorologists understand the correlations among air quality variables better. We demonstrate the efficiency of our approach in a real-world air quality investigation scenario.
Symbolic motion planning for robots is the process of specifying and planning robot tasks in a discrete space, then carrying them out in a continuous space in a manner that preserves the discrete-level task specifications. Despite progress in symbolic motion planning, many challenges remain, including addressing scalability for multi-robot systems and improving solutions by incorporating human intelligence. In this paper, distributed symbolic motion planning for multi-robot systems is developed to address scalability. More specifically, compositional reasoning approaches are developed to decompose the global planning problem, and atomic propositions for observation, communication, and control are proposed to address inter-robot collision avoidance. To improve solution quality and adaptability, a dynamic, quantitative, and probabilistic human-to-robot trust model is developed to aid this decomposition. Furthermore, a trust-based real-time switching framework is proposed to switch between autonomous and manual motion planning for tradeoffs between task safety and efficiency. Deadlock- and livelock-free algorithms are designed to guarantee reachability of goals with a human-in-the-loop. A set of non-trivial multi-robot simulations with direct human input and trust evaluation are provided demonstrating the successful implementation of the trust-based multi-robot symbolic motion planning methods.
Enticing passers-by to a focused interaction with a public display requires taking appropriate action depending on how much attention visitors are already paying to the display. A suchlike system might want to emit a strong signal that makes the inattentive visitor look or turn towards it or choose to give the actual content in a way that indicates that a head-on looking visitor has been registered and is addressed individually (as opposed to a dumb system just playing a message in a loop). The challenge in this connection is to reliably determine the attention of passers-by both considering single persons and groups of visitors simultaneously appearing within the displays field of view. In this article, we present a model for estimating individual and collective human attention towards a focal stimulus and investigate different technical methods for measuring physical expressive features (i.e. the basis for deriving a persons attention). In the course of an experimental setup we compare a Support Vector Machine (SVM) as a measuring technique, a neural network using a Multilayer Perceptron (MLP) and a Finite State Machine (FSM) and compare the results to a manual reference classification. We carve out strengths and weaknesses and identify the most feasible measuring method with regard to precision of recognition and practical application.
Trust in automation has become a topic of intensive study over the past two decades. While the earliest trust experiments involved human interventions to correct failures/errors in automated control systems a majority of subsequent studies have investigated information acquisition and analysis decision aiding tasks such as target detection for which automation reliability is more easily manipulated. Despite the high level of international dependence on automation in industry and transport almost all current studies have employed Western samples primarily from the US. The present study addresses these gaps by running a large sample experiment in three (US, Taiwan and Turkey) diverse cultures using a trust sensitive task consisting of both automated control and target detection subtasks. This paper presents results for the target detection subtask for which reliability and task load were manipulated. The current experiments allow us to determine whether reported effects are universal or specific to Western culture, vary in baseline or magnitude, or differ across cultures. Results generally confirm consistent effects of manipulations across the three cultures as well as cultural differences in initial trust and variation in effects of manipulations consistent with 10 cultural hypotheses based on Hofstedes Cultural Dimensions and Leung and Cohens theory of Cultural Syndromes. These results provide critical implications and insights for enhancing human trust in intelligent automation systems across cultures. Our paper presents the following contributions: First, to the best of our knowledge, this is the first set of studies that deal with cultural factors across all the cultural syndromes identified in the literature by comparing trust in the Honor, Face, Dignity cultures. Second, this is the first set of studies that uses a validated cross-cultural trust measure for measuring trust in automation. Third, our experiments are the first to study the dynamics of trust across cultures
Convergent lines of evidence indicate that anthropomorphic robots are represented using neurocognitive mechanisms typically employed in social reasoning about other people. Relatedly, a growing literature documents that contexts of threat can exacerbate coalitional biases in social perceptions. Integrating these research programs, the present studies test whether cues of violent intergroup conflict modulate perceptions of the intelligence, emotional experience, or overall personhood of robots. In Studies 1 and 2, participants evaluated a large, bipedal all-terrain robot; in Study 3, participants evaluated a small, social robot with humanlike facial and vocal characteristics. Across all studies, cues of violent conflict caused significant decreases in perceived robotic personhood, and this shift was mediated by parallel reductions in emotional sympathy with the robot (with no significant effects of threat on attributions of intelligence). In addition, in Study 2, participants in the conflict condition estimated the large bipedal robot to be less effective in military combat, and this difference was mediated by the reduction in perceived robotic personhood. These results are discussed as they motivate future investigation into the links between threat, coalitional bias and human-robot interaction.