Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this paper, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve collective performance by offsetting systematic biases induced by individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that allows aggregation across different tools. Then, we conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the design space of post processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of EM-based aggregation method. The result implies that not only the individual tool's design but also the correction mechanism can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 by providing a novel region-based error correction method and additional in-depth evaluation of the proposed approach.
Information-seeking tasks with learning or investigative purposes are usually referred to as exploratory search. Exploratory search unfolds as a dynamic process where the user, amidst navigation, trial-and-error and on-the-fly selections, gathers and organizes information (resources). A range of innovative interfaces with increased user control have been developed to support exploratory search process. In this work we present our attempt to increase the power of exploratory search interfaces by using ideas of social search, i.e., leveraging information left by past users of information systems. Social search technologies are highly popular nowadays, especially for improving ranking. However, current approaches to social ranking do not allow users to decide to what extent social information should be taken into account for result ranking. This paper presents an interface that integrates social search functionality into an exploratory search system in a user-controlled way that is consistent with the nature of exploratory search. The interface incorporates control features that allow the user to (i) express information needs by selecting keywords and (ii) to express preferences for incorporating social wisdom based on tag matching and user similarity. The interface promotes search transparency through color-coded stacked bars and rich tooltips. This work presents the full series of evaluations conducted to, first, assess the value of the social models in contexts independent to the user interface, in terms of objective and perceived accuracy. Then, in a study with the full-fledged system, we investigated system accuracy and subjective aspects with a structural model that revealed that, when users actively interacted with all its control features, the hybrid system outperformed a baseline content-based-only tool and users were more satisfied.
In this paper we leverage a previously-developed data-driven approach to design novel privacy-setting interfaces for users of household IoT devices. The essence of this approach is to gather users' feedback on household IoT scenarios before developing the interface, which allows us to create a navigational structure that preemptively maximizes users' efficiency in expressing their privacy preferences, and develop a series of 'privacy profiles' that allow users to express a complex set of privacy preferences with the single click of a button. We expand upon the existing approach by proposing a more sophisticated translation of statistical results into interface design, and by extensively discussing and analyzing the trade-off between user-model parsimony and accuracy in developing privacy profiles and default settings.
In domains where users are exposed to large variations in visuo-spatial features among designs, they often spend excess time searching for common elements (features) on an interface. This article contributes individualised predictive models of visual search, and a computational approach to restructure layouts such that features on a new, unvisited interface can be found quicker. We explore four principles, inspired by the human visual system (HVS), to predict expected positions of features, and create individualised templates: (I) the interface with highest frequency is chosen as the template; (II) the interface with highest predicted recall probability (serial position curve) is chosen as the template; (III) the most probable locations for features across interfaces are chosen (visual statistical learning) to generate the template; (IV) based on a generative cognitive model, the most likely visual search locations for features are chosen (visual sampling modelling) to generate the template. Given a history of previously seen interfaces, we restructure the spatial layout of a new (unseen) interface with the goal of making its features more easily findable. The four HVS principles are implemented in Familiariser, a browser-based implementation that automatically restructures webpage layouts based on the visual history of the user. Evaluation with users provides first evidence favouring our approach.
Eating activity monitoring through wearable sensors can potentially enable interventions based on eating speed to mitigate the risks of critical healthcare problems such as obesity or diabetes. Eating actions are poly-componential gestures composed of sequential arrangements of three distinct components interspersed with gestures that may be unrelated to eating. This makes it extremely challenging to accurately identify eating actions. The primary reason for the lack of acceptance of state-of-art eating action monitoring techniques include: i) the need to install wearable sensors that are cumbersome to wear or limit mobility of the user, ii) the need for manual input from the user, and iii) poor accuracy if adequate manual input is not available. In this work, we propose a novel methodology, IDEA that performs accurate eating action identification in eating episodes with an average F1-score of 0.92. IDEA uses only a single wrist-worn sensor and provides feedback on eating speed every 2 minutes without obtaining any manual input from the user. %It can also be used to automatically annotate other poly-componential gestures.
Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user's understanding of the data set. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To incorporate the user's perspective in the clustering process and, at the same time, effectively visualize document collections to enhance user's sense-making of data, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user's feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the data set and the results of the clustering. The visualization provides the user the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We improved the clustering process by a novel algorithm for choosing seeds for the clustering algorithm. The results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.
The beyond-relevance objectives of recommender systems have been drawing more and more attention. For example, a diversity-enhanced interface has been shown to associate positively with overall levels of user satisfaction. However, little is known about how users adopt diversity-enhanced interfaces to accomplish various real-world tasks. In this paper, we present two attempts at creating a visual diversity-enhanced interface that presents recommendations beyond a simple ranked list. Our goal was to design a recommender system interface to help users explore the different relevance prospects of recommended items in parallel and to stress their diversity. Two within-subject user studies in the context of social recommendation at academic conferences were conducted to compare our visual interfaces. Results from our user study show that the visual interfaces significantly reduced the exploration efforts required for given tasks and helped users to perceive the recommendation diversity. We show that the users examined a diverse set of recommended items while experiencing an improvement in overall user satisfaction. Also, the users' subjective evaluations show significant improvement in many user-centric metrics. Experiences are discussed that shed light on avenues for future interface designs.
Recent trends in computer-mediated communications (CMC) have not only led to expanded instant messaging through the use of images and videos, but have also expanded traditional text messaging with richer content in the form of visual communication markers (VCM) such as emoticons, emojis, and stickers. VCMs could prevent a potential loss of subtle emotional conversation in CMC, which is delivered by nonverbal cues that convey affective and emotional information. However, as the number of VCMs grows in the selection set, the problem of VCM entry needs to be addressed. Furthermore, conventional means of accessing VCMs continue to rely on input entry methods that are not directly and intimately tied to expressive nonverbal cues. In this work, we aim to address this issue, by facilitating the use of an alternative form of VCM entry: hand gestures. To that end, we propose a user-defined hand gesture set that is highly representative of a number of VCMs and a two-stage hand gesture recognition system (trajectory-based, shape-based) that can identify these user-defined hand gestures with an accuracy of 82%. By developing such a system, we aim to allow people using low-bandwidth forms of CMCs to still enjoy their convenient and discreet properties, while also allowing them to experience more of the intimacy and expressiveness of higher-bandwidth online communication.
Towards User-Adaptive Visualizations: Comparing and Combining Eye-Tracking and Interaction Data for the Real-Time Prediction of User Cognitive Abilities
EventAction: A Visual Analytics Approach to Explainable Recommendation for Event Sequences
Even without speech recognition errors, robots may encounter difficulties interpreting natural language instructions. We report on a research framework developed for robustly handling miscommunication between people and robots in task-oriented spoken dialogue. We describe TeamTalk, a conversational interface to situated agents like robots that incorporates detection and recovery from the situated grounding problems of referential ambiguity and impossible actions. The current work investigates algorithms for spatial reasoning and nearest-neighbor learning to decide on recovery strategies that a virtual robot should use in different contexts, and evaluates this approach in a longitudinal study over six sessions for each of six participants. When the robot encounters a grounding problem, it looks back on its interaction history to consider how it resolved similar situations. The learning algorithm was trained initially on crowdsourced data but was supplemented by interactions from the study. We compare results collected with user-specific and general models, with user-specific models performing best on measures of dialogue efficiency. The overall contribution is an approach to incorporating additional information from situated context, namely a robot's path planner and its surroundings, to detect and recover from miscommunication using dialogue.
When building a classifier in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration. By creating example-based or dictionary-based anchors representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts, and (b) surfaces the prediction and label inconsistencies between data points that are semantically related. Once such inconsistencies and errors are discovered, users can encode the new information as labels or features, and interact with the retrained classifier to validate their actions in an iterative loop. We evaluated AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction errors than stratified random and uncertainty sampling methods. Furthermore, during the beginning stages of a training task, an iML tool with AnchorViz can help users build classifiers comparable to the ones built with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generalizable features. We discuss exploration strategies observed during the two studies and how AnchorViz supports discovering, labeling, and refining of concepts through a sensemaking loop.
Special Issue on Highlights of ACM Intelligent User Interface
The explanation interface has been recognized important in recommender systems because it can allow users to better judge the relevance of recommendations to their preference and hence make more informed decisions. In different product domains, the specific purpose of explanation can be different. For high-investment products (e.g., digital cameras, laptops), how to educate the typical type of new buyers about product knowledge and consequently improve their preference certainty and decision quality is essentially crucial. With this objective, we have developed a novel tradeoff-oriented explanation interface that particularly takes into account sentiment features as extracted from product reviews to generate recommendations and explanations in a category structure. In this manuscript, we report two user studies conducted on this interface. The first is an online user study (in both before-after and within-subjects setups) that compared our prototype system with the traditional one that purely considers static specifications for explanation. The experimental results reveal that adding sentiment-based explanations can help increase users' product knowledge, preference certainty, perceived information usefulness, perceived recommendation transparency and quality, and purchase intention. Inspired by those findings, we performed a follow-up eye-tracking lab experiment in order to in-depth investigate how users view information on the interface. This study shows integrating sentiment features with static specifications in the tradeoff-oriented explanations prompted users to not only view more recommendations from various categories, but also stay longer on reading explanations. The results also infer users' inherent information needs for sentiment features during product evaluation and decision making. At the end, we discuss the work's practical implications from three major aspects, i.e., new users, category interface, and explanation purpose.
Activity recognition is a core component of many intelligent and context-aware systems. We present a solution for discreetly and unobtrusively recognizing common work activities above a work surface without using cameras. We demonstrate our approach, which utilizes an RF-radar sensor mounted under the work surface, in three domains; recognizing work activities at a convenience-store counter, recognizing common office deskwork activities, and estimating the position of customers in a showroom environment. Our examples illustrate potential benefits for both post-hoc business analytics and for real-time applications. Our solution was able to classify seven clerk activities with 94.9% accuracy using data collected in a lab environment and able to recognize six common deskwork activities collected in real offices with 95.3% accuracy. Using two sensors simultaneously, we demonstrate coarse position estimation around a large surface with 95.4% accuracy. We show that using multiple projections of RF signal leads to improved recognition accuracy. Finally, we show how smartwatches worn by users can be used to attribute an activity, recognized with the RF sensor, to a particular user in multi-user scenarios. We believe our solution can mitigate some of users privacy concerns associated with cameras and is useful for a wide range of intelligent systems.