Graphical Adaptive Menus are Graphical User Interfaces menus whose items predicted of immediate usage can be automatically rendered in a prediction window. Rendering this prediction window is a key question for adaptivity to enable the end user to appropriately differentiate predicted items from normal ones and to select appropriate items consequently. Adaptivity for graphical menus has been more largely investigated for normal screens, such as desktops, than for small screens, like smartphones where real estate imposes severe rendering constraints. To this end, this paper explores a design space where Graphical Adaptive Menus are designed based on Bertins eight visual variables (i.e., position, size, shape, value, color, orientation, texture, and motion)and their combination by contrasting their rendering for small screens with respect to normal screens. Based on this design space, previously introduced techniques for graphical adaptive menus are revisited in terms of four properties (i.e. spatial, physical, format, and temporal stabilities) and discussed for both normal and small screens. The paper then reports on some experiments conducted for selected case studies and provides a set of usability guidelines useful for designers and practitioners to implement graphical adaptive menus
Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this paper, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve collective performance by offsetting systematic biases induced by individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that allows aggregation across different tools. Then, we conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the design space of post processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of EM-based aggregation method. The result implies that not only the individual tool's design but also the correction mechanism can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 by providing a novel region-based error correction method and additional in-depth evaluation of the proposed approach.
Digital libraries and services enable users to access large amounts of data on demand. Yet, quality assessment of information encountered on the Internet remains an elusive open issue. For example, Wikipedia, one of the most visited platforms on the Web, hosts thousands of user-generated articles and undergoes 12 million edits/contributions per month. User-generated content is undoubtedly one of the keys to its success, but also a hindrance to good quality: contributions can be of poor quality because anyone, even anonymous users, can participate. Though Wikipedia has defined guidelines as to what makes the perfect article, authors find it difficult to assert whether their contributions comply with them and reviewers cannot cope with the ever growing amount of articles pending review. Great efforts have been invested in algorithmic methods for automatic classification of Wikipedia articles (as featured or non-featured) and for quality flaw detection. However, little has been done to support quality assessment of user-generated content through interactive tools that combine automatic methods and human intelligence. We developed WikiLyzer, a Web toolkit comprising three interactive applications designed to assist (i) knowledge discovery experts in creating and testing metrics for quality measurement, (ii) Wikipedia users searching for good articles, and (iii) Wikipedia authors that need to identify weaknesses to improve a particular article. A design study sheds a light on how experts could create complex quality metrics with our tool, while a user study reports on its usefulness to identify high-quality content.
Information-seeking tasks with learning or investigative purposes are usually referred to as exploratory search. Exploratory search unfolds as a dynamic process where the user, amidst navigation, trial-and-error and on-the-fly selections, gathers and organizes information (resources). A range of innovative interfaces with increased user control have been developed to support exploratory search process. In this work we present our attempt to increase the power of exploratory search interfaces by using ideas of social search, i.e., leveraging information left by past users of information systems. Social search technologies are highly popular nowadays, especially for improving ranking. However, current approaches to social ranking do not allow users to decide to what extent social information should be taken into account for result ranking. This paper presents an interface that integrates social search functionality into an exploratory search system in a user-controlled way that is consistent with the nature of exploratory search. The interface incorporates control features that allow the user to (i) express information needs by selecting keywords and (ii) to express preferences for incorporating social wisdom based on tag matching and user similarity. The interface promotes search transparency through color-coded stacked bars and rich tooltips. This work presents the full series of evaluations conducted to, first, assess the value of the social models in contexts independent to the user interface, in terms of objective and perceived accuracy. Then, in a study with the full-fledged system, we investigated system accuracy and subjective aspects with a structural model that revealed that, when users actively interacted with all its control features, the hybrid system outperformed a baseline content-based-only tool and users were more satisfied.
In this paper we leverage a previously-developed data-driven approach to design novel privacy-setting interfaces for users of household IoT devices. The essence of this approach is to gather users' feedback on household IoT scenarios before developing the interface, which allows us to create a navigational structure that preemptively maximizes users' efficiency in expressing their privacy preferences, and develop a series of 'privacy profiles' that allow users to express a complex set of privacy preferences with the single click of a button. We expand upon the existing approach by proposing a more sophisticated translation of statistical results into interface design, and by extensively discussing and analyzing the trade-off between user-model parsimony and accuracy in developing privacy profiles and default settings.
Creating scripts for tasks involving manipulating GUIs is hard even for programmers due to limitations on accessing to and interacting with applications widgets. For non-programmng users, it seemed impossible to create scripts for those tasks. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and their corresponding mouse-keyboard signals. After the demonstration, the user answers follow-up questions. We propose a user-in-the-loop framework that learns to generate scripts of actions performed on visible elements of graphical applications. While pure programming-by-demonstration is still unrealistic, we use quantitative and qualitative experiments to show that non-programming users are willing and effective at answering follow-up queries posed by our system. Our models of events and appearance are surprisingly simple, but are combined effectively to cope with varying amounts of supervision. The best available baseline, Sikuli Slides, struggled with the majority of the tests in our user study experiments. The prototype with our proposed approach successfully helped users accomplish simple linear tasks, complicated tasks (monitoring, looping, and mixed), and tasks that span across multiple executables. Even when both systems could ultimately perform a task, ours was trained and refined by the user in less time.
Understanding a target audiences emotional responses to video advertisements is crucial to stakeholders. However, traditional methods for collecting such information are slow, expensive, and coarse-grained. We propose AttentiveVideo, an intelligent mobile interface with corresponding inference algorithms to monitor and quantify the effects of mobile video advertising in real time. AttentiveVideo employs a combination of implicit photoplethysmography (PPG) sensing and facial expression analysis (FEA) to predict viewers attention, engagement, and sentiment when watching video advertisements on unmodified smartphones. In a 24-participant study, AttentiveVideo achieved good accuracy on a wide range of emotional measures (the best accuracy = 73.4%, kappa = 0.46 across 9 measures). We also found that the PPG sensing channel and the FEA technique are complementary in both prediction accuracy and signal availability. These findings show the potential for both low-cost collection and deep understanding of emotional responses to mobile video advertisements.
In domains where users are exposed to large variations in visuo-spatial features among designs, they often spend excess time searching for common elements (features) on an interface. This article contributes individualised predictive models of visual search, and a computational approach to restructure layouts such that features on a new, unvisited interface can be found quicker. We explore four principles, inspired by the human visual system (HVS), to predict expected positions of features, and create individualised templates: (I) the interface with highest frequency is chosen as the template; (II) the interface with highest predicted recall probability (serial position curve) is chosen as the template; (III) the most probable locations for features across interfaces are chosen (visual statistical learning) to generate the template; (IV) based on a generative cognitive model, the most likely visual search locations for features are chosen (visual sampling modelling) to generate the template. Given a history of previously seen interfaces, we restructure the spatial layout of a new (unseen) interface with the goal of making its features more easily findable. The four HVS principles are implemented in Familiariser, a browser-based implementation that automatically restructures webpage layouts based on the visual history of the user. Evaluation with users provides first evidence favouring our approach.
Recent research has shown that reliable recognition of sign language words and phrases using user-friendly and non-invasive armbands is feasible and desirable. This work provides an analysis and implementation of including fingerspelling recognition(FR) in such systems, which is a much harder problem due to lack of distinctive hand movements. A novel algorithm called DyFAV (Dynamic Feature Selection and Voting) is proposed for this purpose that exploits the fact that fingerspelling has a finite corpus (26 letters for ASL). Detailed analysis of the algorithm used as well as comparisons with other traditional machine learning algorithms is provided. The system uses an independent multiple agent voting approach to identify letters with high accuracy. The independent voting of the agents ensures that the algorithm is highly parallelizable and thus recognition times can be kept low to suit real-time mobile applications. A thorough explanation and analysis is presented on results obtained on the ASL alphabet corpus for 9 people with limited training. An average recognition accuracy 95.36\% is reported and compared with recognition results from other machine learning techniques. This result is extended by including 6 new users with data collected under similar settings as the previous dataset. Furthermore, a feature selection schema using a subset of the sensors is proposed and the results are evaluated. The mobile, non-invasive, and real time nature of the technology is demonstrated by evaluating performance on various types of android phones and remote server configurations. A brief discussion of the UI is provided along with guidelines for best practices.
Deep learning has emerged as a powerful tool for feature-driven labeling of datasets. However, for it to be effective, it requires a large and finely-labeled training dataset. Precisely labeling a large training dataset is expensive, time consuming, and error-prone. In this paper we present a visually-driven deep learning approach that starts with a coarsely-labeled training dataset, and iteratively refines the labeling through intuitive interactions that leverage the latent structures of the dataset. Our approach can be used to (a) alleviate the burden of intensive manual labeling that captures the fine nuances in a high-dimensional dataset by simple visual interactions, (b) replace a complicated (and therefore difficult to design) labeling algorithm by a simpler (but coarse) labeling algorithm supplemented by user interaction to refine the labeling, or (c) use low-dimensional features (such as the RGB colors) for coarse labeling and turn to higher-dimensional (hyperspectral) latent structures, that are progressively revealed by deep learning, for fine labeling. We validate our approach through use cases on three high-dimensional datasets.
Document clustering is a necessary step in various analytical and automated activities. When guided by the user, algorithms are tailored to imprint a perspective on the clustering process that reflects the user's understanding of the data set. While contributing his or her perspective, the user will also acquire a deeper understanding of the data set. To incorporate the user's perspective in the clustering process and, at the same time, effectively visualize document collections to enhance user's sense-making of data, we propose a novel visual analytics system for interactive document clustering. We built our system on top of clustering algorithms that can adapt to user's feedback. In the proposed system, initial clustering is created based on the user-defined number of clusters and the selected clustering algorithm. A set of coordinated visualizations allow the examination of the data set and the results of the clustering. The visualization provides the user the highlights of individual documents and understanding of the evolution of documents over the time period to which they relate. The users then interact with the process by means of changing key-terms that drive the process according to their knowledge of the documents domain. In key-term based interaction, the user assigns a set of key-terms to each target cluster to guide the clustering algorithm. We improved the clustering process by a novel algorithm for choosing seeds for the clustering algorithm. The results demonstrate that not only the system has improved considerably its precision, but also its effectiveness in the document based decision making. A set of quantitative experiments and a user study have been conducted to show the advantages of the approach for document analytics based on clustering. We performed and reported on the use of the framework in a real decision-making scenario that relates users discussion by email to decision making in improving patient care. Results show that the framework is useful even for more complex data sets such as email conversations.
This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection of base recommendation algorithms for e-commerce. We focus on the problem of item-to-item recommendations, for which multiple behavioral and attribute-based predictors are provided to an ensemble learner. In addition, we detail the construction of a personalized predictor based on k-Nearest Neighbors (kNN), with temporal decay capabilities and event weighting. We show how to adapt Thompson Sampling to realistic situations when neither action availability nor reward stationarity is guaranteed. Furthermore, we investigate the effects of priming the sampler with pre-set parameters of reward probability distributions by utilizing the product catalog and/or event history, when such information is available. We report our experimental results based on the analysis of three real-world e-commerce datasets.
The beyond-relevance objectives of recommender systems have been drawing more and more attention. For example, a diversity-enhanced interface has been shown to associate positively with overall levels of user satisfaction. However, little is known about how users adopt diversity-enhanced interfaces to accomplish various real-world tasks. In this paper, we present two attempts at creating a visual diversity-enhanced interface that presents recommendations beyond a simple ranked list. Our goal was to design a recommender system interface to help users explore the different relevance prospects of recommended items in parallel and to stress their diversity. Two within-subject user studies in the context of social recommendation at academic conferences were conducted to compare our visual interfaces. Results from our user study show that the visual interfaces significantly reduced the exploration efforts required for given tasks and helped users to perceive the recommendation diversity. We show that the users examined a diverse set of recommended items while experiencing an improvement in overall user satisfaction. Also, the users' subjective evaluations show significant improvement in many user-centric metrics. Experiences are discussed that shed light on avenues for future interface designs.
Recent trends in computer-mediated communications (CMC) have not only led to expanded instant messaging through the use of images and videos, but have also expanded traditional text messaging with richer content in the form of visual communication markers (VCM) such as emoticons, emojis, and stickers. VCMs could prevent a potential loss of subtle emotional conversation in CMC, which is delivered by nonverbal cues that convey affective and emotional information. However, as the number of VCMs grows in the selection set, the problem of VCM entry needs to be addressed. Furthermore, conventional means of accessing VCMs continue to rely on input entry methods that are not directly and intimately tied to expressive nonverbal cues. In this work, we aim to address this issue, by facilitating the use of an alternative form of VCM entry: hand gestures. To that end, we propose a user-defined hand gesture set that is highly representative of a number of VCMs and a two-stage hand gesture recognition system (trajectory-based, shape-based) that can identify these user-defined hand gestures with an accuracy of 82%. By developing such a system, we aim to allow people using low-bandwidth forms of CMCs to still enjoy their convenient and discreet properties, while also allowing them to experience more of the intimacy and expressiveness of higher-bandwidth online communication.
We present an intelligent virtual interviewer that engages with a user in a text-based conversation and automatically infers the users personality traits. We investigate how the personality of a virtual interviewer as well as the personality of a user inferred from a virtual interview influences the users trust in the virtual interviewer from two perspectives: the users willingness to confide in, and listen to, a virtual interviewer. We have developed two virtual interviewers with distinct personalities and deployed them in a series of real-world events. We present findings from four real-world deployments with completed interviews of 1280 users, including 606 actual job applicants. Notably, users are more willing to confide in and listen to a virtual interviewer with a serious, assertive personality in a high-stakes job interview. Moreover, users personality traits, inferred from their chat text, along with interview context, influence their perception of a virtual interviewer, and their willingness to confide in and listen to a virtual interviewer. Finally, we discuss the implications of our work on building hyper-personalized, intelligent agents based on user traits.
Even without speech recognition errors, robots may encounter difficulties interpreting natural language instructions. We report on a research framework developed for robustly handling miscommunication between people and robots in task-oriented spoken dialogue. We describe TeamTalk, a conversational interface to situated agents like robots that incorporates detection and recovery from the situated grounding problems of referential ambiguity and impossible actions. The current work investigates algorithms for spatial reasoning and nearest-neighbor learning to decide on recovery strategies that a virtual robot should use in different contexts, and evaluates this approach in a longitudinal study over six sessions for each of six participants. When the robot encounters a grounding problem, it looks back on its interaction history to consider how it resolved similar situations. The learning algorithm was trained initially on crowdsourced data but was supplemented by interactions from the study. We compare results collected with user-specific and general models, with user-specific models performing best on measures of dialogue efficiency. The overall contribution is an approach to incorporating additional information from situated context, namely a robot's path planner and its surroundings, to detect and recover from miscommunication using dialogue.
When building a classifier in interactive machine learning (iML), human knowledge about the target class can be a powerful reference to make the classifier robust to unseen items. The main challenge lies in finding unlabeled items that can either help discover or refine concepts for which the current classifier has no corresponding features (i.e., it has feature blindness). Yet it is unrealistic to ask humans to come up with an exhaustive list of items, especially for rare concepts that are hard to recall. This article presents AnchorViz, an interactive visualization that facilitates the discovery of prediction errors and previously unseen concepts through human-driven semantic data exploration. By creating example-based or dictionary-based anchors representing concepts, users create a topology that (a) spreads data based on their similarity to the concepts, and (b) surfaces the prediction and label inconsistencies between data points that are semantically related. Once such inconsistencies and errors are discovered, users can encode the new information as labels or features, and interact with the retrained classifier to validate their actions in an iterative loop. We evaluated AnchorViz through two user studies. Our results show that AnchorViz helps users discover more prediction errors than stratified random and uncertainty sampling methods. Furthermore, during the beginning stages of a training task, an iML tool with AnchorViz can help users build classifiers comparable to the ones built with the same tool with uncertainty sampling and keyword search, but with fewer labels and more generalizable features. We discuss exploration strategies observed during the two studies and how AnchorViz supports discovering, labeling, and refining of concepts through a sensemaking loop.
Drone navigation in complex environments poses many problems to teleoperators. Especially in 3D structures like buildings or tunnels, viewpoints are often limited to the drone's current camera view, nearby objects can be collision hazards, and frequent occlusion can hinder accurate manipulation. To address these issues, we have developed a novel interface for teleoperation that provides a user with environment-adaptive viewpoints that are automatically configured to improve safety and smooth user operation. This real-time adaptive viewpoint system takes robot position, orientation, and 3D pointcloud information into account to modify user-viewpoint to maximize visibility. Our prototype uses simultaneous localization and mapping (SLAM) based reconstruction with an omnidirectional camera and we use resulting models as well as simulations in a series of preliminary experiments testing navigation of various structures. Results suggest that automatic viewpoint generation can outperform first and third-person view interfaces for virtual teleoperators in terms of ease of control and accuracy of robot operation.
Social Signal Processing techniques have given the opportunity to analyze in-depth human behavior in social face-to-face interactions. With recent advancements, it is henceforth possible to use these techniques to augment social interactions, especially the human behavior in oral presentations. The goal of this paper is to train a computational model able to provide a relevant feedback to a public speaker concerning his coverbal communication. Hence, the role of this model is to augment the social intelligence of the orator and then the relevance of his presentation. To this end, we present an original interaction setting in which the speaker is equipped with only wearable devices. Several coverbal modalities have been extracted and automatically annotated namely speech volume, intonation, speech rate, eye gaze, hand gestures and body movements. An offline report was addressed to participants containing the performance scores on the overall modalities. In addition, a post-experiment study was conducted to collect participants opinions on many aspects of the studied interaction and the results were rather positive. Moreover, we annotated recommended feedbacks for each presentation session, and to retrieve these annotations, a Dynamic Bayesian Network model was trained using as inputs the multimodal performance scores. We will show that our assessment behavior models.
Discovering the correlations among variables of air quality data is challenging because the correlation time-series are long-lasting, multi-faceted, and information-sparse. In this paper, we propose a novel visual representation, called Time-Correlation Partitioning (TCP) tree that compactly characterizes correlations of multiple air quality variables and their evolutions. A TCP tree is generated by partitioning the information-theoretic correlation time-series into pieces with respect to the variable hierarchy and temporal variations, and reorganizing these pieces into a hierarchically nested structure. The visual exploration of a TCP tree provides a sparse data traversal of the correlation variations, and a situation-aware analysis of correlations among variables. This can help meteorologists understand the correlations among air quality variables better. We demonstrate the efficiency of our approach in a real-world air quality investigation scenario.
The explanation interface has been recognized important in recommender systems because it can allow users to better judge the relevance of recommendations to their preference and hence make more informed decisions. In different product domains, the specific purpose of explanation can be different. For high-investment products (e.g., digital cameras, laptops), how to educate the typical type of new buyers about product knowledge and consequently improve their preference certainty and decision quality is essentially crucial. With this objective, we have developed a novel tradeoff-oriented explanation interface that particularly takes into account sentiment features as extracted from product reviews to generate recommendations and explanations in a category structure. In this manuscript, we report two user studies conducted on this interface. The first is an online user study (in both before-after and within-subjects setups) that compared our prototype system with the traditional one that purely considers static specifications for explanation. The experimental results reveal that adding sentiment-based explanations can help increase users' product knowledge, preference certainty, perceived information usefulness, perceived recommendation transparency and quality, and purchase intention. Inspired by those findings, we performed a follow-up eye-tracking lab experiment in order to in-depth investigate how users view information on the interface. This study shows integrating sentiment features with static specifications in the tradeoff-oriented explanations prompted users to not only view more recommendations from various categories, but also stay longer on reading explanations. The results also infer users' inherent information needs for sentiment features during product evaluation and decision making. At the end, we discuss the work's practical implications from three major aspects, i.e., new users, category interface, and explanation purpose.
Activity recognition is a core component of many intelligent and context-aware systems. We present a solution for discreetly and unobtrusively recognizing common work activities above a work surface without using cameras. We demonstrate our approach, which utilizes an RF-radar sensor mounted under the work surface, in three domains; recognizing work activities at a convenience-store counter, recognizing common office deskwork activities, and estimating the position of customers in a showroom environment. Our examples illustrate potential benefits for both post-hoc business analytics and for real-time applications. Our solution was able to classify seven clerk activities with 94.9% accuracy using data collected in a lab environment and able to recognize six common deskwork activities collected in real offices with 95.3% accuracy. Using two sensors simultaneously, we demonstrate coarse position estimation around a large surface with 95.4% accuracy. We show that using multiple projections of RF signal leads to improved recognition accuracy. Finally, we show how smartwatches worn by users can be used to attribute an activity, recognized with the RF sensor, to a particular user in multi-user scenarios. We believe our solution can mitigate some of users privacy concerns associated with cameras and is useful for a wide range of intelligent systems.