Acerca de

A Comparative Study of
Virtual Footwear Try-On in
Virtual and Augmented Reality
An user experience evaluation
among AR, VR footwear try-on systems and real environment
Master's Thesis

OUTLINE
Introduction
Literature Review
Research Framework
Experimental Method
Results and Discussion
Conclusion

Abstract
Virtual reality and augmented reality (VR/AR) have been applied to virtual try-on of various products. Enabled by depth sensing technologies, commercial virtual try-on systems allow users to virtually wear apparels, footwear, and other items worn. Very few studies have been conducted on ergonomic issues of the current systems or have evaluated how effectively they work on these issues. To address these issues, this research measures and compares the presence, usability and user experience of three different try-on methods: real, VR, and AR. A group of subjects conducted a shoe try-on experiment, in which they evaluated three pairs of sneakers and determined which one they preferred. During the experimental process, each subject’s gaze trajectory was recorded by using an eye tracker. After the experiment, they completed a questionnaire and participated in a think-aloud procedure to express their thoughts about the try-on processes. The eye movement data show that the user’s visual attention in the real and AR try-on mainly concentrated on the shoes, whereas the gaze in the VR try-on focused on the avatar’s legs. Real try-on shows significantly higher spatial presence, naturalness, and negative emotional response than the VR try-on. In addition, the naturalness of the AR try-on was higher than VR. The real environment received significantly higher experience scores, whereas no significant difference existed between the AR and VR try-on systems. The analysis result also reveals the factors that negatively affect the user experience in the AR and VR try-on processes. We conclude that virtual try-on technologies are useful and complementary to actual try-on, as long as they provide satisfactory presence, usability and user experience.
Introduction

Research Motivation
Buying shoes online is now a very familiar shopping method for the public, but users cannot immediately view the fitting effect through online shopping.
There is sometimes a gap between user expectation and perception when receiving the actual product.


Online Shopping :
Under Expectation


Traditional Method :
Limited Choices,
Time Consuming
Therefore, many people are still accustomed to buying shoes in physical stores. Although this eliminates the concerns about online shopping mentioned above, the process of repeatedly putting on and taking off the shoes is time-consuming and worn-out.
If you ask the clerk to change the shoes size all the time, sometimes the clerk will give you attitude.


Lack of
Usability Evaluation
User Experience Evaluation
Rapid Development of
Virtual Try-on Technology
In the last decade, rapid technological progress and ever-increasing market competition have led to advances in the development of virtual try-on (VTO) technology in the fashion retail industry. This highly interactive technology is expected to create a novel shopping experience by allowing users to virtually wear garments, footwear, and facial accessories.
However, research on virtual footwear try-on is in the preliminary stage compared to that of garments or facial accessories. To the best of our knowledge, no study has analyzed users’ affective responses while evaluating footwear designs using VTO technology as well as their perceived differences from an actual try-on.
This is the reason why I devoted myself to this research. What I want to explore is whether the user's affective responses and user experience in the two virtual try-on systems are close to theirs in traditional shoe try-on mode. To fill this research gap, I conducted an experimental study that compares AR and VR applications designed for VTO in a real environment using both physiological and psychological measures.
Literature Review

User Study of Virtual Garment Try-On




Yuan et al. conducted a user study to evaluate the perceptions of usability and usefulness across three virtual cloth try-on systems: (1) virtual clothes on an avatar, (2) virtual clothes on the user’s image, and (3) virtual clothes on the avatar blended with the user’s facial image. The study participants considered the second system most preferable because it provided the highest usability, whereas the first system had the lowest usability. They concluded that displaying the user’s physical body is more appealing than using a virtual human model.

Tang et al. investigated sense of presence in AR and VR applications that represent a human user as an avatar. The focus was to determine whether replacing the user’s body with a virtual human model would reduce the perceived presence. In the study, the subjects talked through an HMD to an object that was a real person in the AR and a virtual agent in the VR scenario. The results
indicated that the subjects’ perceptions of spatial presence were significantly higher in the AR scenario.

Virtual Footwear Try-On


Eisert et al. developed a magic mirror that simulates the try-on process of customized shoes in real time. In this case, the user wears a pair of real shoes while using the try-on service, and the 3D positions of both shoes are estimated from the silhouette information from a single camera view captured by a motion tracking system.

Jimeno-Morenilla et al. reported new advances in 3D techniques in cinema, TV, and games applied to virtual footwear try-on. They developed an AR system using high-quality stereoscopic vision technology that allows users to visually check footwear models in real time. The system also allows users to intuitively interact with 3D scenes and manipulate shoe models via 3D gloves.


Affective Assessment in Virtual Environment
Most of the studies investigated whether users evoked similar affective responses and psychological states across different interaction means. An important issue is the users’ sense of presence in interactive environments. The main data collection method was grading questionnaires, and subjective ratings produced from the answers were the most popular dependent measures.


Brade et al. evaluated the sense of presence in a VR environment compared to a real environment in a geocaching game based on a between-subject comparison measured with the ITC-Sense of Presence Inventory (ITC-SOPI). The real environment showed significantly higher naturalness, whereas the virtual environment exhibited higher engagement and lower negative effects. Significant differences in usability measured by the system usability scale (SUS) were also observed across the two environments.
Based on the literature review above, fewer studies have evaluated users’ affective responses to virtual footwear try-on in VR and AR environments. Prior research has not yet confirmed whether the use of a real person or an avatar creates different user experiences and how these user experiences compare with that of actual try-on.

Research Framework

Figure 1 Research Framework
To address these issues, we conducted an experimental study to identify important factors that influence users’ satisfaction using three footwear try-on methods (VR, AR, and real environment) with both psychological and physiological measures. The focus of this study is not on the comprehensive assessment of virtual try-on in this regard. Our goal is to develop an understanding of the emotional response evoked by virtual footwear try-on methods from the analyses of well-accepted measures (see Figure 1):

Eye-tracking Behavior
Users’ eye movements were recorded using a wearable eye tracker, and the analysis of their gaze behavior revealed visual attention or distraction in each method.

Measurement of Sense of Presence, User Experience
A strong sense of presence—a feeling of ‘being there’ in the scene—is an integral part of VR/AR applications. As a validated tool in previous research, the ITC-SOPI is used to measure perceived presence in interactive virtual environments.

Usability Testing
Usability is another metric that is closely related to the user experience of a product or service [27]. It is particularly important to examine VTO tools, which are highly user-interactive, from this aspect. The SUS questionnaire was chosen for this purpose.

Experimental Method

Research Questions and Hypotheses
Ideally, VTO tools deliver both hedonic (perceived enjoyment) and utilitarian (perceived convenience, ease, and usefulness) values comparable to direct experience with real products. However, previous studies have reported mixed results on the positive impact of AR-based VTO tools on consumer evaluations while using them. The effectiveness of existing VTO solutions remains questionable, and evidence is particularly lacking in footwear tryon. We proposed following research questions and hypotheses.

User Experience


Research
Hypotheses
Sense of Presence

H3-1: The real environment offers a higher sense of presence than the AR try-on.
H3-2: The AR try-on offers a higher sense of presence than VR.
H4-1: The higher the level of presence perceived by the user, the higher the experience score.
H4-2: The lesser the impact of negative factors, the higher the degree of sense of presence.
H4-3: The lesser the impact of negative factors, the higher the experience score.
Eye-tracking Behavior

H1-1: Subjects’ eye fixation rate to “shoes” is higher than for “legs” and “other areas” in all methods.
H1-2: Subjects’ eye fixation rate to “legs” is higher than for “other areas” in all methods.
H2-1: Subjects’ eye fixation rate to “shoes” in AR is higher than that in VR.
H2-2: Subjects’ eye fixation rate to “shoes” in a real environment is higher than that in VR.
H2-3: Subjects’ eye fixation rate to “legs” in VR is higher than that in AR.
H2-4: Subjects’ eye fixation rate to “legs” in VR is higher than that in a real environment.
Hypotheses related to eye-tracking behavior
An eye tracker was used to understand the user’s visual attention and/or distraction in each method. The area of interest (AOI) was categorized into three regions: shoes, legs, and other areas. Users normally have the highest proportion of visual attention on shoes in actual try-on. It is speculated that they would concentrate on the shoe models presented with properly designed VTO tools. This leads to the assumption that the percentage of users’ eye fixations on shoes is the highest, and the percentage of eye fixations on their legs is higher than that for other areas. Therefore, we propose the following hypotheses:
H1-1: Subjects’ eye fixation rate to “shoes” is higher than for “legs” and “other areas” in all methods.
H1-2: Subjects’ eye fixation rate to “legs” is higher than for “other areas” in all methods.
However, the eye fixation rates between certain AOIs may be different across the three methods.
In AR try-on, virtual shoes are tried by a user’s real legs, so the user should focus more on the
appearance of virtual shoes. In contrast, the user’s legs are represented by a virtual human model in
a VR try-on. The degree to which the model’s body movement matches that of the real person may
also capture the user’s attention during the try-on. Therefore, we propose the following hypotheses:
H2-1: Subjects’ eye fixation rate to “shoes” in AR is higher than that in VR.
H2-2: Subjects’ eye fixation rate to “shoes” in a real environment is higher than that in VR.
H2-3: Subjects’ eye fixation rate to “legs” in VR is higher than that in AR.
H2-4: Subjects’ eye fixation rate to “legs” in VR is higher than that in a real environment.
Hypotheses related to sense of presence
Participants who interact with virtual objects through a virtual body have a higher sense of presence than those with a traditional user interface, such as a computer mouse or keyboard. Waltemate et al. reported that the degree of personalization and individualization of users’ avatars significantly impacts perceived body ownership and presence in the context of social VR. The human model created for the VR try-on in this study was personalized against the user’s body size and appearance, but not the body movement. This glitch may reduce the degree of naturalness with which the users deem the avatars as their own self. Thus, we propose the following hypotheses:
H3-1: The real environment offers a higher sense of presence than the AR try-on.
H3-2: The AR try-on offers a higher sense of presence than VR.
Hypotheses related to user experience
In Section 4.3, we describe potential factors that might negatively affect user satisfaction with the tested AR and VR applications. It is valuable to examine how perceived negative effects influence the overall user experience, as measured by an experience score. Thus, we propose the following hypotheses:
H4-1: The higher the level of presence perceived by the user, the higher the experience score.
H4-2: The lesser the impact of negative factors, the higher the degree of sense of presence.
H4-3: The lesser the impact of negative factors, the higher the experience score.


System Setting
The hypotheses described above were tested in a within-subject experiment, with the try-on method as the independent variable. Thirty college students aged 20–25 years with equal female-to-male ratio were recruited to complete footwear try-on tasks in three different environments. The participants had to select the best pair among three pairs of sneakers, thereby simulating a daily shopping process. The try-on duration lasted 45 s for each method. The order of the real, VR, and AR try-on was randomly determined for each subject.
The experiment comprised two parts: the main try-on test and the post-experimental assessments. As shown in Figure 2, the experiment was conducted in a quiet room with sufficient lighting and space for the participants to comfortably perform the try-on processes. The participants wore a portable wireless eye tracker to record their eye-movement data during the process. Afterward, they performed a think-aloud procedure to express their thoughts about the three methods after the experiment.

Experimental Procedure


Independent variables



Real environment
In the real try-on, three sneakers with different styles and sizes (39, 42, and 44) were prepared on site. The sneakers were supplemented by two different insoles and a heel sticker, and the sneakers matched the feet of most of the subjects. The subjects focused on the visual appearance of the shoes and how they matched their body images during the try-on process. A mirror of 1.25-m length was set up for them to visualize their whole body without their face.
AR virtual try-on application
The AR try-on application allows users to virtually try 3D shoe models in a live video without the need for markers or special sensors. The key to the try-on application is to automatically recognize and track human foot in a real environment. This task is modeled as a 3D registration problem, in which an iterative closest point (ICP) algorithm is applied to best match the depth data captured by a depth camera (Kinect v2) and a predefined reference foot model. Picture above shows continuous screenshots of the AR try-on process in a controlled environment.
AR virtual try-on application
An auto-rigging technique was applied to integrate a skeleton consisting of body joints and 3D meshes of the human model. The rigged model allows human motion to be generated by changing the coordinates of the body joints, which correspond to the user’s joints captured by Kinect v2. The user performed the try-on task while the virtual human model mimicked the same body motion on the screen. Controlling the virtual human model via the user’s body strengthens the sense of presence in VR, which may positively influence user experience

Dependent variables




Four categories of dependent variables were recorded in each try-on:
-
Four sense of presence factors (spatial presence, engagement, naturalness, and negative effects) measured with the ITC-SOPI questionnaire
-
Usability evaluated with the SUS questionnaire
-
An overall user experience score
-
Eye movement data captured by an eye-tracking system
Sense of presence
The greater the sense of presence induced by the virtual environment, the more positive the user’s emotional response. we employed the ITC-SOPI questionnaire to assess the sense of presence. The questionnaire contains assessment questions comprising four factors: spatial presence, engagement, naturalness, and negative effects. A five-point Likert scale (1 = strongly disagree; 5 = strongly agree) was used to measure the users’ answers to each question.

Usability
As previously mentioned, SUS is an inexpensive but effective research tool commonly used for assessing the usability of a product, service, or environment. We used a 10-item SUS questionnaire to measure the perceived usability of AR/VR try-on applications. Each statement was rated on a five-point Likert scale. Participants graded the SUS questionnaire for the VR and AR try-on but not the real environment, which is not considered a product or system.

User experience
In this study, we explored potential factors that may have a negative impact on user experience in different try-on environments. The technical limitations of hardware and software involved in VR/AR might affect the performance of try-on functions such as computational efficiency and visual quality. Being aware of any problem, the user had to identify the problem and rated how satisfied he/she was with the overall try-on experience by a score ranging from 0 to 100. The average score for all users was regarded as the experience score.

Eye tracking data-fixation rate
We used SMI Eye Tracking Glasses 2 Wireless (ETG 2 W), a wearable 60-Hz eye tracking system with full wireless control, to collect the gaze data. ETG 2 W is a head-mounted device that permits users to move gently while watching a screen. This capability makes the experimental setting closer to a real try-on experience. In this study, the AOIs chosen were shoes, legs, and other areas. Possible gaze changes across different try-on methods can be identified by comparing the distribution of visual attention between the three AOIs.

Results and Discussion
In this section, we present all the experimental results and discuss them later from the user experience viewpoint. Statistical analyses were conducted to compare the subjects’ gaze behavior, sense of presence, experience scores, and negative factors across the three methods. Moreover, paired t-test was used to identify the difference in usability between VR and AR try-on applications.

Differences in eye-tracking data
within and between the methods

Table 1 lists the average eye-tracking data of all the subjects in the three methods. In the first analysis, we determined if there was a difference in the total eye fixation ratio between “shoes,” “legs,” and “other areas” using ANOVA. The results showed that the ratio of different AOIs in each method was significantly different at the 5% significance level.Moreover, post-hoc analyses using Fisher’s LSD indicated that the subjects’ eye-fixation ratio to “shoes” was significantly higher than for “legs” and “other areas” in all methods (H1-1 is supported), whereas the eye-fixation ratio to “legs” was significantly higher than for “other areas” only in the VR try-on (not fully in compliance with H1-2).

Table 2 lists the eye-tracking data of the same AOIs in the three methods. The ANOVA results showed significant differences in the fixation ratio of the same AOI between different methods. Post-hoc analyses using the LSD method showed the following findings. When a user is in the real and AR environments, the fixation ratio to “shoes” is significantly higher than when the user is in the VR environment (H2-1 and H2-2 are supported). In contrast, when a user performs a try-on in the VR application, the fixation ratio to “legs” is significantly higher than when the user is in the real environment and AR (H2-3 and H2-4 are supported).


In this regard, the users’ visual attention in the real and AR environments was mainly on “shoes,” whereas the fixation ratio to “legs” in the VR was higher than to “others.” The figure shows screenshots with a typical gaze pattern for each case. A circle represents a subject’s first fixation, and the circle size is proportional to the fixation duration during the process. According to the feedback obtained from the think-aloud session, subjects in VR paying more attention to “legs” might be because of their failure to identify themselves with the avatar by watching its lower body, especially when it is moving. The proportion of the gaze on “legs” was significantly higher in the VR try-on owing to such distraction. Meanwhile, the scene created by placing shoe models on a real user is of higher naturalness and fidelity. As a result, the visual attention of most users was mainly on “shoes” in the real environment and AR try-on. The users’ familiarity with such try-on scenes assisted their focus on the footwear try-on task.
(a)
A typical gaze behavior in (a) AR and (b) VR try-on
(b)


Differences in sense of presence

Table 3 lists the test results for each sense of presence factor on a five-point Likert scale. The ANOVA result indicates a significant difference in the total score among the three methods: F(2, 87) = 17.43, p < .000. Moreover, post-hoc analyses with the LSD method showed that sense of presence in the real environment was significantly higher than in AR and VR (H3-1 is supported), whereas sense of presence in AR was significantly higher than in VR (H3-2 is supported).

This figure presents the averages of the four sense of presence factors in each method as bar charts. The spatial presence in the real environment was significantly higher than that in VR. This might be attributed to the user’s ability to directly move and interact with physical objects in the real environment, together with the provision of direct sensory stimulation. Moreover, naturalness in the real environment was significantly higher than that in both AR and VR, whereas naturalness in the AR try-on was significantly higher than that in VR. This might imply that users consider the content displayed by the AR application to be closer to that of the real environment than VR. The negative effect of presence in VR was higher than that in the real environment and AR. A possible reason is that the appearance of the avatar is not similar to the individual’s own body or it responds inappropriately to the user’s limb movement.

Differences in usability
Based on the interpretation of the SUS score, the usability levels of the AR try-on (mean 19 = 67.25, standard deviation = 14.37) and VR (mean = 67.17, standard deviation = 14.44) were both classified as “ok” and close to “good,” respectively. The participants did not find that interacting with the two virtual try-on applications was effective. The result of the paired sample t-test between the two applications showed no significant difference in usability (t(29) = 0.029, p = .977).

Differences in user experience score


The bar chart shows the average user experience score for each method. The real try-on had the highest experience score, while the score of the AR try-on was higher than that of VR. Significant differences across the three methods were also observed in the AVONA results: F(2, 87) = 4.174, p = .019. We speculate that a higher spatial presence in a real environment may increase the experience score.
Negative effects are a sense of adverse psychological reactions to the environment. Thus, understanding how these effects are generated in the current VR and AR try-on may suggest further improvements for the VTO technology. In the think-aloud session, the participants were asked to describe system flaws that they noticed in each try-on method. These flaws can be classified into two categories: physical and psychological. The subjects did not wear an HMD in either virtual try-on application, that is, they interacted with VR and AR in a non-immersive manner. This minimized physical issues caused by the device, such as motion sickness and wear discomfort. Half of the subjects found the real try-on to be time-consuming and boring compared to the virtual try-on. Some of them also reported fatigue in repetitively taking off and putting on shoes during the experiment.
The participants also pointed out numerous system imperfections that caught their attention during the VR and AR try-on.
The imperfections mentioned more frequently in AR are listed as follows, along with possible causes:
-
Poor occlusion: This indicates the result of poor occlusion processing between shoe models and
the human feet. The seams between them may appear incomplete and unnatural. -
System delay: This occurs when the shoe model fails to catch up with the user’s movement. Human foot tracking in images usually requires a high computational load, thus reducing the display speed of the try-on process.
-
Limited viewing angle: Because of the camera specifications of Kinect v2, the AR try-on can only allow limited body movement during the try-on.
-
Insufficient fidelity: This occurs when the rendering result of the shoe models is not sufficiently
realistic or the visual quality of the real and virtual objects does not match well.
Factors that may have a negative impact on the user’s psychological state in the VR try-on are also listed as follows
-
Poor occlusion: This indicates the result of poor occlusion processing between shoe models and
the human feet. The seams between them may appear incomplete and unnatural. -
System delay: This occurs when the shoe model fails to catch up with the user’s movement. Human foot tracking in images usually requires a high computational load, thus reducing the display speed of the try-on process.
-
Limited viewing angle: Because of the camera specifications of Kinect v2, the AR try-on can only allow limited body movement during the try-on.
-
Insufficient fidelity: This occurs when the rendering result of the shoe models is not sufficiently
realistic or the visual quality of the real and virtual objects does not match well.
Conclusion
The application of virtual try-on technology is expected to impact the retail fashion industry by creating new utilitarian values and shopping experiences. In addition to technical development, a critical issue is the effectiveness of existing virtual try-on methods from an affective perspective. In this study, an experimental study was conducted to investigate differences in sense of presence, usability, and user experience between actual and virtual try-on using both psychological and physiological measures. Subjects conducted try-on tests on three pairs of sneakers in a real environment and two virtual try-on applications based on AR and VR. The gaze trajectory of each subject was recorded using an eye tracker during the experiment. After the experiment, they completed three types of questionnaires (SUS, ITC-SOPI, and user experience score) to reflect their psychological responses, and they participated in a think-aloud session to express their thoughts about the try-on processes. The analysis results of the questionnaire data indicated that the AR try-on experience was closer to the real experience, whereas the VR try-on experience was less similar to the real experience. The important findings are summarized as follows:

The gaze behavior and the level of sense of presence within the real and AR try-on were fairly consistent. There was no significant difference in spatial presence, engagement, and naturalness between the two systems. In contrast, users paid more attention to the avatar’s legs in the VR try-on, while they mainly concentrated on shoe models in the real environment and AR. In all the try-on methods, the lower body received more attention than the other body parts. Therefore, other body parts may be neglected in displays or shown in low visual quality to save computational resources in the VTO.

The usability level in both VR and AR try-on was not high in the experiment, which is probably due to the negative factors induced by current hardware/software limitations. The major factors were “poor occlusion” in the AR and “lack of self-identification” and “motion simulation inaccuracy” in the VR. Improving these imperfections may enhance the usability of virtual footwear try-on technology.

The user experience score of the actual try-on was higher than that of VR and AR, while there was no significant difference in both applications. This implies that most of the subjects were more satisfied with real try-on than the two virtual try-on applications.
