A fundamental challenge in cross-cultural co-creation is achieving shared understanding of tacit knowledge when visual and embodied expertise defies verbal description. Our previous work, CrossProbe, demonstrated that Artificial Intelligence (AI)-assisted image generation enables non-expert participants to express concepts visually through natural language, achieving significantly higher crosscultural understanding and satisfaction scores compared to manual interviews and online questionnaires in a comparative study (N=30). Yet post-study analysis revealed critical limitations-2D representations cannot convey spatial relationships, material qualities, or embodied knowledge. This position paper proposes extending CrossProbe's mechanisms into eXtended Reality (XR) environments through a multi-layered grounding framework that integrates linguistic flexibility, visual generation, spatial grounding, and embodied interaction. We articulate design principles to ensure inclusive and accessible implementation, democratizing spatial authoring for non-expert participants. This integration addresses expressiveness gaps that neither XR nor AI can solve independently, enabling inclusive cross-cultural collaboration in social XR environments.
Zhao, X., Lai, M., Li, W., Li, S., Cascarano, P., Marfia, G. (2026). Bridging the Expressiveness Gap in Cross-Cultural Co-Creation Through AI-Augmented Extended Reality. Institute of Electrical and Electronics Engineers Inc. [10.1109/VRW70859.2026.00098].
Bridging the Expressiveness Gap in Cross-Cultural Co-Creation Through AI-Augmented Extended Reality
Zhao X.;Lai M.;Li W.;Li S.;Cascarano P.
;Marfia G.
2026
Abstract
A fundamental challenge in cross-cultural co-creation is achieving shared understanding of tacit knowledge when visual and embodied expertise defies verbal description. Our previous work, CrossProbe, demonstrated that Artificial Intelligence (AI)-assisted image generation enables non-expert participants to express concepts visually through natural language, achieving significantly higher crosscultural understanding and satisfaction scores compared to manual interviews and online questionnaires in a comparative study (N=30). Yet post-study analysis revealed critical limitations-2D representations cannot convey spatial relationships, material qualities, or embodied knowledge. This position paper proposes extending CrossProbe's mechanisms into eXtended Reality (XR) environments through a multi-layered grounding framework that integrates linguistic flexibility, visual generation, spatial grounding, and embodied interaction. We articulate design principles to ensure inclusive and accessible implementation, democratizing spatial authoring for non-expert participants. This integration addresses expressiveness gaps that neither XR nor AI can solve independently, enabling inclusive cross-cultural collaboration in social XR environments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



