What began to surface through this work was a tension already present within captioning practice.
Accessibility guidelines often prioritise clarity and legibility, which can discourage captions from introducing cultural context that is not visibly or audibly present in the film. Neutral labels such as [laughs] or [street ambience] are designed to remain broadly accessible. Yet these generic descriptions can sometimes flatten relational and cultural nuance, as if there were a single universal listener.
This raises a question about how listening itself is described.
Cultural recognition rarely appears through explicitly naming a culture. Instead, it often emerges through specificity through attention to delivery, pacing, texture and social function. Listening is always situated within environments, relationships and habits of interpretation.
For example, the Southeast Asian example of ‘555′ demonstrates that laughter is not interchangeable across contexts. In Thai digital culture, the number five is pronounced ha, so ‘555’ reads as laughter. Yet laughter itself can also carry different social meanings: whether it is softening hierarchy, signalling politeness or creating intimacy.
When captions reduce these moments to a single bracketed [laughs], relational meaning can disappear. At the same time, inserting cultural explanations where they are not present visually risks shifting captioning from description into interpretation.
The challenge then, is not to make captions more explanatory but to rethink how listening is described.
A situated listening approach focuses on what the sound is doing rather than what culture it belongs to. Instead of relying on generic labels, captions can attend to delivery – a brief easing laugh, a restrained exhale, overlapping chatter, distant music leaking through a wall. These are not poetic interpretations but material qualities that place the viewer within a specific sonic environment.
This lens is not limited to Southeast Asian contexts.
Standard English captioning often assumes neutrality, flattening regional dialects, dry humour or class-coded sonic environments. A British coastal promenade, a rural pub or a London estate might all be reduced to [crowd noise] or [wind], even though their atmospheres differ significantly.
Situated captioning resists these universal categories by foregrounding specificity – not by naming culture directly but by allowing sonic environments to remain particular.
Relational listening therefore shifts captioning away from categorisation toward attentiveness. It asks practitioners to notice density and spatial texture – rain against metal roofs versus rain against leaves; distant television commentary leaking through a window; voices moving between languages in the background.
These descriptions remain grounded in what is audible yet they acknowledge that sound is always experienced within social and cultural environments.
Seen this way, captions do not become more personal or expressive. Rather, they become more attentive.
The Southeast Asian examples function less as content to be replicated and more as a methodological lens, a reminder that neutrality often hides a dominant listening position. By expanding descriptive vocabulary while remaining faithful to the sonic moment, captioning can hold cultural specificity without turning accessibility into explanation.
This research by Celina Loh draws on Southeast Asian communal practices such as eating together as a starting point to explore how relational approaches might complement structural access frameworks in the UK arts and cultural sector. Beginning with shared meals, the work expands into collaborative listening, sound and facilitation practices, reflecting on how care and attentiveness circulate within creative environments.
At its centre is a simple question, how do people create conditions where others want to remain? Through workshops, shared meals and listening practices, the research explores how participants notice one another, adjust and begin to host each other. While access often takes shape through structures, this work considers how it is sustained in practice through the ways people respond and make space for one another and themselves.
Loh’s research is supported by the British Art Network (BAN). BAN is a Subject Specialist Network supported by Tate and the Paul Mellon Centre for Studies in British Art, with additional public funding provided by the National Lottery through Arts Council England.