The Human Speechome Project

The Human Speechome Project

We humans are born with certain innate predisposition to develop certain abilities, most of which require a learning period before they are fully acquired. The main example of this could arguably be language development. Lexical acquisition is a complex process that takes over the first few years of a child’s life and that relies in many external factors, being linguistic input (range and frequency of words listened by the child), the most obvious, but not the only one of them.

Environment is crucial for learning in this case, and the natural day-to-day environment of the child during the learning process is not only composed of language, but also of places, people, social situations and interactions. Professor Deb Roy has tried to take into account all this different dimension of the language acquisition environment in his Human Speechome Project (HSP).

The HSP was launched in 2005 as a pilot project of the MIT lab. What first motivated Roy to do so was that our understanding of language acquisition was based on highly diffused and sparse data, since available corpora was rather incomplete and under-sampled.

This basic problem lead Roy to install in his own house eleven omnidirectional cameras and fourteen microphones and to record (almost completely) the first three years of his son’s life in his natural environment. As a result, he and his lab colleagues obtained 120,000h of audio and 90,000h of video to work with (an estimated 70% of the child’s waking hours).

The comprehensive study that Roy had envisioned was now possible with such a massive data set. However, in order to work with it, they had to organise it. The main challenge the project had to face was the transcription and annotation of all the corpus. By 2012, using a semi-automatic tool, they had managed to transcribe 80% of the Child Available Speech of the subset 9-24 month age range.

The HSP corpus includes, in addition to speech, video recordings of the day-to-day life in Roy’s house. The ultimate aim of the MIT team was to gain understanding and computationally model how the child learnt a particular a word by tracing back to the context in which it was used by adults speaking to him. For doing so they had to structure all the hours of video by identifying recurrent activities of day-to-day life in which the child participates (such as mealtime, or bedtime). The results of this computer modelling showed that words that are uttered around the child in consistent activity contexts appear to be learnt earlier.

The implications of this research are many. Even though for most individuals language acquisition is a smooth and steady process, some children (due to environmental or biological causes) struggle with it. Being able to identify and understand the importance of the regularities in home environments is essential to understand the mechanisms that work in this process and to approach appropriately the problems that could emerge.

In addition, even though HSP studied only the case of one child and could be therefore considered limited, the team is confident that it will guide further observational and experimental research on the field, and that the data mining methodology that has been developed can enable researchers to deal with high density audio-visual data sets, helping to address other questions in behavioural sciences.



Section: Science & Tech

Leave a Reply

Your email address will not be published. Required fields are marked *