Automated interlinking of speech radio archives

Automated interlinking of speech radio archives

Yves Raimond, Chris Lowis, BBC R&D
Linked Data on the Web, WWW, Lyon
16 April 2012

Tagging at the BBC

... only for resources already published on the Web site

The BBC archive

Cataloguing practices

Classify the entire archive?

Automated interlinking of speech audio

Automatically extract topics and identify them with Linked Data URIs

Automated interlinking workflow

Automated Speech Recognition

Enhanced Topic Vector Space model

Term identification

Vector space

Vector space

sim(C4, C5) = 0.247; sim(C4, C6) = 0.048

Disambiguation and ranking

Disambiguation

Pick the closest interpretation to the programme vector

Ranking

Evaluation

TopN

Evaluation results

Other TopN results

The World Service archive

Processing the World Service archive

Some examples

Thank you!