The BBC World Service archive prototype

Towards an alternative approach to publishing large archives?

Yves Raimond, BBC R&D IRFS / @moustaki

The BBC Archive

Cataloguing the archive

In Our Time archive

BBC Four Army collection

Tagging programmes

Linked Data

An alternative approach

The World Service archive

Unlocking the archive through machine listening

Automated speech recognition

Automated transcripts

Automated tagging

Example results

Automated tagging evaluation

  • Dataset of 132 programmes manually tagged
  • TopN measure
  • Random baseline: 0.0002
  • Our algorithm: 0.209
  • Next best: 0.195
  • Dataset and evaluation script available on our Github
  • Core algorithm available on Github

Processing archives in the cloud

Bootstrapping search and discovery


Data validation

Speaker segmentation

Crowd-sourcing speaker names

Propagating speaker names

Evaluating speaker identification

User activity

Emerging shape of the archive

Visualising the archive

ClOud Marketplace for Multimedia Analysis


Thank you!

Photo credits: