The BBC World Service Archive experiment

Yves Raimond, BBC R&D IRFS / @moustaki

The BBC Archive

Publishing our archive

The World Service archive

The missing metadata

Machine listening

Automated speech recognition

Automated transcripts

Automated tagging

Example results

Processing archives in the cloud

Noise

Algorithms and people

http://worldservice.prototyping.bbc.co.uk

Data validation

Speaker segmentation

Crowd-sourcing speaker names

Speakerthon

Propagating speaker names

Evaluating speaker identification

Refining our models

User activity

How good is the data?

  • Tags are a large and sparse space
  • When is a tag correct?
  • When is a programme tagged completely?
  • How do you measure crowdsourced data?

Who does the work?

Emerging shape of the archive

Visualising the archive

Semantic Web Challenge 2013 - First prize!

Code

ClOud Marketplace for Multimedia Analysis

Conclusion

Thank you!

Photo credits:

  • http://www.flickr.com/photos/andyarmstrong/4402416306/
  • http://www.flickr.com/photos/nicecupoftea/8579975238/
  • http://www.flickr.com/photos/11561957@N06/5202870020/
  • http://www.flickr.com/photos/hubmedia/2141860216/
  • http://www.flickr.com/photos/allison_mcdonald/7604871594
  • http://www.flickr.com/photos/aayars/4072755936/