The BBC Archive
Publishing our archive
The World Service archive

The missing metadata

Machine listening
Automated speech recognition

Automated transcripts
Automated tagging

Example results

Processing archives in the cloud
Noise
Algorithms and people

Data validation
Speaker segmentation
Crowd-sourcing speaker names
Speakerthon

Propagating speaker names
Evaluating speaker identification
Refining our models

User activity
How good is the data?
- Tags are a large and sparse space
- When is a tag correct?
- When is a programme tagged completely?
- How do you measure crowdsourced data?
Who does the work?

Emerging shape of the archive
Visualising the archive
Semantic Web Challenge 2013 - First prize!
ClOud Marketplace for Multimedia Analysis

Conclusion

Thank you!
Photo credits:
- http://www.flickr.com/photos/andyarmstrong/4402416306/
- http://www.flickr.com/photos/nicecupoftea/8579975238/
- http://www.flickr.com/photos/11561957@N06/5202870020/
- http://www.flickr.com/photos/hubmedia/2141860216/
- http://www.flickr.com/photos/allison_mcdonald/7604871594
- http://www.flickr.com/photos/aayars/4072755936/