Semantic Web at the BBC
Yves Raimond, BBC R&D
Europeana Plenary, Leuven, Belgium
14 June 2012
Radio since 1922
TV since 1930
On the Web since 1994
Programme support
1000 to 1500 programmes per day
Automated programme support
- BBC Programmes
- Aggregates and links data from various sources
- Permanent Web presence for all BBC programmes
- The Web site is the API
The BBC Programme Ontology
Publishing data
- RDF/XML, JSON, XML
- RDFa, Microdata
- Search engines
- External prototypes, new ideas of data sources, UX...
- Exchanging data between various systems at the BBC
BBC Music, using the Web as a CMS
BBC Wildlife Finder, horizontal navigation
Tagging with URIs
- Open Data sources (DBpedia, Musicbrainz)
- Linking different domains
- Contextualising our content
- Aggregating our content
Manual tagging
- Recent programmes
- Archived programmes around particular brands or topics
- What about the rest?
The BBC archive
ABC-IP: Automated interlinking of archived content
- BBC World Service archive
- 3 years of continuous audio, sparse metadata
- Topic interlinking (surfacing archive content on topic aggregations)
- Contributor interlinking
Topic interlinking for speech audio
- Transcription using CMU Sphinx
- Custom algorithm to identify topics from (noisy) transcripts
- Topics identified by DBpedia URIs
- Distributed using Amazon Web Services
- More details in our LDOW 2012 paper and our WWW'12 demo paper
Contributor interlinking
- Segmenting programmes by contributors
- Training models on individual contributors
- Fast matching of contributor models across programmes
- Cross-matching with third-party databases
Enabling users to validate and change data
- Automatically generated data can be inaccurate
- Engaging users in trying to make that data better
- How do users modify data in the light of the behaviour of others?
- How can we define data accuracy?
- Does validation and correction leads to a greater accuracy?