Semantic Web at the BBC

Semantic Web at the BBC

Yves Raimond, BBC R&D
Europeana Plenary, Leuven, Belgium
14 June 2012

Radio since 1922

BBC Eagle

TV since 1930

BBC Tuning Signal

On the Web since 1994

BBC 1994

Programme support

Doctor Who official site

1000 to 1500 programmes per day

BBC Color

Automated programme support

BBC Programmes
Aggregates and links data from various sources
Permanent Web presence for all BBC programmes
The Web site is the API

BBC Programmes

BBC Programmes

The BBC Programme Ontology

Publishing data

RDF/XML, JSON, XML
RDFa, Microdata
Search engines
External prototypes, new ideas of data sources, UX...
Exchanging data between various systems at the BBC

Other domains

BBC Music
BBC Wildlife Finder
BBC Food
BBC Sport (World Cup 2010, London 2012)

BBC Music, using the Web as a CMS

Tom Waits page on BBC Music

Aggregate Link

BBC Wildlife Finder, horizontal navigation

Horizontal navigation

Horizontal navigation

Tagging with URIs

Open Data sources (DBpedia, Musicbrainz)
Linking different domains
Contextualising our content
Aggregating our content

Manual tagging

Recent programmes
Archived programmes around particular brands or topics
What about the rest?

The BBC archive

ABC-IP: Automated interlinking of archived content

BBC World Service archive
3 years of continuous audio, sparse metadata
Topic interlinking (surfacing archive content on topic aggregations)
Contributor interlinking

Topic interlinking for speech audio

Transcription using CMU Sphinx
Custom algorithm to identify topics from (noisy) transcripts
Topics identified by DBpedia URIs
Distributed using Amazon Web Services
More details in our LDOW 2012 paper and our WWW'12 demo paper

Contributor interlinking

Segmenting programmes by contributors
Training models on individual contributors
Fast matching of contributor models across programmes
Cross-matching with third-party databases

Enabling users to validate and change data

Automatically generated data can be inaccurate
Engaging users in trying to make that data better
How do users modify data in the light of the behaviour of others?
How can we define data accuracy?
Does validation and correction leads to a greater accuracy?

Thank you!