A. Hogan
Published: 2014-04-09
Total Pages: 344
Get eBook
Linked Data publishing has brought about a novel “Web of Data”: a wealth of diverse, interlinked, structured data published on the Web. These Linked Datasets are described using the Semantic Web standards and are openly available to all, produced by governments, businesses, communities and academia alike. However, the heterogeneity of such data – in terms of how resources are described and identified – poses major challenges to potential consumers. Herein, we examine use cases for pragmatic, lightweight reasoning techniques that leverage Web vocabularies (described in RDFS and OWL) to better integrate large scale, diverse, Linked Data corpora. We take a test corpus of 1.1 billion RDF statements collected from 4 million RDF Web documents and analyse the use of RDFS and OWL therein. We then detail and evaluate scalable and distributed techniques for applying rule-based materialisation to translate data between different vocabularies, and to resolve coreferent resources that talk about the same thing. We show how such techniques can be made robust in the face of noisy and often impudent Web data. We also examine a use case for incorporating a PagerRank-style algorithm to rank the trustworthiness of facts produced by reasoning, subsequently using those ranks to fix formal contradictions in the data. All of our methods are validated against our real world, large scale, open domain, Linked Data evaluation corpus.