Wednesday 8 January 2014

Querying Wikipedia data with SPARQL


Wikipedia is a great and well-known resource for all kinds of information.  However this information is not readily query-able (other than by search term) on the Wikipedia site directly.  There is a community effort at http://dbpedia.org to make a structured set of Wikipedia data available for querying and analysis.

The query form at http://dbpedia.org/sparql is a great way to test out the kinds of data extraction possible.



As a basic example, the following SPARQL query lists software products by subject and organisation and is limited to 500 results.
SELECT ?subject ?organisation ?product
WHERE
{
    ?organisation rdf:type <http://dbpedia.org/ontology/Company> .
    ?product <http://dbpedia.org/ontology/developer> ?organisation .
    ?product rdf:type <http://dbpedia.org/ontology/Software> .   
    ?product <http://purl.org/dc/terms/subject> ?subject .
}
ORDER BY ?subject ?organisation
LIMIT 500


The following SPARQL query lists subjects for which there are at least 50 software products defined, and provides the count of products against each subject.
SELECT ?subject count(distinct ?product)
WHERE
{
    ?organisation rdf:type <http://dbpedia.org/ontology/Company> .
    ?product <http://dbpedia.org/ontology/developer> ?organisation .
    ?product rdf:type <http://dbpedia.org/ontology/Software> .   
    ?product <http://purl.org/dc/terms/subject> ?subject .
}
GROUP BY ?subject
HAVING count(distinct ?product) >= 50
ORDER BY ?subject
LIMIT 500

The queries above are very simple examples and it is possible to make more complex use of the links between data available at http://dbpedia.org.

I believe there are many interesting possibilities for this capability and have been using data extracted in this way in a small project.

1 comment:

  1. Bet365 Casino & Promos 2021 - JTM Hub
    Full list of Bet365 Casino & Promos · Up 출장안마 to £100 in Bet Credits for 토토 사이트 new customers at bet365. Min deposit https://febcasino.com/review/merit-casino/ £5. 바카라 Bet Credits available for use upon settlement of bets to value of

    ReplyDelete