Online demo of ClassRank algorithm
This webapp is a prototype to calculate online the ClassRank score of a given RDF graph. A detailed overview of ClassRank algorithm is offered below. Currently, this online prototype cannot handle big graphs. In order to experiment with big sources we encourage you to download the source code of ClassRank and to test it locally.
Your results will be prompted in the text area labelled "Results". In case any error occurs during the process, it will also be prompted in the text area.
The results are given in a list of JSON objects. The objects represent detected classes and are sorted by its ClassRank score (the element with max score goes first). Each object contains the following keys:
ClassRank is an unsupervised technique useful for discovering URIs representing abstract concepts (classes) in a knowledge graph, and to measure their relevance within the source. ClassRank is primarily based on notions of graph centrality according to the PageRank algorithm. ClassRank associates each found class with a score with the following equivalent meanings:
We provide a graphical example of how ClassRank works:
In the image we prompt some elements of a certaing RDF graph. The rectangles represent URIs of instances, the clouds URIs of classes, and the arrows properties that join both elements; orange color for dbo:governamentType and green for rdf:type . Let's say that we have applied PageRank (PR) to a graph which contains those elements and relations. Also, lets say that we classify both dbo:governamentType and rdf:type as classpointers. Each entity on the graph has been associated to a PR score. The ClassRank (CR) score of a class is obtained by aggregating the PR scores of its instances. Then, in case Parlimentary system and Country don't have any other instance than the ones that appear in the picture, we would have:
The information brought by ClassRank may be helpful in several scenarios, including, but not limited to:
In order to discover class URIs, ClassRank explores the target graph looking for triples containing some special properties that we have called class-pointers. Essentially, a class-pointer is an RDF property which is expected to be used only to point to RDF classes.
ClassRank computes exclusively information contained in the target graph, i.e., no third-party knowledge is used during the process, thus ClassRank can be applied on KGs of any kind of domain.
The damping factor α in PageRank defines the probability p = (1 - α) for a random surfer to get bored on moving through the graph using links, moment in which he decides to jump to a random element. The thresholds θI and θC are used to ignore some facts that occur rarely, which may mean that they are noise or they have a non-significant presence in G. The algorithm returns three results:
The combined information of CR scores and list and amount of isntances provide the relevance of each class and it allows us to analyze the source of that relevance.
We provide a formalization of ClassRank algorith:
We have implemented ClassRank in Python and the source code is publicly available. Our version of Classrank is still a prototype, more tests are needed and some bugs (which do not affect to the final score) have been already detected. It completely operates in main memory to maintain some structures during the computations, so you may need a powerfull machine in order to compute huge grpahs.
Some examples and instructions are provided in the repository.
ClassRank has already been applied over big RDF sources. We provide access to the results of applying ClassRank on a dump of Wikidata, date 2016/10/26. The link provided also give acces to an older implementation of ClassRank whose code is tightfully linked to Wikidata. If you are planning to test ClassRank, we recomend you to use the implementation that we have recently provided, which is thought to work with any kind of RDF graph.
We will provide soon the results of applying ClassRank over the English edition of DBpedia.