About
I forgot who it was but someone once told me that the best and most successful open-source software projects start out of frustration. It’s hard to prove, yet it’s an interesting point of view. Our biggest frustration was the lack of decent open data, despite European PSI directives, and even when data was fully opened, too little effort has been put into supporting a community. At first The DataTank was nothing more than an API which could get data from a remote website and return it in json or xml. All developers using the webservice could correct errors and enhance the code which would take care of putting everything in an app-readable format. Nowadays The DataTank is a project which solves a problem for three parties: dataset users such as data journalists or app developers, data owners and academics studying the semantic web..
0. Open data
According to wikipedia,
“Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. While not identical, open data has a similar ethos to those of other “Open” movements such as open source, open content, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term “open data” itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such Data.gov.”
1. Data owners
If you should know one important detail about data owners, or about the people that are able to take decisions about certain datasets, it’s that they’re not the people writing code. Once they made up their mind that opening some of their data is the right choice, they all encounter the same problems and questions. Questions that need to be asked are: “How do we have to open up data?”, “Once released how will we get a return on this investment?” and last but not least “How do we make people use our data?”.
Without The DataTank all these questions could be solved in several ways. If the policymakers are experienced they will probably decide on setting op a triplestore and add their data to the semantic web. If they are not, which is not bizarre for non-technical people, they’ll advise their web-team to add datasets to their homepage and maybe their technical people will even develop an API. In many cases however, policymakers have no idea what to do. With The DataTank we give them an easier solution: they only have to tell their IT department to install this software and add their datasets.
With The DataTank you do not only have an answer to the first question. Once The DataTank has been set up, data owners have the right tools to monitor the usage of their data. Business intelligence becomes an easy task since the meta-data (stats, feedback, documentation…) is made available through the same API. You can see a clear return on investment when third parties start to make use of your data, when you can see what people are looking for, and when you can retrieve feedback from a community.
The DataTank aims at lowering the bar to open up data, but there’s no use in that if no one will ever use it. Therefore, The DataTank provides good error handling, an intuitive interface and good automatically generated documentation for every resource. By providing a framework that focuses on both publishing and discovering data, we hope The DataTank will become another raison d’être for open data.
2. App developers
Application developers, the people creating tools or applications using datasets, are in the first place people who are creating tools or applications «using» datasets. They are not specialized in getting all data from different sources and providing their application with a decent datalayer, since that’s not their corebusiness.
Without The DataTank app developers are setting up quickly developed APIs which return the right output for their applications. This often means that error handling, statistics, a feedback mechanism and/or documentation are the first things to be left out.
With The DataTank app developers have a decent webservice. When they set up a DataTank instance on their servers they can use it as a proxy for any other resource in any other DataTank set-up. This means that the developer remains in control of his stats while the data owner has full control over its dataset: if something in the dataset changes, it will change in the app.
App developers will also want to combine data. For instance if one dataset returns all points of interest close to X and another dataset returns all buses from our place to X, we might want to have this information in one response. The developer has two choices: he can use a SPARQL end-point, or he will be able to join on REST identifiers through our API.
3. Academics
Besides providing a flexible platform to open up data through a RESTful API, the platform provides a study on semantics. According to Tim Berner-Lee’s 5 star plan for Linked Open Data, data should be enriched with URI’s after data has been published in an open format, in order to make the data machine readable. This could be done in a native way by using semantics. After the datasets have been enriched, a final step has to be taken in order to fulfil the fifth star. This step consists of interlinking the triple enriched datasets to provide context.
The DataTank aims at providing a framework that allows people to complete the third, fourth and fifth star of Tim Berner-Lee’s 5 star plan, without obliging data owners to go beyond the third star (publishing data in an open format) theirself. It aims at providing this functionality independently from what datasource someone wants to publish. Any user can choose at any point to enrich its datasets with triples and anyone can help a dataset reach the fourth star by linking it to an ontology.







