This is a summary about research conducted for the XBRL Challenge.
Statement of Purpose (description): The Financial Information Observation System (FIOS) provides a web-based Online Analytical Processing (OLAP) interface for multi-company analysis over an integrated dataset, consisting of XBRL data from the Security and Exchange Commission (SEC) and the Federal Financial Institutions Examination Council (FFIEC) as well as open web data from Wikipedia/DBpedia and Freebase.
There is an every-increasing amount of data available, which enterprises can leverage to their competitive advantage. While decision makers require an integrated view over information, relevant data typically originates from a vast variety of disparate sources.
With FIOS we show how to merge XBRL data (from SEC EDGAR and the FFIEC) with freely-available Semantic Web data (from Wikipedia and Freebase). Our system uses increasingly popular Linked Data principles for accessing and encoding data. We use standard open-source tools to transform and make openly available SEC EDGAR and FFIEC data according to Linked Data principles, and to collect and integrate data from various sources. The FIOS user interface abstracts the underlying RDF data format of Linked Data away from the user and bases interaction on intuitive and interactive operations on well-known OLAP data cubes and operations. We have developed olap4ld to bridge the gap between Semantic Web datasets and OLAP user interfaces.
FOIS currently incorporates the following data sources:
The goal of FIOS is to demonstrate:
In the following we explain how FIOS addresses the requirements of the XBRL challenge.
Striving towards enhanced data sharing within organisations will experience a cultural barrier to adoption as the nature of information ownership changes. The motivation is therefore to provide enhanced accessibility and transparency of linked information that will allow greater use be made of that data.
Our demo shall demonstrate the potential of FIOS:
We have made available a screencast demonstrating a query in FIOS via YouTube.
We have also made a demonstration of FIOS available on the web.
The FIOS Saiku Demo can be accessed using username b
and password b
.
The OLAP interface consists of two parts:
For the demo we have prepared two data cubes, "SEC-Cube-Gross-Profit-Margin" and "SEC-FFIEC-Cube". As an example query we want to query for "Sales revenue net" of a specific company. We select the Issuer dimension (the issuers of disclosures) and drag-and-drop the "Issuer root level" into "columns". We click on "Issuer root level" to only select "WELLPOINT, INC" and "WEYERHAEUSER CO". Then, we drag "Dtend root level" (the valid end date of disclosures) into "rows". We see the "Cost of goods sold" (since as the first measure listed, it is selected for the query automatically; Sales revenue net can be dragged to the filter in order to be selected), or if there are several disclosures per cell, we see the number of disclosures. We can add "Dtstart root level" (the valid start date) to "rows" to do a cross-join and drill further down.
FIOS' capability to integrate meta data from external sources can be seen when removing the Issuer root level and instead adding "Business business operation industry" to columns. Then, disclosures are grouped by the category of companies as given by Freebase.
FIOS' capability to integrate different XBRL data is demonstrated by our second Data Cube, "SEC-FFIEC-Cube". Using the cube, disclosures of the SEC, as well as disclosures of the FFIEC can be queried. Select SEC-FFIEC-Cube in the drop down menu. Next, drag-and-drop the Issuer root level to columns and drag all measures into rows. Note that "Cost of goods sold" and "Sales revenue net" are measures from SEC filings, whereas automobile loan "137 RCONK" is a measure of an FFIEC filing. We see that each measure is disclosed by at least one company.
In summary, FIOS allows for OLAP over financial data; the data can originate from a single XBRL source, from multiple XBRL sources (such as SEC EDGAR and FFIEC), and from XBRL sources integrated with external data sources.
FIOS represents the instantiation of a dynamic extract-transform-load pipeline: data is collected from the data sources via an open-source crawler (LDSpider) and stored using an open source database (triple store) for RDF data (Openlink Virtuoso). Once the data is loaded into the triple store, Data Cubes are found automatically in the data or are defined manually by issuing queries on the triple store. The schema used for finding and defining data cubes in RDF is the RDF Data Cube vocabulary. We make the triple store available on the web for the FIOS UI and other applications.
In the following we shortly describe how XBRL data from the SEC has been modelled as Data Cubes and how it was possible to integrate the SEC data with other data sources. The following figure gives an overview of the modelling in a class diagram:
The SEC publishes large numbers of XBRL files as "interactive filings" on their website. The central element of a filing are disclosures. In an example disclosure a specific company discloses the information that in the fiscal year of 2009 in the first quarter the company have had a sales revenue net of 1,000,000 USD.
Disclosures are uniquely identified by the following properties that we call dimensions:
A disclosure contains one or more values that are fully dependent on the dimensions, which are called measures. The meaning of a measure is given by attributes:
Based on this modelling of SEC XBRL data the data could be integrated with other data sources: from Freebase we get higher-level information about issuers (operation industry) and from the FFIEC Linked Data Wrapper we receive additional XBRL filings with disclosures. Those disclosures serve similar dimensions, e.g., issuer, but different measures, e.g., about automobile loans (137 RCONK).
In FIOS, user interact with the data using common OLAP operations such as slice
and dice
: the component that enables this functionality is a middleware to connect Linked Data with standard OLAP clients.
We make the component, olap4ld, available as open-source.
The following figure gives an overview of the software components of FIOS:
We have presented FIOS, a modular extensible system for enabling OLAP analysis over XBRL and non-XBRL data integrated from several sources. FIOS leverages semantic technologies (in particular, Linked Data) and various open-source systems to achieve elaborate functionality. The use of semantic technologies allows for a "pay-as-you-go" strategy to incrementally integrate a broad variety of datasets. We have developed a crucial component, olap4ld, to bridge the gap between Semantic Web datasets and OLAP user interfaces. Future work includes the support of the full set of common OLAP operations for analyses by information professionals, specifically dimension hierarchy drill down, complex measures, aggregation functions and what-if-scenarios.
Work on FIOS has been funded by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2); the EU FP7 Activity ICT-4-2.2 under Grant Agreement No. 248458, Multilingual Ontologies for Networked Knowledge (MONNET); the EU Network of Excellence PlanetData (ICT-NoE-257641) project; and the German Ministry of Education and Research (BMBF) within the SMART project (Ref. 02WM0800).