Financial Information Observation System (FIOS) Final Submission - XBRL Challenge

Andreas Harth
AIFB, Karlsruhe Institute of Technology, Germany
Benedikt Kämpgen
AIFB, Karlsruhe Institute of Technology, Germany
Sean O’Riain
DERI, National University of Ireland, Galway

This is a summary about research conducted for the XBRL Challenge.

Statement of Purpose (description): The Financial Information Observation System (FIOS) provides a web-based Online Analytical Processing (OLAP) interface for multi-company analysis over an integrated dataset, consisting of XBRL data from the Security and Exchange Commission (SEC) and the Federal Financial Institutions Examination Council (FFIEC) as well as open web data from Wikipedia/DBpedia and Freebase.

Introduction

There is an every-increasing amount of data available, which enterprises can leverage to their competitive advantage. While decision makers require an integrated view over information, relevant data typically originates from a vast variety of disparate sources.

With FIOS we show how to merge XBRL data (from SEC EDGAR and the FFIEC) with freely-available Semantic Web data (from Wikipedia and Freebase). Our system uses increasingly popular Linked Data principles for accessing and encoding data. We use standard open-source tools to transform and make openly available SEC EDGAR and FFIEC data according to Linked Data principles, and to collect and integrate data from various sources. The FIOS user interface abstracts the underlying RDF data format of Linked Data away from the user and bases interaction on intuitive and interactive operations on well-known OLAP data cubes and operations. We have developed olap4ld to bridge the gap between Semantic Web datasets and OLAP user interfaces.

FOIS currently incorporates the following data sources:

Features of FIOS

The goal of FIOS is to demonstrate:

In the following we explain how FIOS addresses the requirements of the XBRL challenge.

Improves access - enables investor stakeholder access to corporate data
With FIOS, stakeholders can gain a holistic view on data that previously was dispersed across multiple source locations and held in isolated data silos. FIOS provides access to seamlessly integrated data. Stakeholders can begin data analysis based on one source such as XBRL data from the SEC and seamlessly traverse to data from other sources to "discover" and build a better information picture for their analysis activity, e.g., starting with some companies financial statements or metrics, exploring them based on additional company information, and comparing discovered patterns with external statistics. For instance, if companies mentioned in the SEC are linked to companies listed in Freebase, and since Freebase classifies companies according to industry sectors different from the SEC, users can select and view XBRL company data according to Freebase's sector taxonomy. FIOS dynamic extract-transform-load pipeline can be re-run regularly to have access to up-to-date data.
Usability - application quality, usability, and accessibility
As FIOS uses the well-known multidimensional data model known from OLAP decision support systems, users can leverage their existing knowledge of OLAP systems and spreadsheets to access corporate XBRL data. We use Saiku to demonstrate the features of FIOS since Saiku is open-source, web-based, and its interface supports functionalities such as selection of cubes, dimensions, measures, as well as slice and dice, which are supported by most OLAP tools, e.g., Tableau, Palo, and JPivot.
Design - originality and creativity, cannot be drawn from an existing design
FIOS provides novel overall functionality via a unique combination of existing open-source offerings with an open-source implementation to connect OLAP interfaces to Semantic Web datasets (olap4ld). FIOS integrates different components using Semantic Web standards and technology to provide an interoperable common data space that supports agile information search and association with a wider range of data sources. Specifically, FIOS combines the power of semantic technologies with the ease-of-use of OLAP based user interfaces.
Extensibility - potential for further development and use
FIOS's component architecture is modular and can accommodate system extensibility through changes in its three main functionality areas of source inclusion, query enhancement and user interface. FIOS currently supports basic OLAP operations that provide analysis capabilities on data cubes. All components of FIOS are open-source.
Participation potential - engages and motivates target audience of investors and analysts
All data is web-accessible and freely available. The integrated information is also web-accessible and in turn interoperable and linkable with other data sources; the data can be accessed as a resource, analysis conducted, comments added and also easily shared. In this manner third-parties can link to and integrate data. If supported by the UI, FIOS also allows for queries, query results, and visualisations to be shared and discussed.

Striving towards enhanced data sharing within organisations will experience a cultural barrier to adoption as the nature of information ownership changes. The motivation is therefore to provide enhanced accessibility and transparency of linked information that will allow greater use be made of that data.

Tutorial

Our demo shall demonstrate the potential of FIOS:

  1. to query and analyse SEC XBRL data using OLAP
  2. to query and analyse SEC XBRL data according to background information not easily available otherwise, e.g., business information from external sources
  3. and to correlate and compare SEC XBRL data with other financial statistics, e.g., XBRL data from the Federal Financial Institutions Examination Council (FFIEC).

We have made available a screencast demonstrating a query in FIOS via YouTube. We have also made a demonstration of FIOS available on the web. The FIOS Saiku Demo can be accessed using username b and password b.

The OLAP interface consists of two parts:

For the demo we have prepared two data cubes, "SEC-Cube-Gross-Profit-Margin" and "SEC-FFIEC-Cube". As an example query we want to query for "Sales revenue net" of a specific company. We select the Issuer dimension (the issuers of disclosures) and drag-and-drop the "Issuer root level" into "columns". We click on "Issuer root level" to only select "WELLPOINT, INC" and "WEYERHAEUSER CO". Then, we drag "Dtend root level" (the valid end date of disclosures) into "rows". We see the "Cost of goods sold" (since as the first measure listed, it is selected for the query automatically; Sales revenue net can be dragged to the filter in order to be selected), or if there are several disclosures per cell, we see the number of disclosures. We can add "Dtstart root level" (the valid start date) to "rows" to do a cross-join and drill further down.

FIOS' capability to integrate meta data from external sources can be seen when removing the Issuer root level and instead adding "Business business operation industry" to columns. Then, disclosures are grouped by the category of companies as given by Freebase.

FIOS' capability to integrate different XBRL data is demonstrated by our second Data Cube, "SEC-FFIEC-Cube". Using the cube, disclosures of the SEC, as well as disclosures of the FFIEC can be queried. Select SEC-FFIEC-Cube in the drop down menu. Next, drag-and-drop the Issuer root level to columns and drag all measures into rows. Note that "Cost of goods sold" and "Sales revenue net" are measures from SEC filings, whereas automobile loan "137 RCONK" is a measure of an FFIEC filing. We see that each measure is disclosed by at least one company.

In summary, FIOS allows for OLAP over financial data; the data can originate from a single XBRL source, from multiple XBRL sources (such as SEC EDGAR and FFIEC), and from XBRL sources integrated with external data sources.

Implementation

FIOS represents the instantiation of a dynamic extract-transform-load pipeline: data is collected from the data sources via an open-source crawler (LDSpider) and stored using an open source database (triple store) for RDF data (Openlink Virtuoso). Once the data is loaded into the triple store, Data Cubes are found automatically in the data or are defined manually by issuing queries on the triple store. The schema used for finding and defining data cubes in RDF is the RDF Data Cube vocabulary. We make the triple store available on the web for the FIOS UI and other applications.

In the following we shortly describe how XBRL data from the SEC has been modelled as Data Cubes and how it was possible to integrate the SEC data with other data sources. The following figure gives an overview of the modelling in a class diagram:

Modelling in FIOS

The SEC publishes large numbers of XBRL files as "interactive filings" on their website. The central element of a filing are disclosures. In an example disclosure a specific company discloses the information that in the fiscal year of 2009 in the first quarter the company have had a sales revenue net of 1,000,000 USD.

Disclosures are uniquely identified by the following properties that we call dimensions:

A disclosure contains one or more values that are fully dependent on the dimensions, which are called measures. The meaning of a measure is given by attributes:

Based on this modelling of SEC XBRL data the data could be integrated with other data sources: from Freebase we get higher-level information about issuers (operation industry) and from the FFIEC Linked Data Wrapper we receive additional XBRL filings with disclosures. Those disclosures serve similar dimensions, e.g., issuer, but different measures, e.g., about automobile loans (137 RCONK).

In FIOS, user interact with the data using common OLAP operations such as slice and dice: the component that enables this functionality is a middleware to connect Linked Data with standard OLAP clients. We make the component, olap4ld, available as open-source.

The following figure gives an overview of the software components of FIOS:

FIOS components

Conclusion

We have presented FIOS, a modular extensible system for enabling OLAP analysis over XBRL and non-XBRL data integrated from several sources. FIOS leverages semantic technologies (in particular, Linked Data) and various open-source systems to achieve elaborate functionality. The use of semantic technologies allows for a "pay-as-you-go" strategy to incrementally integrate a broad variety of datasets. We have developed a crucial component, olap4ld, to bridge the gap between Semantic Web datasets and OLAP user interfaces. Future work includes the support of the full set of common OLAP operations for analyses by information professionals, specifically dimension hierarchy drill down, complex measures, aggregation functions and what-if-scenarios.

Acknowledgements

Work on FIOS has been funded by the Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2); the EU FP7 Activity ICT-4-2.2 under Grant Agreement No. 248458, Multilingual Ontologies for Networked Knowledge (MONNET); the EU Network of Excellence PlanetData (ICT-NoE-257641) project; and the German Ministry of Education and Research (BMBF) within the SMART project (Ref. 02WM0800).