LargeInstanceProcessing
From XBRLWiki
Revision as of 15:13, 26 February 2014 (edit) Iboixo (Talk | contribs) ← Previous diff |
Current revision (10:28, 3 February 2015) (edit) Eric.jarry (Talk | contribs) (→Checking business rules - Introduction of Extensible Enumerations) |
||
Line 1: | Line 1: | ||
- | IN CONSTRUCTION | + | [[In progress...]] |
+ | = Introduction = | ||
+ | |||
+ | Several families of taxonomies have led to potentially large instances (e.g. more than a few tens of kilobytes, up to several gigabytes). | ||
+ | |||
+ | The taxonomy currently known as having this characteristic are: | ||
+ | * Taxonomy of Bank of Indonesia, for which an XBRL White paper has been published (http://www.xbrl.org/sites/xbrl.org/files/imce/lrg_instance_proc_indonesia.pdf); | ||
+ | * Solvency II taxonomies defined by EIOPA (European Insurance and Occupational Pensions Authority: https://eiopa.europa.eu); | ||
+ | * Basel III / CRD IV taxonomies, COREP and FINREP, defined by EBA (European Banking Authority: http://www.eba.europa.eu). | ||
+ | |||
+ | Note: the European taxonomies are intended to be used by all countries of the European Union, and more. | ||
+ | |||
+ | The size of these instances are typically due to lists of details for things like loans, financial products or assets. | ||
+ | |||
+ | Some tests have been made and led to difficulties. | ||
+ | |||
+ | The subject is tackled by the XBRL International, in the Standards Board and Best Practices Board and the topic has been discussed in the XBRL International conferences, during the 24th XBRL Conference in Yokohama (December 2012): | ||
+ | |||
+ | * The Challenges of Processing Large Instances by Ashu BHATNAGAR (XBRL International) and Michal PIECHOCKI (BR-AG): http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2Large%20instances%20session.pdf | ||
+ | * Large Instances Technology by Paul WARREN (CoreFiling): http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2LargeInstances.pdf | ||
+ | |||
+ | The Specification working group is working on the subject: | ||
+ | |||
+ | * A Working Group Note has been published by XBRL International, proposing mainly to adopt a streaming solution and proposing adequate structure of XBRL instance: http://www.xbrl.org/WGN/large-instance-processing/WGN-2012-10-31/large-instance-processing-WGN-WGN-2012-10-31.html. | ||
+ | * An XBRL specification is being developed to specify additional rules for streamin XBRL instances: http://www.xbrl.org/Specification/streaming-extensions-module/PWD-2013-03-06/streaming-extensions-module-PWD-2013-03-06.html. | ||
+ | |||
+ | This Wiki is a forum where this topic can be freely discussed. | ||
+ | |||
+ | = Types of issues = | ||
+ | |||
+ | Several difficulties may happen at different stages when processing instances, when: | ||
+ | * loading the taxonomy | ||
+ | * generating the instance | ||
+ | * signing the instance | ||
+ | * transmitting the instance | ||
+ | * parsing the instance | ||
+ | * validating the instance | ||
+ | * checking business rules | ||
+ | * reporting errors | ||
+ | * rendering the instance | ||
+ | |||
+ | == Loading the taxonomy == | ||
+ | |||
+ | In some case big instances correspond to big taxonomies. | ||
+ | |||
+ | When a Data Point Model appears in instances (case of highly dimensional taxonomies), instances are bigger than for moderately dimensional taxonomies, where some dimensional aspects are hidden. This large set of dimensional elements leads to big taxonomy. | ||
+ | |||
+ | Sometimes, it is necessary to chop a taxonomy in several entry points to avoid too big DTS, this is the case of the COREP taxonomy which had to be chopped in four parts. | ||
+ | |||
+ | In the case of multi-lingual taxonomies, like the European ones, existence of labels in several languages also inflate the size of the taxonomy. Care must be taken to include only used labels in a given country (there are 24 languages in the European Union, plus Norwegian and Icelandic). | ||
+ | |||
+ | == Generating the instance == | ||
+ | |||
+ | The FRIS document puts constraints on the ordering of units and contexts that should appear before facts, but this rule must be relaxed because it hinders the streaming of the instances. | ||
+ | |||
+ | This aspect is covered by the Working Group Note. | ||
+ | |||
+ | == Signing the instance == | ||
+ | |||
+ | Typically, supervisors request the signing of the transmitted instances to fulfil integrity and non-repudiation. | ||
+ | |||
+ | Sometimes, it is also necessary to crypt the instance to fulfil confidentiality. | ||
+ | |||
+ | Security tool may have limitation and adequate tools must be used. | ||
+ | |||
+ | It could also be possible to sign or encrypt a compressed file but this would mean to have a canonical compression algorithm. | ||
+ | |||
+ | [[More work needed]] | ||
+ | |||
+ | == Transmitting the instance == | ||
+ | |||
+ | Sending a multi-gigabyte document may cause difficulty but should be possible (technolgies exist to exchange video files of several gigabytes). | ||
+ | |||
+ | It is possible to transmit a compressed file that should be much smaller, due to the large compression factor of XML / XBRL files. | ||
+ | |||
+ | == Parsing the instance == | ||
+ | |||
+ | This aspect is covered by the Working Group Note. | ||
+ | |||
+ | == Validating the instance == | ||
+ | |||
+ | In this section, validation mean enforcement of the rules defined in XBRL 2.1 and XBRL Dimensions 1.0. | ||
+ | |||
+ | For the memory aspect, such a validation may be done fact by fact, with no need to keep the information in memory. | ||
+ | |||
+ | For dimensional validation, the context (or a representation of it) must be accessed, it is thus necessary to keep contexts-related information available. | ||
+ | |||
+ | == Checking business rules == | ||
+ | |||
+ | Business checks are typically exercised through assertions (defined by the XBRL Formula specifications). | ||
+ | |||
+ | This is a difficult point for XBRL processors that spend a lot of time for this task. | ||
+ | |||
+ | New XBRL specifications may be developed and used to optimise processing of large instances. For example, the Extensible Enumerations specification (http://www.xbrl.org/Specification/ext-enumeration/REC-2014-10-29/ext-enumeration-REC-2014-10-29.html) may be used to avoid having to check the validity of enumerations using assertions. | ||
+ | |||
+ | Software providers may propose optimisations in the expression of formula (for example, suppressing unneeded filters, factorizing filters used several times or putting expressions in variables). | ||
+ | |||
+ | Several optimisation may be considered [[(to be discussed)]] | ||
+ | |||
+ | === Disposition of facts no longer needed === | ||
+ | |||
+ | To process assertions all information of the instance must be accessible, except for facts for which all assertions' evaluations have been fired. For example, a fact being alone to bind to an assertion (e.g.: A > 0) does not need to be accessible for this assertion after it has fired. | ||
+ | |||
+ | If, for each fact considered, a reference count is initiated with the number of possible assertion's evaluation concerning this fact and decremented once such evaluation is fired, it would be possible to free the memory associated with this fact. | ||
+ | |||
+ | However, freeing memory for a single fact may have some disagreement: | ||
+ | * given the memory fragmentation, it may be suboptimal for languages that use a garbage collector like Java or C#; | ||
+ | * computing the possible number of evaluations may be difficult, considering implicit filtering and fall-back values; | ||
+ | * the memory consumption may be lower but the time taken to handle the reference count would increase the processing time. | ||
+ | |||
+ | === Slicing the instance into reporting units === | ||
+ | |||
+ | Some taxonomies like European banking and insurance supervisory taxonomies defined by EBA, EIOPA and other supervisors in Europe, use the concept of reporting unit. Reporting units allow: | ||
+ | * partial filing, to implement proportionality and materiality principles (small reporters file less than big ones and only significant information is reported); and | ||
+ | * conditional trigerring of business checks (aka XBRL assertions) corresponding to what has been reported, using "filing indicators". For taxonomies defined through templates, there is a correspondance between reporting units and templates. | ||
+ | |||
+ | Given a taxonomy, it is possible to determine which reporting unit(s) a fact belongs to, and slice large instances into smaller chunks easier to process. Each chunk corresponds to a reporting unit or a set of reporting units, if cross-reporting-units assertions are defined in this set. | ||
+ | |||
+ | == Reporting errors == | ||
+ | |||
+ | Big instances may lead to a large numbers of errors. Using test instances, some log files were several gigabytes in size. | ||
+ | |||
+ | To process large instances leading to, potentially, large number of errots it may be wise to restrict the log file to only data and use a rendering mechanism, like XSLT, to present human-friendly error messages. Table linkbases may be used to present the errors in templates. | ||
+ | |||
+ | But this practice only limit the size of log files and do not solve the problem. A mechanism stopping the process after a number of errors has been reached may be implemented. | ||
+ | |||
+ | == Rendering the instance == | ||
+ | |||
+ | Rendering large instances is also a challenge. | ||
+ | |||
+ | A large instance may exceed the capacity of spreadsheets and /or not possible to be fathomed as a whole by humans. Dynamic rendering tools like Web browsers or Business Intelligence technology may be used. | ||
+ | |||
+ | Big instances may also imply dynamicity in tables, i.e. variable numbers of columns, rows and/or sheets (corresponding to the x, y and z axis for the XBRL Table linkbase technology). These axis may correspond to dimensions with a large set of members or values, like countries, currencies, assets... For some members, no facts may be reported or the reported value may be not significant. | ||
+ | For "dynamic axis", it should be possible to present only significant information in a given order. The significance and the collation order may be determined by values for other axis. | ||
+ | |||
+ | [[To be developed]] |
Current revision
Contents |
Introduction
Several families of taxonomies have led to potentially large instances (e.g. more than a few tens of kilobytes, up to several gigabytes).
The taxonomy currently known as having this characteristic are:
- Taxonomy of Bank of Indonesia, for which an XBRL White paper has been published (http://www.xbrl.org/sites/xbrl.org/files/imce/lrg_instance_proc_indonesia.pdf);
- Solvency II taxonomies defined by EIOPA (European Insurance and Occupational Pensions Authority: https://eiopa.europa.eu);
- Basel III / CRD IV taxonomies, COREP and FINREP, defined by EBA (European Banking Authority: http://www.eba.europa.eu).
Note: the European taxonomies are intended to be used by all countries of the European Union, and more.
The size of these instances are typically due to lists of details for things like loans, financial products or assets.
Some tests have been made and led to difficulties.
The subject is tackled by the XBRL International, in the Standards Board and Best Practices Board and the topic has been discussed in the XBRL International conferences, during the 24th XBRL Conference in Yokohama (December 2012):
- The Challenges of Processing Large Instances by Ashu BHATNAGAR (XBRL International) and Michal PIECHOCKI (BR-AG): http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2Large%20instances%20session.pdf
- Large Instances Technology by Paul WARREN (CoreFiling): http://archive.xbrl.org/25th/sites/25thconference.xbrl.org/files/TECH2LargeInstances.pdf
The Specification working group is working on the subject:
- A Working Group Note has been published by XBRL International, proposing mainly to adopt a streaming solution and proposing adequate structure of XBRL instance: http://www.xbrl.org/WGN/large-instance-processing/WGN-2012-10-31/large-instance-processing-WGN-WGN-2012-10-31.html.
- An XBRL specification is being developed to specify additional rules for streamin XBRL instances: http://www.xbrl.org/Specification/streaming-extensions-module/PWD-2013-03-06/streaming-extensions-module-PWD-2013-03-06.html.
This Wiki is a forum where this topic can be freely discussed.
Types of issues
Several difficulties may happen at different stages when processing instances, when:
- loading the taxonomy
- generating the instance
- signing the instance
- transmitting the instance
- parsing the instance
- validating the instance
- checking business rules
- reporting errors
- rendering the instance
Loading the taxonomy
In some case big instances correspond to big taxonomies.
When a Data Point Model appears in instances (case of highly dimensional taxonomies), instances are bigger than for moderately dimensional taxonomies, where some dimensional aspects are hidden. This large set of dimensional elements leads to big taxonomy.
Sometimes, it is necessary to chop a taxonomy in several entry points to avoid too big DTS, this is the case of the COREP taxonomy which had to be chopped in four parts.
In the case of multi-lingual taxonomies, like the European ones, existence of labels in several languages also inflate the size of the taxonomy. Care must be taken to include only used labels in a given country (there are 24 languages in the European Union, plus Norwegian and Icelandic).
Generating the instance
The FRIS document puts constraints on the ordering of units and contexts that should appear before facts, but this rule must be relaxed because it hinders the streaming of the instances.
This aspect is covered by the Working Group Note.
Signing the instance
Typically, supervisors request the signing of the transmitted instances to fulfil integrity and non-repudiation.
Sometimes, it is also necessary to crypt the instance to fulfil confidentiality.
Security tool may have limitation and adequate tools must be used.
It could also be possible to sign or encrypt a compressed file but this would mean to have a canonical compression algorithm.
Transmitting the instance
Sending a multi-gigabyte document may cause difficulty but should be possible (technolgies exist to exchange video files of several gigabytes).
It is possible to transmit a compressed file that should be much smaller, due to the large compression factor of XML / XBRL files.
Parsing the instance
This aspect is covered by the Working Group Note.
Validating the instance
In this section, validation mean enforcement of the rules defined in XBRL 2.1 and XBRL Dimensions 1.0.
For the memory aspect, such a validation may be done fact by fact, with no need to keep the information in memory.
For dimensional validation, the context (or a representation of it) must be accessed, it is thus necessary to keep contexts-related information available.
Checking business rules
Business checks are typically exercised through assertions (defined by the XBRL Formula specifications).
This is a difficult point for XBRL processors that spend a lot of time for this task.
New XBRL specifications may be developed and used to optimise processing of large instances. For example, the Extensible Enumerations specification (http://www.xbrl.org/Specification/ext-enumeration/REC-2014-10-29/ext-enumeration-REC-2014-10-29.html) may be used to avoid having to check the validity of enumerations using assertions.
Software providers may propose optimisations in the expression of formula (for example, suppressing unneeded filters, factorizing filters used several times or putting expressions in variables).
Several optimisation may be considered (to be discussed)
Disposition of facts no longer needed
To process assertions all information of the instance must be accessible, except for facts for which all assertions' evaluations have been fired. For example, a fact being alone to bind to an assertion (e.g.: A > 0) does not need to be accessible for this assertion after it has fired.
If, for each fact considered, a reference count is initiated with the number of possible assertion's evaluation concerning this fact and decremented once such evaluation is fired, it would be possible to free the memory associated with this fact.
However, freeing memory for a single fact may have some disagreement:
- given the memory fragmentation, it may be suboptimal for languages that use a garbage collector like Java or C#;
- computing the possible number of evaluations may be difficult, considering implicit filtering and fall-back values;
- the memory consumption may be lower but the time taken to handle the reference count would increase the processing time.
Slicing the instance into reporting units
Some taxonomies like European banking and insurance supervisory taxonomies defined by EBA, EIOPA and other supervisors in Europe, use the concept of reporting unit. Reporting units allow:
- partial filing, to implement proportionality and materiality principles (small reporters file less than big ones and only significant information is reported); and
- conditional trigerring of business checks (aka XBRL assertions) corresponding to what has been reported, using "filing indicators". For taxonomies defined through templates, there is a correspondance between reporting units and templates.
Given a taxonomy, it is possible to determine which reporting unit(s) a fact belongs to, and slice large instances into smaller chunks easier to process. Each chunk corresponds to a reporting unit or a set of reporting units, if cross-reporting-units assertions are defined in this set.
Reporting errors
Big instances may lead to a large numbers of errors. Using test instances, some log files were several gigabytes in size.
To process large instances leading to, potentially, large number of errots it may be wise to restrict the log file to only data and use a rendering mechanism, like XSLT, to present human-friendly error messages. Table linkbases may be used to present the errors in templates.
But this practice only limit the size of log files and do not solve the problem. A mechanism stopping the process after a number of errors has been reached may be implemented.
Rendering the instance
Rendering large instances is also a challenge.
A large instance may exceed the capacity of spreadsheets and /or not possible to be fathomed as a whole by humans. Dynamic rendering tools like Web browsers or Business Intelligence technology may be used.
Big instances may also imply dynamicity in tables, i.e. variable numbers of columns, rows and/or sheets (corresponding to the x, y and z axis for the XBRL Table linkbase technology). These axis may correspond to dimensions with a large set of members or values, like countries, currencies, assets... For some members, no facts may be reported or the reported value may be not significant. For "dynamic axis", it should be possible to present only significant information in a given order. The significance and the collation order may be determined by values for other axis.