Banks have expanded their roles from Traditional banking to various verticals such as Wealth Management and Proprietary Trading. In doing so, banks have developed the need for sophisticated Market Data usages across various departments originating from similar feeds with minimal customizations. Each of these Market Data feeds need cleansing and validation, which requires resources from the Bank’s Operation teams resulting in time inefficiencies, manual errors and increased costs.There are also legacy systems that lack the capability of sourcing required data from multiple formats such as csv and xml files and cannot directly interact with other systems. IT teams require considerable time, human and financial resources to maintain these systems related to Market Data activities. Currently, they lack agility in adapting to new and modified source and distribution data format changes resulting in direct negative business impact. Sourcing, Data Storage, Cleansing, Enrichment and Distribution of market data requires agility and rapid time to market in order to be of direct business value to the continuously changing sophisticated world of Finance.

All of the above challenges are already well known by banks,therefore, it’s no surprise that the Basel Committee on Banking Supervisor formally identified the challenges in their paper Principles of data aggregation and risk reporting issued in January 2013. The Basel paper clearly specifies 14 principles that banks need to review and adopt as part of their effort to be compliant. Since the Global Financial Crisis (GFC), the focus on data has increased with data-centric regulatory examinations. These examinations require financial institutions to prove data quality, data reconciliation or demonstrate traceability of risk data information from source to final reports (www.deloitte.com-assets-riskdata_pdf_120913).

ApiroRates is a versatile and high performance software platform that facilitates an organisation’s ability to collect, validate, cleanse, enrich, audit, process and distribute market data, gathered from a variety of sources, including automated systems and customizable specialty feeds.The qualified data, is available for intraday or end of day distribution and analysis.

ApiroRates bridges the gap between business and technology and also provides the solution for Basel principles 2-6 (refer Appendix A) specifically for Market Data.

Furthermore, a second paper, issued by the Basel Committee on Banking Supervisor in December 2013 to document progress to date with respect to implementation of these 14 principles, revealed that the least progress has been made with principles 2, 3 and 6 which is exactly where ApiroRates is specifically positioned for Market Data.

Market Data refers to indices, pricing and other trade related information for financial instruments such as Bonds,Treasury Bills, Foreign Exchange, Equities and Commodities as reported by trading venues such as a stock exchanges and brokers of OTC platforms.


1. Sourcing data from multiple data feeds

2. Sourcing multiple data formats

3. Configurable transformation of data , eg. xml, csv, json

4. Configurable raw data violation rules (zero, negative)

5. Validation

6. Cleansing

7. Enrichment

8. Error reporting

9. Configurable market data aggregation algorithms (average)

10. Configurable aggregation violation rules, eg. stale, threshold min. source

11. Post aggregation cleansing rules (roll forward intra/EOD, re-source)

12. Post aggregation processors (corporate actions, fill gaps (bonds) against an index)

13. Holiday Calendars

14. Corporate actions

15. Corporate actions sourcing

16. Processing over 50,000 rates in few seconds

17. Sourcing/storing/adjusting/distributing historical data of 10 years

18. Approval workflows

19. Audit

20. Configurable APIs

21. Requires only commodity hardware

22. Scheduling for sourcing and distributions

23. Distributions (manual/auto) to downstream systems

24. Distributions in various formats

25. Rate Collections

26. Advance exports to support calculations, eg. regression, correlation, yield curves

27. Advanced, flexible and powerful configurable algorithms.


1. Scenario set group maintenance

2. Risk Factor tagging in scenario set

3. Portfolio position file replication

4. Scenario set perturbation settings

5. Risk factor proxy tagging

6. Risk factor gap fill & back fill

7. History computation based on a proxy index

8. Back fill and gap fill based on proxy index

9. Daily historical scenario set of 252 data points

10. Weekly stress scenario set of selected date range of 252 data points

11. Daily historical scenario set for selection asset class (RCC)

12. Rate drill down


The above diagram illustrates the recommended phased approach of deploying a new instance of ApiroRates.

Phase 1: The purpose of the first phase (30-90 days) is to deploy ApiroRates and ensure all of the data feeds are configured and daily/intra/historical sourcing is in place for both manual and scheduled triggers. Data aggregation, validation, cleansing, enrichment, audit and distributions to downstream systems is established in such a way that the ApiroRates platform and its flexible, configurable algorithms morph into a system which complies not only to the global regulatory standards but also to the local and internal rules and regulations constraining a specific bank.

Phase 2: It is recommended to allow the system to run in production for 1-3 months in order to undergo a number of monthly internal compliance and audit reviews aiming to fine tune the data aggregation, validation and enrichment algorithms to comply with the instructions of internal regulators and auditors, e.g. threshold algorithms may need to be adjusted. Phase 2 is important because if risk calculations are based on inaccurately aggregated market data this can lead to misleading results such as incorrect end of day positions and stress test scenario results.

Phase 3: The ApiroRates PLUS features are deployed in production immediately after any lessons learned from phase 2 are taken into consideration. These features are specific to risk.They facilitate the efficient and effective generation of market data scenario generation sets, risk factors sets, stress testing sets etc. in order to empower risk systems to perform their calculations based on aggregated, validated, cleansed and enriched set of rates.


This section provides a number of examples for each asset class and the challenges currently faced by financial institutions. The purpose of this section is to re-enforce the argument that, in order to achieve the compliance objectives for Market/Credit/Operation risk, we require aggregated, validated, cleansed and audited market rates.


Typical examples of the attributes maintained by ApiroRates for Equities are:

• Equity Spots: bid, ask, last, volume

• Equity Futures: settlement date, underlying spot price, future price, bid, ask, volume

• Equity Options: strike, bid, ask, last, underlying price, volume, option price, option type, option expiry date


• A bank is sourcing rates from Bloomberg for a specific equity. The rates management team is distributing historical rates to risk systems. After a share split, the rate symbols have changed and the downstream system does not have a way of mapping the old symbols to the new ones.

• Challenge: If data integrity is not managed properly the downstream risk systems will treat the new symbol as if it has no history.

• ApiroRates: Solves this problem by automatically adjusting all affected and new rates to ensure referential integrity.


• A bank has a position against a specific equity. The rates management team is distributing historical rates to risk systems to run correlations or other calculations. The company in reference was in a trading hold for 5 days.

• Challenge: The next time rates are distributed to risk, there will be a 5 day gap in the historical values distributed.

• ApiroRates: Solves this problem by using configurable post-processors that fill these gaps using specified proxy index, eg. NIFTY


• At the end of every month the bank needs to update the tenor and maturity of all equities it has positions against.

• Challenge: The product quality team has to manually update all equities at least once a month. This is a time consuming, manual and error-prone approach difficult to audit.

• ApiroRates: All updates are automatically done by the system and referential integrity is maintained. All relevant attributes, including maturity, are automatically updated and the relevant audit entries are recorded.


• A stock is traded in multiple stock exchanges, E.g.TRS in both NSE and BSE.

• Challenge:A bank has a position againstTRS. Which value of the two exchanges should they use for the EOD position against TRS?

• ApiroRates:Will aggregate the two EOD values into one price. The out-of-the-box algorithm is using medium calculation but this is fully configurable and different methods can be used depending on the instrument and complexity of the Asset class in reference.


• Example:When COAL India got listed in the market (assuming equity dealing desk applied for this issue and where allotted some quantity).

• Challenge:The bank needs to closely monitor the Market Risk exposure associated with this position but there is no history for this new rate.

• ApiroRates:Allows the users to select a proxy index such as a sector index or the country index (NIFTY) to automatically generate historical rates when this is added in the system. One of the ways this is achieved is by using the historical dataset of the proxy index and running a linear regression of daily returns to create historical rates for COAL India.This allows the system to generate scenario sets and, consequently, a VaR for this position that does not actually have real historical rates.


A typical example of the attributes maintained by ApiroRates for Bonds and Treasury Bills are:

• Bonds Spot: bid, ask, last, coupon, rating, maturity, yield, face value

• Futures Bond: bid, ask, last, future expiry, future settlement, date, volume

• Options on Bonds: strike, option price, option expiry date, ask, last, bid


• A risk team needs historical values to run historical simulations such as:

• Historical Simulation

• Monte Carlo

• Parametric – Analytic Method – Delta normal

• Challenge: During holidays that are non-trading days there will be gaps in historical rates

• ApiroRates: Will automatically fill in these gaps by using:

• Roll Forwards from previous days

• Calculations against a proxy index

• Interpolation

• Post aggregation processing using average of bid and ask



A bank has taken a new exposure against an FX Futures contract. The bank needs to generate market, credit and liquidity risk reports using historical simulations, Monte Carlo simulations or other calculations and build hypothetical P&L and compute VaR at Portfolio Level.

Challenge: It is currently difficult, manually intense and error prone to source historical rates because the data mostly resides in spreadsheets. Furthermore, it is challenging to include the new exposure and calculations in a portfolio level report.

ApiroRates: Has specific features for historical sourcing that is immediately ready for distributions to downstream risk and other systems. Furthermore, it has a concept of Rate Collections ensuring that the new exposure can be included and a single distribution to downstream risk system can include all exposures belonging to the portfolio.


During crisis situations and requests to meet supervisory queries, risk teams and risk systems may require different sets of data at different times.They may require daily historical data points of one or more risk factors for ad hoc date ranges. It is extremely important for these data sets to be readily available but also aggregated, cleansed, accurate and audited, otherwise the risk calculations will be based on inaccurate data, which will lead to inaccurate risk reports.


• During supervisory reviews and queries, a risk system is required to run a stress test scenario for a specific date range for a number of risk factors. For example, we need to run a historical simulation on considering the dates from 01 June 2007 to 01 June 2008.

• Challenge: The stress test scenario is likely to include a number of rates of different asset classes.This information currently sits in different systems or spreadsheets and they have to be manually extracted and imported into risk systems to perform calculations.

• ApiroRates: Allows, in a single screen, to select multiple risk factors and a date range for 10 historical years (over 3,500 data points for each risk factor – there is practically no upper limit in the system). It also allows the distribution of this dataset to downstream risk systems in predefined format such as csv, xml or even the proprietary format of the downstream system.These distributions can be ad hoc and manually triggered or scheduled for automatic distribution on a daily, weekly, monthly basis or any desired recurring cycle.


• A risk analyst within the bank needs to calculate the spreads between multiple yield curves.

• Challenge: In order to do this, the individual needs to manually extract the information from numerous spreadsheets and feed this into a system to plot and analyze the yield curves.

• ApiroRates: The same screen used above can be used to select different risk factors and to select multiple bonds and distribute them as a single CSV file directly into the downstream system.


• A risk team requires 252 days of historical data in order to run a specific correlation equation. Furthermore, there is a need to use different shocks that will be used for the specified simulation.

• Challenge: This is another example where the whole process needs to be manually carried out by using spreadsheets. This lacks agility, is difficult to audit and there is also no guarantee that the rates used have been validated and they are accurate.

• ApiroRates: The same screen that is used to select multiple risk factors and a date range also allows the introduction of shocks for each risk factor just before distributing to risk.


Banks currently source and maintain attributes of market data such as bid, ask, close etc. However, there is also reference use to classify data. Faulty reference data can permeate errors systematically within the organization (Deloitte paper). Examples of reference data are equity issuer, issuer name, group name or Legal Entity Identifier (LEI) that is being embraced as a global standard.

Challenge: The current manual process, which involves considerable use of spreadsheets, forces product quality operations teams to keep reference data in separate sheets or reference tables or within the same spreadsheets as the rates. The second approach has the site effect of resulting in complex, hard to read spreadsheets which, consequently, may contain error-prone data and be hard to audit and hard to assure its quality.

ApiroRates: Have specific configurable properties assigned to each rate to ensure it is readily available for the rate they refer to. Furthermore, these properties can change at runtime and the relevant audit entries are recorded when attributes are added, removed or modified for a rate.



Rate Lifecycle diagram and configurable algorithms

This diagram illustrates the lifecycle of a rate and a few examples of the configurable algorithms that are specified via scripting languages to adjust the overall system’s behavior at runtime with zero downtime.

ApiroRates has numerous configurable algorithms for different phases in the system such as validation, cleansing, aggregation, data enrichment and pre-distribution.

For example, the diagram above illustrates that the deployed instance has two Raw Violations (zero, negative) and one post source processor that enriches the rates by automatically calculating the mid-attribute, using the values of bid and ask. All these algorithms can be added, removed or modified and they can be applied to each and every rate and each and every attribute.

Examples of algorithm categories:

• Validation: Zero validation.

• Aggregation: Average calculation

• Data Enrichment: Mid-calculation or historical gap fill

• Pre distribution: Shocks for risk factors


ApiroRates was built using flexible and extensible data structures and processes. Making a system flexible, by necessity, implies loosening constraints and, therefore, compromising manageability.This may be considered a universal law not specific to ApiroRates – you rarely get something for nothing.The extra performance gains afforded NoSQL systems, as an example, are explicitly won by bargaining against the tight consistency guarantees espoused by traditional relational databases – as outlined by what is termed ACID compliance. ApiroRates found an ideal balance between the two goals of extensibility and manageability.

Much of the explosion in data volumes over the previous years is of an unstructured form, and process and analysis mechanisms have evolved in lockstep to cater to this. Hence, the proliferation of what are termed ‘Post SQL’ storage solutions. ApiroRates follows this logical trend.

On top of being built on the leading open source NoSQL database, MongoDB, a common concept of scriptability pervades the ApiroRates system. Both the implied extension points of the system (such as field validators) as well as its core internal components are implemented using a scriptable base abstraction, conferring customisability to all components of the system.

This scriptable abstraction allows reimplementation of components in any number of ways, including in any JSR223 Java Bean Scripting Framework script (Javascript/Ruby/Python etc.), a Spring bean, a precompiled Java class in a custom JAR file, and even Java source code provided as raw source code.With the release of Java 8, an excellent choice of scripting language is the newly released Nashorn javascript engine, exhibiting performance way beyond its dated replacement – Rhino.

Reasonable defaults are provided out of the box for core systems and extension points. However, from the outset of design, it has been explicitly assumed that we cannot foresee all specific client requirements. Furthermore, we recognise that different clients have conflicting requirements. Rather than pick winners, we have allowed for the enhancement, customisation or replacement of all core systems, possibly at runtime, without a formal system redeployment lifecycle. Obviously, the components should be tested in a development deployment first and mechanisms for automating the deployment through web services may be made to streamline changes away from error prone manual interaction with a graphical interface.

The result of this customisability is that there should never be an instance where, for example, our authorisation or authentication systems are incompatible with client requirements.We can always provide a customised, or in the worst case, a complete replacement specific to client requests. Furthermore, the change may be made in production, through a graphical interface or web service, with zero system downtime.

A distinction should be made between core services and normal extension points.

Although both are scriptable, the extension points are expected to be defined by clients.They are defined with a more stable interface and the pipeline processing prevents errors from one bad extension point propagating throughout thesystem. Anexampleofanextensionpointcouldbeavalidator.Abadlyimplementedfieldvalidator,for example, will cause violations to occur in all rates that are configured to use the validator. It will not prevent other rates from being processed. Core services, on the other hand, comprise base functionality of the system.They are composed against internal interfaces more likely to change between releases and require detailed knowledge of the internal working of the system.An example of a core service would be the Aggregation Process Service.Written badly, it may prevent ANY useful system activity. Badly implemented core services have the ability to cause far greater system instability in a number of ways and are best implemented via consultation with, or deferred to, a qualified Apiro representative.The reimplementation of core services without consultation with Apiro may limit the assistance Apiro can provide in system support.

At a lower level, we allow clients to extend the application by providing their own jar files and spring bean definitions into the core of the system. Prudence should be urged however.The eventing system allows easy messaging possibilities and consideration should be given to considering a required enhancement primarily an integration, rather than development task.

Note that this scriptability abstraction does not necessarily imply inferior performance. Since we actually allow Java source code to be treated as a script, some implementation options will run at full native speed as if they were compiled as java source into the original application.All of these various implementation options are hidden by a runtime reification process, which presents the components to the core system in the same manner, under a standard interface specific to each component type.

Another fundamental concept of ApiroRates is the eventing model. Many occurrences within the core of the system generate events. Of course, the event processing service itself as well as most of the services that generate events are scriptable components, so we could theoretically add more events if there is a system action we didn’t provide for. In addition, custom events may be defined which may then be emitted by custom client scripts for actions, which are completely unforeseeable by Apiro.

The eventing system makes ApiroRates an easy integration target. In the instance where custom processing must be made, or ApiroRates events must synchronise with existing systems, the eventing system allows this to happen.These base events are intra-system, however, custom event listeners may be created via the scriptable interface to affect any desirable behaviour. Events may be handled via scriptable implementations, which may then emit inter-system messages of any format through any arbitrary transport mechanism (SOAP / REST / JMS / CUSTOM TCP) to other enterprise systems.The fine granularity of the events allows synchronisation down to a fine detail. For example, an event listener may be created to be notified whenever a new aggregation for a rate is completed.This may be used to signal and synchronise external systems. In the event where the transport mechanism is a persistent queue, this channel may be lossless, providing guaranteed dispatch.

The ApiroRates graphical console was developed using state-of-the-art modern web-based application development.The user interface is a client side HTML5 application utilising a standard REST API to integrate with the standalone server.A websocket channel is also utilised for bidirectional real-time telemetry, readily observable when using the application.A sophisticated cached data model ensures exemplary performance. Pagination of a rate dataset comprising hundreds of thousands of rates occurs in single digit milliseconds. The application feels like much like a thick client due to the facilitation of these latest browser features.

ApiroRates Engine Diagram


Data Dictionary – explicit data types consisting of a name and data type (example OPEN_PRICE of type Decimal) aggregated to create rate schemas.

Rate Schema – an aggregated collection of Data Dictionary elements which place some level of constraint and consistency on rate data.

Raw Rate – raw data sourced from a data provider, processed into a standard format compliant with associated Rate Schema.

Aggregated Rate – a rate created from multiple source origins, processed and validated to a higher level of trustworthiness compared to raw rates.

Historical Rate – as aggregated rates are collected, this cleansed data becomes valuable for historical analysis. A system lifecycle process keeps a record of aggregated rates as each day is closed off, in time generating a historical record of rates.

Rate Feed – a system that sources rates data from anywhere and presents it.

Raw Rate Sourcing Algorithm – the mechanism by which raw rates are sourced by the system – by invoking raw rate feeds and processing.This is a scriptable core component which may be enhanced or replaced at runtime for specific client needs.

Aggregation Algorithm – the algorithm whereby raw rates are aggregated to form a golden rate.This core component is scriptable and may be modified or re-implemented.

FieldValidator – a standard mechanism to validate individual fields.This is a scriptable component. Existing and new validators may be implemented/re-implemented at runtime.

Event Listener – code that is executed to react to system events.This is a scriptable component and existing/new listeners may be implemented or re-implemented at runtime.

Datasink – an abstract concept which takes a data payload and ‘sinks’ it. An example may be an ftp destination, an email address.This is a scriptable component interface.

Rate Collection – a collection of rates that may be emitted to a destination.An example could be a Bloomberg update.This is a scriptable extension point and clients may provide their own implementations.

Rate Distribution – the rate data associated with a rate collection, formatted in a particular fashion, e.g. CSV, and emitted to a nominated Datasink.This is a standard extension point. New rate distributions are created via the user interface.

Cross Rate Collection – a tabulated historical collection normally used by risk personnel to seed MS Excel spreadsheets or other tools for analysis.An example could be the last 260 days of the price of gold, the AUD/USD exchange rate, the Australian ALL Ordinaries index ordered in tabular form.

Cross Rate Distribution – The combination of a rate distribution, a distribution format, and a nominated Datasink.



Data architecture and IT infrastructure – A bank should design, build and maintain data architecture and IT infrastructure which fully supports its risk data aggregation capabilities and risk reporting practices not only in normal times but also during times of stress or crisis, while still meeting the other principles.


Accuracy and Integrity – A bank should be able to generate accurate and reliable risk data to meet normal and stress/crisis reporting accuracy requirements. Data should be aggregated on a largely automated basis so as to minimise the probability of errors.


Completeness – A bank should be able to capture and aggregate all material risk data across the banking group. Data should be available by business line, legal entity, asset type, industry, region and other groupings that permit identifying and reporting risk exposures, concentrations and emerging risks.


Timeliness – A bank should be able to generate aggregate and up to date risk data in a timely manner while also meeting the principles relating to accuracy and integrity, completeness and adaptability.The precise timing will depend upon the nature and potential volatility of the risk being measured as well as its criticality to the overall risk profile of the bank.This timeliness should meet bank-established frequency requirements for normal and stress/crisis risk management reporting.


Adaptability – A bank should be able to generate aggregate risk data to meet a broad range of on-demand, ad hoc risk management reporting requests, including requests during crisis situations, requests due to changing internal needs and requests to meet supervisory queries.