Big Data in Finance

By Maik Sowinski, Daniel Prelipcean, Vladimir Cucu, and Madeleine Sadler:


Big data can be defined as massive, cumulative amounts of data that can be analysed computationally and is usually used to find patterns and trends in human behaviours.  The concept of big data has been a steadily rising discussion on an international level.  Huge data collections have been accumulated through internet users, credit card use, and many other methods in order to optimise a company’s profit whether it be by personalised advertising or data sales.  While there are many positive aspects associated with data mining and the concept of big data, there are many controversies to consider in several areas where big data is utilised.

One such area is the financial sector.  The financial sector includes banks, the stock market, investment funds, insurance companies, and real estate; and it has been globalizing at an exponential rate.  This massive shift to a global market has lead to the need to revolutionise how companies interact with the consumer population.  Big data has provided many solutions as well as many problems.


Uses of Big Data in the Financial Sector:

Big decisions require big data:

Many companies are reporting that large financial decisions are often being still being made based on human judgement rather than machine algorithms.  Things like big investments, new product launches, and entrance into new markets all come with inherent risk.  Recently, people have realised that in order to limit risk they can use predictive analytics.  These high functioning analyses are data and computer based and allow companies to see how alternative choices will affect them before they make them, giving these companies a competitive edge.  These types of decisions also include credit score data mining that determines whether or not to give loans to specific individuals.

Protecting customers and stakeholders:

Fraud detection systems are becoming increasingly reliant on user data collection such as geolocation, past purchases, cluster information, etc..  In financial service companies such as credit card companies, these are necessary data logs that allow them to distinguish between normal and fraudulent use in order to better protect their customers.

Regulations and rules:

Today’s standard regulations require that big swap companies report and turn over all information that is relevant to each swap trade.  All of the information that goes into these trades requires a large amount of data storage and collection.  This has become a huge opportunity for big data as provides an extremely efficient way for companies to save money on data collection when complying with these regulatory rules.

Finding your market:

Market segmentation is a crucial part of a company’s marketing strategy.  Companies must identify the part of the consumer population they are trying to appeal to and how to get them to buy.  By utilising data mining companies are able to efficiently find patterns in consumer purchases to promote ad campaigns that reach their specific audience.  This type of segmentation saves capital by eliminating advertising that does not appeal to buyers as well as eliminating the human part of antiquated data collection methods.

Managing risk:

Every business comes with inherent risks.  It is the job of the finance branch to minimise these risks and maximise profits.  By using big data, companies are now able to predict and prevent loss as well as maximise gain.  Implementing centralised and integrated risk and management data platforms, they are able to quickly adapt to new requirements and needs within the company’s internal framework.

Personalising your product offerings:

What influences your customer to buy a certain product and how do you attain individual purchasing data?  Personal purchasing data collection can be accomplished through individual buying records.  Although this has many ethical repercussions, it is the most integral way of providing customers with personalised options, products, and services; saving them time and money and bringing them closer to life-time loyalty consumers.

High Frequency Trading (HFT) opportunities:

High Frequency Trading is a computerised trading method that exploits fleeting market inefficiencies to avoid errors and minimise losses.  And while there have been documented cases of algorithms running incorrectly and resulting in substantial losses, big data is helping to turn this somewhat unreliable strategy into a hyper-intelligent trading system that could potentially make companies millions in virtually any asset that can be traded electronically.  With the integration of real-time analysis, computers are able to compare today’s trends and values with historical patterns to determine maximum profit.

Commodity Trading opportunities:

Strategically placed sensors can now send real-time information about certain commodities.  This information allows for better management of production when external variables such as global demand, price, deflationary/inflationary pressure, and regulation change; all leading to improved returns.

Investment advice and customer retention:

Many consumers seek help when looking to make investments.  Prior to the rise of big data, financial and investment advisors had to use their previous experience and learned knowledge to give consumers the most up to date information in order for them to make the most informed decision however it was not nearly as accurate as it can be now.  Thanks to big data, advisors are now equipped with real-time data and sometimes automated big data algorithms can replace human advisors.


Key Drivers of Big Data in the Financial Sector:

Changing regulatory landscape:

The regulations in the financial sector have been made over a lot in the past years – and new regulations are coming up. Companies in the capital market are reacting to these changes in regulations and preparing for the future ones using Big Data programs.

Changing trading strategies:

The trading strategies used on the capital market have become more complicated – starting from paired strategies in the 1980s to today’s gaming strategies. Furthermore, the strategies used today include more and more unstructured data, such as Twitter-based trading strategies.

Increasing data volume:

The data volume in the capital market is growing larger. For years, companies used more advanced technologies. Now, as there are more and more data sources available, also more storage is needed.

-> Increasing complexity of processes

As the business processes become more and more complex, there is an increase in data.

Stringent timelines

It is nowadays important to find value in the data as fast as possible. The corresponding mean to today’s high frequency trading is real-time execution.

Need for handling speed, agility and control

Another main driver is the speed needed for retrieving, manipulating and analyzing data in order to gain control over the financial market dynamics.

Visibility of data

It is important that everyone can see what they need to see, especially regarding the circumstance that front and back offices are physically more distributed, but more closely integrated from a business perspective.

Need for integration of siloes system

Investment banks accumulate certain risks. They need a central platform to connect the risks and the analytical tools in order to have data from disparate systems connected. Thereby, they can gain an accurate view of potential risks.

Big Technologies in the Financial Sector:

Data grids:

These are an architecture or set of services that gives the users the ability to access, modify and transfer extremely large amounts of geographically distributed data across a network of servers.

Compute grids:

These allow the user to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel with the benefit that the computation will perform faster. For example, one of the most common design patterns for parallel execution is MapReduce and some Compute Grid vendors are GridGain and JPPF. The “must have” Compute Grid features are (for their explanations see references):

  • Automatic Deployment
  • Topology Resolution
  • Collision Resolution
  • Load Balancing
  • Fail-over
  • CheckPoints
  • Grid Events
  • Node Metrics
  • Pluggability
  • Data Grid Integration


Massively Parallel Processing (MPP):

It is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory. Usually they communicate using some messaging interface in an “interconnected” arrangement of data paths and in some implementations, up to 200 or more processors can work on the same application, such as decision supports system and data warehouse apps. An MPP system is also known as a “loosely coupled” or “shared nothing” system.

In-memory databases:

This technology puts the working set of data into the system memory, either completely, or partially. It also provides the opportunity to optimise the way data is managed compared to traditional databases on disk-based media. Thus, when all data is kept in memory, there is no need to deal with issues arising from the use of traditional spinning disks, and data can also be compressed and decompressed more easily, resulting in the opportunity to make space savings over the equivalent disk copy.



A NoSQL database encompasses a wide variety of different database technologies that were developed in response to the demands presented in building modern applications. It provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. When compared to these, the NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:

  • Large volumes of rapidly changing structured, semi-structured and unstructured data
  • Agile sprints, quick schema iteration, and frequent code pushes
  • Object-oriented programming that is easy to use and flexible
  • Geographically distributed scale-out architecture instead of expensive, monolithic architecture

Specialized databases:

Databases containing public information or material not proprietary in nature commonly appear on the World Wide Web. Specialized databases are indexes that can be searched, much like the search engines. The main difference is that specialized databases are collections on particular subjects, holding information that one would often not be able to locate using a global WWW search engine.


It is an open-source software framework for storing data and running applications on clusters of commodity hardware, also providing massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Its features are:

  • Ability to store and process huge amounts of any kind of data, quickly.
  • Computing power.
  • Fault tolerance.
  • Flexibility.
  • Low cost.
  • Scalability.


Real World Example:

What do you do in the case you need to book your next flight home, need to buy a perfect gift for your wife as the anniversary is approaching and at the same time order a pair of socks in order to keep yourself hot during winter time assuming you are a busy working man that barely has some free time during the week. I don’t know how you could do this in the past, but now the answer stands in a single word: credit card or more generally speaking banks.

The top German bank ranked by total assets as of September 2015 is Deutsche Bank. It is a German global banking and financial services company with its headquarters in the Deutsche Bank Twin Towers in Frankfurt. It has more than 100,000 employees in over 70 countries, and has a large presence in Europe, the Americas, Asia-Pacific and the emerging markets. In 2009, Deutsche Bank was the largest foreign exchange dealer in the world with a market share of 21 percent. The bank offers financial products and services for corporate and institutional clients along with private and business clients. Deutsche Bank’s core business is investment banking, which represents 50% of equity, 75% of leverage assets and 50% of profits. Services include sales, trading, research and origination of debt and equity; mergers and acquisitions (M&A); risk management products, such as derivatives, corporate finance, wealth management, retail banking, fund management, and transaction banking.

Thus due to the complexity of Big Data usage in the banking services, including Fraud detection and Security of personal data of customers, customer retention and individualized product offering; people like the one mentioned above can easily and reliably use the bank’s services in order to satisfy their personal needs.


References and Image Sources:

Big Data Enterprise Planning Infographic:

Introduction Round Table Photo:

Methods Graph:

Deutsche Bank:


Big Technologies in the Financial Sector References (one per each technology):


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s