Data Lake design and implementation

Client Needs & Objectives

As part of the implementation of a data lake for a Financial Crime department, our client needed assistance to design the framework and monitor progress of the implementation

We faced multiple challenges:

  • Poor knowledge of data in the bank
  • Few subject matter/data experts
  • Lack of common repository of conformed data

Our Approach

  • Align business and operations and define the business-benefit priority list
  • Build the business case and validate investment
  • Identify data needs and cartography the different data sources
  • Perform data quality tests
  • Design the Target Operating Model (mandates, process, controls to set up, recruitment needed etc.)
  • Monitor progress of the implementation:
    • Build a Hadoop data lake from all transactional / customer sources in the bank
    • Automate the discovery of the meaning / relationships in the data
    • Build conformed query datasets

Client Benefits & Main Results

Hard to get really good data scientists and retain them

- Use data analysts (easier to find) but restrict what it is they can query (so-called query-focused datasets)

- Leave the harder analysis to the most experienced people

Automation is vital – volumes are impossible to manage otherwise

You need to know the provenance and quality of the data at the point of ingestion if you plan to analyze it later, to gain any useful or reliable business insight

- You can find multiple needles in multiple haystacks if you create the correct data foundations