<light-layout>
    <div class="__page-root">
        <div class="sections">
            <insight-header category="Big Data" name="Choreographing  a Data Architecture"
                summary="For those working with Big Data, initial results were generally modest and technically harder to realise than expected.">
                <div class="half-image insight-image tinted"></div>
            </insight-header>
            <insight-pamphlet>
                <insight-sub-section class="subsection" name="Requirements of Modern Data Architecture"
                    content="For early adopters of Big Data, initial results were generally modest and technically harder to realize than expected. Then, industry brought Machine Learning into the process. In theory, this involved ingesting closely-related disparate data sources into a Data Lake, analyzing the data to uncover fresh new ideas that delivered significant business value, and then providing simple, easy to use, production tools to enable people to act upon these new insights.">
                </insight-sub-section>
                <insight-sub-section class="subsection"
                    content="However, there are fundamental challenges involved in working with data that remain stubbornly unresolved today. These include, but are not limited to, incomplete data sets poorly captured across time; inadequate data capture and storage mechanisms; the complex interleaving and cleaning of disparate, inconsistent, patchy and time-based data sources; regular delivery of new or improved services and tools into production; sharing common information consistently both within the organization and across cross-industry supply chains; and establishing and maintaining trust across the entire infrastructure.">
                </insight-sub-section>
                <insight-sub-section class="subsection"
                    content="On top of all this, those working with data must accept that the ambition for a “single view of the truth” is unrealistic — data is simply a proxy for reality and different business problems require different proxies. When we analyze data, we’re interpreting someone else’s recollections. The building blocks of data projects must be dynamic and gradually evolving, rather than being completely defined before any business impact can be delivered.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding"
                    content="While it is accepted that today’s approach to data warehouses is unnecessarily limiting, there are practical ways to improve the data architecture, which can enable an earlier and more dynamic delivery of innovative solutions. Here are seven examples of how to go about this:">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="1. Minimize the complexity"
                    content="Minimize the complexity caused by building-upon, co-existing with or rationalizing current EDW (Enterprise Data Warehouse), MDM (Master Data Management) and ODS (Operational Data Store) tools and environments. While they do present significant limitations to the target data architecture we require, they are the starting point for modernization initiatives: that is where knowledge and expertise are concentrated, and they have valuable tool sets.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="2. Business-led initiatives"
                    content="Business-led initiatives should flow from right to left: from business value back to the appropriate data sources — which is not the usually the case today. The data architecture needs to facilitate this process of gaining an ever-improving understanding as the business initiative definition becomes clearer. Additionally, the same data must be applicable to a wide-range of use cases, each of which may have significantly different tolerances of accuracy and correctness, relative to each other.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="3. Knowledge capture"
                    content="Knowledge capture is likely to want to take a “highest business-value, horizontal slice” use-case led approach. We need the notation and tools to provide a natural, quantitative and consistent recording, linking and re-use of these discoveries to provide worthwhile business ontologies. While the notation and tooling are arriving, it’s vital to build up the expertise to understand, combine, extend and take advantage of the additional understanding it provides.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="4. The data architecture"
                    content="The data architecture needs to support continuous evolution of the system by providing improved automation capabilities. There are three key areas: firstly, the data ingestion, preparation and storage phase need to be well-layered so that downstream changes don’t always require upstream processes to be re-run. Secondly, the implementation of the machine learning systems should include a complete rebuild capability, including model serialization and the provision of model servers at scale. Finally, components should be meticulously crafted and loosely coupled, so that they can be tested and deployed completely independently of each other.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="5. Capturing data"
                    content="Capturing data with an event-based architecture may be a better way doing things than a traditional relational database-based, record-centric style. This allows natural, consistent, acquisition of data value changes over time, and facilitates the interleaving of data sets; something that is important to most machine learning models. This style of architecture may well become the new system of record for transactional business systems, because it naturally standardizes and simplifies increasingly onerous compliance requirements.">
                </insight-sub-section>
                <insight-sub-section class="subsection subsection-padding" name="6. Algorithm and hardware efficiency"
                    content="Algorithm and hardware efficiency has improved immeasurably in recent years. However, this trend will only accelerate further, and the data architecture must put us in a position to support future ideas and innovations with (hopefully) the minimum of compromise and re-work.">
                </insight-sub-section>
                <insight-sub-section class="subsection" name="7. Ensure Safety"
                    content="We need to ensure the safety of what is produced. This includes deep, standardized, diagnostic statistical capabilities, fortified by strong governance. We need to guard against data imperceptibly changing over time, leading to stale or hurtful models. Improved explanatory tooling is required to understand how models produce their insights, particularly with the latest Deep Learning techniques. Our systems need to recognize and warn against potential privacy and bias issues. The data formats and metadata management to underpin the above need to be determined up-front and subsequently committed to, and the loose-coupling mechanisms necessary to support regular and reliable releases need to be engineered with care. Otherwise, these matters can be addressed largely independently of each other, at a pace that emphasizes the client’s specific priorities. This iterative approach avoids the traditional huge up-front cost that takes too long to realize, offers an unclear payback, and is difficult to measure.">
                </insight-sub-section>
                <insight-sub-section class="subsection"
                    content="We’ll be covering these topics in more detail in future blogs and insights, but the advice above should provide a useful starting point for any organization exploring machine learning strategies — and asking how data architecture can be designed to deliver dynamic, innovative results.">
                </insight-sub-section>
            </insight-pamphlet>
        </div>
    </div>
</light-layout>
