𝐄𝐍𝐓𝐄𝐑𝐏𝐑𝐈𝐒𝐄 𝐃𝐀𝐓𝐀 𝐈𝐍𝐓𝐄𝐆𝐑𝐀𝐓𝐈𝐎𝐍
𝐏𝐑𝐈𝐌𝐄𝐑 - 𝐄𝐍𝐓𝐄𝐑𝐏𝐑𝐈𝐒𝐄 𝐃𝐀𝐓𝐀 𝐈𝐍𝐓𝐄𝐆𝐑𝐀𝐓𝐈𝐎𝐍
There are several approaches to Enterprise Data Integration on existing enterprise landscapes, shown below in the order of increasing cost for the business.
Higher levels also bring in scalability and performance. On the other hand complexity increases, as well as setup and maintenance effort. Like with anything, it's about hitting the optimal spot for your particular business scenario.
𝐍𝐨𝐭𝐞: for organizations fortunate enough to design an enterprise architecture from grounds up (or completely revamping an existing architecture) consider strategies like Data Virtualization or Data Lakes which eliminate the need to physically move data, leading to the Single Source of Truth (SSOT) concept.
𝐋𝐞𝐯𝐞𝐥 1 - 𝐌𝐚𝐧𝐮𝐚𝐥
A person in a data manager roles controls the integration. For example by using custom code or other low level means (like files) to connect sources to targets.
Pros: Low cost, Flexibility.
Cons: Scalability issues, Error prone.
𝐋𝐞𝐯𝐞𝐥 2 - 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧
Using applications to directly access and manipulate data from various sources and targets. For example SQL scripts, Data import/export utilities, Database Replication, Data Virtualization, Message brokers, Event driven architecture.
Pros: Simple, Reuses existing tools.
Cons: Scalability and Data Quality issues, Complex to setup and manage.
𝐋𝐞𝐯𝐞𝐥 3 - 𝐌𝐢𝐝𝐝𝐥𝐞𝐰𝐚𝐫𝐞
Using specialized software that connects applications and transfers data. Optionally it can also transform and cleanse data. For example Data Integration platforms or tools like ETL, ELT, CDC.
Pros: Scalable and performant, Unified interface to multiple sources and targets, Handles complex data transformations (mostly true for ETL tools).
Cons: Complex to setup and manage, Requires additional Software/Hardware, More expensive.
𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 level integration tends to work well in hybrid environments (on-prem + hybrid cloud).
𝐌𝐢𝐝𝐝𝐥𝐞𝐰𝐚𝐫𝐞 works well when integrating legacy with modern applications using either on-prem or private/public cloud, although availability may me limited. For example ETL tools like Google Data Fusion, AWS Glue or Azure Data Factory are only available on the respective public clouds.
There's no unique/best approach, it needs to be taylored to the specific scenario and requirements. Don't forget to read opinions from everyday users and try to ignore the marketing hype. When using a public cloud product, don't forget to factor in the use cost as it tends to add up quickly with large data transfers.