Big Data

How to Build Real-Time Data Pipelines From Traditional Systems to Modern Platforms

April 15, 2021

Precisely Editor

Most modern enterprises are running multiple complex systems, including a mix of on-premises and cloud software, often deployed in various locations around the world. Typically, organizations must grapple with a host of different data models and nomenclatures, as well as adjust to a constantly changing landscape in which new systems are brought online, outdated systems are retired, or corporate acquisitions or mergers call for the integration of existing ERP, CRM, or similar business management systems. Building real-time data pipelines can help address these challenges.

For established organizations, traditional systems such as the mainframe present a particular challenge. Very often, the businesses running such systems have invested heavily in existing code. Understandably, many in those organizations are hesitant to walk away from those established investments. After all, the process of recreating complex systems from scratch is risky, expensive, and potentially highly disruptive.

Yet systems like the mainframe present a particular challenge because they often operate as functional silos. Integration is difficult given the mix of complex data formats, hierarchical IMS databases, VSAM files, and COBOL copybooks, among others. Most integration tools are poorly suited to tackle the data access challenges that are unique to mainframe environments.

This leads to a situation in which organizations must struggle to roll out new innovations and access to information may be limited. Connecting cloud-based systems to the mainframe can be cumbersome and time-consuming. Efforts to build out advanced analytics or AI and machine learning may be hampered by integration bottlenecks that make it difficult to gain real-time visibility.

The real-time advantage

Real-time access to information is becoming increasingly important. In an age of connected devices, clickstream analysis, and the rapid flow of information, it is no longer acceptable to wait for an overnight update to get the analysis you need. Fraud detection in payment processing, for example, relies upon the early identification of anomalies and quick action to resolve questionable charges. Too many card declines will lead to customer dissatisfaction. Too few will result in fraud losses.

Monitoring critical systems, likewise, relies on real-time visibility to large volumes of data. In many cases, it is essential to identify potential problems quickly, especially when a rapid response may be necessary to avoid system failure.

To achieve real-time visibility, organizations must have tools that can keep data in sync without overloading networks or adversely affecting database performance. Integration tools must be robust enough to handle situations in which connectivity fails. This calls for automatic restart capability, with a guarantee that no data will be lost in the process.

Watch our Webcast

Streaming IBM i to Kafka for Next-Gen Use Cases

To learn how to unlock the potential of your IBM i data by creating data pipelines that integrate, transform, and deliver it to users when and where they need it, watch our webcast.

Watch

Key challenges for integrating with traditional on-premises systems

The primary challenge of integrating traditional systems with modern cloud environments arises from the complex data formats previously mentioned (ISM, VSAM, COBOL copybooks, etc.). Experienced mainframe programmers who understand these formats are getting harder to find, particularly as many of them are approaching retirement age. Without a workforce to replace them, the talent shortage is growing in severity.

Integrating a wide array of modern systems, likewise, presents a challenge when it comes to hiring and retaining talent with the right mix of skills. Fortunately, the right enterprise-grade integration tools can encapsulate the complexities of data integration, streamlining and simplifying the entire integration process.

Given the diversity of systems that most enterprises must manage, it is important to select integration tools that allow for low-code or no-code integration. It must be easy to add new data sources and targets quickly. It’s important to have prebuilt connectors for multiple endpoints, including:

Streaming platforms such as Apache Kafka, Cloudera, and Amazon Kinesis
Relational databases
OLAP data warehouses
Big-data analytics platforms
Tools such as Hadoop, Hive, Snowflake, and others

A rich array of prebuilt connectors ensures that future needs can be met without excessive additional investments. Precisely’s “design once, deploy anywhere” capabilities provide for efficient use of scarce high-skilled IT resources.

A case study in innovation

A large European retailer and distributor of building materials was seeking to integrate and extend its core ERP system, which was running in an IBM AS/400. Unfortunately, the company encountered significant bottlenecks when trying to integrate its ERP system with external data sources and targets.

The company sought to gain agility and increase innovation by modernizing data access and presenting a unified platform with which new applications and services could be easily integrated.

Rather than scrapping its core business information system (that is, the ERP system running on an AS/400), the company decided to replicate its ERP data to a cloud-native database with real-time data pipelines. If it could meet the right conditions for performance, reliability, and security, this would provide the company with a common database that acts as a central repository for all the organization’s data.

More importantly, that database could serve as a central business platform for other systems needing to be tied into the company’s operations and analytic capabilities. Rather than rebuilding its ERP systems in a cloud-native environment, the company was able to build real-time data pipelines between its existing systems and the cloud. This preserved the company’s existing investments, prevented it from having to reengineer its core business systems and processes, and helped the company avoid the inevitable disruption associated with a major ERP implementation project.

The team built the streaming data pipelines using Precisely Connect CDC, feeding real-time ERP information to Apache Kafka, which replicated changes to its native-cloud data store. Just three hours after starting the setup process, the IT team was replicating live data from the company’s ERP system to the cloud in real time. Connect CDC dramatically shortened the time to value by making the process of connecting mainframe and cloud data sources quick and painless.

Connect CDC also provided the company with the flexibility to scale up performance without increasing costs. When the company first began to replicate transactions from ERP to the cloud, the system was only handling about 200 messages per second. Simply by adding two lines of code to its Connect CDC configuration, the team increased that performance to 12,000 messages per second. The flexibility and scalability of Precisely Connect CDC proved to be a key element in the project’s success.

With Connect CDC, businesses have the power to build data pipelines and easily share application data across the enterprise. With the ability to quickly add new data sources and targets, Connect CDC gives enterprises the agility they need to roll out innovative new initiatives quickly and with confidence.

Watch our webcast, Streaming IBMi to Kafka for Next-Gen Use Cases, to learn how to unlock the potential of your IBM i data by creating data pipelines that integrate, transform, and deliver it to users when and where they need it.