Blog > Big Data > Data Science in Healthcare: The Integration Imperative

Data Science in Healthcare: The Integration Imperative

Authors Photo Precisely Editor | March 11, 2021

Since early 2020, public health statistics have captured the attention of the public. Press reports provide daily updates on COVID-19 case counts, tests administered, hospitalizations, recoveries, and fatalities. For the first time, large numbers of ordinary citizens are seeking to better understand the progress of the disease, its primary risk factors, and our common fight against it. Cloud storage, data integration, and advances in artificial intelligence have propelled analytics to its current position as a primary tool in the fight for health and longevity.

For the scientific community, data analysis is nothing new; it has been a critical element in the scientific method for hundreds of years. However, our capacity to process information efficiently and effectively, has accelerated dramatically. Big data analytics have opened up exciting new opportunities to understand disease mechanisms and discover new ways to treat injuries and illnesses. As medical providers, insurers, and pharmaceutical companies strive to improve the quality of patient care while reducing costs, big data has a lot to offer.

This opportunity has given rise to a new breed of healthcare organizations focused exclusively on collecting and analyzing medical information. One such organization, a leading healthcare and pharmaceutical data provider, has been working with medical data for nearly a decade, building impressive analytics capabilities and helping organizations throughout the healthcare value chain to deliver better patient outcomes with greater efficiency.

The company’s existing systems were built around a traditional Oracle relational database environment. Those databases were updated daily from the company’s legacy systems and external sources. The company’s data warehouse was, in turn, updated weekly from the Oracle databases. While that system was functional, it had a few drawbacks. Company leaders knew that unless they upgraded their systems to deliver greater speed, agility, and flexibility, they ran the risk of falling behind on the innovation curve.

Because the company’s existing systems relied upon a fixed schedule of daily and weekly updates, there was a considerable delay between the initial receipt of that data and the time it eventually became available within its analytics platform. In a competitive environment where customers increasingly expect near real-time results, that presented a significant problem.

Also, the company’s data warehouse lacked flexibility. If a new type of analysis was to be performed, a new schema was required. That, in turn, meant that a database administrator would have to design and implement the change. Very often, the company experienced a backlog of such change requests, which resulted in further delays.

Read our case study

Symphony Health case study

Symphony Health provides data science for the Healthcare industry. They built an optimally efficient process to minimize data latency, reduce costs, and provide usable data to data scientists, and results to customers. Now, instead of analysts waiting days, data is available for analysis within minutes of its arrival. Read the case study to learn more.

Achieving scalability and agility with Hadoop

The company decided to transition its data management and analytics processes to Apache Hadoop. In doing so, the company opened up a range of big data management and analytics capabilities that were previously unavailable to it.

The company also empowered its analysts to design and implement schemas directly within Hadoop, rather than relying on a database administrator to do the job for them. And because Apache Hadoop is open-source, the company was able to dramatically reduce its costs, both for proprietary database technology and for the specialized hardware needed to run it. Instead, the company moved to industry-standard commodity hardware, reducing storage costs, hardware, and software license fees significantly.


However, a key barrier to success still remained. The company’s existing data integration processes were too slow, largely because they required data to be processed on an edge node before loading. That was too inefficient, and the company preferred to leverage the parallel processing capabilities of Hadoop.

The healthcare and pharmaceutical data provider chose to redesign its ETL processes around Precisely Connect and Hadoop. Connect has enabled the company to scale back its use of Oracle databases, resulting in substantially lower hardware and software costs. In addition, the company can now handle incoming data using Hadoop’s parallel processing capabilities. Connect license fees are lower than those of the legacy integration product the company was using, so switching resulted in even more cost savings.

Because Precisely Connect can integrate with multiple products and platforms, analysts at the healthcare and pharmaceutical data provider now have the flexibility to push data to analytics tools that might otherwise be unavailable to the company. For example, with prebuilt connectivity to Amazon Redshift, analysts can deliver results to their clients on AWS or other cloud platforms.

That flexibility can easily be extended to additional platforms, given Connect’s versatility and predefined integration capabilities for Cloudera, Databricks, Snowflake, Azure Synapse Analytics, Apache Spark, and more. Likewise, source data may reside in a multitude of different systems and databases, including mainframes, conventional relational database systems such as Oracle, SQL Server, MySQL, or DB2, or in enterprise data warehouses such as Teradata, IBM Netezza, Vertica, or Greenplum.

The net result is that the healthcare and pharmaceutical data provider now has a robust “anything to anything” data integration platform that provides a clean visual mapping interface for designing and administering the ETL process. Precisely Connect gives the company speed, efficiency, accuracy, and flexibility. Connect provides an essential tool in the collection of technologies that now enable the company to process large volumes of data in near real-time.

The data integration imperative

With the arrival of robust big data analytics tools and data management platforms like Hadoop, Hive, and Snowflake, healthcare leaders understand that there is tremendous untapped value hidden within their data. However, the first step toward making sense of all that information is to bring it together under one roof and make it available for analysis.

By choosing an enterprise-grade integration platform like Precisely Connect to manage ETL processes, companies gain immediate access to multiple integration points and platforms, opening the door to fast, flexible integration and setting the stage for robust analytics and better-informed business decisions.

To learn more about how Precisely Connect is helping companies in the healthcare industry deliver better results faster, download the Symphony Health case study today.