Big Data

Big Data 101: Dummy’s Guide to Batch vs. Streaming Data

November 16, 2022

Christoper Tozzi

Are you trying to understand big data and data analytics, but are confused by the difference between process streams processing and batch data processing? If so, this article’s for you!

Batch processing	Stream processing
Data is collected over time	Data streams continuously
Once data is collected, it’s sent for processing	Data is processed piece-by-piece
Batch processing is lengthy and is meant for large quantities of information that aren’t time-sensitive	Stream processing is fast and is meant for information that’s needed immediately

Batch processing vs. stream processing

The distinction between batch processing and stream processing is one of the most fundamental principles within the big data world. There is no official definition of these two terms, but when most people use them, they mean the following:

Under the batch processing model, a set of data is collected over time, then fed into an analytics system. In other words, you collect a batch of information, then send it in for processing.
Under the streaming model, data is fed into analytics tools piece-by-piece. The processing is usually done in real time.

Those are the basic definitions. To illustrate the concept better, let’s look at the reasons why you’d use batch processing or streaming, and examples of use cases for each one.

Read out eBook

Streaming Legacy Data for Real-Time Insights

See how Precisely Connect can help your businesses stream real-time application data from legacy systems to mission-critical business applications and analytics platforms that demand the most up-to-date information for accurate insights.

Read

Batch processing purposes and use cases

Batch processing is most often used when dealing with very large amounts of data, and/or when data sources are legacy systems that are not capable of delivering data in streams.

Data generated on mainframes is a good example of data that, by default, is processed in batch form. Accessing and integrating mainframe data into modern analytics environments takes time, which makes streaming unfeasible to turn it into streaming data in most cases.

Batch processing: Bills are processed in batches - process streams — Bills are processed in batches.

Batch processing works well in situations where you don’t need real-time analytics results, and when it is more important to process large volumes of information than it is to get fast analytics results (although data streams can involve “big” data, too – batch processing is not a strict requirement for working with large amounts of data).

Use cases for batch processing:

Payroll
Billing
Orders from customers

Stream processing purposes and use cases

Stream processing is key if you want analytics results in real time. By building data streams, you can feed data into analytics tools as soon as it is generated and get near-instant analytics results using platforms like Spark Streaming.

Stream processing is useful for tasks like fraud detection. If you stream-process transaction data, you can detect anomalies that signal fraud in real time, then stop fraudulent transactions before they are completed.

Use cases for stream processing:

Fraud detection
Social media sentiment analysis
Log monitoring
Analyzing customer behavior

Turning batch data into streaming data

As noted, the nature of your data sources plays a big role in defining whether the data is suited for batch or streaming processing.

That doesn’t mean, however, that there’s nothing you can do to turn batch data into streaming data to take advantage of real-time analytics. If you’re working with legacy data sources like mainframes, you can use a tool like Precisely Connect to automate the data access and integration process and turn your mainframe batch data into streaming data.

This can be very useful because by setting up streaming, you can do things with your data that would not be possible using streams. You can obtain faster results and react to problems or opportunities before you lose the ability to leverage results from them.

Read our whitepaper Streaming Legacy Data for Real-Time Insights for more about process streams.