Peter Wayner
Contributing writer

Data as a service: Top vendors offering data on tap

Feature
Apr 14, 2022
AnalyticsData IntegrationData Science

Enterprises looking to supplement their data-driven decisions or digital services with data they might not otherwise have on hand have a growing number of options to turn to.

data loss leak breach security risk CIO badge
Credit: Thinkstock

With data-driven decisions and digital services at the center of most businesses these days, enterprises can never get enough data to fuel their operations. But not every bit of data that could benefit a business can be readily produced, cleansed, and analyzed by internal means. Enter data-as-a-service providers: Entities that offer data on tap for a fee for your enterprise to use.

Who needs data as a service (DaaS)? Anyone with an enterprise that craves data and needs it to be trustworthy, loyal, helpful or any of the many useful roles. Sometimes the data on offer by DaaS vendors comes from their innerworkings or their own business operations. Sometimes it comes from external, oftentimes open, sources, gathered together by the DaaS vendor to help enterprises leverage data assets they might otherwise be unable to deal with themselves.

DaaS offerings have been evolving for decades, but lately developers have recognized that a cloud model, with its flexible, usage-based pricing, could more readily help connect enterprises with data sources the vendors seek to monetize. And it’s not just about the data on offer itself. DaaS vendors can also improve the quality of data that an organization might otherwise gather itself by correcting errors or filling in gaps and even provide big blocks of data should you need more. In this way, DaaS providers can improve your homegrown data warehouse by cross-fertilizing it with other, curated sources.

The area is rapidly growing. Some DaaS vendors emphasize the ability of their tools to manage information, analyze the data, create reports, and support decision making. Others push the data itself, knowing that having too much data is like being too rich or too thin. Everyone is in the market for more information about their competitors, their customers, their internal operations, and the general world as well.

Many of the tools are also following the current fashion of making development simpler and smarter. Low code and no code options are making it easier for anyone to click a few buttons and produce a report or download a spreadsheet loaded with data, all without setting up an endless series of meetings with the developers. The companies also emphasize their connections to good artificial intelligence algorithms and data science options.

Here are a number of options available to help fulfill your DaaS needs.

Cloud providers

All the major cloud companies maintain a big collection of open data sets for their customers. In many cases, the data is free and provided as an incentive to use the local computing services. The data is usually already converted and sometimes improved as it’s converted to the local format for easy integration with your code. Data sets include many of the big government collections such as the weather data, as well as some surprises. Azure Open Datasets, for instance, includes census data and crime data as well as some data sets focused on understanding global climate change. AWS Open Data includes a variety of genomic data and the Common Crawl, a collection of 50 billion web pages. Google Cloud’s Datasets include patents, weather information, and also Google’s own data produced by tracking searches and web analytics.

Credit agencies

Three major companies — Experian, TransUnion, and Equifax — track how all of us borrow and pay back loans in an effort to compute scores that purport to measure how well we can be trusted in the future. In the past, the scores themselves were rather mysterious and hidden, but lately banks and credit card companies are sharing the scores directly with customers in an effort to encourage better behavior.

The credit agencies themselves aren’t content to work just with lenders. Equifax, for instance, wants to tackle bigger problems such as workforce management, fraud, identity theft, and marketing. The knowledge of how much people make and how they spend and pay back loans may be useful for predicting a variety of questions for industries as diverse as healthcare, automotive, manufacturing, and retail.

Now, the credit agencies are exploring new ways to deliver answers. Equifax Ignite, for instance, is a cloud-based tool that lets you analyze Equifax’s data without the personally sensitive information leaving Equifax’s machines. It produces sophisticated analytics under several layers of security and compliance.

Enigma

Tracking the growth and development of every small business in the world is not easy. Enigma gathers information from a variety of government agencies and open sources before mixing in anonymized transaction-level details provided by credit card and debit card banks. If money talks, then understanding cash flows is the fastest way to understand the nature of a business.

HIRinfotech

Much of the information you might want is often already available on websites. HIRinfotech specializes in scraping it into databases and then analyzing it. The company collects data about pricing and products across dozens of industries such as travel and financial services. Enterprises can work directly with the data and reports or build similar ones using some of the robotic process automation (RPA) tools integrated with the scraped information.

Informatica

Marketing teams that need clean, up-to-date contact information can turn to Informatica to organize and update their contact lists. The company’s service mixes verification and enrichment. First, addresses and phone numbers are double-checked with address databases and National Do-Not-Call databases. Then Informatica adds details from trusted sources of businesses and consumers to build an enhanced dossier on contacts.

Oracle DaaS

Marketers craving better sales intelligence and opportunities to open up lines of communication are the primary targets for Oracle’s DaaS product. The DaaS database maintains up-to-date information on primary and secondary contacts for a wide range of businesses. Instead of struggling to keep your Rolodex current, the tool will import the new and updated names and contacts into your software. If you use other Oracle tools, such as Eloqua, the import path is already debugged.

Precisely

Developers who need information about places on a map and the people living there turn to Precisely. Its Demographics APIs, for instance, take an address or location and return a set of aggregated statistics about the people and households within the search radius. Residential and commercial real estate parcels are tracked with the Property API. Some enterprises use the data for real estate transactions and store-location planning, but others use the database to simplify the checkout process for online retail by finding an accurate address with the Typeahead lookup. The company also builds a connection of data processing tools to simplify the development of better analytics.

RTI

The US Census Bureau locks up responses for 72 years to protect the privacy of Americans, and that can be a long time to wait to do any data analysis. RTI has taken a different approach. Instead of delivering personalized information, it has created a synthetic data set that mimics the real data in many important ways. If there are 58 people in a block in the real census, you’ll find close to 58 entries in the synthetic data set along with made up details that try to approximate the real values. Anyone trying to analyze the census data can run their algorithms without worrying about personal data. The answers might not be exactly the same as using the real thing, but for many questions the answers will be close enough. And that’s better than waiting 72 years.

Snowflake

Companies with data turn to Snowflake to store and analyze it instead of building their own infrastructure. The company offers a scalable, maintenance-free option that ingests structured and semi-structured data and then offers a variety of standard reporting and AI services. The Data Marketplace also enables users to buy and sell their data to improve the quality of insights through cross-fertilization. Some of the featured data sets include market research from MSCI or S&P Global and COVID epidemiological data from Knoema or Starschema. There is a wide array of data sets for a diverse range of topics from demographic studies, to marketing, media, or sports, including for fantasy football.

Streetlight Data

Organizations involved in city planning and designing transportation networks need to understand what residents are doing on city streets. Streetlight Data tracks everyone using anonymized cell phone records and government sources to build a detailed model of just when people need to move throughout the city. With Streetlight Data, enterprises can get accurate measurements of people flow without having to build out their own sensor networks.

Synthesis AI

Generally, DaaS companies gather real information about the world. Synthesis AI, however, creates its data using some of the 3D models and CGI techniques that power video games and Hollywood action movies. If you want to train your machine-vision routines, perhaps to build an autonomous car, you can find as many test cases as you need. Perhaps your algorithm needs testing a street full of drunk pedestrians at Mardi Gras? Or maybe a scene at twilight just after a theater lets everyone out? Or maybe you just worry about the ethical issues of working with video footage of real-life children? Synthetic data is faster and more comprehensive than anything you can generate with a film crew.

Peter Wayner
Contributing writer

Peter Wayner is the author of more than 16 books on diverse topics, including open source software ("Free for All"), autonomous cars ("Future Ride"), privacy-enhanced computation ("Translucent Databases"), digital transactions ("Digital Cash"), and steganography ("Disappearing Cryptography").

More from this author