Blog > Data Enrichment > 4 Steps to Straightforward Data Enrichment in Snowflake Without Heavy Spatial Processing

4 Steps to Straightforward Data Enrichment in Snowflake Without Heavy Spatial Processing

Authors Photo Colin Mattison | June 10, 2022

Snowflake users leverage faster data processing and efficient storage every day while using solutions customized to their needs. With that in mind, a hot topic in data warehousing today is how to leverage location that is already present in business data. Considering our IoT devices and mobile phones, location is embedded in most data, but that has historically been left to those with specialized skills and systems for spatial processing, like Geographic Information Systems (GIS).

While some industries have recognized the strategic advantage of incorporating spatial analysis into their strategy, many are still early in their location data journey.  Additionally, these businesses may not have the in-house geospatial expertise and tools required to provide actionable information to data scientists or business analysts.  As the telecommunications and technology sectors have shown, the geographic coordinates our phones capture can be leveraged to deliver better consumer experiences and reduce costs.  The trick is to transform that raw data into something valuable and accessible.

Leveraging Precisely’s data enrichment portfolio directly within Snowflake can open the world of spatial processing to a broad range of users regardless of their geospatial experience.

spatial processing

Finding data you can trust – Available today in the Snowflake Data Marketplace

Traditionally it has been hard for businesses to source current, clean, and trusted spatial data while making it accessible to a broad segment of its employees. To make location data more accessible and usable, Precisely has teamed with Snowflake, providing fast and easy data enrichment with pre-processed spatial information. Using Precisely’s data enrichment portfolio, Snowflake users can access a variety of city and postal boundaries, points of interest, property attributes, and addresses while working in a familiar SQL environment.

How does this work?

Many organizations have customer addresses within their business systems. To streamline working with address data while ensuring accuracy, Precisely has assigned every address in the United States a unique and persistent identifier – we call it the PreciselyID. Let’s look at how a business can use this ID to inform business decisions by spatially enhancing its in-house addresses list.

Simplifying geospatial data enrichment – a real-world example

An insurance company has noted elevated costs and customer churn in their auto line of business in Oklahoma. As their customer data is stored in Snowflake, the insurer would like to identify those policyholders with automobiles living in areas that commonly experience severe hail events. Additionally, the company would like to identify customers in areas with limited access to repair facilities. These folks would have more trouble getting the damage repaired–especially at a fair price.

The goal is to identify policies with potentially high repair costs from severe hail events and low satisfaction rates due to long repair times. Through the insights gained from this analysis, the company can take steps to reduce costs and increase customer satisfaction.

Enrich in-house data with spatially referenced weather, property, and business content.

Using existing customer addresses we will add hailstorm severity, associated damage, and TORRO intensity scale from Precisely weather data, a to identify those policyholder locations with a high risk of severe hail damage. Simply use SQL directly in Snowflake and eliminate the need for complicated spatial processing.

Step 1

We will first use the company’s list of insured addresses and join them with Precisely’s Address Fabric. This will show which addresses are residential and will add the PreciselyID. (Assumption: The address list has been cleansed and validated. Do you need tools to help with this process? Precisely has best-in-class solutions to quickly clean up your data.)

SQL Statement

SELECT *

FROM PRECISELY.PUBLIC.ADDRESS_FABRIC_USA INNER JOIN CLIENT.PUBLIC.ADDRESS_FABRIC_USA.PBKEY

ON PRECISELY.PUBLIC.ADDRESS_FABRIC_USA.PBKEY = CLIENT.PUBLIC.ADDRESS_FABRIC_USA.PBKEY

WHERE STATE = ‘OK’ AND PROP_TYPE – ‘R’

Step 2

Now that the customer address list has been joined to Precisely’s Address Fabric, the PreciselyID can be leveraged to enrich those locations with additional content. Using the PreciselyID, we associate each address with historical weather information. Below you see an example of a “left join” to identify properties that have experienced previous severe hail events. The PreciselyID in both the customer data and Weather Data can now easily correlate detailed information from historical hail events. In this step, we will query Precisely’s Historical Weather Risk:

SQL Statement

SELECT *

FROM PRECISELY.PUBLIC.HISTORICAL_WEATHER_RISK_USA LEFT OUTER JOIN CLIENT.PUBLIC.ADDRESS_FABRIC_USA.PBKEY

ON PRECISELY.PUBLIC.HISTORICAL_WEATHER_RISK.PBKEY = CLIENT.PUBLIC.ADDRESS_FABRIC_USA.PBKEY

WHERE HAIL_RISKLEVEL = ‘High’;

Step 3

Property details can inform the relative risk associated with a policy. In this case, let’s find those policies with no garage available to protect an automobile from hail damage. This is achieved by another “left join” to Precisely’s Property Attributes file via the PreciselyID.

SQL Statement

SELECT*

FROM PRECISELY.PUBLIC.PROPERTY_ATTRIBUTES_ASSESSMENT_USA

WHERE PROP_GARAGE = 0;

Step 4

As mentioned previously, the analysts feel that high dissatisfaction rates could be associated with long wait times for repairs. To test this theory, it will be helpful to understand the availability of repair facilities relative to policyholder locations. Using Precisely’s spatially enabled business data, World Points of Interest Premium, access company data such as contact information, employee count, annual revenue, legal status, corporate hierarchy, industrial classification, and more. Eighty-four curated attributes offer a comprehensive, spatially referenced menu to choose from.

We will filter out the number of body shops enclosed by each ZIP Code.

SQL Statement

SELECT PRECISELY.PUBLIC.WORLD_POINTS_OF_INTEREST_PREMIUM_USA.POSTCODE,COUNT(SIC8)

FROM PRECISELY.PUBLIC.WORLD_POINTS_OF_INTEREST_PREMIUM_USA

WHERE SIC8 = ‘75320000’ AND STABB = ‘CA’

GROUP BY POSTCODE;

Results from these queries?

Through a few simple SQL queries with datasets available within the Snowflake Data Marketplace, the analysts were able to quickly identify customers at highest risk of auto damage and limited access to local repair shops. The business can incentivize these high-risk policyholders to protect their cars or face a price increase at renewal.

As illustrated through the process above, we can see how Precisely data enables Snowflake users to solve business problems by enriching a very common source of customer data, the policy address, with pre-processed spatial data. This offers data scientists and business analysts who are versed in SQL the ability to add powerful spatial content to their Snowflake workflows without the need for deep expertise in GIS.

Precisely offers a comprehensive data enrichment product portfolio and solutions for

Snowflake users can browse Precisely datasets in the Snowflake Data Marketplace and request free sample shares.