Blog > Data Integrity > Top 4 Takeaways from Snowflake Summit and Databricks Data + AI Summit

Top 4 Takeaways from Snowflake Summit and Databricks Data + AI Summit

Tendü Yoğurtçu, PhD | August 8, 2022

Summer 2022 is off to a fantastic start, with two major partner events already complete: the Snowflake Summit in Las Vegas and the Databricks Data + AI Summit in San Francisco. These events are always highly inspiring and energizing, and what made both even more special this year was their return to an in-person format.

There’s really nothing like the impromptu conversations with customers and partners, and the whiteboarding with team members that these events afford. These were my first in-person conferences since the beginning of the pandemic, and I was reminded of just how much I missed those special kinds of opportunities.

The innovations are continuing to flow, and I couldn’t be more thrilled about how our partnerships with Snowflake and Databricks will enable us to bring these updates to our customers and propel them to new heights.

Both events left me and my fellow attendees with plenty to reflect on. For today, I’ve rounded up my top four takeaways to share with you.

Key takeaways from the Snowflake and Databricks summits

Takeaway #1: Accessing data from all origins, types, and volumes continues to be an exciting work in progress.

Innovation doesn’t happen without experimentation along the way, and data access and integration aren’t exempt from that rule. Each event strongly demonstrated that the wheels continue to turn in these areas.

Snowflake announced Unistore, with support for hybrid tables which can also handle transactional data. Databricks announced that all of Delta Lake will transition to be open source. The new version of Delta Lake also has better query performance.

Bringing analytical and operational workloads together and integrating all data to the platforms was one of the key themes throughout both events. In addition to structured and unstructured data, vendors are additionally expanding support for geospatial data.

The need for this is something we’re consistently seeing with our customers here at Precisely. Many organizations are integrating their critical data assets from mainframes, IBM i series, on-prem data warehouses, and SAP, to their cloud data platforms for advanced analytics. Then, they’re enriching these data sets with location data and point of interest data, to accelerate and enhance business insights even further.

We’re excited that our joint customers with Snowflake and Databricks can benefit from Precisely’s real-time change data capture, from on-premises transactional data stores to cloud data warehouse, and use our hyper-accurate spatial analytics to make the best, most informed decisions possible.

Takeaway #2: Moving forward, it’s not your father’s data governance anymore.

Both Snowflake and Databricks made announcements around data governance. For Snowflake, the focus was on cell-level security, encryption at rest and in flight, and data masking. Databricks, on the other hand, was concentrated on data privacy, access controls, and updates to their Unity Catalog.

Why such a great emphasis on data security and privacy? In short: the growing complexity of compliance regulations. The federated data governance is gaining traction, giving more power to the data teams. Interoperability will be very important as the data travels across a broad set of platforms, with cataloging and lineage across these platforms most likely requiring multiple vendors.

Metadata is becoming the new big data challenge among our Precisely customers. Active metadata and identifying relevant data attributes are more important than ever. Some of our customers, for example, may have 50 million data attributes, yet only care about less than a million – those attributes that are used downstream in the data pipeline, as part of their dashboards or in advanced analytics.

We recently announced Data Observability capabilities within our new Precisely Data Integrity Suite. Data observability enables organizations to continuously monitor the health of their data pipelines, assess impacts of any data drifts downstream or upstream, be proactively alerted to anomalies, and recommend actions. The Data Observability module runs natively in Snowflake or Databricks, and integrates with Precisely’s data governance and quality products.

Takeaway #3: As data becomes a product, data collaboration and sharing becomes critical.

Every company ought to be a data company.

That’s why data collaboration and sharing is becoming more important than ever. This requires using data as a product across lines of business, as well as integrating your organization’s internal data with external 3rd party vendor data.

The Snowflake Marketplace combines data and applications that operate on the data. The goal? To make data sharing and discovery easy, so that you can seamlessly collaborate, distribute, and monetize. Any Snowflake user will have the ability to access data from 1,200 data listings and 250 different providers.

The Databricks Marketplace is for data, data models, and data applications. It’s open for everyone, not just for Databricks customers. They have about 50 datasets available.

Precisely’s datasets, including wildfire data, demographics data, and more, are available on Snowflake Marketplace. We plan to leverage the marketplaces for both our rich set of data products, as well as applications like Spatial Analytics and Geo Addressing that operate on these datasets.

Takeaway #4: AI is the future of business.

Databricks built Lakehouse with artificial intelligence (AI) workloads in mind. They continue to make announcements around new capabilities with MLflow. Snowflake announced support for Phyton libraries for ML support.

One Databricks keynote session that I found particularly insightful highlighted the intersection of data and AI. Hidden Door’s CEO, Hilary Mason, talked about the challenge of building data products when we don’t have a measure of quantitative correctness. She envisions that the next great data products will be new, creative experiences.

In that same discussion, Peter Norvig, author and AI visionary, emphasized the need to shift the focus for AI pipelines from data models to making the entire pipeline differentiable – i.e., curating the data, having the ability to recommend additions or ways of cleansing the data, making suggestions on how the data should be managed. Ultimately, it’s about building those tools to figure out the end-to-end data pipeline, rather than parts of it. He also touched upon the importance of ethical considerations.

Tying it together with data integrity

At Precisely, our foundation is rooted in data integrity and the ability to empower organizations to make better decisions with data they can trust. If your organization is focused on AI and machine learning (ML) initiatives moving forward, having reliable data with the maximum accuracy, consistency, and context to feed those projects is critical.

That’s why our team is constantly listening to our customers’ feedback and finding new ways to innovate and solve evolving business challenges.

Our latest innovation was revealed at Trust ‘22, Precisely’s annual Data Integrity Summit. During this virtual event, we were thrilled to announce our new Data Integrity Suite, a set of seven interoperable modules that allow you to build trust in your data: Data Integration, Data Observability, Data Governance, Data Quality, Geo Addressing, Spatial Analytics, and Data Enrichment.

While data integrity is the destination, the Data Integrity Suite is how we get there. Through our partnerships with Snowflake and Databricks, we continue to help our customers derive value through data, and optimize data operations to get to the next level in their data integrity journey.

What does your own journey look like? Our team is here to help. Find out more about our Data Integrity Suite and contact us today to unlock new possibilities.