Data Integrity

Trusted Generative AI using Precisely and AWS

August 15, 2023

Rochelle Grubbs

Welcome to an era where machines are breaking free from their traditional analytical roles to unleash boundless creativity and innovation. This groundbreaking paradigm shift has revolutionized artificial intelligence (AI), allowing machines to produce original, contextually relevant content that was once the exclusive domain of human creativity.

Generative AI drives this transformation, focusing on machines’ ability to generate meaningful and novel creations. Fueled by powerful machine learning (ML) models known as foundation models (FMs), generative AI taps into vast datasets and patterns to create outputs that mimic or even expand upon the knowledge they were trained on.

The implications of generative AI are vast, driving organizations of all sizes to explore and leverage FMs to transform their businesses and elevate the value they deliver to customers. The quest to harness the full potential of generative AI relies on finding high-performing FMs and trustworthy data to achieve outstanding results for diverse use cases.

In this blog, we’ll explore the transformative impact of generative AI, the crucial role of data integrity – and by extension, data enrichment – in powering these innovative systems, and how real-world applications in PropTech using Precisely and Amazon Web Services (AWS) together can help make generative AI actionable.

Data Integrity and AI

Trusting your data is the cornerstone of successful AI and ML initiatives, and data integrity is the key that unlocks its full potential. Data integrity means having accurate, consistent, and contextually relevant data – the kind of data you can confidently rely on to make informed decisions and drive your business forward.

Yet, achieving data integrity is a complex task, and many organizations need help with data challenges that stand in the way. Data often resides in isolated silos, grows stale over time, lacks standardization, may be riddled with duplicates, and is incapable of delivering location-based insights, diminishing its integrity and reliability.

Without data integrity, you risk compromising your AI and ML initiatives due to unreliable insights that do not fuel business value.

But with data integrity, the rewards are immense. Using a solution like the Precisely Data Integrity Suite to achieve and maintain data integrity, you gain more trustworthy and dependable AI results and can make confident data-driven decisions that help you grow the business, move quickly, reduce costs, and manage risk and compliance.

Data enrichment is an essential part of the data integrity journey. Enhancing data with additional information, like points of interest, property attributes, demographics, and risk information, increases the context and relevance of AI models’ outputs. This enrichment process involves various techniques, such as preprocessing, cleaning, and incorporating contextual embeddings.

Fine-tuning large language models on trusted third-party enrichment datasets allows them to learn from domain-specific patterns, making their outputs more accurate and relevant. Human review further ensures the accuracy and relevance of the dataset, addressing potential biases or errors in the training data and preventing misinformation, ethical concerns, security risks, and other negative implications.

Precisely and AWS prioritize data integrity to deliver reliable and accurate results, recognizing it as the secret ingredient that makes AI models truly powerful. When you thoughtfully leverage data enrichment as part of your overall data integrity strategy, you unlock the full potential of AI models, driving transformative solutions across your various domains.

2025 Outlook: Essential Data Integrity Insights

What’s trending in trusted data and AI readiness for 2025? The results are in!

Read the report

Enhancing Customer Interactions in PropTech with Generative AI

In the fast-paced world of PropTech, data-driven decisions are the norm. But the constant stream of customer queries about properties and surrounding areas can be overwhelming. Powerful machine learning solutions can help tackle these challenges, and Amazon SageMaker JumpStart provides access to algorithms, models, and ML solutions so you can quickly get started.

With Amazon SageMaker JumpStart, ML practitioners can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated Amazon SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment.

However, large language models (LLMs) have certain limitations. Trained on general domain corpora, they might not be as effective on domain-specific tasks. A PropTech chatbot needs precise answers based on specific data rather than generic information. This is where retrieval augmented generation (RAG) comes into play.

RAG is a game-changer that combines the power of LLMs with external knowledge. By retrieving contextual documents from outside the language model and incorporating them during execution, RAG enhances the model’s performance.

PropTech companies can leverage LLMs with RAG to access even richer and more robust information about a property. By simply asking a chatbot a question, they can receive precise and up-to-date responses about property details, neighborhood safety, and demographics. This integration of RAG streamlines the process, resulting in the ability to serve customers faster, research property information more efficiently, and ultimately increase sales and profit.

This is another instance where data enrichment can be harnessed to enhance the value and quality of data used for generative AI models – and ultimately help produce greater context and relevance in the models’ outputs. Enriching data with additional attributes and variables, like points of interest and demographics, ensures accurate, contextually-grounded responses to customer inquiries.

Together, data enrichment and RAG form a powerful pair that unleashes the full potential of generative AI – and they’re also crucial pillars in generative AI’s evolution. As the PropTech industry continues to leverage ML solutions, integrating these powerful techniques will revolutionize customer interactions, streamline research processes, and ultimately boost business outcomes.

How Precisely and AWS Together Can Help Make Generative AI Actionable

To illustrate the possibilities, we’ll look at a prototype developed by Precisely and AWS that uses the unique attributes of Precisely data products like Context Crime Index, Context Commuter Score, Context Demographics, Context Walkability, Community Link, and Market Link to feed a generative AI model.

The diagram below represents a reference architecture of how Precisely and AWS can deliver trusted generative AI solutions using LLMs with RAG.

How do you deliver a solution like the sample above?

Here’s a step-by-step guide to the process and interfaces between various Precisely and Amazon solutions that would make it possible.

Amazon CloudWatch is scheduled to invoke AWS Lambda at a set interval. This starts the Amazon EC2 instance with Automatic Data Downloader, which monitors changes and downloads reference data to the customer’s Amazon S3.
Precisely data is loaded into Amazon OpenSearch.
The user asks a question (prompt) to the generative AI application. This application will make an LLM model data-aware by connecting it to Precisely data sources (context) hosted in Amazon Open Search.
The generative AI application executes a search query on Amazon OpenSearch using the user prompt.
Amazon OpenSearch will return the most relevant record to the AWS Generative AI SDK if a match is found.
The generative AI application passes this record as context, and the user prompt to the large language model (LLM) hosted as an Amazon SageMaker endpoint.
The LLM processes the request from a generative AI application and returns the result.
The generative AI application gets the result from the LLM and sends it to the end user.

Generative AI Marks a Significant Paradigm Shift

Generative AI enables machines to produce novel and contextually-relevant content. With advancements in ML and the development of powerful generative AI models, we’ve witnessed a tremendous transformation in AI capabilities.

However, the true power of generative AI can only be fully realized with data integrity.

Trusting the data that feeds these models is crucial for delivering reliable, accurate results that empower organizations to make well-informed, data-driven decisions. Achieving data integrity is no easy feat, but with advancements in techniques, vast and diverse data availability, and cloud computing capabilities, we’re paving the way to ensure it’s possible for organizations like yours.

The PropTech industry is a prime example of how generative AI can revolutionize business processes. PropTech brokerages can provide customers with precise and up-to-date information by leveraging publicly available LLMs available in Amazon SageMaker JumpStart in conjunction with context derived from Precisely data products.

Data enrichment further enhances the context and relevance of the model’s outputs, enabling brokers and agents to serve customers faster and more efficiently, ultimately leading to increased sales and profit for the agency.

We’re living in the era of AI-driven innovation. As generative AI continues to evolve, the integration of data integrity with powerful data enrichment capabilities and RAG will play a pivotal role in unlocking the true potential of AI models across various industries. When you embrace these powerful techniques, you elevate your AI initiatives to new heights, delivering trustworthy and dependable results that propel your business toward success.

We encourage you to learn more by exploring SageMaker JumpStart and the Precisely Data Integrity Suite. Or build a solution of your own using the sample implementation provided in this post and a Precisely dataset relevant to your business.

This article has been co-authored by