AEP: Data Lake vs. Profile

When starting to work on Adobe Experience Platform, the key concepts are understand is how the data is stored and what data sources are used by the different AEP applications services.

There are two data storage components: Data Lake and Profile.

All data ingested into AEP is stored in Data Lake. This is not the case for Profile, Schemas and Datasets need to be “Profile Enabled” prior to the data ingestion for the data to also flow into Profile.

Data flow when Profile is not Enabled on a Dataset (data only flows into Data Lake)
Data flow when Profile is Enabled on a Dataset (orange arrows show data flows into Profile)

Data Lake contains all the history in your data, including your records-based data and all the changes that have occurred over time as well as all of your events-based data, which is continually flowing into the system.

Profile Fragments are reconciled when Datasets are enabled for Profile.

In contrast, Profile (a.k.a Real-time customer profile) keeps only the most recent records and events, which allows it to remain agile in its delivery of profiles for real-time segmentation. When data reaches this stage, the Profile Fragments are unified using the Identity Service and Profile Service to form the most current view of a customer.

Applications Services

Data sent to Profile is a precious resource. Therefore, this should be managed carefully considering the use-cases for each application service in scope.

Data Architects will need to decide the Dataset partitioning at the Design stage of the implementation. If ever the incorrect data has entered Profile, you can explore the APIs to delete a dataset or batch from Profile.

For example, imagine that we have a use-case to use Query Service to power a Power BI report. If any additional fields (not required for RTCDP segments) need to be added exclusively for the purpose of this report, then the data should live in a separate Dataset and not enabled for Profile.

Application ServicePrimary Data Store
Customer Journey AnalyticsData Lake
Real-Time CDP (RTCDP)Profile
Query ServiceData Lake
Journey OrchestrationProfile (Segments & Attributes)
Journey OptimizerProfile (Segment & Attributes)

Photo by Rahul Pabolu on Unsplash

Next Article

AEP & Google Part A: Data Collection

View Comments (2)
  1. Really easy explanation there.
    Hope that you can also cover how enabling profile store impact a sample customer record, between Individual Class and ExperienceEvent class (when enabled for profile)

Leave a Comment

Your email address will not be published. Required fields are marked *