Pecan's Glossary

Crack open the shell of predictive analytics with our nutty glossary, packed full of kernel knowledge for all your modeling needs!

Ori Sagi avatar
Written by Ori Sagi
Updated over a week ago

A

A/B test

A technique to compare two versions of a product, feature, or webpage to assess which one performs better. A/B tests are a type of experiment where two or more variations of a design are randomly shown to users. Data is collected on which design performs better. The goal of A/B testing is to identify changes that can improve a metric of interest.

accuracy

In predictive analytics, accuracy is a measure of a predictive model's performance. It's usually expressed as a percentage, calculated by dividing the number of correct predictions by the total number of predictions made.

Adstock

The accumulated effects of past marketing activities on current sales. Adstock is a measure of how much of the current sales are driven by past marketing efforts. This can be determined by measuring the current sales and factoring in the carryover and momentum effects of past campaigns.

advanced analytics

Advanced analytics is the use of sophisticated quantitative approaches to data that can not only reveal insights but also provide predictions and offer forecasts. These approaches usually include AI and machine learning, as well as statistical techniques. It can also include prescriptive approaches that offer specific recommendations. Advanced analytics is widely used in many industries, including insurance, e-commerce, retail, banking, healthcare, manufacturing, and more.

algorithm

In the data context, an algorithm is a set of instructions for a computer to bring in input data, manipulate it, perform calculations with it, and generate output. Algorithms used in data science offer preset methods for analyzing data, identifying patterns, and generating predictions. More than one algorithm can often be used to address a predictive analytics question, and choosing the right algorithm is an important part of the process.

analytics

Analytics is a business practice that uses descriptive and visualization techniques to gain insight into data; those insights can then be used to guide business decision-making. "Data analytics" as a term does not necessarily include predictive approaches. Instead, data analytics typically focuses on gaining a better understanding by using data from the past.

application programming interface (API)

An application programming interface, usually called an API, is a procedure that allows two applications or systems to communicate and share information. In the data world, you might use an API to retrieve data from a cloud server or to load a model's predictions into other business systems.

artificial intelligence (AI)

Artificial intelligence (AI) refers to the development of computerized systems that can carry out tasks and perform actions that augment or take the place of human intelligence. Data science and machine learning represent just part of the study and development of AI.

attribute vs. feature

Attributes are the various data points or variables within a dataset. Features may be the same as attributes, or they may represent combinations of attributes or calculations done with attributes to generate new data points (i.e., through feature engineering).

attribution

The process of identifying and assigning credit to the various marketing touchpoints that contributed to a conversion or another business outcome, such as a sale or a lead. Attribution allows marketers to understand which marketing channels, campaigns, and tactics effectively drive conversions and to allocate their marketing budget and resources according to that success.

automated machine learning (AutoML)

Automated machine learning (AutoML) turns the process of building, training, and testing machine learning models into an automated routine that can evaluate hundreds or thousands of potential models, much more quickly than a human could.

B

business analytics

Business analytics is the use of data about a business's past activities, performance, or transactions to drive analyses that yield practical, useful insights for the business and its decisions.

business intelligence (BI)

Business intelligence (BI) includes gathering, storing, and analyzing business data, as well as using that analysis to inform the actions of the business.

C

Calibration

In marketing mix modeling, calibration is the process of further refining a model to help it better match reality. Calibration is important to ensure that the model can be used to make accurate predictions and to identify the most effective marketing strategies. It also helps to identify sources of variability in the data and to validate the model's assumptions.

Carryover

The effect that a marketing campaign has on sales after the campaign has ended. This can be positive or negative and can be caused by factors such as brand awareness or consumer sentiment.

customer acquisition cost (CAC) payback

Customer acquisition cost (CAC) is the total amount spent on acquiring a new customer. It includes all the sales and marketing efforts required to gain that customer. CAC payback is a measure of the average cost of acquiring one customer, and it's calculated by dividing the total expenses by the total number of customers acquired during a specific time period. It's critical for companies to monitor and manage CAC payback effectively to ensure their customer acquisition is efficient and consistent.

customer data platform

A digital platform that allows marketers and customer success teams to collect and unify first-party data. With the combined data, teams can build a comprehensive view of each of their customers.

churn detection

Churn detection is the process of identifying customers at risk of churning. A predictive model can look at data about past customers who churned and look for patterns in their behavior that preceded their churning. The model can then look at current customers and try to find similar patterns so you can take action to retain these customers. This approach can reduce customer attrition and boost customer retention.

churn prediction

Churn prediction involves building a predictive model based on past customer data. That model will help identify patterns in customer behavior that correlate with churn, allowing for the identification of those patterns in current customer data and an intervention to prevent churn. Ideally, you can boost customer retention with this kind of model, and reduce attrition.

classification model

Classification models predict a class or category for each row of data (e.g., for each customer). They analyze data that includes the known category for data from the past, and then can predict which category will best fit future data. For example, a classification model could predict that customers would be most likely to purchase product A, B, or C, given their past transaction history. Churn prediction may also be made with classification models.

conversion analytics

Conversion analytics is the analysis of data related to conversions to find useful insights. A conversion in the business context refers to a desired customer action, such as making a purchase or providing contact information. Classification models may be used for this purpose. Predictive analytics can help improve conversion rates by providing deeper insight into what affects customers' decisions and predicting which customers will respond to which offers.

cross-sell model

Cross-sell models are developed based on customer data and can identify which complementary products might most interest a specific customer. The goal of these models is to offer the right customer the right offer at the right time. Predictive analytics can use cross-sell models to improve conversion rates and generate greater revenue.

customer data analytics

Customer data analytics can include both descriptive and predictive approaches to analyzing customers' interactions with a business. Customer data analytics might address issues like churn detection and prediction, cross-sell and upsell, or customer lifetime value. The insights and predictions can be used to guide business decision-making and improve business outcomes.

customer data platform

A customer data platform (CDP) is a single system that holds and organizes all customer data from various sources. The CDP constructs customer profiles and makes that information available to other business technology systems, such as those used for marketing, sales, and customer service.

customer lifetime value (CLV or CLTV) prediction

Customer lifetime value (CLV) is the total amount of revenue a business can expect to take in from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting CLV (sometimes called CLTV or pLTV) can highlight customers who might receive special offers and inform strategies to retain the most valuable customers.

D

data blending

Data is often stored in a variety of locations, from cloud databases to on-prem databases to Excel files. A frequent challenge of data projects is combining or "blending" all of those data sources. Many data platforms offer the ability to readily connect to different data sources and import various file types. also known as data integration, data munging

data clean rooms

Data clean rooms are isolated, secure locations used to store and combine aggregated, anonymized data from multiple sources. They provide additional privacy protection for individual-level data while also allowing marketers to match their first-party data to aggregated data from other sources.

data cleaning

Data typically needs some "cleaning" prior to being used in machine learning models. For example, an unusually large number may represent a data entry error, or it could be an outlier that's unusual but correct. Clean data is essential to high-quality machine learning models. also known as data cleansing, data preprocessing

data encoding

Data is not always in exactly the right format for predictive modeling. Some data, such as text, may need to be represented differently to be used in a mathematical model. The data encoding process ensures that all data are ready for use in a model.

data engineering

Data engineering includes setting up and maintaining systems for gathering and storing data, as well as constructing processes for retrieving data for use in predictive analytics and modeling. Data engineering has become a specialized job of its own at many data-driven companies.

data enrichment

Data enrichment involves integrating external data from trusted third-party sources into analytics in ways that complement a company's internal data. For example, demographic, weather or public health data can enhance the performance of predictive models. The retrieval and integration of this data can be time-consuming and technically challenging if not handled through automation.

data leakage

Data leakage occurs when a machine learning model is trained with information about the target/outcome variable that it will not have when used in production. This typically occurs when a feature is included in the training dataset inappropriately. For example, if you want to predict whether a website visitor will purchase a product using their behavioral and demographic details, but accidentally include a feature reflecting their purchases in the training dataset, the model will "know" information about the visitor's future that will not be available when you use the model to make predictions about a new visitor. Additionally, the model will seem to perform unusually well because it has been provided information that directly correlates strongly with the target/outcome variable.

data management platform

A software tool that helps businesses collect, store, and organize large amounts of data related to their customers and target audiences. This data can be used to create more targeted and personalized digital marketing campaigns across various channels, including email, social media, and the web.

data preparation

Data preparation is a blanket term that can include everything from combining data from different sources, dealing with outliers and missing data, making statistical adjustments, and encoding data in the correct formats for predictive modeling. Although this process can be tedious and take a lot of time if conducted by hand, automated processes can build it into a predictive modeling workflow efficiently. also known as data preprocessing

data science

Data science combines statistics, computer science, scientific methods, and business knowledge to analyze, model, and predict using data. The data science toolkit can be used to analyze all kinds of data, from numerical to text to images. Ideally, the insights and predictions gained from data science are used to enhance business success.

data visualization

Data visualization is the communication of data trends and stories in a visual format, such as in a bar chart, line graph, timeline, word cloud, or even a map. "Data viz" should make it easy for the viewer to quickly identify important trends or patterns in data.

data wrangling

Data wrangling is a term sometimes used to encompass data blending and data cleansing, suggesting all the forms of manipulation that data might need to be ready for use in a predictive model. also known as data preprocessing, data preparation

Decomposition

A technique used to identify the relative contributions of different marketing inputs or channels (i.e. the "marketing mix") to changes in sales or other key performance indicators (KPIs). Decomposition aims to understand how changes in different marketing activities, such as advertising spend and promotional activities, affect business performance.

deep learning

Deep learning is a specific area of data science that uses analytic methods based on human brain structure to analyze data and generate predictions. Specifically, this area focuses on algorithms called neural networks that have many layers, which is where the "deep" term comes from. Though especially widely used in areas like image, video, and text analysis, deep learning can be used for many predictive purposes.

demand forecasting

Demand forecasting involves trying to determine the likely future need for an item, based on historical data and analytics showing how much of it has been needed in the past. For example, a grocery store needs to know roughly how many loaves of bread to order each week, and can forecast how many will be needed based on prior demand. AI-based demand forecasting can save significant resources through accurate determination of needs. This kind of predictive demand forecasting has been adopted by varied industries, including manufacturing, retail, grocery, CPG, and more.

demand planning

Demand planning using AI and machine learning is the process of generating forecasts for demand and planning to satisfy that demand most efficiently using available resources. Supply chain demand planning increasingly uses predictive modeling to ensure products and services are allocated and promoted effectively to reduce costs, cut down on environmental impact, and provide greater profit. Data enrichment can bring data about external conditions, like weather or labor availability, into the predictive modeling process for greater accuracy.

descriptive analytics

Descriptive analytics is the analysis of data from the past in order to "describe" what has been happening in a business. It typically includes looking for trends or patterns in order to find meaningful insights. Calculating summary statistics, like a mean or median, and creating data visualizations, like scatter plots and bar charts, are frequently used in this kind of analysis. It typically does not include predictive modeling.

E

Enrichment

The process of adding additional information or context to existing data in order to improve its value and usefulness. This can include adding new data points, such as demographic information or location data. Data enrichment aims to make the data more complete and accurate, and to enable more advanced analysis and decision-making.

exploratory data analysis (EDA)

Exploratory data analysis (EDA) is an initial stage in the predictive modeling process. In this stage, the analyst looks at statistics representing the distributions of each of the variables, looking for interesting patterns, relationships among variables, outliers, and potential data entry errors, as well as checking basic assumptions about the data. Graphs and other data visualizations are often part of this process.

F

feature engineering

Feature engineering is the process of manipulating and transforming raw data into forms that are more valuable in a predictive model. Datasets offer many options for creating new features, and deciding which ones to create and retain is considered a craft by data scientists. For example, you might have repeated transactions for each customer in your dataset that are actually more informative to predictive models if used to calculate an average transaction amount for each customer. In addition to creating the new features, new labels must be added to make the engineered data understandable to users and meaningful when the model is assessed.

feature selection

While it's great to have a lot of data, not every variable (aka feature) in your dataset will be equally informative in a predictive model. Typically you want to build models using the most valuable features, and omit those that offer less information for the predictions or that are redundant. Feature selection is the process of determining the value of each variable to the model and deciding which variables to keep in the model.

first-party data

First-party data is the data that a company collects itself instead of acquiring it from other sources. For example, data on visits to the company's own website, from newsletter subscribers, and from webinar attendees all can contribute to a robust first-party data repository.

I

imputation

Missing data can sometimes pose a problem for predictive modeling. A process called imputation will replace those missing data points with "best guesses." Depending on the reasons for the missing data, different methods can be selected for imputation. This process can also be automated to make data preparation easier.

incrementality

The ability to measure the additional impact or value of a specific marketing campaign or strategy. It is used to determine the incremental impact of a particular campaign or treatment on a specific metric, such as sales or conversions, relative to a control group that did not receive the campaign or treatment. Incrementality can be measured through different types of tests, such as A/B tests or lift studies. Measuring incrementality allows marketers to understand the true impact of their campaigns and strategies on business outcomes, and make more informed decisions about where to allocate marketing resources.

L

lead scoring

Lead scoring is a method of predicting the chance a new lead (prospective customer) will become an actual customer. Each lead is assigned a score that reflects how likely they are to become a customer. Scores are assigned based on information about the lead, such as information they provide, behavioral data, firmographic data about their company, and other relevant data. These scores can be used to automate sales and marketing efforts and to prioritize high-scoring leads for faster or more tailored action. Lead scoring can lower customer acquisition costs, boost conversion rates, accelerate sales cycles, and improve alignment between sales and marketing strategies.

lifetime value (LTV) prediction

Lifetime value (LTV) is the total amount of revenue a business can expect to take in from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting LTV (sometimes called CLV, CLTV, or pLTV for predictive LTV) can highlight customers who might receive special offers and inform strategies to retain the most valuable customers.

Lift study

A method used in marketing to determine the effectiveness of a specific campaign or marketing strategy. It looks to measure the incremental impact of a particular campaign or treatment on a specific metric, such as sales or conversions.

lookalike modeling

Look-alike modeling is an approach that seeks to identify the behaviors, demographics, and other shared traits of your ideal customers. Using those ideal customers as a "seed set," a mathematical model can find other prospects or new customers with similar characteristics. You can then target these look-alike customers with outreach and messaging to help them become equally high-value in the long term.

M

machine learning (supervised and unsupervised)

Machine learning is an area of data science that helps computers learn in ways similar to human learning. Supervised machine learning methods use data from the past to find patterns that can inform a mathematical model. The model is refined until it does a good job of matching what happened in the past data. Then, the model can make predictions about the future using new data from the present time. Unsupervised machine learning looks for patterns in data where there isn't a clear pre-existing structure. For example, clustering is a form of unsupervised machine learning that tries to identify groups of similar items or people.

Marketing mix modeling

A statistical approach used to quantify the impact of various elements of a company's marketing strategy on sales and other key performance indicators (KPIs). MMM helps determine the most effective use of resources for marketing campaigns by analyzing and isolating the impact of different marketing channels and non-marketing variables.

marketing performance management

Marketing performance management includes a variety of services and technological tools that improve marketing teams' capabilities in using data, obtaining actionable insights, generating predictions, and generally improve marketing campaigns. This approach optimizes marketing efficiency and makes the best use of resources allocated to marketing.

MLOps

Machine learning operations, or MLOps, includes all the work that surrounds the machine learning model development process. It encompasses making sure data are available, providing access to data for analysts, integrating models into business workflows, and monitoring and updating models to ensure they are performing well. In large organizations, MLOps may require multiple people and/or teams. However, these processes can also often be largely automated as part of a predictive analytics platform.

mobile measurement partner

A third-party service that helps mobile app and game developers track and analyze data on user engagement and revenue. MMPs usually offer analytics and reporting tools, including metrics on acquisition, engagement, retention, and revenue.

model

In the context of machine learning, a model is a specific instance or example of an algorithm that has been created based on a particular dataset and that can be used on new data to generate predictions or find patterns.

model drift

Predictive models can perform well at first, but it's common that their performance can decrease somewhat over time. For example, once a predictive model is implemented, the related business changes may alter the outcomes that occur, and so the model may need to be adjusted to fit the new reality. The relationships among the variables have changed, and the model has to be updated as well. To ensure the best ROI from predictive models, their performance should be monitored to catch model drift and adjust as required. Automated monitoring tools can help address this concern.

model training time

The time it takes to train a machine-learning model varies. Some important factors in the time required include the quantity and complexity of the data, the specific algorithm being used for the model, and the computing power available for training. Simple models built on small datasets can be trained quickly on a typical laptop, while large datasets typical of many businesses will require more time and more computing capacity.

model training, validation, and testing

Training, validation, and testing are parts of the machine learning model-building process. Using historical data, the model is trained and "learns" to identify patterns and trends. The model is then validated by comparing its outputs to known outcomes for a second dataset. The model is evaluated on its ability to specify correctly or get close to the true outcome or target variable in that second dataset. This stage of evaluating the model allows its builder to see how well it is performing and to compare it to different versions of the model. Those different versions may use other predictive approaches or be set up in different ways that offer better or worse perfomance. Finally, the model is tested on a third set of data it has never seen before, allowing its builder to judge its likely performance when it is deployed.

Momentum

The rate at which the effects of a campaign dissipate over time. A higher momentum indicates that the effects of a campaign will last longer, while a lower momentum indicates that the effects will fade more quickly.

Multicollinearity

A situation where two or more independent variables in a model are highly correlated with each other.

N

neural network

Neural networks are used in predictive modeling. Their construction is based loosely on human biology. They are constructed of a series of algorithms that each carry out its own specific operation on the data, then pass its results to the next layer, until an output layer is reached and a final output or prediction is made. Neural networks can be used on many kinds of data, but complex networks with many layers are most used in deep learning for challenging data like images, video, and text.

O

optimization

Broadly speaking, optimization is a process used to either maximize or minimize an output value by selecting the right input values. In data science, this process involves creating a mathematical model that can identify the right input values to reach a desired outcome. Examples of optimization might include marketing campaign optimization (i.e., allocating money to the right channels for the best results) and supply chain logistics (e.g., optimizing transportation options to maximize speed and sustainability). Machine learning models can be used for this kind of optimization.

overfitting

Overfitting occurs when a machine learning model learns its training data too well, and then tries to apply a pattern too tightly defined by that training data to new data it encounters. The model will seem very accurate when evaluated on the training data, but it will not generalize well to new data and will perform poorly.

P

Pipeline

A set of processes that move data from one system to another, which often involves extracting data from a variety of sources, transforming it for use in other systems, and loading it into a target data store or system. Data pipelines can automate repetitive data integration tasks, such as moving data between systems, thereby enabling data to flow between different parts of an organization.

pLTV - customer predicted lifetime value prediction

Customer predicted lifetime value (pLTV) is the total amount of revenue a business can predict will be received from a specific customer over the entire time period that the customer is actively engaged with the company. Predicting pLTV (sometimes called CLV, LTV, or CLTV) can highlight customers who should receive special offers and inform strategies to retain the most valuable customers.

precision

In predictive analytics, precision shows what proportion of a machine learning model's identifications of an item were actually correct. As an example, imagine a machine learning model that is trained to recognize cats or dogs in photos. Its precision is based on how many times a photo actually contains a dog compared to how many times the model says it contains a dog. also known as positive predictive value. During model training, this is one of two primary metrics used to measure model accuracy. The other is Detection Rate / Recall Rate (see line 37). During model training, Precision = of the samples that were withheld during model training, what percent of the total number of "in fact" true results were correctly identified by the model. An error here means that an "in fact" true sample was incorrectly labeled as "false".

predictive analytics

Predictive analytics uses data, statistics, and machine learning techniques to build mathematical models that can generate predictions about things likely to happen in the future. Predictive models are trained to identify patterns and trends that are likely to recur. Predictive analytics is used in a wide variety of industries, including banking, insurance, retail, CPG, manufacturing, e-commerce, food and beverage, and more.

predictive marketing

Predictive marketing is the integration of predictive analytics and machine learning into marketing practices. Specifically, data science techniques can be used to understand and predict customer behavior, allowing marketers to proactively respond to customers' interests and behavior. Among many uses, predictive marketing can increase the success of upsell and cross-sell offers, improve conversion rates from email campaigns, and provide insights for campaign optimization.

prescriptive analytics

Prescriptive analytics is related to descriptive and predictive analytics, and can be considered a final step in a data-driven decision-making process. Specifically, prescriptive analytics guides the best course of action to be taken in a business situation, as informed by the data that has been analyzed and used in predictive modeling. Prescriptive analytics is used in many industries, including insurance, manufacturing, and human resources (i.e., people analytics).

R

recall

In predictive analytics, recall shows what proportion of the actually relevant cases were correctly identified by a machine learning model. As an example, consider a machine learning model that is trained to recognize cats or dogs in photos. Calculating its recall would be based on how often it actually recognized the dogs out of all the photos that truly contained dogs. also known as sensitivity

recency-frequency-monetary (RFM)

RFM analysis is a tool used in marketing to score customers based on three categories: the recency, frequency, and monetary value of their purchases. This approach lets companies see which customers have the highest likelihood of becoming ongoing purchasers, how revenue is distributed among new and established customers, and who are the highest-value customers.

regression

Regression models are used in statistics and machine learning to represent the relationship among variables. These models can show the strength of their relationships and also offer insight into how the variables affect each other. In predictive analytics, regression models can be used to predict a value for an outcome variable, given the values for other variables that are related to it. For example, based on records of customer behavior, we can predict customers' lifetime value (also known as CLV, LTV, CLTV, or pLTV).

S

saturation

Refers to the point at which a specific marketing channel has reached its maximum effectiveness, and additional spending on that channel will not result in a significant increase in the desired outcomes. Channel saturation is often used as a metric to determine the optimal level of investment in a specific channel and to make budget allocation decisions.

seasonality

Refers to patterns that repeat over a specific period of time. These patterns can be observed in various types of data, such as customer transaction data, and are often related to recurring seasons or calendar events.

SHAP values

SHapley Additive exPlanations, aka SHAP values, quantify the effect of each feature in your model on the predictions generated by that model. The values are calculated by comparing the output of your model with each feature to its output without each feature. SHAP values can be calculated "globally" for all the model's features, taken as a whole; they also can be calculated "locally" for each individual prediction to show which features most affected that prediction.

SKAdNetwork

A method created by Apple for measuring mobile app advertising effectiveness. SKAdNetwork is intended to offer a more private, secure way for developers and advertisers to measure iOS ad campaign performance.

supply chain analytics

Supply chain analytics refers to using data about supply chain processes to garner insights into the processes and — in predictive analytics — to generate predictions that can be used to improve and streamline those processes. Predictive approaches can be used to try out different scenarios and select the best options, as well as to plan for potential situations that could arise. Predictive analytics applied to supply chain challenges can identify risks, find trends, and minimize costs. Supply chain decision-making that is informed by predictive analytics can also increase resiliency in unpredictable times.

T

target variable (or outcome variable)

The target variable in a machine learning model is the variable you want to predict. For example, you might want to predict whether a customer is likely to churn or not. Also known as the outcome variable, dependent variable, target

U

underfitting

Underfitting occurs when a machine learning model has not learned well from its training data and hasn't recognized a pattern that it can apply accurately to new data. The model may be too simple to capture meaningful patterns in the training data. Underfit models will perform poorly on training and test data.

upsell model

Upsell models are developed based on customer data and can identify which customers might be likely to buy a higher level or additional product or service. The goal of these models is to offer the right customer the right upsell offer at the right time. Predictive analytics can use upsell models to improve conversion rates and generate greater revenue.

V

Validation

In marketing mix modeling (MMM), validation refers to the process of testing the accuracy and reliability of the model by comparing its predictions to actual historical data. This is done to ensure that the model is able to accurately predict the impact of different marketing mix variables on sales or other key performance indicators (KPIs).

Did this answer your question?