Sample report

Example data for a fictional estate. Connect your warehouse (BigQuery, Snowflake, Databricks, PostgreSQL, Redshift, or Fabric) to see yours. Read-only, ~2 minutes.

Sample AI Data Readiness Report

Acme Analytics (sample)

BigQueryacme-analytics.warehouse·scan profile·Assessed May 20, 2026, 12:00 AM

Readiness Verdict · scan profile

Pilot only

The estate is active and typed but unmasked PII across 4 tables and zero policy tags on all 31 columns make a production AI investment decision premature. A scoped internal triage is safe, external or model-training use is not.

Safe to ship now: An internal-only portfolio triage pilot using schema structure, type coverage (30/31 columns typed), and freshness signals (5/5 tables updated in last 7 days). No customer or payment data surfaces, no external data sharing, no model training on raw PII.

Gate to production

Apply BigQuery policy tags to all 31 columns (currently 0/31) to enable access governance before any AI pipeline ingests this estate.
Mask all 6 PII columns: email, first_name, last_name (customers), ip_address (events), shipping_address (orders), credit_card (stg_payments). Currently 0/6 masked; credit_card exposure is highest-severity.
Document the remaining 16 of 31 undocumented columns so AI feature pipelines can interpret column semantics without manual intervention.
Partition or cluster the 4 unoptimized tables (only 1/5 currently optimized) to prevent runaway scan costs as AI workloads scale query volume.

What we found

6 unprotected PII-candidate columns exposed to AI

customers.email, customers.first_name, customers.last_name, events.ip_address +2 more: readable verbatim by any pipeline built on this data. See the full inventory below.

Classification

0 of 31 columns with policy tags

Compliant · needs 50%

Column Masking

0 of 6 PII columns masked

Compliant · needs 50%

Access Optimization

1 of 5 tables partitioned/clustered

20%

Consumable · needs 50%

Semantic Documentation

15 of 31 columns documented

48%

Contextual · needs 50%

AI Readiness Score

49/ 100Data Aware

Factors Measured

scan profile · 4 of 4 factors

Requirements Passed

4/8

4 requirements need attention

Source

acme-analytics.warehouse

bigquery · scan profile

Assessed May 20, 2026

Factor Breakdown

Data Governance · PII Exposure

6 PII-candidate columns detected · 0 masked · 6 unprotected

Column	Table	Type	Status
email	customers	Email	Unprotected
first_name	customers	Name	Unprotected
last_name	customers	Name	Unprotected
ip_address	events	IP address	Unprotected
shipping_address	orders	Address	Unprotected
credit_card	stg_payments	Credit card	Unprotected

Detected by column-name pattern from metadata, so review for false positives. Masking status reflects policy tags / masking policies on the column.

AI Recommendations

At an overall score of 49, this estate is live and well-structured: all 5 tables are fresh (updated within 7 days), 30/31 columns are explicitly typed, and 4/5 tables carry entity identifiers. But it fails on compliance entirely (0/31 columns tagged, 0/6 PII columns masked), which is the single gate blocking AI investment. The unmasked credit card, email, and address fields in active tables create GDPR exposure today and a documented gap against EU AI Act Article 10 obligations that become enforceable in 2027, enough to halt a serious production AI review. Until masking and classification are in place, the estate cannot be safely handed to an AI pipeline at scale.

Strengths

Perfect data currency: all 5 tables modified within the last 30 days and all 5 updated within the last 7 days. AI models trained or scored on this estate will not suffer from stale-data drift.
Near-complete schema typing: 30 of 31 columns are explicitly typed, giving AI feature pipelines reliable type inference with minimal pre-processing.
Strong entity identification: 4 of 5 tables declare identifiers, enabling join-based feature construction and entity-resolution across the estate without custom key logic.

Critical Gaps

Zero data classification: 0 of 31 columns carry policy tags, so no AI pipeline can enforce access controls or data-lineage governance, meaning any model trained on this estate inherits undifferentiated access to PII and payment data.
Six unmasked PII columns across 4 tables (email, first_name, last_name, ip_address, shipping_address, credit_card). Credit card data in stg_payments exposed in plaintext creates immediate regulatory and reputational risk; any AI workload ingesting this estate becomes a PCI/GDPR liability.
16 of 31 columns undocumented (semantic_documentation score 48%, just below the 50% threshold). AI feature pipelines operating on unnamed or ambiguous columns will silently produce mis-labeled features, degrading model quality without a visible error signal.

Prioritized Actions

Mask all 6 PII columns immediately: credit_card (stg_payments), email/first_name/last_name (customers), ip_address (events), shipping_address (orders). Use BigQuery dynamic data masking before any AI pipeline or analyst touches this estate.

0 of 6 PII columns are currently masked. Any AI training job, feature store export, or portfolio-triage dashboard that reads these tables exposes credit card and personal identity data in plaintext, triggering PCI-DSS scope and GDPR Article 25 violations. This single gap can block the entire AI investment program pending a regulatory audit.

compliantColumn Masking ~+10 readiness pts

Copy the fix: Mask the unprotected PII columns

-- BigQuery masks via Data Catalog policy tags. Create a taxonomy + a PII policy tag
-- with a data-masking rule, then attach the tag to each unmasked column:
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN email SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN first_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN last_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.events`
  ALTER COLUMN ip_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.orders`
  ALTER COLUMN shipping_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.stg_payments`
  ALTER COLUMN credit_card SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);

Create the taxonomy/policy tag + masking rule in BigQuery → Policy tags, then paste its resource name above.

Apply BigQuery policy tags to all 31 columns (currently 0/31 tagged), starting with the 6 confirmed PII columns, to enable column-level access control and data-lineage tracking across AI pipelines.

Without classification, every AI consumer (model training, feature pipelines, BI dashboards) has undifferentiated access to sensitive fields. The compliant factor scores 0; closing this gap is prerequisite to demonstrating EU AI Act Article 10 data-management practices and to any enterprise data-sharing agreement that depends on this estate.

compliantClassification ~+10 readiness pts

Copy the fix: Classify sensitive tables with policy tags / labels

-- Attach a sensitivity label to each table (repeat per table):
ALTER TABLE `acme-analytics.warehouse.<table>` SET OPTIONS (labels = [('data_classification', 'pii')]);

For enforced access control, use Data Catalog policy tags on the sensitive columns.

Document the 16 undocumented columns (currently 15/31 documented, 48% vs. 50% threshold). Prioritize columns in the customers, orders, and events tables that will feed AI feature pipelines first.

Sixteen ambiguous columns in a portfolio-triage AI model will produce mis-labeled features with no visible error: the model may appear to run while producing meaningless signals. This is also a direct gap against EU AI Act Article 10's requirement for documented data-management practices for high-risk AI.

contextualSemantic Documentation ~+8 readiness pts

Copy the fix: Document the undocumented columns

-- Add a description to each undocumented column (repeat per column):
ALTER TABLE `acme-analytics.warehouse.<table>`
  ALTER COLUMN <column> SET OPTIONS (description = 'what this column holds and its unit/meaning');

Partition or cluster the 4 unoptimized tables (only 1/5 currently partitioned/clustered) to control query costs before AI workloads scale scan volume.

At 1/5 tables optimized, full-table scans are the default access pattern. As AI feature pipelines and training jobs run repeated large queries, BigQuery on-demand costs will scale linearly with data volume, a predictable budget overrun that will surface in the first sprint of any serious AI build-out on this estate.

consumableAccess Optimization ~+6 readiness pts

Declare an identifier for the 1 remaining unidentified table (4/5 tables have identifiers) to complete entity-resolution coverage across the estate.

The missing identifier on one table prevents reliable entity joins in AI feature construction. For a portfolio-triage workload, this means one table's signals cannot be linked to customer or order entities, creating a blind spot in any cross-entity investment signal the model tries to generate.

contextualEntity Identifier Declaration ~+2 readiness pts

EU AI Act Article 10

Gaps

Readiness to produce data-governance evidence, not a certification of compliance

Evidence in place

Data currency is fully evidenced: all 5 tables updated within 7 days, supporting the Article 10 requirement that training and validation data be current and relevant to the intended purpose.
Schema type coverage of 30/31 columns and entity identifiers on 4/5 tables provide a structural basis for demonstrating data completeness and representativeness obligations.
Active modification signals on all 5 tables within 30 days support auditability of data-management lifecycle practices.

Gaps

Zero policy tags on all 31 columns means there is no documented data classification or provenance trail. Article 10(2)(f) requires documented data-governance practices; this estate cannot currently produce that evidence.
Six PII columns across 4 tables are unmasked and untagged. Article 10(3) requires examination for biases and data gaps with appropriate safeguards; plaintext credit card and identity data in a training dataset would constitute a data-protection breach, not merely a readiness gap.
Sixteen of 31 columns are undocumented. Article 10(2)(b) requires that data collection, labelling, and examination processes be documented; half the column-level semantics are invisible, making it impossible to evidence that training data was 'relevant and representative' as required.

See this for your own data

Connect read-only and get your score, six-factor breakdown, and prioritized fixes. We only query INFORMATION_SCHEMA metadata, never your data.

Start free

A consultant's readiness assessment runs ~$50–100K and six weeks. This took minutes, and it's free to run on your own warehouse.

Column

Table

Type

Status

customers

Unprotected

first_name

customers

Name

Unprotected

last_name

customers

Name

Unprotected

ip_address

events

IP address

Unprotected

shipping_address

orders

Address

Unprotected

credit_card

stg_payments

Credit card

Unprotected

-- BigQuery masks via Data Catalog policy tags. Create a taxonomy + a PII policy tag -- with a data-masking rule, then attach the tag to each unmasked column: ALTER TABLE `acme-analytics.warehouse.customers` ALTER COLUMN email SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']); ALTER TABLE `acme-analytics.warehouse.customers` ALTER COLUMN first_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']); ALTER TABLE `acme-analytics.warehouse.customers` ALTER COLUMN last_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']); ALTER TABLE `acme-analytics.warehouse.events` ALTER COLUMN ip_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']); ALTER TABLE `acme-analytics.warehouse.orders` ALTER COLUMN shipping_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']); ALTER TABLE `acme-analytics.warehouse.stg_payments` ALTER COLUMN credit_card SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);

-- Add a description to each undocumented column (repeat per column): ALTER TABLE `acme-analytics.warehouse.<table>` ALTER COLUMN <column> SET OPTIONS (description = 'what this column holds and its unit/meaning');