MortarIQ
Sign inAssess your data

Sample report

Example data for a fictional estate. Connect your warehouse (BigQuery, Snowflake, Databricks, PostgreSQL, Redshift, or Fabric) to see yours. Read-only, ~2 minutes.

Assess your own data

Sample AI Data Readiness Report

Acme Analytics (sample)

BigQueryacme-analytics.warehouse·scan profile·Assessed May 20, 2026, 12:00 AM

Readiness Verdict · scan profile

Pilot only

The estate is active and typed but unmasked PII across 4 tables and zero policy tags on all 31 columns make a production AI investment decision premature. A scoped internal triage is safe, external or model-training use is not.

Safe to ship now: An internal-only portfolio triage pilot using schema structure, type coverage (30/31 columns typed), and freshness signals (5/5 tables updated in last 7 days). No customer or payment data surfaces, no external data sharing, no model training on raw PII.

Gate to production

  • Apply BigQuery policy tags to all 31 columns (currently 0/31) to enable access governance before any AI pipeline ingests this estate.
  • Mask all 6 PII columns: email, first_name, last_name (customers), ip_address (events), shipping_address (orders), credit_card (stg_payments). Currently 0/6 masked; credit_card exposure is highest-severity.
  • Document the remaining 16 of 31 undocumented columns so AI feature pipelines can interpret column semantics without manual intervention.
  • Partition or cluster the 4 unoptimized tables (only 1/5 currently optimized) to prevent runaway scan costs as AI workloads scale query volume.

What we found

6 unprotected PII-candidate columns exposed to AI

customers.email, customers.first_name, customers.last_name, events.ip_address +2 more: readable verbatim by any pipeline built on this data. See the full inventory below.

Classification

0 of 31 columns with policy tags

0%

Compliant · needs 50%

Column Masking

0 of 6 PII columns masked

0%

Compliant · needs 50%

Access Optimization

1 of 5 tables partitioned/clustered

20%

Consumable · needs 50%

Semantic Documentation

15 of 31 columns documented

48%

Contextual · needs 50%

AI Readiness Score

49/ 100Data Aware

Factors Measured

scan profile · 4 of 4 factors

Contextual75Consumable20Current100Compliant0

Requirements Passed

4/8

4 requirements need attention

Source

acme-analytics.warehouse

bigquery · scan profile

Assessed May 20, 2026

Factor Breakdown

Data Governance · PII Exposure

6 PII-candidate columns detected · 0 masked · 6 unprotected

ColumnTableTypeStatus
emailcustomersEmailUnprotected
first_namecustomersNameUnprotected
last_namecustomersNameUnprotected
ip_addresseventsIP addressUnprotected
shipping_addressordersAddressUnprotected
credit_cardstg_paymentsCredit cardUnprotected

Detected by column-name pattern from metadata, so review for false positives. Masking status reflects policy tags / masking policies on the column.

AI Recommendations

At an overall score of 49, this estate is live and well-structured: all 5 tables are fresh (updated within 7 days), 30/31 columns are explicitly typed, and 4/5 tables carry entity identifiers. But it fails on compliance entirely (0/31 columns tagged, 0/6 PII columns masked), which is the single gate blocking AI investment. The unmasked credit card, email, and address fields in active tables create regulatory exposure under GDPR and EU AI Act Article 10 that would halt any production AI initiative before it launched. Until masking and classification are in place, the estate cannot be safely handed to an AI pipeline at scale.

Strengths

  • Perfect data currency: all 5 tables modified within the last 30 days and all 5 updated within the last 7 days. AI models trained or scored on this estate will not suffer from stale-data drift.
  • Near-complete schema typing: 30 of 31 columns are explicitly typed, giving AI feature pipelines reliable type inference with minimal pre-processing.
  • Strong entity identification: 4 of 5 tables declare identifiers, enabling join-based feature construction and entity-resolution across the estate without custom key logic.

Critical Gaps

  • Zero data classification: 0 of 31 columns carry policy tags, so no AI pipeline can enforce access controls or data-lineage governance, meaning any model trained on this estate inherits undifferentiated access to PII and payment data.
  • Six unmasked PII columns across 4 tables (email, first_name, last_name, ip_address, shipping_address, credit_card). Credit card data in stg_payments exposed in plaintext creates immediate regulatory and reputational risk; any AI workload ingesting this estate becomes a PCI/GDPR liability.
  • 16 of 31 columns undocumented (semantic_documentation score 48%, just below the 50% threshold). AI feature pipelines operating on unnamed or ambiguous columns will silently produce mis-labeled features, degrading model quality without a visible error signal.

Prioritized Actions

1

Mask all 6 PII columns immediately: credit_card (stg_payments), email/first_name/last_name (customers), ip_address (events), shipping_address (orders). Use BigQuery dynamic data masking before any AI pipeline or analyst touches this estate.

0 of 6 PII columns are currently masked. Any AI training job, feature store export, or portfolio-triage dashboard that reads these tables exposes credit card and personal identity data in plaintext, triggering PCI-DSS scope and GDPR Article 25 violations. This single gap can block the entire AI investment program pending a regulatory audit.

compliantColumn Masking ~+10 readiness pts
Copy the fix: Mask the unprotected PII columns
-- BigQuery masks via Data Catalog policy tags. Create a taxonomy + a PII policy tag
-- with a data-masking rule, then attach the tag to each unmasked column:
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN email SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN first_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.customers`
  ALTER COLUMN last_name SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.events`
  ALTER COLUMN ip_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.orders`
  ALTER COLUMN shipping_address SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);
ALTER TABLE `acme-analytics.warehouse.stg_payments`
  ALTER COLUMN credit_card SET OPTIONS (policy_tags = ['<your-pii-policy-tag-resource-name>']);

Create the taxonomy/policy tag + masking rule in BigQuery → Policy tags, then paste its resource name above.

2

Apply BigQuery policy tags to all 31 columns (currently 0/31 tagged), starting with the 6 confirmed PII columns, to enable column-level access control and data-lineage tracking across AI pipelines.

Without classification, every AI consumer (model training, feature pipelines, BI dashboards) has undifferentiated access to sensitive fields. The compliant factor scores 0; closing this gap is prerequisite to demonstrating EU AI Act Article 10 data-management practices and to any enterprise data-sharing agreement that depends on this estate.

compliantClassification ~+10 readiness pts
Copy the fix: Classify sensitive tables with policy tags / labels
-- Attach a sensitivity label to each table (repeat per table):
ALTER TABLE `acme-analytics.warehouse.<table>` SET OPTIONS (labels = [('data_classification', 'pii')]);

For enforced access control, use Data Catalog policy tags on the sensitive columns.

3

Document the 16 undocumented columns (currently 15/31 documented, 48% vs. 50% threshold). Prioritize columns in the customers, orders, and events tables that will feed AI feature pipelines first.

Sixteen ambiguous columns in a portfolio-triage AI model will produce mis-labeled features with no visible error: the model may appear to run while producing meaningless signals. This is also a direct gap against EU AI Act Article 10's requirement for documented data-management practices for high-risk AI.

contextualSemantic Documentation ~+8 readiness pts
Copy the fix: Document the undocumented columns
-- Add a description to each undocumented column (repeat per column):
ALTER TABLE `acme-analytics.warehouse.<table>`
  ALTER COLUMN <column> SET OPTIONS (description = 'what this column holds and its unit/meaning');
4

Partition or cluster the 4 unoptimized tables (only 1/5 currently partitioned/clustered) to control query costs before AI workloads scale scan volume.

At 1/5 tables optimized, full-table scans are the default access pattern. As AI feature pipelines and training jobs run repeated large queries, BigQuery on-demand costs will scale linearly with data volume, a predictable budget overrun that will surface in the first sprint of any serious AI build-out on this estate.

consumableAccess Optimization ~+6 readiness pts
5

Declare an identifier for the 1 remaining unidentified table (4/5 tables have identifiers) to complete entity-resolution coverage across the estate.

The missing identifier on one table prevents reliable entity joins in AI feature construction. For a portfolio-triage workload, this means one table's signals cannot be linked to customer or order entities, creating a blind spot in any cross-entity investment signal the model tries to generate.

contextualEntity Identifier Declaration ~+2 readiness pts

EU AI Act Article 10

Gaps

Readiness to produce data-governance evidence, not a certification of compliance

Evidence in place

  • Data currency is fully evidenced: all 5 tables updated within 7 days, supporting the Article 10 requirement that training and validation data be current and relevant to the intended purpose.
  • Schema type coverage of 30/31 columns and entity identifiers on 4/5 tables provide a structural basis for demonstrating data completeness and representativeness obligations.
  • Active modification signals on all 5 tables within 30 days support auditability of data-management lifecycle practices.

Gaps

  • Zero policy tags on all 31 columns means there is no documented data classification or provenance trail. Article 10(2)(f) requires documented data-governance practices; this estate cannot currently produce that evidence.
  • Six PII columns across 4 tables are unmasked and untagged. Article 10(3) requires examination for biases and data gaps with appropriate safeguards; plaintext credit card and identity data in a training dataset would constitute a data-protection breach, not merely a readiness gap.
  • Sixteen of 31 columns are undocumented. Article 10(2)(b) requires that data collection, labelling, and examination processes be documented; half the column-level semantics are invisible, making it impossible to evidence that training data was 'relevant and representative' as required.

See this for your own data

Connect read-only and get your score, six-factor breakdown, and prioritized fixes. We only query INFORMATION_SCHEMA metadata, never your data.

Start free

A consultant's readiness assessment runs ~$50–100K and six weeks. This took minutes, and it's free to run on your own warehouse.

© MortarIQ
AboutBlogDocsFAQSecurityPrivacyTermsDPA

All product names, logos, and brands are property of their respective owners and are used for identification purposes only.