All examples for Smart File Merger

Sales and Customer Data Merge

Merges quarterly sales data with customer master file using LEFT JOIN on customer_id. Produces revenue breakdowns by segment and region with data quality findings for unmatched records.

Input

What you provide to the skill

Merge sales_Q3.csv with customers.xlsx on customer_id to analyze revenue by customer segment and region.

Output

What the skill generates for you


MERGE REPORT - Sales Q3 Customer Segment Analysis

Generated: 2025-12-04 14:37:22
Analysis Goal: Analyze revenue by customer segment and region


INPUT FILES

File 1: sales_Q3.csv

  • Rows: 8,742
  • Columns: 6
  • Structure: transaction_id, customer_id, product_sku, sale_amount, sale_date, sales_rep_id
  • Key column: customer_id

File 2: customers.xlsx

  • Rows: 2,156
  • Columns: 7
  • Structure: customer_id, customer_name, email, phone, segment, region, join_date
  • Key column: customer_id

MERGE OPERATIONS

Operation 1: sales_Q3 + customers

  • Join key: customer_id
  • Join type: LEFT (retain all sales transactions, add customer attributes where available)
  • Matched records: 8,695 (99.5%)
  • Unmatched (sales only): 47 (0.5%)

Match breakdown:

  • Sales transactions with complete customer data: 8,695
  • Sales transactions missing customer details: 47 (customer_id not found in customers.xlsx)

FINAL DATASET

Output file: merged_sales_customer_Q3.csv

Dimensions:

  • Total rows: 8,742 (all Q3 sales retained)
  • Total columns: 12

Column structure:

transaction_id, customer_id, product_sku, sale_amount, sale_date, sales_rep_id,
customer_name, email, phone, segment, region, join_date

Revenue summary by segment:

  • Enterprise: $2,847,320 (42% of total) - 1,834 transactions
  • Mid-Market: $1,956,180 (29% of total) - 3,421 transactions
  • Small Business: $1,432,650 (21% of total) - 2,987 transactions
  • Unknown/Missing: $542,390 (8% of total) - 47 transactions

Revenue summary by region:

  • West: $2,234,670 (33%)
  • Northeast: $1,987,540 (29%)
  • South: $1,642,330 (24%)
  • Midwest: $921,450 (14%)
  • Unknown/Missing: $542,390 (from unmatched records)

DATA QUALITY FINDINGS

Critical Issues

  1. 47 orphaned sales transactions (0.5%)
    • customer_id values present in sales but not in customer master file
    • Total revenue impact: $542,390
    • Recommendation: Verify if these are test accounts, deleted customers, or data entry errors

Moderate Issues

  1. Email addresses missing: 127 customer records (5.9% of customer base)

    • Affects: 1,456 sales transactions
    • Impact: Reduced ability to perform email marketing analysis
  2. Phone numbers missing: 89 customer records (4.1% of customer base)

    • Affects: 982 sales transactions
    • Impact: Limited contact options for customer outreach

Data Observations

  1. Customer segments distribution:

    • Enterprise: 423 customers (19.6%) - Highest average transaction value ($1,553)
    • Mid-Market: 782 customers (36.3%) - Medium transaction value ($572)
    • Small Business: 951 customers (44.1%) - Lower transaction value ($480)
  2. Regional distribution:

    • West region shows highest average transaction value ($623/transaction)
    • Midwest shows lowest transaction volume but second-highest average value
  3. Duplicate detection:

    • No duplicate transaction_ids found
    • No duplicate customer_id entries in customer file
    • Clean dataset for analysis

RECOMMENDATIONS

Immediate Actions

  1. Investigate unmatched customer_ids: Review the 47 sales transactions with missing customer data

    • Check if customer_ids were entered incorrectly (typos, extra spaces)
    • Verify if customers were deleted after transactions occurred
    • Consider creating “Unknown Customer” placeholder records
  2. Data enrichment opportunities:

    • Update missing email addresses (127 customers) for marketing campaigns
    • Fill missing phone numbers (89 customers) for customer service outreach

Analysis Ready

  1. Dataset is ready for:

    • Revenue analysis by customer segment (Enterprise/Mid-Market/Small Business)
    • Geographic performance analysis across four regions
    • Customer lifetime value calculations (requires date-based aggregation)
    • Sales rep performance by segment and region
    • Product performance within customer segments
  2. Next steps for deeper analysis:

    • Join with product master file on product_sku for category-level analysis
    • Add sales rep details on sales_rep_id for commission calculations
    • Calculate customer retention metrics by comparing join_date to sale_date

TECHNICAL DETAILS

Merge configuration:

{
"files": [
{"path": "sales_Q3.csv", "name": "sales", "rows": 8742},
{"path": "customers.xlsx", "name": "customers", "rows": 2156}
],
"merges": [
{
"left": "sales",
"right": "customers",
"on": "customer_id",
"how": "left"
}
],
"output": "merged_sales_customer_Q3.csv"
}

Processing summary:

  • Load time: 1.2 seconds
  • Merge execution: 0.4 seconds
  • Data quality scan: 0.8 seconds
  • Total processing time: 2.4 seconds

Output delivered: merged_sales_customer_Q3.csv (8,742 rows × 12 columns)