Something went wrong!
Hang in there while we get back on track
Sales and Customer Data Merge
Merges quarterly sales data with customer master file using LEFT JOIN on customer_id. Produces revenue breakdowns by segment and region with data quality findings for unmatched records.
Input
What you provide to the skill
Merge sales_Q3.csv with customers.xlsx on customer_id to analyze revenue by customer segment and region.
Output
What the skill generates for you
MERGE REPORT - Sales Q3 Customer Segment Analysis
Generated: 2025-12-04 14:37:22
Analysis Goal: Analyze revenue by customer segment and region
INPUT FILES
File 1: sales_Q3.csv
- Rows: 8,742
- Columns: 6
- Structure: transaction_id, customer_id, product_sku, sale_amount, sale_date, sales_rep_id
- Key column: customer_id
File 2: customers.xlsx
- Rows: 2,156
- Columns: 7
- Structure: customer_id, customer_name, email, phone, segment, region, join_date
- Key column: customer_id
MERGE OPERATIONS
Operation 1: sales_Q3 + customers
- Join key: customer_id
- Join type: LEFT (retain all sales transactions, add customer attributes where available)
- Matched records: 8,695 (99.5%)
- Unmatched (sales only): 47 (0.5%)
Match breakdown:
- Sales transactions with complete customer data: 8,695
- Sales transactions missing customer details: 47 (customer_id not found in customers.xlsx)
FINAL DATASET
Output file: merged_sales_customer_Q3.csv
Dimensions:
- Total rows: 8,742 (all Q3 sales retained)
- Total columns: 12
Column structure:
transaction_id, customer_id, product_sku, sale_amount, sale_date, sales_rep_id,
customer_name, email, phone, segment, region, join_date
Revenue summary by segment:
- Enterprise: $2,847,320 (42% of total) - 1,834 transactions
- Mid-Market: $1,956,180 (29% of total) - 3,421 transactions
- Small Business: $1,432,650 (21% of total) - 2,987 transactions
- Unknown/Missing: $542,390 (8% of total) - 47 transactions
Revenue summary by region:
- West: $2,234,670 (33%)
- Northeast: $1,987,540 (29%)
- South: $1,642,330 (24%)
- Midwest: $921,450 (14%)
- Unknown/Missing: $542,390 (from unmatched records)
DATA QUALITY FINDINGS
Critical Issues
- 47 orphaned sales transactions (0.5%)
- customer_id values present in sales but not in customer master file
- Total revenue impact: $542,390
- Recommendation: Verify if these are test accounts, deleted customers, or data entry errors
Moderate Issues
-
Email addresses missing: 127 customer records (5.9% of customer base)
- Affects: 1,456 sales transactions
- Impact: Reduced ability to perform email marketing analysis
-
Phone numbers missing: 89 customer records (4.1% of customer base)
- Affects: 982 sales transactions
- Impact: Limited contact options for customer outreach
Data Observations
-
Customer segments distribution:
- Enterprise: 423 customers (19.6%) - Highest average transaction value ($1,553)
- Mid-Market: 782 customers (36.3%) - Medium transaction value ($572)
- Small Business: 951 customers (44.1%) - Lower transaction value ($480)
-
Regional distribution:
- West region shows highest average transaction value ($623/transaction)
- Midwest shows lowest transaction volume but second-highest average value
-
Duplicate detection:
- No duplicate transaction_ids found
- No duplicate customer_id entries in customer file
- Clean dataset for analysis
RECOMMENDATIONS
Immediate Actions
-
Investigate unmatched customer_ids: Review the 47 sales transactions with missing customer data
- Check if customer_ids were entered incorrectly (typos, extra spaces)
- Verify if customers were deleted after transactions occurred
- Consider creating “Unknown Customer” placeholder records
-
Data enrichment opportunities:
- Update missing email addresses (127 customers) for marketing campaigns
- Fill missing phone numbers (89 customers) for customer service outreach
Analysis Ready
-
Dataset is ready for:
- Revenue analysis by customer segment (Enterprise/Mid-Market/Small Business)
- Geographic performance analysis across four regions
- Customer lifetime value calculations (requires date-based aggregation)
- Sales rep performance by segment and region
- Product performance within customer segments
-
Next steps for deeper analysis:
- Join with product master file on product_sku for category-level analysis
- Add sales rep details on sales_rep_id for commission calculations
- Calculate customer retention metrics by comparing join_date to sale_date
TECHNICAL DETAILS
Merge configuration:
{
"files": [
{"path": "sales_Q3.csv", "name": "sales", "rows": 8742},
{"path": "customers.xlsx", "name": "customers", "rows": 2156}
],
"merges": [
{
"left": "sales",
"right": "customers",
"on": "customer_id",
"how": "left"
}
],
"output": "merged_sales_customer_Q3.csv"
}
Processing summary:
- Load time: 1.2 seconds
- Merge execution: 0.4 seconds
- Data quality scan: 0.8 seconds
- Total processing time: 2.4 seconds
Output delivered: merged_sales_customer_Q3.csv (8,742 rows × 12 columns)
About This Skill
Merge CSV and Excel files intelligently with schema analysis, optimal join logic, and comprehensive quality reporting - no VLOOKUP or programming needed.
View Skill DetailsMore Examples
12-Month Payment File Consolidation
Vertically stacks 12 monthly payment files with identical schemas for annual reporting. Validates schema consistency, checks for duplicates, and provides monthly statistics with payment method and status breakdowns.
Multi-Source Inventory Reconciliation
Reconciles warehouse inventory, supplier stock, and pending orders using FULL OUTER and LEFT joins. Identifies critical discrepancies including oversold SKUs, stockouts with demand, and missing items across systems.