All examples for Root Cause Analysis Framework Generator

Session Length Decline Investigation

RCA framework for 30% session length decrease with minimal context provided. Demonstrates handling ambiguity, emphasizing data quality verification first, and distinguishing between session count vs. length issues.

Input

What you provide to the skill

Session length decreased by 30% over two weeks. What data should I investigate first?

Output

What the skill generates for you

=== ROOT CAUSE ANALYSIS FRAMEWORK ===
STEP 1: CLARIFYING QUESTIONS (Ask These First)
------------------------------------------------
1. SCOPE: Is the 30% decrease consistent across all user segments, or concentrated in specific cohorts (new vs. returning, power users vs. casual)?
- Why ask: Segment-specific drops point to onboarding issues, feature adoption problems, or cohort quality changes
2. TIMEFRAME: Did the drop happen suddenly (over 1-2 days) or gradually decline over the two weeks?
- Why ask: Sudden = likely deployment or external event; Gradual = product degradation or competitive shift
3. PLATFORM: Is this affecting all platforms equally (iOS, Android, web), or specific to one?
- Why ask: Platform-specific suggests technical issue, app version problem, or platform-specific UX change
4. GEOGRAPHY: Is this a global phenomenon or concentrated in specific regions/countries?
- Why ask: Regional drops suggest local competition, regulatory changes, or cultural factors
5. METRIC DEFINITION: Has the way session length is calculated changed? Any instrumentation updates or backend changes?
- Why ask: Data quality issues are common and easy to rule out—always verify measurement integrity first
6. RECENT CHANGES: What product changes, feature releases, or experiments went live in the past two weeks?
- Why ask: Temporal correlation helps identify causal candidates quickly
7. SESSION COUNT: Are users having the same number of sessions but each is shorter, or are they also having fewer sessions?
- Why ask: Different symptoms point to different root causes (engagement vs. friction vs. technical)
STEP 2: MECE HYPOTHESIS FRAMEWORK
----------------------------------
CATEGORY A: INTERNAL - PRODUCT CHANGES (High Priority - Check First)
1. Recent feature launch degraded core engagement loop
Impact: High | Likelihood: High | Data: Feature adoption rates, session length by users exposed vs. not exposed to new feature
2. Content feed/recommendation algorithm change reduced relevance or quality
Impact: High | Likelihood: High | Data: Content engagement metrics (clicks, likes, time on content), algorithm version correlation
3. Navigation or information architecture change increased friction
Impact: High | Likelihood: Medium | Data: Navigation flow analysis, clicks to reach key features, user journey tracking
4. Notification frequency or timing changes reduced re-engagement within sessions
Impact: Medium | Likelihood: Medium | Data: In-session notification triggers, notification-driven actions per session
5. A/B test or experiment running that inadvertently affects larger population or control group
Impact: High | Likelihood: Medium | Data: Experiment exposure logs, session length by experiment variant
CATEGORY B: INTERNAL - TECHNICAL/INFRASTRUCTURE (High Priority)
6. Performance degradation (slower load times, lag, increased latency)
Impact: High | Likelihood: High | Data: P95/P99 page load times, API response times, time-to-interactive metrics
7. Increased error rates or API failures causing feature unavailability
Impact: High | Likelihood: Medium | Data: Error rates by endpoint, failed requests per session, JavaScript console errors
8. Mobile app crashes or freezes causing premature session termination
Impact: High | Likelihood: Medium | Data: Crash rates, crash-free session rate, version-specific crash analysis
9. Memory leaks or resource issues causing app slowdown over session duration
Impact: Medium | Likelihood: Low | Data: Memory usage profiles, CPU utilization, session abandonment patterns
10. CDN or media loading issues (slow images/videos)
Impact: Medium | Likelihood: Medium | Data: CDN response times, media load success rates, buffering events
CATEGORY C: INTERNAL - CONTENT & SUPPLY QUALITY
11. Decline in content quality or relevance (user-generated or curated)
Impact: High | Likelihood: Medium | Data: Content creation rates, content engagement rates, content moderation removals
12. Reduced content supply or inventory availability
Impact: Medium | Likelihood: Medium | Data: Available content per user, content impression rates, "empty state" views
13. Increased spam, low-quality content, or safety issues
Impact: Medium | Likelihood: Low | Data: Content reports, spam flags, content moderation actions
CATEGORY D: EXTERNAL - COMPETITION & MARKET (Medium Priority)
14. Major competitor launched compelling feature or product
Impact: High | Likelihood: Medium | Data: Competitor app store rankings, social media sentiment, time spent benchmarks
15. Competitor running aggressive user acquisition or engagement campaign
Impact: Medium | Likelihood: Medium | Data: Competitive intelligence, market share trends, app download rankings
16. Industry-wide shift in user behavior or attention
Impact: Medium | Likelihood: Low | Data: Industry benchmarks, competitor data if available
CATEGORY E: EXTERNAL - SEASONALITY & EVENTS (Medium Priority)
17. Seasonal effect (end of holiday season, back to school/work, weather change)
Impact: Medium | Likelihood: High | Data: Historical year-over-year patterns for same two-week period
18. Major cultural, sporting, or news events capturing attention
Impact: Low | Likelihood: Medium | Data: News calendar, historical event correlation analysis
19. Platform-specific external factors (iOS update, Android version changes)
Impact: Medium | Likelihood: Medium | Data: OS version segmentation, correlation with OS update rollout
CATEGORY F: EXTERNAL - REGULATORY & POLICY
20. Privacy policy changes reducing data collection or personalization
Impact: Medium | Likelihood: Low | Data: Personalization effectiveness, recommendation diversity
21. App store policy changes affecting notifications or features
Impact: Low | Likelihood: Low | Data: Feature availability, notification permission rates
CATEGORY G: DATA QUALITY & MEASUREMENT (Check Early!)
22. Session timeout definition changed (e.g., timeout threshold shortened)
Impact: CRITICAL | Likelihood: Medium | Data: Session definition documentation, backend config changes
23. Instrumentation bug undercounting session duration
Impact: CRITICAL | Likelihood: Medium | Data: Related metrics sanity checks (page views per session, events per session)
24. Data pipeline processing change affecting aggregation
Impact: High | Likelihood: Low | Data: Data freshness, raw vs. aggregated data comparison
25. Client-side tracking implementation change
Impact: High | Likelihood: Low | Data: Event firing rates, tracking SDK version correlation
STEP 3: PRIORITIZATION MATRIX
------------------------------
CHECK FIRST (High Impact × High Likelihood):
→ Data quality/measurement issues (#22, #23) - ALWAYS rule these out first—quick to check and surprisingly common
→ Performance degradation (#6) - Slow experiences directly reduce session length
→ Recent feature launch impact (#1) - Most common product-related cause
→ Algorithm/content feed changes (#2) - Directly affects engagement
→ Seasonal effects (#17) - Year-over-year comparison is quick to verify
CHECK SECOND (High Impact × Medium Likelihood OR Fast to Verify):
→ Technical issues: errors, crashes (#7, #8)
→ Navigation/UX friction (#3)
→ Content quality decline (#11)
→ Major competitor moves (#14)
→ CDN/media loading issues (#10)
CHECK THIRD (Lower Impact or Lower Likelihood):
→ Experiment exposure issues (#5)
→ Notification changes (#4)
→ Content supply issues (#12)
→ Platform external factors (#19)
→ All other hypotheses based on context clues
STEP 4: STRATEGIC DATA REQUESTS
--------------------------------
1. RULE OUT DATA QUALITY FIRST (fast sanity checks):
"Before investigating behavior changes, let me verify data integrity:
- Has the session timeout definition or backend config changed in the past two weeks?
- Compare related metrics: Are page views per session, events per session, or actions per session also down 30%?
- Check raw event logs vs. aggregated metrics for discrepancies
- Timeline: Show me exactly when the drop started (daily granularity)"
2. SEGMENTATION (narrow scope quickly):
"Show me session length trend segmented by:
- Platform (iOS vs. Android vs. web)
- User cohort (new users in past 2 weeks vs. returning users)
- Geography (top 5 countries)
- User activity level (power users vs. casual)
- Device type (high-end vs. low-end devices)"
3. TECHNICAL HEALTH (rule out infrastructure):
"Provide for the past 3 weeks:
- App load times (P50, P95, P99) by platform
- API response times for critical endpoints
- Error rates and types
- Crash-free session rate
- Session abandonment rate (sessions ending abruptly vs. gracefully)"
4. PRODUCT CHANGES CORRELATION (identify causal candidates):
"What changed in the past two weeks:
- Any feature releases or product launches? (with dates and rollout %)
- Any A/B tests started, modified, or ended?
- Any algorithm or recommendation system changes?
- Any changes to navigation, information architecture, or key user flows?
- Any notification or in-app messaging changes?"
5. USER BEHAVIOR PATTERNS (understand symptoms):
"For the past 3 weeks, show me:
- Average sessions per user (is frequency also down?)
- Key engagement metrics per session: page views, actions taken, content interactions
- Session ending patterns: What are users' last actions before ending sessions?
- Time to first meaningful action within session
- Bounce rate (single-action sessions)"
6. CONTENT & SUPPLY METRICS (if applicable):
"If content-driven product, provide:
- Content creation rate and volume
- Content engagement rate (% of viewed content that gets interaction)
- 'Empty state' or 'no content available' occurrence rate
- Content load success rates"
7. SEASONAL & EXTERNAL CONTEXT:
"Provide context:
- Historical data: What was session length during the same two-week period last year?
- Competitive context: Any major competitor launches or campaigns?
- Platform updates: Did iOS or Android release updates in this timeframe?"
STEP 5: INTERVIEW ANSWER STRUCTURE (Use This Flow)
---------------------------------------------------
1. CLARIFY (2-3 minutes):
"Before diving into potential root causes, let me ask a few clarifying questions to narrow the scope..."
[Ask top 5-7 questions from Step 1]
Example: "First, I want to understand if this is affecting all users equally or specific segments. Is the 30% drop consistent across new versus returning users, or all platforms? Also, did this happen suddenly or gradually? And crucially, has anything changed in how we calculate session length?"
2. FRAMEWORK (1 minute):
"I'll use a MECE framework to systematically explore potential root causes across Internal Product & Technical factors, External Market & Seasonality factors, and Data Quality. This ensures we cover all possibilities without overlap."
3. HYPOTHESES (5-7 minutes):
"Let me walk through the most likely causes organized by category..."
[Present top 10-12 hypotheses organized by category, focusing on highest priority]
Example structure:
"Starting with data quality—because this is critical to rule out first—we should verify whether the session timeout definition changed or if there's an instrumentation bug.
Next, internal product changes: If a feature launched or an algorithm changed in the past two weeks, this could directly impact engagement. I'd look at...
On the technical side, performance degradation is a major driver of shorter sessions. Slow load times or increased errors would manifest exactly as we're seeing...
Externally, we should check for seasonal effects by comparing year-over-year, and investigate whether a competitor launched something compelling..."
4. PRIORITIZATION (2-3 minutes):
"Given the 30% magnitude and two-week timeframe, here's how I'd prioritize investigation..."
"First, I'd immediately check data quality—verify the session definition hasn't changed and look at related metrics for consistency. This is quick and rules out false alarms.
Second, I'd investigate performance metrics. A 30% session length drop strongly correlates with performance degradation, and this is relatively easy to verify.
Third, I'd correlate with product changes. If anything shipped in the past two weeks, that's a prime suspect.
Fourth, I'd check year-over-year seasonal patterns to rule out expected variance.
The logic here is: impact × likelihood × ease of verification."
5. DATA REQUESTS (2-3 minutes):
"To test these hypotheses systematically, here's the specific data I'd request..."
[List your ordered data requests from Step 4, being very specific]
Example: "First, data integrity checks: show me if the session timeout config changed and compare page views per session to session length. Second, segmentation: break down session length by platform, user cohort, and geography to narrow where the problem lives. Third, technical health: P95 load times, error rates, crash rates..."
6. NEXT STEPS (1-2 minutes):
"Once we have this data, I'd drill into the top hypotheses systematically. If we find the root cause is a product change, I'd recommend rolling back or fixing it and validating with an A/B test. If it's technical, we'd prioritize the performance fix based on user impact. If it's external or seasonal, we'd adjust our expectations and potentially change our engagement strategy."
COMMON MISTAKES TO AVOID:
-------------------------
❌ Not ruling out data quality issues first—always verify measurement integrity
❌ Jumping to solutions before systematic diagnosis
❌ Only considering product changes—technical performance is often the culprit for session length
❌ Forgetting to check year-over-year for seasonal effects
❌ Asking vague data requests ("give me more data") instead of specific, prioritized requests
❌ Not segmenting—understanding which users are affected narrows root cause faster
❌ Missing the distinction between session count and session length (both can affect engagement)
❌ Not using a structured framework—ad-hoc thinking misses important categories
❌ Spending too much time on low-probability causes before checking obvious ones
❌ Ignoring performance metrics—slow experiences directly cause shorter sessions
❌ Not checking if users are having fewer sessions AND shorter sessions (compounded problem)
❌ Forgetting that 30% is a LARGE drop—this suggests something major, not a minor tweak