=== ROOT CAUSE ANALYSIS FRAMEWORK ===

STEP 1: CLARIFYING QUESTIONS (Ask These First)
------------------------------------------------

1. SCOPE: Is the 30% decrease consistent across all user segments, or concentrated in specific cohorts (new vs. returning, power users vs. casual)?
   - Why ask: Segment-specific drops point to onboarding issues, feature adoption problems, or cohort quality changes

2. TIMEFRAME: Did the drop happen suddenly (over 1-2 days) or gradually decline over the two weeks?
   - Why ask: Sudden = likely deployment or external event; Gradual = product degradation or competitive shift

3. PLATFORM: Is this affecting all platforms equally (iOS, Android, web), or specific to one?
   - Why ask: Platform-specific suggests technical issue, app version problem, or platform-specific UX change

4. GEOGRAPHY: Is this a global phenomenon or concentrated in specific regions/countries?
   - Why ask: Regional drops suggest local competition, regulatory changes, or cultural factors

5. METRIC DEFINITION: Has the way session length is calculated changed? Any instrumentation updates or backend changes?
   - Why ask: Data quality issues are common and easy to rule out—always verify measurement integrity first

6. RECENT CHANGES: What product changes, feature releases, or experiments went live in the past two weeks?
   - Why ask: Temporal correlation helps identify causal candidates quickly

7. SESSION COUNT: Are users having the same number of sessions but each is shorter, or are they also having fewer sessions?
   - Why ask: Different symptoms point to different root causes (engagement vs. friction vs. technical)


STEP 2: MECE HYPOTHESIS FRAMEWORK
----------------------------------

CATEGORY A: INTERNAL - PRODUCT CHANGES (High Priority - Check First)

1. Recent feature launch degraded core engagement loop
   Impact: High | Likelihood: High | Data: Feature adoption rates, session length by users exposed vs. not exposed to new feature

2. Content feed/recommendation algorithm change reduced relevance or quality
   Impact: High | Likelihood: High | Data: Content engagement metrics (clicks, likes, time on content), algorithm version correlation

3. Navigation or information architecture change increased friction
   Impact: High | Likelihood: Medium | Data: Navigation flow analysis, clicks to reach key features, user journey tracking

4. Notification frequency or timing changes reduced re-engagement within sessions
   Impact: Medium | Likelihood: Medium | Data: In-session notification triggers, notification-driven actions per session

5. A/B test or experiment running that inadvertently affects larger population or control group
   Impact: High | Likelihood: Medium | Data: Experiment exposure logs, session length by experiment variant


CATEGORY B: INTERNAL - TECHNICAL/INFRASTRUCTURE (High Priority)

6. Performance degradation (slower load times, lag, increased latency)
   Impact: High | Likelihood: High | Data: P95/P99 page load times, API response times, time-to-interactive metrics

7. Increased error rates or API failures causing feature unavailability
   Impact: High | Likelihood: Medium | Data: Error rates by endpoint, failed requests per session, JavaScript console errors

8. Mobile app crashes or freezes causing premature session termination
   Impact: High | Likelihood: Medium | Data: Crash rates, crash-free session rate, version-specific crash analysis

9. Memory leaks or resource issues causing app slowdown over session duration
   Impact: Medium | Likelihood: Low | Data: Memory usage profiles, CPU utilization, session abandonment patterns

10. CDN or media loading issues (slow images/videos)
    Impact: Medium | Likelihood: Medium | Data: CDN response times, media load success rates, buffering events


CATEGORY C: INTERNAL - CONTENT & SUPPLY QUALITY

11. Decline in content quality or relevance (user-generated or curated)
    Impact: High | Likelihood: Medium | Data: Content creation rates, content engagement rates, content moderation removals

12. Reduced content supply or inventory availability
    Impact: Medium | Likelihood: Medium | Data: Available content per user, content impression rates, "empty state" views

13. Increased spam, low-quality content, or safety issues
    Impact: Medium | Likelihood: Low | Data: Content reports, spam flags, content moderation actions


CATEGORY D: EXTERNAL - COMPETITION & MARKET (Medium Priority)

14. Major competitor launched compelling feature or product
    Impact: High | Likelihood: Medium | Data: Competitor app store rankings, social media sentiment, time spent benchmarks

15. Competitor running aggressive user acquisition or engagement campaign
    Impact: Medium | Likelihood: Medium | Data: Competitive intelligence, market share trends, app download rankings

16. Industry-wide shift in user behavior or attention
    Impact: Medium | Likelihood: Low | Data: Industry benchmarks, competitor data if available


CATEGORY E: EXTERNAL - SEASONALITY & EVENTS (Medium Priority)

17. Seasonal effect (end of holiday season, back to school/work, weather change)
    Impact: Medium | Likelihood: High | Data: Historical year-over-year patterns for same two-week period

18. Major cultural, sporting, or news events capturing attention
    Impact: Low | Likelihood: Medium | Data: News calendar, historical event correlation analysis

19. Platform-specific external factors (iOS update, Android version changes)
    Impact: Medium | Likelihood: Medium | Data: OS version segmentation, correlation with OS update rollout


CATEGORY F: EXTERNAL - REGULATORY & POLICY

20. Privacy policy changes reducing data collection or personalization
    Impact: Medium | Likelihood: Low | Data: Personalization effectiveness, recommendation diversity

21. App store policy changes affecting notifications or features
    Impact: Low | Likelihood: Low | Data: Feature availability, notification permission rates


CATEGORY G: DATA QUALITY & MEASUREMENT (Check Early!)

22. Session timeout definition changed (e.g., timeout threshold shortened)
    Impact: CRITICAL | Likelihood: Medium | Data: Session definition documentation, backend config changes

23. Instrumentation bug undercounting session duration
    Impact: CRITICAL | Likelihood: Medium | Data: Related metrics sanity checks (page views per session, events per session)

24. Data pipeline processing change affecting aggregation
    Impact: High | Likelihood: Low | Data: Data freshness, raw vs. aggregated data comparison

25. Client-side tracking implementation change
    Impact: High | Likelihood: Low | Data: Event firing rates, tracking SDK version correlation


STEP 3: PRIORITIZATION MATRIX
------------------------------

CHECK FIRST (High Impact × High Likelihood):
→ Data quality/measurement issues (#22, #23) - ALWAYS rule these out first—quick to check and surprisingly common
→ Performance degradation (#6) - Slow experiences directly reduce session length
→ Recent feature launch impact (#1) - Most common product-related cause
→ Algorithm/content feed changes (#2) - Directly affects engagement
→ Seasonal effects (#17) - Year-over-year comparison is quick to verify

CHECK SECOND (High Impact × Medium Likelihood OR Fast to Verify):
→ Technical issues: errors, crashes (#7, #8)
→ Navigation/UX friction (#3)
→ Content quality decline (#11)
→ Major competitor moves (#14)
→ CDN/media loading issues (#10)

CHECK THIRD (Lower Impact or Lower Likelihood):
→ Experiment exposure issues (#5)
→ Notification changes (#4)
→ Content supply issues (#12)
→ Platform external factors (#19)
→ All other hypotheses based on context clues


STEP 4: STRATEGIC DATA REQUESTS
--------------------------------

1. RULE OUT DATA QUALITY FIRST (fast sanity checks):
   "Before investigating behavior changes, let me verify data integrity:
    - Has the session timeout definition or backend config changed in the past two weeks?
    - Compare related metrics: Are page views per session, events per session, or actions per session also down 30%?
    - Check raw event logs vs. aggregated metrics for discrepancies
    - Timeline: Show me exactly when the drop started (daily granularity)"

2. SEGMENTATION (narrow scope quickly):
   "Show me session length trend segmented by:
    - Platform (iOS vs. Android vs. web)
    - User cohort (new users in past 2 weeks vs. returning users)
    - Geography (top 5 countries)
    - User activity level (power users vs. casual)
    - Device type (high-end vs. low-end devices)"

3. TECHNICAL HEALTH (rule out infrastructure):
   "Provide for the past 3 weeks:
    - App load times (P50, P95, P99) by platform
    - API response times for critical endpoints
    - Error rates and types
    - Crash-free session rate
    - Session abandonment rate (sessions ending abruptly vs. gracefully)"

4. PRODUCT CHANGES CORRELATION (identify causal candidates):
   "What changed in the past two weeks:
    - Any feature releases or product launches? (with dates and rollout %)
    - Any A/B tests started, modified, or ended?
    - Any algorithm or recommendation system changes?
    - Any changes to navigation, information architecture, or key user flows?
    - Any notification or in-app messaging changes?"

5. USER BEHAVIOR PATTERNS (understand symptoms):
   "For the past 3 weeks, show me:
    - Average sessions per user (is frequency also down?)
    - Key engagement metrics per session: page views, actions taken, content interactions
    - Session ending patterns: What are users' last actions before ending sessions?
    - Time to first meaningful action within session
    - Bounce rate (single-action sessions)"

6. CONTENT & SUPPLY METRICS (if applicable):
   "If content-driven product, provide:
    - Content creation rate and volume
    - Content engagement rate (% of viewed content that gets interaction)
    - 'Empty state' or 'no content available' occurrence rate
    - Content load success rates"

7. SEASONAL & EXTERNAL CONTEXT:
   "Provide context:
    - Historical data: What was session length during the same two-week period last year?
    - Competitive context: Any major competitor launches or campaigns?
    - Platform updates: Did iOS or Android release updates in this timeframe?"


STEP 5: INTERVIEW ANSWER STRUCTURE (Use This Flow)
---------------------------------------------------

1. CLARIFY (2-3 minutes):
   "Before diving into potential root causes, let me ask a few clarifying questions to narrow the scope..."
   [Ask top 5-7 questions from Step 1]
   
   Example: "First, I want to understand if this is affecting all users equally or specific segments. Is the 30% drop consistent across new versus returning users, or all platforms? Also, did this happen suddenly or gradually? And crucially, has anything changed in how we calculate session length?"

2. FRAMEWORK (1 minute):
   "I'll use a MECE framework to systematically explore potential root causes across Internal Product & Technical factors, External Market & Seasonality factors, and Data Quality. This ensures we cover all possibilities without overlap."

3. HYPOTHESES (5-7 minutes):
   "Let me walk through the most likely causes organized by category..."
   
   [Present top 10-12 hypotheses organized by category, focusing on highest priority]
   
   Example structure:
   "Starting with data quality—because this is critical to rule out first—we should verify whether the session timeout definition changed or if there's an instrumentation bug.
   
   Next, internal product changes: If a feature launched or an algorithm changed in the past two weeks, this could directly impact engagement. I'd look at...
   
   On the technical side, performance degradation is a major driver of shorter sessions. Slow load times or increased errors would manifest exactly as we're seeing...
   
   Externally, we should check for seasonal effects by comparing year-over-year, and investigate whether a competitor launched something compelling..."

4. PRIORITIZATION (2-3 minutes):
   "Given the 30% magnitude and two-week timeframe, here's how I'd prioritize investigation..."
   
   "First, I'd immediately check data quality—verify the session definition hasn't changed and look at related metrics for consistency. This is quick and rules out false alarms.
   
   Second, I'd investigate performance metrics. A 30% session length drop strongly correlates with performance degradation, and this is relatively easy to verify.
   
   Third, I'd correlate with product changes. If anything shipped in the past two weeks, that's a prime suspect.
   
   Fourth, I'd check year-over-year seasonal patterns to rule out expected variance.
   
   The logic here is: impact × likelihood × ease of verification."

5. DATA REQUESTS (2-3 minutes):
   "To test these hypotheses systematically, here's the specific data I'd request..."
   
   [List your ordered data requests from Step 4, being very specific]
   
   Example: "First, data integrity checks: show me if the session timeout config changed and compare page views per session to session length. Second, segmentation: break down session length by platform, user cohort, and geography to narrow where the problem lives. Third, technical health: P95 load times, error rates, crash rates..."

6. NEXT STEPS (1-2 minutes):
   "Once we have this data, I'd drill into the top hypotheses systematically. If we find the root cause is a product change, I'd recommend rolling back or fixing it and validating with an A/B test. If it's technical, we'd prioritize the performance fix based on user impact. If it's external or seasonal, we'd adjust our expectations and potentially change our engagement strategy."


COMMON MISTAKES TO AVOID:
-------------------------
❌ Not ruling out data quality issues first—always verify measurement integrity
❌ Jumping to solutions before systematic diagnosis
❌ Only considering product changes—technical performance is often the culprit for session length
❌ Forgetting to check year-over-year for seasonal effects
❌ Asking vague data requests ("give me more data") instead of specific, prioritized requests
❌ Not segmenting—understanding which users are affected narrows root cause faster
❌ Missing the distinction between session count and session length (both can affect engagement)
❌ Not using a structured framework—ad-hoc thinking misses important categories
❌ Spending too much time on low-probability causes before checking obvious ones
❌ Ignoring performance metrics—slow experiences directly cause shorter sessions
❌ Not checking if users are having fewer sessions AND shorter sessions (compounded problem)
❌ Forgetting that 30% is a LARGE drop—this suggests something major, not a minor tweak
Session Length Decline Investigation

Input

Output

About This Skill

More Examples

E-commerce Checkout Conversion Crisis

Fitness App MAU Drop Analysis