SCHEMA DESIGN FEEDBACK

What You Did Well

✅ Core entity separation: You correctly identified the four main entities (users, posts, follows, likes) needed for a basic social media platform

✅ Relationship structure awareness: Using user_id foreign keys shows you understand that entities need to reference each other

✅ Standard social media patterns: Following the conventional approach of separate tables for follows and likes rather than embedding in posts

Areas to Consider

1. Missing Critical Fields

What you did: Mentioned tables and user_id FKs, but didn’t specify other key fields

The issue:

Users table needs authentication fields (email, password_hash, username)
Posts need content fields (text, media_urls, created_at)
Follows needs both follower_id and followed_id (not just user_id)
Likes needs both user_id AND post_id to represent the relationship

Better approach:

users
├─ id (PK)
├─ username (unique)
├─ email (unique)
├─ password_hash
├─ bio, profile_pic_url
├─ created_at

posts
├─ id (PK)
├─ user_id (FK → users)
├─ content (text)
├─ media_urls (array or separate media table)
├─ like_count (denormalized for performance)
├─ created_at

follows
├─ id (PK)
├─ follower_id (FK → users)
├─ followed_id (FK → users)
├─ created_at
└─ UNIQUE(follower_id, followed_id)

likes
├─ id (PK)
├─ user_id (FK → users)
├─ post_id (FK → posts)
├─ created_at
└─ UNIQUE(user_id, post_id)

PM Trade-off:
Denormalizing like_count on posts table trades storage/consistency for read speed—feed queries don’t need to COUNT() every time.

2. Feed Generation Strategy

What you did: Basic schema structure

Missing consideration:

How do you generate a user’s feed of posts from people they follow?
Naive approach: JOIN follows → posts → users (slow at scale)
Feed generation is the #1 performance bottleneck in social media apps

Better approach - Fan-out on write:

feed_items
├─ id (PK)
├─ user_id (FK → users)  // whose feed this appears in
├─ post_id (FK → posts)
├─ created_at
└─ INDEX(user_id, created_at DESC)

When user A posts, write to feed_items for all followers. Reading feed = simple query on user’s feed_items.

PM Trade-off:

Fan-out on write: Fast reads (critical for user experience), slower writes (acceptable delay)
Fan-out on read: Fast writes, slow reads (bad UX when opening feed)
Hybrid: Fan-out on write for users with <100k followers, on-demand aggregation for celebrities

3. Relationship Cardinality Clarity

What you did: “user_id FKs” in follows and likes

The issue:

Follows is a many-to-many self-referencing relationship on users
Requires TWO foreign keys: follower_id and followed_id, both pointing to users
Likes is a many-to-many between users and posts
Requires user_id and post_id

Better approach:
Be explicit about relationship types:

users → posts: one-to-many (one user creates many posts)
users → follows (as follower): many-to-many (many users follow many users)
users → likes: many-to-many (many users like many posts)
posts → likes: one-to-many (one post has many likes)

PM insight: In interviews, naming both sides of many-to-many relationships shows you understand data flows in both directions (e.g., “who follows me” vs “who do I follow”).

4. Comments and Nested Interactions

What you did: Posts, follows, likes only

Missing consideration:
Most social media apps need comments, which have their own complexity:

Comments on posts (potentially nested/threaded)
Likes on comments
Replies to comments

Better approach:

comments
├─ id (PK)
├─ post_id (FK → posts)
├─ user_id (FK → users)
├─ parent_comment_id (FK → comments, nullable for top-level)
├─ content (text)
├─ like_count (denormalized)
├─ created_at

PM Trade-off:
Threaded comments (parent_comment_id) increase engagement but complicate feed rendering and notification logic.

5. Indexing Strategy

What you did: Defined tables

Missing consideration:
Schema isn’t complete without indexes for common queries:

Critical indexes:

follows(follower_id) - “Who do I follow?”
follows(followed_id) - “Who follows me?”
posts(user_id, created_at DESC) - “User’s posts, newest first”
likes(post_id) - “All likes on a post”
likes(user_id, post_id) - “Has this user liked this post?” (for UI state)

PM insight: Indexes are a product decision—they speed up specific queries but slow writes. Index the queries your users run most often.

Improved Schema + Trade-offs

CORE DATA (PostgreSQL)
======================
users
├─ id (PK)
├─ username (UNIQUE)
├─ email (UNIQUE)
├─ password_hash
├─ bio, profile_pic_url
├─ follower_count, following_count (denormalized)
└─ created_at

posts
├─ id (PK)
├─ user_id (FK → users)
├─ content (text, max 280 chars for Twitter-style)
├─ media_urls (JSON array or separate media table)
├─ like_count, comment_count (denormalized)
├─ created_at
└─ INDEX(user_id, created_at DESC)

follows (many-to-many self-join)
├─ id (PK)
├─ follower_id (FK → users)
├─ followed_id (FK → users)
├─ created_at
├─ UNIQUE(follower_id, followed_id)
├─ INDEX(follower_id)
└─ INDEX(followed_id)

likes (many-to-many junction)
├─ id (PK)
├─ user_id (FK → users)
├─ post_id (FK → posts)
├─ created_at
├─ UNIQUE(user_id, post_id)
├─ INDEX(post_id)
└─ INDEX(user_id, post_id)

comments
├─ id (PK)
├─ post_id (FK → posts)
├─ user_id (FK → users)
├─ parent_comment_id (FK → comments, nullable)
├─ content (text)
├─ like_count (denormalized)
├─ created_at
└─ INDEX(post_id, created_at)

FEED OPTIMIZATION (Redis or PostgreSQL)
========================================
feed_items
├─ id (PK)
├─ user_id (FK → users)  // whose feed
├─ post_id (FK → posts)
├─ created_at
└─ INDEX(user_id, created_at DESC)

Trade-off Summary:

Denormalized counts (follower_count, like_count): Fast reads, eventual consistency on writes
Feed_items fan-out: Instant feed loads, write amplification for popular users
Composite indexes: Speed up common queries, increase storage by ~15-30%
Self-referencing follows table: Clean model, requires two indexes (follower, followed)

Key PM Insights

Social media is read-heavy: Users read feeds 100x more than they post. Optimize for reads (denormalize counts, pre-compute feeds) even if writes get slower.
Many-to-many requires junction tables: Follows and likes are both many-to-many relationships. Each needs a dedicated table with two foreign keys and a composite unique constraint.
Feed generation is the hardest problem: The naive JOIN approach doesn’t scale. Fan-out on write (pre-compute feeds) is the industry standard for good UX.
Indexes are product decisions: Every index speeds up specific queries but slows writes. Index what users do most (view feed, check likes) not admin operations (view all posts ever).

How to Discuss in Interviews

Good answer structure:

Start with core entities and relationships (show structured thinking)
Choose database type with reasoning (show trade-off awareness)
Identify scaling challenges and solutions (show production mindset)
Discuss one normalization trade-off (show nuance)

Example: “I’d start with four core tables: users, posts, follows, and likes. Follows is a many-to-many self-join on users with follower_id and followed_id. Likes is a many-to-many junction between users and posts. I’d use PostgreSQL for ACID guarantees and strong relationship modeling. For scaling, the biggest challenge is feed generation—joining follows to posts gets slow at scale. I’d implement fan-out on write by maintaining a feed_items table that pre-computes each user’s feed when someone they follow posts. This trades write speed for read speed, which is the right trade-off since users read feeds 100x more than they post. I’d also denormalize like_count onto posts to avoid expensive COUNTs, accepting eventual consistency for better UX.”

Social Media App Schema Evaluation

Input

Output