Data Models Lesson Plan
How you model data shapes everything downstream — queries, APIs, UIs, performance, team boundaries.
Lesson 1: Entities and Relationships
Section titled “Lesson 1: Entities and Relationships”Goal: Identify the nouns in a system and how they relate to each other.
Concepts
Section titled “Concepts”Every system has entities (things you track) and relationships (how they connect). List the nouns: users, orders, products. Then determine cardinality — 1:1, 1:N, or M:N. Drawing these out before writing code prevents expensive rework.
Exercises
Section titled “Exercises”-
Identify entities in a bookstore
Entities: Book, Author, Customer, Order, ReviewRelationships:Author --< Book (1:N)Customer --< Order (1:N)Order >--< Book (M:N)Customer --< Review (1:N)Book --< Review (1:N) -
Draw an ER diagram
erDiagramAUTHOR ||--o{ BOOK : writesCUSTOMER ||--o{ ORDER : placesORDER }o--o{ BOOK : containsCUSTOMER ||--o{ REVIEW : writesBOOK ||--o{ REVIEW : "is reviewed in" -
Spot the cardinality errors
“Each employee has exactly one department, each department has exactly one employee.” Explain why 1:1 is wrong here. Rewrite as 1:N. Identify a real case where 1:1 makes sense (e.g., user and user_profile).
Checkpoint
Section titled “Checkpoint”Model a different domain (gym membership, restaurant). List 5+ entities with cardinalities. Identify one M:N and explain how to resolve it with a join table.
Lesson 2: Relational Modeling
Section titled “Lesson 2: Relational Modeling”Goal: Normalize data into tables that avoid redundancy and anomalies.
Concepts
Section titled “Concepts”Normalization reduces duplication. 1NF: every cell holds one value. 2NF: non-key columns depend on the entire primary key. 3NF: no column depends on another non-key column. Over-normalizing creates excessive joins. Denormalize deliberately when read performance matters more than write consistency.
Exercises
Section titled “Exercises”-
Fix a 1NF violation
-- Bad: multiple phones in one columnCREATE TABLE contacts (id INTEGER PRIMARY KEY, name TEXT, phones TEXT);-- Fixed: separate tableCREATE TABLE contacts (id INTEGER PRIMARY KEY, name TEXT);CREATE TABLE phones (id INTEGER PRIMARY KEY,contact_id INTEGER REFERENCES contacts(id),phone TEXT NOT NULL); -
Normalize to 3NF
-- Bad: city depends on zip, not on student_idCREATE TABLE students (student_id INTEGER PRIMARY KEY, name TEXT,zip TEXT, city TEXT, state TEXT);-- Fixed: extract zip_codesCREATE TABLE zip_codes (zip TEXT PRIMARY KEY, city TEXT NOT NULL, state TEXT NOT NULL);CREATE TABLE students (student_id INTEGER PRIMARY KEY, name TEXT,zip TEXT REFERENCES zip_codes(zip)); -
Resolve an M:N relationship
CREATE TABLE students (id INTEGER PRIMARY KEY, name TEXT);CREATE TABLE courses (id INTEGER PRIMARY KEY, title TEXT);CREATE TABLE enrollments (student_id INTEGER REFERENCES students(id),course_id INTEGER REFERENCES courses(id),enrolled_at TEXT DEFAULT (datetime('now')),PRIMARY KEY (student_id, course_id));INSERT INTO students VALUES (1, 'Alice'), (2, 'Bob');INSERT INTO courses VALUES (1, 'Databases'), (2, 'Algorithms');INSERT INTO enrollments (student_id, course_id) VALUES (1, 1), (1, 2), (2, 1);SELECT s.name, c.title FROM enrollments eJOIN students s ON e.student_id = s.idJOIN courses c ON e.course_id = c.id;
Checkpoint
Section titled “Checkpoint”Take a flat CSV with repeated customer info in every row. Normalize it into 3NF with 3+ tables in SQLite.
Lesson 3: Schema Design in Practice
Section titled “Lesson 3: Schema Design in Practice”Goal: Design a real schema with constraints, indexes, and migrations.
Concepts
Section titled “Concepts”Good schemas encode business rules: NOT NULL prevents missing data, UNIQUE prevents duplicates, CHECK enforces ranges, FOREIGN KEY enforces relationships. Indexes support your access patterns — add them for columns you filter and join on.
Exercises
Section titled “Exercises”-
Design a task tracker schema
CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT NOT NULL UNIQUE,name TEXT NOT NULL, created_at TEXT NOT NULL DEFAULT (datetime('now')));CREATE TABLE projects (id INTEGER PRIMARY KEY, name TEXT NOT NULL,owner_id INTEGER NOT NULL REFERENCES users(id));CREATE TABLE tasks (id INTEGER PRIMARY KEY,project_id INTEGER NOT NULL REFERENCES projects(id),title TEXT NOT NULL,status TEXT NOT NULL DEFAULT 'todo' CHECK (status IN ('todo','in_progress','done')),assignee_id INTEGER REFERENCES users(id),due_date TEXT,created_at TEXT NOT NULL DEFAULT (datetime('now')));CREATE INDEX idx_tasks_project ON tasks(project_id);CREATE INDEX idx_tasks_assignee ON tasks(assignee_id); -
Write a migration
-- 002_add_task_priority.sqlALTER TABLE tasks ADD COLUMN priority TEXTDEFAULT 'medium' CHECK (priority IN ('low','medium','high','urgent'));UPDATE tasks SET priority = 'medium' WHERE priority IS NULL; -
Query and verify indexes
SELECT t.title, t.status, t.due_date, p.name AS projectFROM tasks t JOIN projects p ON t.project_id = p.idWHERE t.assignee_id = 1 AND t.status != 'done'ORDER BY t.due_date;EXPLAIN QUERY PLANSELECT * FROM tasks WHERE assignee_id = 1 AND status = 'todo';
Checkpoint
Section titled “Checkpoint”Load the schema into SQLite with sample data. Run EXPLAIN QUERY PLAN on three
queries. Add a labels feature using a junction table.
Lesson 4: Document Models
Section titled “Lesson 4: Document Models”Goal: Understand when to model data as documents instead of tables.
Concepts
Section titled “Concepts”Document models store nested structures — JSON, YAML, TOML. They excel when data is self-contained, schema varies across records, or you read entire objects at once. Embedding duplicates data but avoids joins. Referencing normalizes but requires multiple lookups.
Exercises
Section titled “Exercises”-
Model and query a blog post
{"id": "post-001","title": "Data Modeling Basics","author": { "id": "user-42", "name": "Alice" },"tags": ["databases", "design"],"comments": [{"user": "Bob","text": "Great post!","created_at": "2025-01-15T10:30:00Z"}]}Terminal window jq '.tags[]' post.json # Extract tagsjq '.comments[].user' post.json # Comment authorsjq '.comments | length' post.json # Count comments -
Compare embedding vs referencing
Embedded order (all data inline) vs referenced order (just IDs). Write three advantages and three disadvantages of each approach.
-
Query a collection with jq
Create
tasks.jsonwith 5 task objects (varying status, priority, assignee):Terminal window jq '[.[] | select(.priority == "high")]' tasks.jsonjq 'group_by(.status) | map({status: .[0].status, count: length})' tasks.jsonjq '[.[] | select(.assignee == "Alice")] | sort_by(.due_date)' tasks.json
Checkpoint
Section titled “Checkpoint”Model the Lesson 3 task tracker as JSON documents. Replicate the SQL queries in jq. Note which become easier and which become harder.
Lesson 5: Graph Models
Section titled “Lesson 5: Graph Models”Goal: Model data where relationships are the primary concern.
Concepts
Section titled “Concepts”Graphs store nodes (entities) and edges (relationships). Unlike relational joins that get expensive at depth, graphs traverse relationships efficiently. Use them when the key question is “how is X connected to Y?” — social networks, knowledge bases, dependency trees.
Exercises
Section titled “Exercises”-
Build a social graph as JSON
{"nodes": [{ "id": "alice", "role": "engineer" },{ "id": "bob", "role": "designer" },{ "id": "carol", "role": "engineer" }],"edges": [{ "from": "alice", "to": "bob", "type": "follows" },{ "from": "alice", "to": "carol", "type": "follows" },{ "from": "carol", "to": "alice", "type": "follows" }]}Terminal window # Who does Alice follow?jq '[.edges[] | select(.from == "alice") | .to]' graph.json# Who follows Carol?jq '[.edges[] | select(.to == "carol") | .from]' graph.json -
Model a dependency graph
Create nodes for
app, api, auth, db, cache, loggerwith directed edges. Find all direct dependencies ofapiwith jq. Note why recursive traversal is hard in jq but natural in a graph database. -
Store a graph in SQL
CREATE TABLE nodes (id TEXT PRIMARY KEY, label TEXT, properties TEXT);CREATE TABLE edges (id INTEGER PRIMARY KEY,from_node TEXT REFERENCES nodes(id),to_node TEXT REFERENCES nodes(id),type TEXT NOT NULL);-- Friends of friends (2-hop traversal)SELECT DISTINCT e2.to_node AS friend_of_friendFROM edges e1 JOIN edges e2 ON e1.to_node = e2.from_nodeWHERE e1.from_node = 'alice' AND e1.type = 'follows'AND e2.type = 'follows' AND e2.to_node != 'alice';
Checkpoint
Section titled “Checkpoint”Build a knowledge graph for a topic you know (10+ nodes, 15+ edges). Write a query finding all nodes within 2 hops of a starting node.
Lesson 6: Time and State
Section titled “Lesson 6: Time and State”Goal: Model data that changes over time without losing history.
Concepts
Section titled “Concepts”Most databases store current state only — UPDATE destroys the previous value. Event sourcing stores every change as an immutable event; current state derives from replaying events. Append-only logs (JSONL, Kafka) capture what happened and when, enabling audit trails and point-in-time queries.
Exercises
Section titled “Exercises”-
Build an append-only event log
Create
events.jsonlwith one JSON event per line:{"ts":"2025-01-01T09:00:00Z","type":"task.created","task_id":"t-1","title":"Write docs"}{"ts":"2025-01-01T10:30:00Z","type":"task.assigned","task_id":"t-1","assignee":"alice"}{"ts":"2025-01-01T14:00:00Z","type":"task.status_changed","task_id":"t-1","to":"in_progress"}{"ts":"2025-01-02T11:00:00Z","type":"task.status_changed","task_id":"t-1","to":"done"}Terminal window # Current status (last status_changed event)jq -s '[.[] | select(.type == "task.status_changed")] | last | .to' events.jsonl# Full timelinejq -s 'sort_by(.ts) | .[] | "\(.ts) \(.type)"' events.jsonl -
Temporal table in SQL
CREATE TABLE task_history (id INTEGER PRIMARY KEY, task_id TEXT NOT NULL,field TEXT NOT NULL, old_value TEXT, new_value TEXT NOT NULL,changed_at TEXT NOT NULL DEFAULT (datetime('now')),changed_by TEXT NOT NULL);-- Status at a specific timeSELECT new_value FROM task_historyWHERE task_id = 't-1' AND field = 'status'AND changed_at <= '2025-01-01T12:00:00Z'ORDER BY changed_at DESC LIMIT 1; -
Compare mutable vs immutable
-- Mutable: current balance onlyCREATE TABLE accounts (id TEXT PRIMARY KEY, balance REAL NOT NULL DEFAULT 0);-- Immutable: transaction logCREATE TABLE transactions (id INTEGER PRIMARY KEY, account_id TEXT NOT NULL,amount REAL NOT NULL, created_at TEXT NOT NULL DEFAULT (datetime('now')));-- Current balance: SELECT SUM(amount) FROM transactions WHERE account_id = 'acct-1';-- Balance at date: add AND created_at <= '2025-06-01'Write when you would choose each approach and why.
Checkpoint
Section titled “Checkpoint”Build an event log for a shopping cart (add, remove, change quantity). Replay it with jq to compute the final state. Add a point-in-time query.
Lesson 7: API Data Contracts
Section titled “Lesson 7: API Data Contracts”Goal: Design API schemas that evolve without breaking consumers.
Concepts
Section titled “Concepts”Internal schemas optimize for storage; API schemas optimize for consumers. Adding fields is safe. Removing or renaming fields breaks clients. Version your APIs or use additive-only changes. JSON Schema and OpenAPI formalize contracts so both sides know what to expect.
Exercises
Section titled “Exercises”-
Design a REST resource schema
{"type": "object","required": ["id", "title", "status", "project_id"],"properties": {"id": { "type": "string" },"title": { "type": "string" },"status": { "type": "string", "enum": ["todo", "in_progress", "done"] },"assignee": { "type": "object", "properties": { "id": {}, "name": {} } }}}Note:
assigneeis embedded (not just an ID) — optimize for the consumer, not the database. -
Classify schema changes
Given
{"id": "t-1", "title": "Write docs", "status": "todo"}, classify:- Add optional
priorityfield — SAFE - Rename
titletoname— BREAKING - Add
"blocked"to status enum — RISKY (clients may not handle it) - Change
idfrom string to integer — BREAKING
For each breaking change, write a migration strategy preserving backwards compatibility.
- Add optional
-
Validate with jq
Terminal window jq 'if .id == null then error("missing id")elif .title == null then error("missing title")elif (.status | IN("todo","in_progress","done") | not)then error("invalid status")else "valid" end' task.json
Checkpoint
Section titled “Checkpoint”Design request/response schemas for create-task and list-tasks endpoints. Write a jq validator for sample payloads.
Lesson 8: Choosing the Right Model
Section titled “Lesson 8: Choosing the Right Model”Goal: Apply a decision framework to pick the right data model for a problem.
Concepts
Section titled “Concepts”No single model fits all problems. Relational models handle structured data with complex cross-entity queries. Documents suit self-contained objects with variable schemas. Graphs shine when relationships drive queries. Event logs capture temporal state. Real systems combine multiple models — pick based on access patterns, not familiarity.
Exercises
Section titled “Exercises”-
Evaluate tradeoffs for three scenarios
A: E-commerce catalog -- products with varying attributes (books have ISBN,clothing has sizes). Customers browse and search. Orders reference products.B: Fraud detection -- tracks relationships between accounts, devices, IPs.Key question: "Is this account connected to known-fraud accounts?"C: Financial audit -- must record every change. Regulators need state at anypoint in time. No data may be deleted.Choose a primary model for each and justify in one paragraph.
-
Design a hybrid model
-- Relational coreCREATE TABLE projects (id TEXT PRIMARY KEY, name TEXT NOT NULL);CREATE TABLE tasks (id TEXT PRIMARY KEY, project_id TEXT REFERENCES projects(id),title TEXT NOT NULL, status TEXT NOT NULL,metadata TEXT DEFAULT '{}' -- JSON for flexible fields);-- Event log for auditCREATE TABLE events (id INTEGER PRIMARY KEY, entity_type TEXT, entity_id TEXT,event_type TEXT NOT NULL, payload TEXT NOT NULL,created_at TEXT NOT NULL DEFAULT (datetime('now')));-- Query JSON metadataSELECT id, title, json_extract(metadata, '$.sprint') AS sprintFROM tasks WHERE json_extract(metadata, '$.story_points') > 2; -
Build a decision checklist
Access patterns: Whole objects by ID --> DocumentCross-entity JOINs --> RelationalN-hop traversals --> GraphReplay / audit --> Event logSchema structure: Fixed, enforced --> RelationalVariable per record --> DocumentConsistency: Strong, ACID --> RelationalEventual OK --> Document / Event log
Checkpoint
Section titled “Checkpoint”Pick a real application you use daily. Identify its likely data model(s) and write a one-page analysis covering entities, access patterns, and consistency needs.
Practice Projects
Section titled “Practice Projects”Project 1: Task Tracker (Full Stack Model)
Section titled “Project 1: Task Tracker (Full Stack Model)”Design a task tracker with users, projects, tasks, labels, comments, and activity feed. Write the SQL schema, seed it, build 10 queries, and add an event log for audit.
Project 2: Recipe Book (Document Model)
Section titled “Project 2: Recipe Book (Document Model)”Model a recipe collection as JSON documents with ingredients, steps, tags, and ratings. Store 10 recipes. Write jq queries to search by ingredient, filter by tag, and find recipes under 30 minutes.
Project 3: Knowledge Graph
Section titled “Project 3: Knowledge Graph”Build a graph of technologies — languages, frameworks, databases. Model “uses,” “built-with,” and “depends-on” edges. Store in SQLite as nodes and edges. Find clusters and most-connected nodes.
Quick Reference
Section titled “Quick Reference”| Model | Best For | Watch Out For |
|---|---|---|
| Relation | Structured data, complex queries | Over-normalization, join costs |
| Document | Variable schema, self-contained | Data duplication, no joins |
| Graph | Relationship-heavy queries | Simple queries become verbose |
| Event | Audit trails, temporal queries | Replay cost, storage growth |
| Hybrid | Real-world systems, mixed needs | Operational complexity |
See Also
Section titled “See Also”- PostgreSQL Cheatsheet — Indexes, window functions, admin queries
- SQL Cheatsheet — Joins, CTEs, window functions
- Thinking — Mental models and systems thinking
- System Design
- API Design