Skip to content
Product Analytics

Event schema design: what every product team gets wrong

Naming, versioning, properties, and ownership. The event-taxonomy decisions that determine whether analytics are trustworthy six months from now.

Anthra AI TeamAnthra AI Team
Engineering Team6 min read
Event schema design: what every product team gets wrong hero image
Table of contents
  1. Mistake 1: No tracking plan
  2. The fix: a central tracking plan
  3. Mistake 2: Inconsistent naming
  4. The fix: pick a convention and enforce it
  5. Properties follow the same rules
  6. Mistake 3: Kitchen-sink properties
  7. The fix: standard property sets + event-specific
  8. Mistake 4: No schema versioning
  9. The fix: explicit versioning strategy
  10. Mistake 5: Weak ownership and review
  11. The fix: domain ownership model
  12. Mistake 6: Missing quality gates
  13. The fix: pre- and post-deploy validation
  14. Mistake 7: Mixing product facts with derived metrics
  15. A practical event design template
  16. 30-day remediation plan
  17. Closing
  18. Related resources

Every product analytics system we've audited has the same problem: the events are wrong. Not the infrastructure, not the tools — the events themselves. Inconsistent names, missing properties, silent schema changes, no owner.

Event schema is the foundation. When it's bad, every downstream analysis is suspect. Here's what we see teams get wrong, and the patterns that fix it.

Mistake 1: No tracking plan

Most teams don't have a tracking plan. Events get added ad-hoc — an engineer decides PageViewed is a good event, six months later someone else adds page_view, and now you have two events measuring (almost) the same thing with different property names.

The fix: a central tracking plan

Maintain a single source of truth listing:

  • Event name (exactly as fired)
  • Description (what triggers it, in plain English)
  • Properties (name, type, required/optional, example values, description)
  • Owner (which team/person is responsible)
  • Platforms (web, iOS, Android, server)
  • Version (when the schema changed)

Store it in a Git repo, reviewed via PR. Tools like Avo, RudderStack Tracking Plans, or a plain CSV can all work. What matters is the discipline, not the tool.

If it's not in the tracking plan, it doesn't ship

Enforce via code review and CI. A PR that adds track('new_event', ...) without a corresponding tracking plan entry fails the build.

Mistake 2: Inconsistent naming

Without a convention, events drift:

  • Sign Up vs sign_up vs signup vs UserSignedUp
  • Added to Cart vs Cart Add vs AddedToCart
  • Button Clicked (which button?) vs Checkout Button Clicked vs checkout_click

Each variant splits your analytics data.

The fix: pick a convention and enforce it

Our recommendation:

  • Event names: Past tense, describing what happened. Subject + Verb (past tense).
  • Case: snake_case for wire format, display-cased in UI.
  • Grammar: noun_verbed or noun_verbed_object.

Examples:

  • user_signed_up
  • cart_item_added
  • order_placed
  • payment_failed

Why past tense? Events describe things that already happened. "User signed up" is an event; "Sign up" is an action. This distinction matters: you don't track intents, you track facts.

Why noun first? Easier to scan event lists grouped by subject.

Properties follow the same rules

  • snake_case
  • Clear types (use _at suffix for timestamps, _count for counts, _id for IDs)
  • Consistent units (duration_ms, not sometimes ms and sometimes seconds)
  • Boolean names are assertions: is_guest, not guest_status
💡Name like you'll read it drunk at 2am

When you're debugging in six months, you'll thank yourself for names that are unambiguous without documentation. user_signed_up passes that test; sus does not.

Mistake 3: Kitchen-sink properties

An order_placed event with 150 properties because "we might need it later." Now the analytics warehouse has 150 columns per order event, 95% of them rarely queried.

The fix: standard property sets + event-specific

Define a small set of standard context properties sent on every event:

  • user_id (or anonymous_id for pre-signup)
  • session_id
  • timestamp
  • platform (web, ios, android, server)
  • app_version
  • page_url (web) / screen_name (mobile)
  • referrer (web)
  • experiment_buckets (any active experiments)

These are context. Set them once (SDK level), forget about them.

Then each event has 3-8 event-specific properties that describe that event specifically:

  • order_placed: order_id, order_total, currency, payment_method
  • search_performed: query, results_count, latency_ms
  • feature_used: feature_name, entry_point, time_to_complete_ms

If a property is not used by analytics, experimentation, or operations, do not include it.

Mistake 4: No schema versioning

Teams often "just change" property meaning in place:

  • plan used to be "free" | "pro"
  • now it's "starter" | "growth" | "enterprise"
  • old dashboards silently break or become incomparable over time

The fix: explicit versioning strategy

Use one of these patterns consistently:

  1. Event version suffix (checkout_started_v2) for breaking changes.
  2. Schema version property (schema_version: 2) for managed evolution.
  3. Additive change policy for non-breaking additions only.

Document migration windows and deprecation dates so downstream consumers are never surprised.

Mistake 5: Weak ownership and review

If "analytics team" owns everything, nobody owns quality at the source.

The fix: domain ownership model

Assign event ownership by product domain:

  • onboarding team owns onboarding events
  • billing team owns payment/subscription events
  • growth team owns acquisition/activation events

Require schema-review signoff in pull requests for any tracking change.

Mistake 6: Missing quality gates

Many teams only discover bad events after dashboards look wrong.

The fix: pre- and post-deploy validation

Minimum gates:

  • compile-time typing for track() payloads where possible.
  • CI checks against tracking plan for required fields and types.
  • post-deploy monitors for null spikes, enum drift, and event drop rates.
  • daily anomaly checks on event volume by key funnels.

Treat tracking quality like API quality.

Mistake 7: Mixing product facts with derived metrics

Raw events should describe what happened. Derived metrics should be computed downstream.

Bad:

  • is_power_user, ltv_bucket, health_score in raw events.

Better:

  • emit factual events and compute derived dimensions in dbt/semantic layer.

This keeps definitions centralized and prevents conflicting logic across clients.

A practical event design template

For each event, capture:

  1. Business question supported
  2. Trigger condition (exact)
  3. Actor and object
  4. Required properties
  5. Allowed enums and units
  6. Owner + approver
  7. Version and change history
  8. Downstream dashboards/models using it

If you cannot fill this out clearly, the event is not ready.

30-day remediation plan

If your current schema is messy, do not rewrite everything at once.

  • Week 1: define naming rules and tracking-plan format.
  • Week 2: clean top 20 events tied to critical KPIs.
  • Week 3: add CI checks and ownership metadata.
  • Week 4: backfill mapping tables and deprecate duplicates.

Ship quality improvements where decision risk is highest first.

Closing

Event schema design is not analytics bureaucracy. It is product decision infrastructure. Teams with disciplined schemas move faster because they trust their metrics and can run experiments without arguing about data quality.

Tags

Anthra AI Team

Anthra AI Team

Engineering Team

Collective posts from the engineers at Anthra AI. We write about what we build.

More posts by Anthra AI Team

Share this article

Share

Get insights like this weekly

Product engineering notes on AI, data, and infrastructure - no fluff.

Previous post

Choosing a vector database in 2026: a practical comparison

AI

Next post

Edge or origin? A decision framework for latency-sensitive features

Infra