Skip to content

Writing Specs

Specs are YAML files that declaratively describe what you want to measure and how to slice the data. There are three spec types: MetricSpec, SliceSpec, and SegmentSpec.

MetricSpec

A metric defines a single measurable quantity from a source table.

metric:
  name: ctr
  description: Click-through rate — ratio of clicks to impressions
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

Fields

Field Required Description
name Yes Unique identifier used in MetricCompute.compute()
source Yes Source URI — see Connectors for format
numerator Yes SQL expression containing an aggregate function call (SUM, AVG, COUNT, MIN, MAX)
timestamp_col Yes Column used for time_window filtering
denominator No SQL expression containing an aggregate function call. When present, the metric is computed as numerator / denominator (ratio).
entities No List of entity column names supported for by_entity disaggregation (e.g. [user_id, device_id]). Must be non-empty if provided.
description No Human-readable description

The aggregation type is inferred from the SQL function in numerator (and denominator). There is no separate aggregation field — write the aggregate directly in the expression.

Aggregation types

Ratio is implied by the presence of a denominator. Both numerator and denominator must contain an aggregate function.

metric:
  name: ctr
  description: Click-through rate — ratio of clicks to impressions
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
  entities: [platform, campaign_type, country]
metric:
  name: total_revenue
  description: Total revenue generated across all campaigns
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]
metric:
  name: avg_revenue
  description: Average revenue per campaign row
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "AVG(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]
metric:
  name: campaign_count
  description: Number of campaign rows
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "COUNT(*)"
  timestamp_col: date
  entities: [platform, campaign_type, country]
metric:
  name: max_revenue
  description: Peak revenue from a single campaign row
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "MAX(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]
metric:
  name: min_ad_spend
  description: Lowest ad spend entry
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "MIN(ad_spend)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

Entity columns

Use entities to declare which columns in the source table identify entities that the metric can be disaggregated by. At compute time, pass by_entity to MetricCompute.compute() to select which entity column to group by.

metric:
  name: revenue
  source: duckdb://analytics.db/transactions
  numerator: "SUM(amount)"
  timestamp_col: event_ts
  entities: [user_id, device_id]   # supports per-user or per-device breakdown

A metric without entities can still be computed normally — it simply cannot be disaggregated by entity. See Computing Metrics for usage.


SliceSpec

A slice defines a breakdown dimension — a set of mutually exclusive (or overlapping) filters applied to the metric query.

Leaf slice

A leaf slice defines the filter values directly:

slice:
  name: campaign_type
  description: Breakdown by ad campaign type
  values:
    - name: Search
      where: "campaign_type = 'Search'"
    - name: Display
      where: "campaign_type = 'Display'"
    - name: Video
      where: "campaign_type = 'Video'"
    - name: Shopping
      where: "campaign_type = 'Shopping'"

Composite slice (cross-product)

A composite slice computes the cross-product of two or more leaf slices:

slice:
  name: campaign_type_x_geo
  description: Campaign type broken down by geography
  cross_product:
    - campaign_type
    - geo

Note

Composite slices cannot reference other composite slices (no nesting). All referenced slices must be loaded into the same SpecCache.

Fields

Field Required Description
name Yes Unique identifier
values Leaf only List of {name, where} filter definitions
cross_product Composite only List of leaf slice names to cross
description No Human-readable description

SegmentSpec

A segment is similar to a slice but includes a source field — it can filter on a different table than the metric. This is useful when the breakdown dimension lives in a separate table.

segment:
  name: platform
  description: Breakdown by advertising platform
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  values:
    - name: Google Ads
      where: "platform = 'Google Ads'"
    - name: Meta Ads
      where: "platform = 'Meta Ads'"
    - name: TikTok Ads
      where: "platform = 'TikTok Ads'"

Fields

Field Required Description
name Yes Unique identifier
source Yes Source URI for this segment's table
values Yes List of {name, where} filter definitions
description No Human-readable description

Loading specs

Use SpecCache to load all specs before computing:

from aitaem import SpecCache

# From directories (loads all *.yaml / *.yml files)
cache = SpecCache.from_yaml(
    metric_paths="metrics/",
    slice_paths="slices/",
    segment_paths="segments/",
)

# From individual files
cache = SpecCache.from_yaml(
    metric_paths=["metrics/ctr.yaml", "metrics/total_revenue.yaml"],
    slice_paths="slices/campaign_type.yaml",
)

# From YAML strings (useful for testing)
cache = SpecCache.from_string(
    metric_yaml="""
metric:
  name: ctr
  source: duckdb://:memory:/events
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
""",
)