Writing Specs

Specs are YAML files that declaratively describe what you want to measure and how to slice the data. There are three spec types: MetricSpec, SliceSpec, and SegmentSpec.

MetricSpec

A metric defines a single measurable quantity from a source table.

metric:
  name: ctr
  description: Click-through rate — ratio of clicks to impressions
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

Fields

Field	Required	Description
`name`	Yes	Unique identifier used in `MetricCompute.compute()`
`source`	Yes	Source URI — see Connectors for format
`numerator`	Yes	SQL expression containing an aggregate function call (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`)
`timestamp_col`	Yes	Column used for `time_window` filtering
`denominator`	No	SQL expression containing an aggregate function call. When present, the metric is computed as `numerator / denominator` (ratio).
`entities`	No	List of entity column names supported for `by_entity` disaggregation (e.g. `[user_id, device_id]`). Must be non-empty if provided.
`description`	No	Human-readable description

The aggregation type is inferred from the SQL function in numerator (and denominator). There is no separate aggregation field — write the aggregate directly in the expression.

Aggregation types

ratiosumavgcountmaxmin

Ratio is implied by the presence of a denominator. Both numerator and denominator must contain an aggregate function.

metric:
  name: ctr
  description: Click-through rate — ratio of clicks to impressions
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

metric:
  name: total_revenue
  description: Total revenue generated across all campaigns
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "SUM(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

metric:
  name: avg_revenue
  description: Average revenue per campaign row
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "AVG(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

metric:
  name: campaign_count
  description: Number of campaign rows
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "COUNT(*)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

metric:
  name: max_revenue
  description: Peak revenue from a single campaign row
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "MAX(revenue)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

metric:
  name: min_ad_spend
  description: Lowest ad spend entry
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  numerator: "MIN(ad_spend)"
  timestamp_col: date
  entities: [platform, campaign_type, country]

Entity columns

Use entities to declare which columns in the source table identify entities that the metric can be disaggregated by. At compute time, pass by_entity to MetricCompute.compute() to select which entity column to group by.

metric:
  name: revenue
  source: duckdb://analytics.db/transactions
  numerator: "SUM(amount)"
  timestamp_col: event_ts
  entities: [user_id, device_id]   # supports per-user or per-device breakdown

A metric without entities can still be computed normally — it simply cannot be disaggregated by entity. See Computing Metrics for usage.

SliceSpec

A slice defines a breakdown dimension — a set of mutually exclusive (or overlapping) filters applied to the metric query.

Leaf slice

A leaf slice defines the filter values directly:

slice:
  name: campaign_type
  description: Breakdown by ad campaign type
  values:
    - name: Search
      where: "campaign_type = 'Search'"
    - name: Display
      where: "campaign_type = 'Display'"
    - name: Video
      where: "campaign_type = 'Video'"
    - name: Shopping
      where: "campaign_type = 'Shopping'"

Composite slice (cross-product)

A composite slice computes the cross-product of two or more leaf slices:

slice:
  name: campaign_type_x_geo
  description: Campaign type broken down by geography
  cross_product:
    - campaign_type
    - geo

Note

Composite slices cannot reference other composite slices (no nesting). All referenced slices must be loaded into the same SpecCache.

Fields

Field	Required	Description
`name`	Yes	Unique identifier
`values`	Leaf only	List of `{name, where}` filter definitions
`cross_product`	Composite only	List of leaf slice names to cross
`description`	No	Human-readable description

SegmentSpec

A segment is similar to a slice but includes a source field — it can filter on a different table than the metric. This is useful when the breakdown dimension lives in a separate table.

segment:
  name: platform
  description: Breakdown by advertising platform
  source: duckdb://ad_campaigns.duckdb/ad_campaigns
  values:
    - name: Google Ads
      where: "platform = 'Google Ads'"
    - name: Meta Ads
      where: "platform = 'Meta Ads'"
    - name: TikTok Ads
      where: "platform = 'TikTok Ads'"

Fields

Field	Required	Description
`name`	Yes	Unique identifier
`source`	Yes	Source URI for this segment's table
`values`	Yes	List of `{name, where}` filter definitions
`description`	No	Human-readable description

Loading specs

Use SpecCache to load all specs before computing:

from aitaem import SpecCache

# From directories (loads all *.yaml / *.yml files)
cache = SpecCache.from_yaml(
    metric_paths="metrics/",
    slice_paths="slices/",
    segment_paths="segments/",
)

# From individual files
cache = SpecCache.from_yaml(
    metric_paths=["metrics/ctr.yaml", "metrics/total_revenue.yaml"],
    slice_paths="slices/campaign_type.yaml",
)

# From YAML strings (useful for testing)
cache = SpecCache.from_string(
    metric_yaml="""
metric:
  name: ctr
  source: duckdb://:memory:/events
  numerator: "SUM(clicks)"
  denominator: "SUM(impressions)"
  timestamp_col: date
""",
)