Research data sources

Every source the research engine may draw on, named.

Fonteum Research is a long-form publishing surface, not a blogspun content site. The trustworthiness of every study depends on the source it was built from. This page lists the public-data classes we use, the Fonteum-owned signals we own outright, and the restricted classes we do not use as durable research fields.

Each card lists what the source is usable for, what it is not usable for, the refresh cadence we treat as canonical, the known limitations a citing journalist needs to weight, and the provenance posture (who owns the data and what may be republished).

Catalog version: v1.0
Last reviewed: 2026-04-30
Maintained by: Data Lead

01 · Public data

Federal datasets the research engine treats as primary.

Public-domain federal data — published by the U.S. Census Bureau, the Bureau of Labor Statistics, and the Bureau of Economic Analysis. Safe to redistribute. Safe to cite. The cadence is annual; studies scope their reference year explicitly.

Public data · federal
Census County Business Patterns
Usable for
- · County- and metro-level establishment counts by NAICS
- · Employment density per industry per geography
- · Establishment-size distribution (employees per business)
- · Year-over-year change in supply for a given industry footprint
Not usable for
- · Provider-level facts (no business names, no addresses)
- · Quality / reputation comparisons between providers
- · Real-time market conditions (CBP lags 18-24 months)
- · ZIP-code-level analysis (CBP zip-table coverage is partial and noisy)
Refresh cadence
Annual; ~18-24 months behind reference year
Provenance posture
U.S. federal public-domain data; safe to redistribute and cite with attribution.
Limitations
- · NAICS-to-vertical mapping is imperfect; some Fonteum verticals span multiple NAICS codes
- · Suppression rules redact small-cell counts (privacy protection) — small markets show as zero
- · Establishment counts include very small operations that may not appear in directory data
Publisher: U.S. Census Bureau · CBP ↗
Public data · federal
BLS Occupational Employment and Wage Statistics
Usable for
- · Occupation-level employment counts at the metro level
- · Median and 25th/75th-percentile wages per occupation per metro
- · Occupation density per 1,000 employed (concentration by metro)
- · Wage-pressure context for service-supply analyses
Not usable for
- · Per-business wage data (occupation-level only)
- · Self-employed / independent contractors at scale (OEWS is establishment-based)
- · Rapid market shifts (annual update cadence)
Refresh cadence
Annual; published the May after the reference year
Provenance posture
U.S. federal public-domain data; safe to redistribute and cite with attribution.
Limitations
- · OEWS occupation taxonomy (SOC) does not map 1:1 to Fonteum vertical categories
- · Some metro definitions changed between vintages — comparisons across vintages must align CBSA codes
- · Self-employed practitioners are systematically under-represented
Publisher: U.S. Bureau of Labor Statistics · OEWS ↗
Public data · federal
BEA regional economic accounts
Usable for
- · Per-capita personal income by metro and county
- · Industry GDP contribution at metro level
- · Demand-side proxies (consumer expenditure context)
- · Regional disposable-income context for service-spend studies
Not usable for
- · Provider-level revenue or profitability
- · Sub-county geographies (BEA stops at the county level)
- · Real-time conditions (annual / quarterly cadence)
Refresh cadence
Annual + quarterly state/metro updates
Provenance posture
U.S. federal public-domain data; safe to redistribute and cite with attribution.
Limitations
- · BEA industry classifications use NAICS at varying levels of detail by table — joins must align on the published level
- · Regional definitions occasionally re-vintage; longitudinal joins must reconcile geography codes
Publisher: U.S. Bureau of Economic Analysis · Regional ↗

02 · Fonteum-owned signals

Signals generated by Fonteum's own systems.

Aggregate signals derived from the indexed dataset and the claim funnel. Owned outright, safe to publish in aggregate, and the durability is on Fonteum's schedule — not a third party's. Owner-submitted content remains bound by the submission terms; we publish aggregate metrics, never identifying owner submissions.

Fonteum-owned · directory coverage
Indexed listing dataset
Usable for
- · Per-vertical, per-city listing counts for the Fonteum indexed footprint
- · Coverage breadth — how many cities have at least one listed provider in a vertical
- · Network composition — share of total indexed dataset per vertical
Not usable for
- · Total U.S. market share (the indexed dataset is not a representative sample)
- · Estimates of revenue, employment, or company size
- · Demand-side measurements (the indexed dataset is supply-side only)
Refresh cadence
On scheduled snapshots (currently quarterly)
Provenance posture
Fonteum-owned signal generated by our own systems. Safe to redistribute aggregate counts with attribution.
Limitations
- · Listing counts describe what we have indexed, not the universe
- · Coverage skews toward verticals that have shipped earlier in the network
- · City definitions follow the Fonteum city table, not Census CBSA boundaries
Fonteum-owned · profile completeness
Profile-completeness signal
Usable for
- · % of provider records with phone, website, hours, services populated
- · Per-vertical quality benchmarks (the 6-of-6 rubric)
- · Trend lines on completeness over time as snapshots accumulate
Not usable for
- · Quality of the underlying business itself (completeness is a metadata signal, not a service-quality signal)
- · Owner satisfaction (a separate signal entirely)
Refresh cadence
On scheduled snapshots (currently quarterly)
Provenance posture
Fonteum-owned signal computed from the indexed dataset. Safe to publish.
Limitations
- · Field coverage depends on the source — owner-claimed listings have higher completeness than unclaimed ones
- · Completeness on unclaimed listings is bounded by what is durably importable from public sources
Fonteum-owned · claim status
Owner-claim status
Usable for
- · % of provider records claimed per vertical / city / state
- · Claim-funnel completion benchmarks
- · Longitudinal claim-rate change for the published categories
Not usable for
- · Personal information about claimants
- · Revenue or billing-tier breakdowns at a level finer than published aggregates
Refresh cadence
Continuous as claims close; aggregated on snapshot cadence
Provenance posture
Fonteum-owned signal. Owner-submitted content remains under the submission terms; aggregate claim metrics are safe to publish.
Limitations
- · Claim status reflects opt-in by the owner, not external attestation of any business attribute
- · Owner-submitted fields after claim are labelled separately from public-source fields in the dataset
Fonteum-owned · safe crawlable fields
Safe crawlable business fields
Usable for
- · Categorical breakdowns derived from public, durably citable fields (industry, NAICS-equivalent, city/state)
- · Pattern analysis on business-name, address, and listing-shape attributes that are independently public-record
Not usable for
- · Republishing third-party content as Fonteum-owned
- · Reproducing reviews, photos, or descriptions sourced from restricted upstreams
Refresh cadence
Aligned with the indexed dataset snapshot cadence
Provenance posture
Owned by Fonteum's systems where the underlying field is independently public-record. See the data-provenance register for the field-by-field call.
Limitations
- · Whether a field is 'safe crawlable' depends on the source's terms; the data-provenance register is the canonical source of truth

03 · Restricted (not used as durable research source)

Classes we list publicly, then explicitly exclude.

Some data classes appear on listing pages under TTL-cached display terms but are not used as durable Fonteum-owned research inputs. Listing them here is a deliberate trust signal: the boundary is visible to a citing reader, and a future change in licensing posture is loud rather than silent.

Restricted · do not use as durable research source
Google Places / Google Business Profile fields
Usable for
- · Listing-level display under the existing TTL-cached terms (rating, review count, hours, photos)
- · Visitor-facing surfaces only — never as a durable research input
Not usable for
- · Durable Fonteum-owned research fields
- · Bulk redistribution
- · Aggregate research figures published as Fonteum-owned
- · Schema-level emission as first-party Fonteum aggregates
Refresh cadence
TTL-cached only; refreshed under Google Places terms
Provenance posture
Restricted under the Google Places / Business Profile terms. NOT used as a durable research source until licensing/provenance is resolved. Display only; never aggregate-published as Fonteum-owned.
Limitations
- · Google-restricted fields are subject to Google's terms — limited TTL, attribution, no bulk redistribution
- · Until the licensing/provenance posture is resolved, this source class is OFF-LIMITS for durable research outputs
- · The data-provenance register flags every Google-sourced field; research pipelines must exclude these classes by default
Publisher: Google Business Profile ↗

Field-by-field source assignments — including which Google-restricted fields appear on which listing surface and under which TTL — are in the public data-provenance register.