Skip to content
ownlisted
ResearchCoverageMethodologyAboutPress
DATA · MAY 3, 2026
  • Research
  • Coverage
  • About
Data provenance · The data graph

The provider graph behind OwnListed.

Every displayed field needs a source, a date, and a display rule. This page is the public map of how OwnListed turns federal and state public records into source-cited provider data — with the limitations CMS itself documents.

Today the graph carries 12 registered sources, 574 per-business source matches, and 2,758 field-level provenance rows.

Read the research→See the methodology
Provenance register
v1.1
Last reviewed
2026-05-03
Snapshot
2026-05-04

The data graph · one second

Public records → cited fields, in five gated stages.

Eleven public-record sources flow through five contracted stages into four output surfaces. Drop any stage and the field doesn’t render.

Sources

11 registered · grouped by family

  • Healthcare graph

    Provider identity + Medicare enrollment

    • CMS NPPES
    • CMS PECOS
  • Care quality

    CMS facility quality data

    • CMS Care Compare

What is in the graph today

Snapshot 2026-05-04
  • 113,537providers trackedActive rows in `businesses` across all 40 verticals
  • 40active verticalsEach carries its own archetype + display contract
  • 12registered sourcesFederal + state public-record datasets · 5 Tier-2 profile-enrichment
  • 574provider-source matchesPer-business source-record links recorded with confidence
  • 2,758provenance field rowsEach row carries source + last-checked-date + value
  • 9research snapshotsTier-1 dated aggregates published to /research

These counters reflect the live state of the OwnListed Supabase schema. Each one is a real select count(*) on the named table — no invented numbers, no marketing inflation.

The pipeline, end to end

Seven stages from public source to displayed field.

Every fact OwnListed shows traces back through these seven stages. If a stage is missing — no source, no manifest, no ingestion run, no match, no provenance row, no display permission — the fact does not render.

  1. STAGE 01

    Public source

    An official, public-record dataset published by a federal or state agency: NPPES, CMS PECOS, FL DBPR, CMS Care Compare, BLS, HRSA, US Census.

    data.cms.gov · myfloridalicense.com · npiregistry.cms.hhs.gov
  2. STAGE 02

    Source pack

    A typed manifest for each public source. Names the slug, tier, allowed verticals + states, fetch method, ToS notes, refresh cadence, and field-level display rules.

    src/lib/data-ingestion/source-packs/*.ts
  3. STAGE 03

    Ingestion run

    A dated execution of a pilot script that downloads or queries the source. Records fetch method (rest-api / bulk-file / manual-csv), record count, and run notes.

    ingestion_runs
  4. STAGE 04

    Entity match

    Per-business link from a source record to a platform business via name + city + state + phone + ZIP jaccard scoring. Confidence ≥ 0.75 to write; ambiguous cases logged but not displayed.

    provider_source_matches · entity_match_log
  5. STAGE 05

    Field provenance

    One row per (business, source, field). Carries the source value, normalized value, last-checked date, ingestion-run pointer, and the display_allowed flag from the manifest.

    provider_field_provenance
The graph, by source family

Four source families. One provenance contract.

We organize OwnListed by the public datasets that back it — not by a flat list of 40 vertical domains. Every cluster ships with a registered source, an ingestion runner, and a published research asset.

Browse the full source library →

  • Cluster

    Healthcare graph

    NPPES + CMS PECOS link providers to the federal NPI registry and Medicare-billing-active records. The taxonomy + enrollment surface that source-backs dermatology and chiropractic profiles today.

    Source:CMS NPPES·Checked May 2026Source:CMS PECOS·Checked May 2026
Field-level provenance

Four real examples — one per source family.

Each card shows the field as it would render on the relevant OwnListed surface, sourced verbatim from the public dataset, with the “what it means” and “what it does not mean” framing CMS itself documents.

  • Healthcare graph

    National Provider Identifier

    1447884929
    Source:CMS NPPES·Checked May 2026
    What it means. CMS issued this 10-digit NPI to a dermatology practice in Phoenix, AZ in 2020 under the MOHS-Micrographic Surgery taxonomy. The NPI is the federal HIPAA identifier used on every Medicare claim and most commercial insurance claims.
    What it does not mean. The NPI is an identifier, not a quality endorsement. It does not certify board status, active clinical practice, malpractice history, or current patient acceptance. OwnListed does not independently rate, inspect, verify, endorse, or guarantee the provider.
    field_key: npi
Display rules

When a field is in the database but not on the page.

The same data that flows through stage 5 (field provenance) does not automatically flow through to stage 7 (display). Four rules govern which fields render, and where.

  1. Rule 1

    Why don't BLS / BEA / HRSA / Census numbers appear on individual provider profiles?

    Aggregate datasets (BLS Occupational Employment & Wage Statistics, BEA Regional Income, HRSA Health Professional Shortage Areas, US Census state population) describe markets, not individuals. Attaching a state-mean wage figure to one HVAC contractor's listing would imply a per-business signal the data doesn't carry. They render only on /research aggregate pages.

    How it's enforcedSource-pack manifest tier='tier1-research-only' + the data:bls-bea-not-in-profile-components launch gate.
  2. Rule 2

    Why don't low-confidence matches display?

    Each match between a public-record row and an OwnListed business is scored by name + city + state + phone + ZIP jaccard similarity. Confidence below 0.75 is logged in entity_match_log but never written to provider_field_provenance. Ambiguous matches (top-2 within 0.05 + both ≥ 0.75) are suppressed too — when CMS lists two providers under similar names in the same city and we can't disambiguate, no row is written for either.

    How it's enforcedMatch-confidence threshold encoded in every source-pack manifest (confidence.threshold_high). Mirrored to the entity_match_log audit trail (currently 2,001 logged attempts).
  3. Rule 3
Source classes

Five classes the dataset can fall into.

Every field on every page maps to one of five source classes. The class determines storage rights, refresh cadence, and what attribution must appear next to the value.

  • PUBLIC_RECORD

    Public record (federal / state)

    Sourced from federal or state public-record datasets. No copyright restriction. Safe to store and display indefinitely with attribution. Backs every Tier-2 source-pack.

    Examples: CMS NPPES NPI registry, CMS PECOS Medicare enrollment, CMS Care Compare, FL DBPR construction-license file, CSLB CPRA responses, AZ ROC public-records data.

  • RESEARCH_AGGREGATE

    Research aggregate (Tier-1)

    Sourced from federal aggregate datasets that describe markets, not individuals. Renders only on /research surfaces — never attached to per-business profiles.

    Examples: BLS Occupational Employment & Wage Statistics (OEWS), BLS QCEW, BEA Regional, HRSA HPSA, US Census state population.

  • OWNED

    Owned by OwnListed

    Generated or assigned by OwnListed's own systems. Full rights to store and display indefinitely.

    Examples: Listing identifiers, slugs, vertical IDs, claim flow timestamps, tier flags.

  • OWNER_SUBMITTED

    Owner submitted

    Provided by a logged-in business owner via the claim or owner-portal flow. Storage rights granted by the owner via the submission terms.

01 · Identity / NAP

Name, address, phone, website. Independently public-record where the business is operating publicly, but the canonical source on unclaimed listings is Google Business Profiles.

FieldSource (unclaimed)Source (claimed)RetentionRefresh
Business nameGoogle (cached)Owner submittedIndefinite, with refreshOn scheduled snapshot
AddressGoogle (cached)Owner submittedIndefinite, with refreshOn scheduled snapshot
PhoneGoogle (cached)Owner submittedIndefinite, with refreshOn scheduled snapshot
WebsiteGoogle (cached)Owner submittedIndefinite, with refreshOn scheduled snapshot
EmailNot displayedOwner submittedIndefiniteOwner-driven
02 · Source-cited provenance fields

Per-business fields written via the §94/§104 provenance framework. Each row in provider_field_provenance carries source + last-checked date + display permission.

FieldSource (unclaimed)Source (claimed)RetentionRefresh
NPI (NPPES)CMS NPPES (matched)CMS NPPES (matched)Indefinite, refreshed quarterlyManifest refresh_cadence_days
Provider taxonomy code (NPPES)CMS NPPES (matched)CMS NPPES (matched)IndefiniteQuarterly
Medicare PECOS enrollmentCMS PECOS (matched)CMS PECOS (matched)IndefiniteMonthly per CMS publish cadence
State contractor license #FL DBPR / AZ ROC / CSLB (matched)SameIndefiniteMonthly to quarterly per source
CMS overall star rating (where applicable)CMS Care Compare (matched)SameIndefiniteQuarterly
03 · Reputation signals

Rating + review count. Sourced from Google. Aggregated only at the per-business level — never aggregated to the directory level.

FieldSource (unclaimed)Source (claimed)RetentionRefresh
RatingGoogle (cached)Google (cached)TTL-cached, refreshed on snapshotWeekly for highly-rated segment; on scheduled snapshot otherwise
Review countGoogle (cached)Google (cached)TTL-cached, refreshed on snapshotWeekly for highly-rated segment; on scheduled snapshot otherwise
Individual review text or authorNot storedNot storedn/an/a
04 · OwnListed-owned state

Internal flags, identifiers, claim history, billing tier. Generated by OwnListed; never shared with third parties without consent.

FieldSource (unclaimed)Source (claimed)RetentionRefresh
Listing identifier (UUID, slug)Owned by OwnListedOwned by OwnListedIndefiniteOn rename only
Claim status, tier, billingn/aOwned by OwnListedIndefinite while claimedOn owner action
Created / updated timestampsOwned by OwnListedOwned by OwnListedIndefiniteAuto
What OwnListed does NOT claim

We cite source-backed facts. We do not pay-to-rank, issue trust badges, invent ratings, award providers, or claim license credentials we cannot trace.

Every disclaimer in the homepage “What OwnListed does NOT claim” block applies on every /data-provenance, /research, and per-business surface. OwnListed does not independently rate, inspect, verify, endorse, or guarantee any provider — we cite the CMS, NPPES, FL DBPR, and other public-record sources that already measure them.

See also
  • Sources → The full source library — every dataset OwnListed cites, with tier, refresh cadence, fields used, and limitations.
  • Methodology → Network-wide sourcing, refresh cadence, and corrections policy.
  • Home Health methodology → The §118-§119 staged-vertical doctrine for CMS Care Compare data.
  • Research → Tier-1 dated aggregates published from the source graph.
  • Editorial policy → Independence, sourcing, conflicts, corrections, retractions.
  • Corrections log → Every accepted correction, dated, with the cause named.
  • Data & press kit → Cite our data, request a custom export, or reach press.
ownlisted

An independent research organization studying the local economy.


RESEARCH

  • Research hub
  • All studies
  • Data platform
  • Press kit

NETWORK

  • Coverage
  • Healthcare graph
  • Trades graph
  • Indexed coverage

ABOUT

  • Mission
  • Methodology
  • Editorial policy
  • Corrections log
  • Press kit
  • Contact

SUBSCRIBE

The monthly research digest. One email, first of each month. Unsubscribe anytime.


© 2026 OWNLISTED RESEARCH · DATA SNAPSHOT MAY 3, 2026 · BUILT WITH CARE

  • X
  • LINKEDIN
  • PRESS
  • STAGE 06

    Display permission

    Field-level gate on the manifest. display_allowed:false fields are captured to provenance for traceability but never rendered. Some are write-locked pending operator copy review.

    manifest.fields[].display_allowed
  • STAGE 07

    Vertical / research surface

    Per-business displays on the listing detail page (Tier-2) or state-level aggregates on the /research hub (Tier-1). Aggregates never attach to individual profiles.

    src/app/v/* · /research/* · research_snapshots
  • The pipeline lives in code. Source-pack manifests at src/lib/data-ingestion/source-packs/; framework helpers at src/lib/data-ingestion/; pilot scripts at scripts/research/.

  • NPI + taxonomy on every matched provider
  • Medicare-fee-for-service enrollment status
  • §113 source-family expansion candidate verticals
  • See the dermatology study →
  • Cluster

    Trades graph

    Florida DBPR is the platform's bulk-CSV-published state-licensing source. Five contractor verticals carry state license numbers + classifications + statuses cited from the agency's own publication.

    Source:FL DBPR·Checked May 2026Source:AZ ROC·Checked May 2026Source:CSLB (CA)·Checked May 2026
    • State license number, classification, status, expire date
    • 5 contractor verticals: HVAC, roofers, plumbers, GC, pool-builders
    • FL DBPR is a federal-§119-Sunshine-Law publishing source
    See the HVAC study →
  • Cluster

    Care graph

    CMS Care Compare publishes facility-quality data for nursing homes, home health, hospice, and dialysis. Four research artifacts live; first Tier-2 vertical (home health) is staged.

    Source:CMS Care Compare·Snapshot May 2026
    • CMS overall star ratings (1.0–5.0) where CMS publishes them
    • 4 research snapshots: nursing-home, dialysis, home-health, hospice
    • Tier-2 home-health vertical readiness scored 37/45 (§118)
    See the Care Compare research →
  • Cluster

    Research graph

    BLS OEWS, BEA Regional, HRSA HPSA, and US Census state population. Tier-1 research-only sources that contextualize every Tier-2 vertical with employment, income, shortage-area, and per-capita data.

    Source:BLS OEWS·Snapshot May 2026Source:HRSA HPSA·Snapshot May 2026Source:US Census·Snapshot May 2026
    • Per-capita density, wage context, shortage-area context
    • Cited inside every Sprint-1 study and the Care Compare snapshots
    • Never attached to individual provider profiles
    Browse the research hub →
  • Healthcare graph

    Medicare-billing-active

    Yes — active in PECOS
    Source:CMS PECOS·Checked May 2026
    What it means. CMS PECOS publishes a monthly snapshot of providers actively enrolled in Medicare fee-for-service. A provider listed here is currently authorized to bill Medicare for the specialty + state on the row.
    What it does not mean. PECOS-active does not measure quality, panel size, or wait times. Providers can be high-quality and not enrolled in Medicare. Providers can be enrolled in Medicare and have a temporarily-suspended billing status that the snapshot doesn't reflect.
    field_key: pecos_enrollment_active
  • Trades graph

    Florida state contractor license

    CAC1814337 · Active
    Source:FL DBPR·Checked May 2026
    What it means. Florida DBPR's bulk-published Construction Industry Licensing Board file lists this license as Current. The license authorizes the holder to perform CAC-class HVAC work in Florida. Published under § 119.01(2)(b), F.S. (Sunshine Law).
    What it does not mean. A current state license does not measure workmanship quality, customer satisfaction, or insurance coverage. Bond / workers-comp / insurance / discipline fields exist in CMS DBPR adjacent products but are write-locked on OwnListed pending operator copy review.
    field_key: state_contractor_license_number
  • Care graph

    CMS overall star rating (nursing-home example)

    3.4 ★ avg (Arkansas) — published by CMS
    Source:CMS Care Compare·Snapshot May 2026
    What it means. CMS publishes a 1-5 overall star rating per Medicare/Medicaid-certified nursing home, built from health-inspection + staffing + quality-measure components per the CMS technical methodology. State means are descriptive of the publishing snapshot.
    What it does not mean. Rating differences between facilities can reflect measurement variation as much as quality variation. The rating is one signal — it does not substitute for clinical judgment, family-care decisions, or in-person evaluation. State-level aggregates never attach to individual facility profiles on OwnListed.
    field_key: cms_overall_star_rating
  • Why are some fields write-locked even when CMS publishes them?

    Fields with high brand-risk (Special Focus Facility status, fines, payment denials, abuse-icon flags, individual quality measures, risk-standardized readmission rates) are captured to provider_field_provenance for traceability — every cell is preserved — but their display_allowed flag stays false until operator copy review approves a renderable framing. The field exists in our database; it does not yet exist on a profile page.

    How it's enforcedPer-field display_allowed: false on the source-pack manifest. Examples: cms_special_focus_status, cms_total_health_deficiencies, cms_abuse_icon, state_contractor_bond_amount, state_contractor_disciplinary_history.
  • Rule 4

    Why do we show "missing" or "unknown" instead of a best guess?

    If the source dataset doesn't carry a value, we don't infer one. CMS NPPES doesn't publish a phone number for every NPI. CMS Care Compare leaves 35.8% of home-health agencies unrated. CMS Hospice General Information has no overall star rating. In every case the public surface shows the explicit absence — not a model-imputed guess — and the methodology page documents why.

    How it's enforcedManifest field-level rules + per-page methodology disclaimers + the §95 / §114 'unrated ≠ low-quality' doctrine sentence in every CMS-cited UI.
  • Examples: Owner-managed business description, services list (after claim), photos uploaded via owner portal, hours edits after claim.

  • GOOGLE_RESTRICTED

    Google Business Profile (cached)

    Sourced from Google Business Profiles. Subject to the Google Places terms — time-limited cache, attribution requirements, no bulk redistribution. Refreshed on a documented cadence and purged when its TTL expires.

    Examples: Rating, review count, hours, services menu (when from Google's category list), and the source of name / address / phone / website on listings that have not yet been claimed.

  • Trades graph

    State contractor licenses

    • FL DBPR
    • AZ ROC
    • CSLB (CA)
  • Federal context

    Tier-1 aggregates (research only)

    • BLS OEWS
    • HRSA HPSA
    • BEA Regional
    • US Census
  • →↓

    Pipeline

    5 contracted stages · all enforced in code

    1. 01

      Source pack

      Typed manifest + display rules

    2. 02

      Ingestion run

      Dated execution · record count

    3. 03

      Entity match

      Confidence ≥ 0.75 · ambiguous suppressed

    4. 04

      Field provenance

      1 row per (entity × source × field)

    5. 05

      Display permission

      Field-level gate from manifest

    →↓

    Surfaces

    4 output surfaces · public

    • Profile cardsPer-business sourced fields with last-checked
    • Research snapshotsState-level aggregates · published & dated
    • Coverage surfacesPer-vertical / per-state coverage map
    • Data exportsCSV / JSON dataset catalog
    • 11

      Sources registered

    • 113,548

      Providers tracked

    • 17

      Research snapshots

    • 5

      Source families

    Legend

    • Tier-2 · live
    • Tier-1 · research-only
    • Pending records request

    Source nodes link to /sources · pipeline detail at seven-stage breakdown.