Every displayed field needs a source, a date, and a display rule. This page is the public map of how OwnListed turns federal and state public records into source-cited provider data — with the limitations CMS itself documents.
Today the graph carries registered sources, per-business source matches, and field-level provenance rows.
The data graph · one second
Eleven public-record sources flow through five contracted stages into four output surfaces. Drop any stage and the field doesn’t render.
Sources
11 registered · grouped by family
Healthcare graph
Provider identity + Medicare enrollment
Care quality
CMS facility quality data
These counters reflect the live state of the OwnListed Supabase schema. Each one is a real select count(*) on the named table — no invented numbers, no marketing inflation.
Every fact OwnListed shows traces back through these seven stages. If a stage is missing — no source, no manifest, no ingestion run, no match, no provenance row, no display permission — the fact does not render.
An official, public-record dataset published by a federal or state agency: NPPES, CMS PECOS, FL DBPR, CMS Care Compare, BLS, HRSA, US Census.
data.cms.gov · myfloridalicense.com · npiregistry.cms.hhs.govA typed manifest for each public source. Names the slug, tier, allowed verticals + states, fetch method, ToS notes, refresh cadence, and field-level display rules.
src/lib/data-ingestion/source-packs/*.tsA dated execution of a pilot script that downloads or queries the source. Records fetch method (rest-api / bulk-file / manual-csv), record count, and run notes.
ingestion_runsPer-business link from a source record to a platform business via name + city + state + phone + ZIP jaccard scoring. Confidence ≥ 0.75 to write; ambiguous cases logged but not displayed.
provider_source_matches · entity_match_logOne row per (business, source, field). Carries the source value, normalized value, last-checked date, ingestion-run pointer, and the display_allowed flag from the manifest.
provider_field_provenanceWe organize OwnListed by the public datasets that back it — not by a flat list of 40 vertical domains. Every cluster ships with a registered source, an ingestion runner, and a published research asset.
NPPES + CMS PECOS link providers to the federal NPI registry and Medicare-billing-active records. The taxonomy + enrollment surface that source-backs dermatology and chiropractic profiles today.
Each card shows the field as it would render on the relevant OwnListed surface, sourced verbatim from the public dataset, with the “what it means” and “what it does not mean” framing CMS itself documents.
The same data that flows through stage 5 (field provenance) does not automatically flow through to stage 7 (display). Four rules govern which fields render, and where.
Aggregate datasets (BLS Occupational Employment & Wage Statistics, BEA Regional Income, HRSA Health Professional Shortage Areas, US Census state population) describe markets, not individuals. Attaching a state-mean wage figure to one HVAC contractor's listing would imply a per-business signal the data doesn't carry. They render only on /research aggregate pages.
Each match between a public-record row and an OwnListed business is scored by name + city + state + phone + ZIP jaccard similarity. Confidence below 0.75 is logged in entity_match_log but never written to provider_field_provenance. Ambiguous matches (top-2 within 0.05 + both ≥ 0.75) are suppressed too — when CMS lists two providers under similar names in the same city and we can't disambiguate, no row is written for either.
Every field on every page maps to one of five source classes. The class determines storage rights, refresh cadence, and what attribution must appear next to the value.
Public record (federal / state)
Sourced from federal or state public-record datasets. No copyright restriction. Safe to store and display indefinitely with attribution. Backs every Tier-2 source-pack.
Examples: CMS NPPES NPI registry, CMS PECOS Medicare enrollment, CMS Care Compare, FL DBPR construction-license file, CSLB CPRA responses, AZ ROC public-records data.
Research aggregate (Tier-1)
Sourced from federal aggregate datasets that describe markets, not individuals. Renders only on /research surfaces — never attached to per-business profiles.
Examples: BLS Occupational Employment & Wage Statistics (OEWS), BLS QCEW, BEA Regional, HRSA HPSA, US Census state population.
Owned by OwnListed
Generated or assigned by OwnListed's own systems. Full rights to store and display indefinitely.
Examples: Listing identifiers, slugs, vertical IDs, claim flow timestamps, tier flags.
Owner submitted
Provided by a logged-in business owner via the claim or owner-portal flow. Storage rights granted by the owner via the submission terms.
Name, address, phone, website. Independently public-record where the business is operating publicly, but the canonical source on unclaimed listings is Google Business Profiles.
| Field | Source (unclaimed) | Source (claimed) | Retention | Refresh |
|---|---|---|---|---|
| Business name | Google (cached) | Owner submitted | Indefinite, with refresh | On scheduled snapshot |
| Address | Google (cached) | Owner submitted | Indefinite, with refresh | On scheduled snapshot |
| Phone | Google (cached) | Owner submitted | Indefinite, with refresh | On scheduled snapshot |
| Website | Google (cached) | Owner submitted | Indefinite, with refresh | On scheduled snapshot |
| Not displayed | Owner submitted | Indefinite | Owner-driven |
Per-business fields written via the §94/§104 provenance framework. Each row in provider_field_provenance carries source + last-checked date + display permission.
| Field | Source (unclaimed) | Source (claimed) | Retention | Refresh |
|---|---|---|---|---|
| NPI (NPPES) | CMS NPPES (matched) | CMS NPPES (matched) | Indefinite, refreshed quarterly | Manifest refresh_cadence_days |
| Provider taxonomy code (NPPES) | CMS NPPES (matched) | CMS NPPES (matched) | Indefinite | Quarterly |
| Medicare PECOS enrollment | CMS PECOS (matched) | CMS PECOS (matched) | Indefinite | Monthly per CMS publish cadence |
| State contractor license # | FL DBPR / AZ ROC / CSLB (matched) | Same | Indefinite | Monthly to quarterly per source |
| CMS overall star rating (where applicable) | CMS Care Compare (matched) | Same | Indefinite | Quarterly |
Rating + review count. Sourced from Google. Aggregated only at the per-business level — never aggregated to the directory level.
| Field | Source (unclaimed) | Source (claimed) | Retention | Refresh |
|---|---|---|---|---|
| Rating | Google (cached) | Google (cached) | TTL-cached, refreshed on snapshot | Weekly for highly-rated segment; on scheduled snapshot otherwise |
| Review count | Google (cached) | Google (cached) | TTL-cached, refreshed on snapshot | Weekly for highly-rated segment; on scheduled snapshot otherwise |
| Individual review text or author | Not stored | Not stored | n/a | n/a |
Internal flags, identifiers, claim history, billing tier. Generated by OwnListed; never shared with third parties without consent.
| Field | Source (unclaimed) | Source (claimed) | Retention | Refresh |
|---|---|---|---|---|
| Listing identifier (UUID, slug) | Owned by OwnListed | Owned by OwnListed | Indefinite | On rename only |
| Claim status, tier, billing | n/a | Owned by OwnListed | Indefinite while claimed | On owner action |
| Created / updated timestamps | Owned by OwnListed | Owned by OwnListed | Indefinite | Auto |
Every disclaimer in the homepage “What OwnListed does NOT claim” block applies on every /data-provenance, /research, and per-business surface. OwnListed does not independently rate, inspect, verify, endorse, or guarantee any provider — we cite the CMS, NPPES, FL DBPR, and other public-record sources that already measure them.
Field-level gate on the manifest. display_allowed:false fields are captured to provenance for traceability but never rendered. Some are write-locked pending operator copy review.
manifest.fields[].display_allowedPer-business displays on the listing detail page (Tier-2) or state-level aggregates on the /research hub (Tier-1). Aggregates never attach to individual profiles.
src/app/v/* · /research/* · research_snapshotsThe pipeline lives in code. Source-pack manifests at src/lib/data-ingestion/source-packs/; framework helpers at src/lib/data-ingestion/; pilot scripts at scripts/research/.
Florida DBPR is the platform's bulk-CSV-published state-licensing source. Five contractor verticals carry state license numbers + classifications + statuses cited from the agency's own publication.
CMS Care Compare publishes facility-quality data for nursing homes, home health, hospice, and dialysis. Four research artifacts live; first Tier-2 vertical (home health) is staged.
BLS OEWS, BEA Regional, HRSA HPSA, and US Census state population. Tier-1 research-only sources that contextualize every Tier-2 vertical with employment, income, shortage-area, and per-capita data.
Fields with high brand-risk (Special Focus Facility status, fines, payment denials, abuse-icon flags, individual quality measures, risk-standardized readmission rates) are captured to provider_field_provenance for traceability — every cell is preserved — but their display_allowed flag stays false until operator copy review approves a renderable framing. The field exists in our database; it does not yet exist on a profile page.
If the source dataset doesn't carry a value, we don't infer one. CMS NPPES doesn't publish a phone number for every NPI. CMS Care Compare leaves 35.8% of home-health agencies unrated. CMS Hospice General Information has no overall star rating. In every case the public surface shows the explicit absence — not a model-imputed guess — and the methodology page documents why.
Examples: Owner-managed business description, services list (after claim), photos uploaded via owner portal, hours edits after claim.
Google Business Profile (cached)
Sourced from Google Business Profiles. Subject to the Google Places terms — time-limited cache, attribution requirements, no bulk redistribution. Refreshed on a documented cadence and purged when its TTL expires.
Examples: Rating, review count, hours, services menu (when from Google's category list), and the source of name / address / phone / website on listings that have not yet been claimed.
Federal context
Tier-1 aggregates (research only)
Pipeline
5 contracted stages · all enforced in code
Source pack
Typed manifest + display rules
Ingestion run
Dated execution · record count
Entity match
Confidence ≥ 0.75 · ambiguous suppressed
Field provenance
1 row per (entity × source × field)
Display permission
Field-level gate from manifest
11
Sources registered
113,548
Providers tracked
17
Research snapshots
5
Source families
Legend
Source nodes link to /sources · pipeline detail at seven-stage breakdown.