Two pipelines, one dashboard: how LawSignals tracks bills and news together
LawSignals runs two independent data pipelines that converge into a single category feed. Bills from scrapers across all 50 states. News from AI-driven semantic matching. Here is the architecture and why it matters for confident legislative monitoring.

The interesting architectural choice in LawSignals is not the AI matching, the BYOK key handling, or the per-state coverage. It is the decision to run two independent pipelines that converge at one layer.
That choice — bill ingestion as one pipeline, news intelligence as another, both writing into the same practice-area schema — is what makes the rest of the product work. This post walks through the architecture, why it is built this way, and what it means for users.
Why two pipelines, not one
Bills and news look superficially similar. Both are content. Both arrive over time. Both are filtered against user interests.
In every other dimension that matters, they are opposite:
| Dimension | Bills | News |
|---|---|---|
| Structure | Highly structured (sponsor, committee, version, action) | Unstructured prose |
| Source | Known set of state legislatures and Congress | Arbitrary RSS, APIs, sites |
| Schema stability | Stable within sessions | None to speak of |
| Cadence | Predictable per source | Continuous, bursty |
| Identification | Authoritative bill numbers | No canonical identifiers |
| Failure mode | Schema drift on site redesigns | Source removal, paywalls, bot detection |
| Matching technique | Keyword and metadata work fine | Semantic matching is required |
A monolithic pipeline that handles both ends up doing both badly. Either bill ingestion gets contaminated with semantic-matching machinery it does not need, or news ingestion gets squeezed into a structured schema it does not fit.
The right answer is two pipelines that share one downstream contract: write into the practice-area schema. Each pipeline can use the techniques appropriate to its data shape without forcing the other to compromise.
Pipeline one: the bill scraper
The bill scraper pipeline ingests legislative data from all 50 states, DC, and the US Congress.
Sources. Three tiers, mixed:
- Tier 1 direct APIs for California, Texas, New York, Illinois, Florida, and Congress
- Tier 2 normalized aggregator feeds (Open States, LegiScan) for daily-cadence states
- Tier 3 dedicated scrapers for states with HTML-only or inconsistent feeds
What it captures. Per bill: metadata, sponsors, committee assignments, full text at every version, the action timeline (introduction, referral, hearing, committee vote, floor vote, executive action), and committee hearing schedules where they are publishable.
Cadence. Per state, per signal. Top-five states refresh on minute granularity for status changes. Mid-tier states refresh hourly to daily. The cadence is labeled per state in the dashboard, not hidden behind a uniform “real-time” claim.
Matching. Keyword and metadata against your tracked categories. If you track “AI regulation,” every bill whose text or title mentions the relevant terms in any state legislature gets surfaced.
Failure detection. Schema-drift monitors run continuously. When a state legislature redesigns its site, our scrapers fail loudly, not silently. Coverage gaps trigger pages, not next-week emails.
Output. Structured records into the practice-area schema. Each record carries provenance (source, fetch timestamp, scraper version) so downstream consumers can reason about freshness.
Pipeline two: news intelligence
The news intelligence pipeline ingests unstructured news content and matches it semantically against tracked bills and practice areas.
Sources. Default coverage of the major legal and policy trade press. Customer-configurable: add RSS feeds, news APIs, regulator press release pages, industry blogs, anything reachable.
What it captures. Per article: title, body, publication, author, date, URL. Layout and ads stripped. Quotes and structure preserved.
Embedding. Each article is converted into a semantic vector. Bills are converted with the same model, on a separate cadence. Vectors live in a per-tenant index.
Matching. Articles are scored against bills and against practice-area descriptions. Above a confidence threshold, matches are written. Below the threshold, the article still attaches to a category if its top score is in the category’s vector neighborhood.
BYOK AI. Embedding runs on the customer’s API key by default. Article and bill text are processed through the customer tenant. This matters for regulated industries.
Failure detection. Source health monitors detect when a feed stops, when a site adds bot detection, when content quality drops. Failed sources alert customer admins without blocking the rest of the pipeline.
Output. News records into the same practice-area schema, with explicit links to matched bills where confidence is high.
The two pipelines run on different infrastructure, different deploy cadences, and different on-call rotations. Coupling them tightly would make every change in one require regression testing in the other. Loose coupling is what lets us iterate on the AI matching layer without risking bill ingestion reliability.
The convergence layer
Both pipelines write into the same practice-area schema. That schema is the contract. It looks roughly like:
- Practice area: durable description of what the customer cares about, plus the keywords and embedding used for matching
- Bill: structured legislative record, attached to zero or more practice areas
- Article: news record, attached to zero or more practice areas, optionally linked to specific bills
- Event: a discrete signal (status change, new match, hearing scheduled, news linked) that drives alerts
The dashboard, the alert engine, the export layer, and the search index all consume the schema. None of them know which pipeline produced a given record. They do not need to.
This is what loose coupling buys: the matching pipeline can swap embedding models, the bill pipeline can rewrite a state’s scraper, and nothing else has to change.
What this means for users
A user-facing consequence of the architecture: the feed for a practice area is genuinely unified. You do not see a “bills tab” and a “news tab.” You see a chronological feed of every signal that matched the practice area, with bills and news interleaved by recency.
A bill drops, and the recent news coverage about its underlying topic is already attached. A news article publishes, and it appears in the feed even if no bill exists yet to attach it to. When the bill eventually drops, the article gets linked retroactively.
This is the workflow most teams want and few products deliver, because most products were built as bill trackers that bolted news on later (or as news readers that bolted bills on later). The two-pipeline-one-schema architecture is the way to get to the unified feed without compromising either input.
If you are evaluating tracking products, ask the vendor to describe how their bill data and news data are joined. The answer tells you whether the unified feed you saw in the demo is real or a screenshot.
Why we did not build one pipeline
We considered the unified pipeline first. It is conceptually cleaner. It has fewer moving parts. It would have shipped faster.
It would also have been worse for users. The constraints that bill ingestion needs (schema fidelity, low-latency status detection, per-state failure isolation) are different from the constraints that news ingestion needs (semantic flexibility, source health monitoring, vector index management). Forcing both into the same pipeline would have meant a worse bill experience to support news, or a worse news experience to support bills.
Splitting them was a slower build. It produced a better product. We are not the first team to make this trade. Most mature data platforms eventually arrive at “specialized ingestion, unified storage, unified access” because the alternative is a slow death by compromise.
Where LawSignals fits
LawSignals runs the two-pipeline architecture in production for legal and compliance teams across all 50 states and Congress. Bills, news, semantic matching, BYOK-AI, all converging on a per-practice-area feed.
If you are evaluating tracking platforms, book a demo and we will walk through the architecture with your specific practice areas configured against both pipelines.
