Why does LawSignals run two separate pipelines?

Bills and news are fundamentally different data shapes. Bills are structured, schema-stable, and arrive on predictable cadences from a known set of sources. News is unstructured, semantically variable, and arrives continuously from arbitrary sources. Building one pipeline to handle both forces compromises that hurt both halves. Two pipelines that converge at the practice-area layer let each pipeline do its job well.

How do the bill and news pipelines stay in sync?

They do not need to be in time-sync. They write into the same category and practice-area schema. When a news article matches a tracked bill, the article is linked to the bill record. When a bill drops that matches a recent news cluster, the bill is linked to the existing articles. The schema is the integration point.

What happens when one pipeline goes down?

The other keeps working. If the news pipeline has an outage, bills continue to be tracked, status changes continue to alert. If the bill scraper for a specific state breaks, news for that state and category still flows. Loose coupling is the point. A monolithic pipeline would force shared failure modes.

Does the architecture affect what users see?

Users should not have to think about it. The dashboard shows one feed per practice area, ordered by recency. Whether a signal came from the bill pipeline or the news pipeline is metadata, not navigation. The architecture exists to make the unified view reliable, not to expose internal structure to users.

Can I use just one of the pipelines?

Yes. Some teams care only about formal legislative actions and turn off news ingestion. Other teams use the news pipeline against custom RSS feeds (industry blogs, regulator announcements) without tracking specific bills. The pipelines are independent. Most teams use both because the combined view is the differentiator.

All posts

Architecture Data Engineering Legislative Tracking AI

Two pipelines, one dashboard: how LawSignals tracks bills and news together

LawSignals runs two independent data pipelines that converge into a single category feed. Bills from scrapers across all 50 states. News from AI-driven semantic matching. Here is the architecture and why it matters for confident legislative monitoring.

By LawSignalsMay 1, 20267 min read

Two-pipeline architecture: bill scraper and news intelligence

The interesting architectural choice in LawSignals is not the AI matching, the BYOK key handling, or the per-state coverage. It is the decision to run two independent pipelines that converge at one layer.

That choice — bill ingestion as one pipeline, news intelligence as another, both writing into the same practice-area schema — is what makes the rest of the product work. This post walks through the architecture, why it is built this way, and what it means for users.

Why two pipelines, not one

Bills and news look superficially similar. Both are content. Both arrive over time. Both are filtered against user interests.

In every other dimension that matters, they are opposite:

Dimension	Bills	News
Structure	Highly structured (sponsor, committee, version, action)	Unstructured prose
Source	Known set of state legislatures and Congress	Arbitrary RSS, APIs, sites
Schema stability	Stable within sessions	None to speak of
Cadence	Predictable per source	Continuous, bursty
Identification	Authoritative bill numbers	No canonical identifiers
Failure mode	Schema drift on site redesigns	Source removal, paywalls, bot detection
Matching technique	Keyword and metadata work fine	Semantic matching is required

A monolithic pipeline that handles both ends up doing both badly. Either bill ingestion gets contaminated with semantic-matching machinery it does not need, or news ingestion gets squeezed into a structured schema it does not fit.

The right answer is two pipelines that share one downstream contract: write into the practice-area schema. Each pipeline can use the techniques appropriate to its data shape without forcing the other to compromise.

Pipeline one: the bill scraper

The bill scraper pipeline ingests legislative data from all 50 states, DC, and the US Congress.

Sources. Three tiers, mixed:

Tier 1 direct APIs for California, Texas, New York, Illinois, Florida, and Congress
Tier 2 normalized aggregator feeds (Open States, LegiScan) for daily-cadence states
Tier 3 dedicated scrapers for states with HTML-only or inconsistent feeds

What it captures. Per bill: metadata, sponsors, committee assignments, full text at every version, the action timeline (introduction, referral, hearing, committee vote, floor vote, executive action), and committee hearing schedules where they are publishable.

Cadence. Per state, per signal. Top-five states refresh on minute granularity for status changes. Mid-tier states refresh hourly to daily. The cadence is labeled per state in the dashboard, not hidden behind a uniform “real-time” claim.

Matching. Keyword and metadata against your tracked categories. If you track “AI regulation,” every bill whose text or title mentions the relevant terms in any state legislature gets surfaced.

Failure detection. Schema-drift monitors run continuously. When a state legislature redesigns its site, our scrapers fail loudly, not silently. Coverage gaps trigger pages, not next-week emails.

Output. Structured records into the practice-area schema. Each record carries provenance (source, fetch timestamp, scraper version) so downstream consumers can reason about freshness.

Pipeline two: news intelligence

The news intelligence pipeline ingests unstructured news content and matches it semantically against tracked bills and practice areas.

Sources. Default coverage of the major legal and policy trade press. Customer-configurable: add RSS feeds, news APIs, regulator press release pages, industry blogs, anything reachable.

What it captures. Per article: title, body, publication, author, date, URL. Layout and ads stripped. Quotes and structure preserved.

Embedding. Each article is converted into a semantic vector. Bills are converted with the same model, on a separate cadence. Vectors live in a per-tenant index.

Matching. Articles are scored against bills and against practice-area descriptions. Above a confidence threshold, matches are written. Below the threshold, the article still attaches to a category if its top score is in the category’s vector neighborhood.

BYOK AI. Embedding runs on the customer’s API key by default. Article and bill text are processed through the customer tenant. This matters for regulated industries.

Failure detection. Source health monitors detect when a feed stops, when a site adds bot detection, when content quality drops. Failed sources alert customer admins without blocking the rest of the pipeline.

Output. News records into the same practice-area schema, with explicit links to matched bills where confidence is high.

The two pipelines run on different infrastructure, different deploy cadences, and different on-call rotations. Coupling them tightly would make every change in one require regression testing in the other. Loose coupling is what lets us iterate on the AI matching layer without risking bill ingestion reliability.

The convergence layer

Both pipelines write into the same practice-area schema. That schema is the contract. It looks roughly like:

Practice area: durable description of what the customer cares about, plus the keywords and embedding used for matching
Bill: structured legislative record, attached to zero or more practice areas
Article: news record, attached to zero or more practice areas, optionally linked to specific bills
Event: a discrete signal (status change, new match, hearing scheduled, news linked) that drives alerts

The dashboard, the alert engine, the export layer, and the search index all consume the schema. None of them know which pipeline produced a given record. They do not need to.

This is what loose coupling buys: the matching pipeline can swap embedding models, the bill pipeline can rewrite a state’s scraper, and nothing else has to change.

What this means for users

A user-facing consequence of the architecture: the feed for a practice area is genuinely unified. You do not see a “bills tab” and a “news tab.” You see a chronological feed of every signal that matched the practice area, with bills and news interleaved by recency.

A bill drops, and the recent news coverage about its underlying topic is already attached. A news article publishes, and it appears in the feed even if no bill exists yet to attach it to. When the bill eventually drops, the article gets linked retroactively.

This is the workflow most teams want and few products deliver, because most products were built as bill trackers that bolted news on later (or as news readers that bolted bills on later). The two-pipeline-one-schema architecture is the way to get to the unified feed without compromising either input.

If you are evaluating tracking products, ask the vendor to describe how their bill data and news data are joined. The answer tells you whether the unified feed you saw in the demo is real or a screenshot.

Why we did not build one pipeline

We considered the unified pipeline first. It is conceptually cleaner. It has fewer moving parts. It would have shipped faster.

It would also have been worse for users. The constraints that bill ingestion needs (schema fidelity, low-latency status detection, per-state failure isolation) are different from the constraints that news ingestion needs (semantic flexibility, source health monitoring, vector index management). Forcing both into the same pipeline would have meant a worse bill experience to support news, or a worse news experience to support bills.

Splitting them was a slower build. It produced a better product. We are not the first team to make this trade. Most mature data platforms eventually arrive at “specialized ingestion, unified storage, unified access” because the alternative is a slow death by compromise.

Where LawSignals fits

LawSignals runs the two-pipeline architecture in production for legal and compliance teams across all 50 states and Congress. Bills, news, semantic matching, BYOK-AI, all converging on a per-practice-area feed.

If you are evaluating tracking platforms, book a demo and we will walk through the architecture with your specific practice areas configured against both pipelines.

Share: Post Share

Two pipelines, one dashboard: how LawSignals tracks bills and news together

Why two pipelines, not one

Pipeline one: the bill scraper

Pipeline two: news intelligence

The convergence layer

What this means for users

Why we did not build one pipeline

Where LawSignals fits

Related reading

AI news matching for legislation: how LawSignals links articles to bills automatically

State legislative data sources: how the 50 states publish bill data (2026)

Why legal teams are replacing manual bill tracking with AI monitoring