Hi! I've written crawlers for about a dozen municipal hosting platforms, and you can learn the bare-bones of it from our "How" page: https://civic.band/how.html
The short answer is: there's no common API for any of these sites, and even the ones that do have an API are sometimes misconfigured. It's why I wrote all the scrapers by hand.
Granicus is six providers in a trench coat it turns out. IQM2, NovusAgenda, Legistar, Granicus, PriveGov, and CivicClerk are all Granicus projects that share absolutely 0 apis that I've found, and a city having one of these operational is no guarantee they have any of the others.
Legistar and CivicClerk have actual APIs, which is nice, although it's extremely easy for the City Clerk's staff to trip and make the Legistar API unusable.
My experiments with using LLMs to write crawlers for these has been extremely mixed; it's good at getting first page of data and less good at following weird pagination trails or follow-on requests.
All of this led me to build CivicBand (which tracks all the municipalities I can get my hands on) and CivicObserver (which is generalized full-text search alerting for municipalities via email, mastodon, bluesky, and slack webhook)
Yeah, don't get me wrong, they all suck ass, but it's good to know there's one common set of things to scrape to get you lots and lots of cities. Those both sound like very cool projects!
I _think_ (but am not actually certain) we're monitoring more municipal agencies at CivicBand, but I know some of the folks at MuckRock and the work they're doing is absolutely critical.
The question would ultimately get settled in court, I think, but a DA who was feeling cop-aligned and vicious could try to ding you for interfering with police operations by _not_ allowing your plate to get scanned.
These statutes are typically not written with police enforceability in mind: they criminalize "doing something" rather than "having something installed," and a cop isn't typically going to be around or caring/watching when you move past statically-installed ALPR cameras.
Search context is legitimately hard, especially since this is unstructured text data that (ime building CivicBand) needs to be OCR'd not parsed for best results.
You might be terrified the number of municipalities that are still posting PDFs of scans of printouts of their minutes, which were originally a word document, and round and round we go.
Part of why I haven't guaranteed results building CivicObserver is because of how hard search context is. Maybe making this an MCP helps, but I'm not actually sure it does.
This is super important work, and is kind of why I built https://civic.band and https://civic.observer, which are generalized tools for monitoring civic govts. (You can search for anything, not just ALPR)
This is incredible, great work and will definitely be using and sharing this!
Where in the repos can we find the plugin/scraper for given municipalities to help contribute when they seem to be broken? As looks like the last meetings and agendas scraped for Cook County are from March/April of this year
- is there some standardized APIs each municipality provides, or do you go through the tedious task of building a per-municipality crawling tool?
- how often do you refresh the data? Checked a city, it has meeting minutes until 6/17, but the official website has more recent minutes (up to 12/2 at least)
- There is absolutely not a standardized API for nearly any of this. I build generalized crawlers when I can, and then build custom crawlers when I need.
- Can you let me know which city? The crawlers run for every municipality at least once every day, so that's probably a bug
We track City Councils, Boards of Supervisors, really any municipality we can get our hands on. I'm very open to how to make this better!
reply