Conversation
Can move example to a folder later.
|
I like this PR - it'll get this data out of email and into a standard format in MDS, but it's a small enough change from the current behavior that it's reasonable for cities to begin requiring operators to adopt it in the near future. It would be a happy outcome indeed if, through that adoption, we get operator feedback (from folks besides Spin, thank you @dirkdk and @joshuaandrewjohnson1!). |
|
We should clarify that the k=10 requirement is applied to both trips and unique riders. We don't want data being shared about single rider who takes a daily trip, particularly when that rider is a member of a Special Group. |
|
I want to raise one concern about this sentence:
In practice, I worry that this could lead to situations where a small and/or sensitive geography inadvertently becomes part of a report about Special Groups. For example, the city might have a no parking zone around a single public park that is the home of political protests. Clarifying the k-value requirement as per the above would help address this, but it still violates the principle of data minimization. Longer term, I believe the fix for this is allowing the Policy API to describe data sharing requirements through a new Policy Rule type. Shorter term, I wonder if we should modify this to say that the Geographies in Reports should be agreed upon by the City and the Operator? |
I wonder if this is a real concern though? For one, we're talking about a monthly report - so even in the case of a tiny park / geography, you're still not getting any better temporal granularity. The protest happened over a weekend? Great, it's drowned by the rest of the month. The protest happened over the entire month? Great, that's a lot of data aggregation! If there were less than 10 riders and/or trips in that geometry over the month, you get no data anyway. We touched on the idea of the "City and Provider should discuss which geographies are returned"; it can definitely work, but also doesn't scale as well on either side as things change. I like your longer-term vision for Policy models of data sharing agreements. As this is beta (and Geography itself only now starting to pick up some momentum) we wanted to keep it simple. |
Yes I will add that clarification thanks! |
|
@thekaveman Agreed, the concern here is principled (Data Minimization) rather than real world. I gave a pretty poor example, should have been more clear on that point. I'm OK with leaving it as-is as long as we think it will be OK to make a breaking change in the future that limits Reports to only explicitly requested geographies. I believe that as a beta feature that shouldn't be a problem. |
I think keeping it as simple as possible for this version is best, which means not having to communicate which geographies to return ahead of time. This way the entire /reports can be created without any needed communication about the details between the city and provider. I also agree with @thekaveman that in this case the risk is very minimal with counts in geographies redacted when the values are low over a month period. It's more data and possible some geographies that are not needed, but the effort to calculate and return these is less than the effort/time to communicate and agree on the list of IDs. I also like the idea of modeling data sharing agreements. This idea came up in a WG call by @whereissean - a sort of meta data MDS json that communicated all the optional parts of MDS that a city would like providers to support: endpoints, optional fields, IDs (like geography for reports in this case), beta features (like GDEs), optional location data, etc. This would make it clear what is required and what is not. A good proposal for 2.0! |
|
@schnuerle My thoughts exactly. I went ahead and filed #608 to track this. |
Explain pull request
This is a beta implementation of the #569 special groups aggregate metrics issue. It adds a /reports endpoint within the Provider API that handles returning monthly trip counts of special groups, in this case just low income groups.
The response served up is a pre-generated CSV file of all data.
Is this a breaking change
Impacted Spec
Which spec(s) will this pull request impact?
providerAdditional context
See PR #606 for a version that is dynamic and not static like this one.