More precise newsflash coordinates (using Waze data) by sahare92 · Pull Request #1552 · data-for-change/anyway

sahare92 · 2020-10-20T19:26:42Z

Added a simple logic to find the related waze accident alert using information from the newsflash
(Currently it selects a waze accident that occurred around the area of the newsflash accident , up to WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS hours before the newsflash's time.

It currently only populates the newsflash.waze_alert property, without modifying the lat/lon attributes according to the waze accident (this should be updated after we test it manually, and make sure it indeed improves the precision)

codecov · 2020-10-20T19:33:48Z

Codecov Report

Merging #1552 into dev will increase coverage by 0.97%.
The diff coverage is 28.20%.

@@            Coverage Diff             @@
##              dev    #1552      +/-   ##
==========================================
+ Coverage   50.38%   51.35%   +0.97%     
==========================================
  Files          62       62              
  Lines        5724     6048     +324     
==========================================
+ Hits         2884     3106     +222     
- Misses       2840     2942     +102

Flag	Coverage Δ
unittests	`51.35% <28.20%> (+0.97%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
anyway/parsers/injured_around_schools.py	`0.00% <0.00%> (ø)`
anyway/parsers/utils.py	`10.00% <12.50%> (+10.00%)`	⬆️
anyway/parsers/location_extraction.py	`26.88% <37.50%> (-41.91%)`	⬇️
tests/test_news_flash.py	`90.90% <50.00%> (-6.29%)`	⬇️
anyway/parsers/__init__.py	`100.00% <100.00%> (ø)`
anyway/parsers/news_flash_db_adapter.py	`60.86% <0.00%> (-8.70%)`	⬇️
tests/test_infographic_api.py	`50.71% <0.00%> (+0.35%)`	⬆️
anyway/infographics_utils.py	`87.21% <0.00%> (+6.67%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fd4f86...c9ef932. Read the comment docs.

elazarg

Nice work! I have added some comments, but they are mostly minor suggestions.

I am not entirely sure the best way of adding SQL rows for tests is using this "add, commit; use; delete, commit" pattern; I'm not an expert, but I think it could be done using something like "add; use; rollback" pattern. Maybe @ziv17 would like to comment on this.

ziv17 · 2020-10-21T03:44:20Z

You can mock the db to return what you need for the test. Using mock you get much faster tests, and it is easier to test for boundary conditions. The tests can run the tests directly from the IDE.

…hub.com/sahare92/anyway into better-precision-news-flash-coordinates

sahare92 · 2020-10-21T12:35:08Z

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

elazarg · 2020-10-21T12:53:38Z

I agree with Ziv. Mocking is generally better, and it's how other tests are written. If you want to test querying the DB, that would be a different test.

elazarg · 2020-10-21T12:55:05Z

+        _delete_waze_alert(waze_alert.id)
+
+
+def _create_waze_accident_alert():


Now that you're using the context manager, there's no need to write these two functions separately. Just inline them.

elazarg · 2020-10-21T12:59:20Z

+        .filter(WazeAlert.alert_type == "ACCIDENT")
+        .filter(
+            WazeAlert.created_at.between(
+                newsflash.date - timedelta(hours=WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS),


Since this constant is only used inside the function, it's should probably be local constant. This way the occasional reader doesn't have to wonder where else it is used.

Also, it can be a timedelta, making it slightly more type-safe, and making this expression a single line.

ziv17 · 2020-10-21T14:39:45Z

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

We do need both end-to-end tests and tests to cover the logic. Covering the logic with end-to-end will be slow, unstable, and cumbersome. The best practice is small amount of end-to-end and and reach coverage with unit tests (that use mocking)

Mano3 · 2020-10-24T06:32:22Z

Hi @sahare92 , great work!
I'm mostly concerned about the algorithmic aspect.
How were the distances per resolution picked?
How was the time delta picked?
Why taking the first waze alert in the bounding box instead of the closest?
Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.
Was this logic checked on a large chunk of existing news flash?
I wish to know some statistics and results of manual review of how this logic fares on a large amount of data before merging.
Thanks.

sahare92 · 2020-11-04T17:05:48Z

@Mano3 thanks,

How were the distances per resolution picked?
How was the time delta picked?

The distances by resolution were arbitrarily picked for now, until we'll learn what the best values are.
The same for time delta, and bounding box

Why taking the first waze alert in the bounding box instead of the closest?

We didn't take the closest, because it's not necessarily the right one (for example if the resolution is "עיר" then the geo_location we got from google is just the center of the city, so choosing the closest waze accident doesn't mean it's the right one..but mostly we had hoped that in real data, most times we won't get more than one matching waze accidents (in the time range we will decide to query)

Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.

Good point, I'll fix that 👍

Was this logic checked on a large chunk of existing news flash? .......

No, we didn't test it yet, after talking with @atalyaalon, we thought it'll be a good idea to test this with previous data (hopefully soon, when waze data backfill will be fixed)
Do you have an idea to test this, other then going manually over the data and see if the chosen waze location was accurate according to the newsflash description? if not, then I guess that's what I'll do once waze data backfill is in

sahare92 added 4 commits October 20, 2020 19:07

added logic skeleton

02e0f38

Merge branch 'dev' into better-precision-news-flash-coordinates

08ddb7a

added mechanism to get the related waze accident

55a4bee

black lint

570db17

atalyaalon requested review from Mano3 and elazarg October 20, 2020 19:49

Merge branch 'dev' into better-precision-news-flash-coordinates

17ac527

elazarg reviewed Oct 20, 2020

View reviewed changes

Comment thread anyway/parsers/location_extraction.py

Comment thread anyway/parsers/utils.py

Comment thread anyway/parsers/utils.py Outdated

Comment thread tests/test_news_flash.py Outdated

sahare92 added 2 commits October 21, 2020 15:30

fix review stuff

dd2b36a

Merge branch 'better-precision-news-flash-coordinates' of https://git…

6d74a58

…hub.com/sahare92/anyway into better-precision-news-flash-coordinates

elazarg reviewed Oct 21, 2020

View reviewed changes

sahare92 added 2 commits October 22, 2020 11:20

change test to unit test

be81f6e

black lint

f7b383f

atalyaalon mentioned this pull request Nov 1, 2020

Intersect data from Waze with news flash accidents #1189

Open

fix waze alerts query time range

c9ef932

atalyaalon marked this pull request as draft March 31, 2022 19:51

		_delete_waze_alert(waze_alert.id)


		def _create_waze_accident_alert():

Conversation

sahare92 commented Oct 20, 2020

Uh oh!

codecov Bot commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

elazarg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ziv17 commented Oct 21, 2020 • edited by elazarg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahare92 commented Oct 21, 2020

Uh oh!

elazarg commented Oct 21, 2020

Uh oh!

elazarg Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

elazarg Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

elazarg Oct 21, 2020

Choose a reason for hiding this comment

Uh oh!

ziv17 commented Oct 21, 2020

Uh oh!

Mano3 commented Oct 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahare92 commented Nov 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov Bot commented Oct 20, 2020 •

edited

Loading

ziv17 commented Oct 21, 2020 •

edited by elazarg

Loading

Mano3 commented Oct 24, 2020 •

edited

Loading