Skip to content

More precise newsflash coordinates (using Waze data)#1552

Draft
sahare92 wants to merge 10 commits intodata-for-change:devfrom
sahare92:better-precision-news-flash-coordinates
Draft

More precise newsflash coordinates (using Waze data)#1552
sahare92 wants to merge 10 commits intodata-for-change:devfrom
sahare92:better-precision-news-flash-coordinates

Conversation

@sahare92
Copy link
Copy Markdown

Added a simple logic to find the related waze accident alert using information from the newsflash
(Currently it selects a waze accident that occurred around the area of the newsflash accident , up to WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS hours before the newsflash's time.

It currently only populates the newsflash.waze_alert property, without modifying the lat/lon attributes according to the waze accident (this should be updated after we test it manually, and make sure it indeed improves the precision)

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 20, 2020

Codecov Report

Merging #1552 into dev will increase coverage by 0.97%.
The diff coverage is 28.20%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1552      +/-   ##
==========================================
+ Coverage   50.38%   51.35%   +0.97%     
==========================================
  Files          62       62              
  Lines        5724     6048     +324     
==========================================
+ Hits         2884     3106     +222     
- Misses       2840     2942     +102     
Flag Coverage Δ
unittests 51.35% <28.20%> (+0.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
anyway/parsers/injured_around_schools.py 0.00% <0.00%> (ø)
anyway/parsers/utils.py 10.00% <12.50%> (+10.00%) ⬆️
anyway/parsers/location_extraction.py 26.88% <37.50%> (-41.91%) ⬇️
tests/test_news_flash.py 90.90% <50.00%> (-6.29%) ⬇️
anyway/parsers/__init__.py 100.00% <100.00%> (ø)
anyway/parsers/news_flash_db_adapter.py 60.86% <0.00%> (-8.70%) ⬇️
tests/test_infographic_api.py 50.71% <0.00%> (+0.35%) ⬆️
anyway/infographics_utils.py 87.21% <0.00%> (+6.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fd4f86...c9ef932. Read the comment docs.

@atalyaalon atalyaalon requested review from Mano3 and elazarg October 20, 2020 19:49
Copy link
Copy Markdown

@elazarg elazarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have added some comments, but they are mostly minor suggestions.

I am not entirely sure the best way of adding SQL rows for tests is using this "add, commit; use; delete, commit" pattern; I'm not an expert, but I think it could be done using something like "add; use; rollback" pattern. Maybe @ziv17 would like to comment on this.

Comment thread anyway/parsers/location_extraction.py
Comment thread anyway/parsers/utils.py
Comment thread anyway/parsers/utils.py Outdated
Comment thread tests/test_news_flash.py Outdated
@ziv17
Copy link
Copy Markdown
Collaborator

ziv17 commented Oct 21, 2020

You can mock the db to return what you need for the test. Using mock you get much faster tests, and it is easier to test for boundary conditions. The tests can run the tests directly from the IDE.

@sahare92
Copy link
Copy Markdown
Author

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

@elazarg
Copy link
Copy Markdown

elazarg commented Oct 21, 2020

I agree with Ziv. Mocking is generally better, and it's how other tests are written. If you want to test querying the DB, that would be a different test.

Comment thread tests/test_news_flash.py Outdated
_delete_waze_alert(waze_alert.id)


def _create_waze_accident_alert():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're using the context manager, there's no need to write these two functions separately. Just inline them.

Comment thread anyway/parsers/location_extraction.py Outdated
.filter(WazeAlert.alert_type == "ACCIDENT")
.filter(
WazeAlert.created_at.between(
newsflash.date - timedelta(hours=WAZE_ALERT_NEWSFLASH_DELTA_IN_HOURS),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this constant is only used inside the function, it's should probably be local constant. This way the occasional reader doesn't have to wonder where else it is used.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it can be a timedelta, making it slightly more type-safe, and making this expression a single line.

@ziv17
Copy link
Copy Markdown
Collaborator

ziv17 commented Oct 21, 2020

@ziv17 I think there's a benefit to run the query on the db (instead of mocking), because this way we can test that the generated query is working, and actually returns the expected WazeAlert from the db.
but I can mock the db response if you think it'll be better. WDYT?

We do need both end-to-end tests and tests to cover the logic. Covering the logic with end-to-end will be slow, unstable, and cumbersome. The best practice is small amount of end-to-end and and reach coverage with unit tests (that use mocking)

@Mano3
Copy link
Copy Markdown
Collaborator

Mano3 commented Oct 24, 2020

Hi @sahare92 , great work!
I'm mostly concerned about the algorithmic aspect.
How were the distances per resolution picked?
How was the time delta picked?
Why taking the first waze alert in the bounding box instead of the closest?
Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.
Was this logic checked on a large chunk of existing news flash?
I wish to know some statistics and results of manual review of how this logic fares on a large amount of data before merging.
Thanks.

@sahare92
Copy link
Copy Markdown
Author

sahare92 commented Nov 4, 2020

@Mano3 thanks,

How were the distances per resolution picked?
How was the time delta picked?

  • The distances by resolution were arbitrarily picked for now, until we'll learn what the best values are.
  • The same for time delta, and bounding box

Why taking the first waze alert in the bounding box instead of the closest?

  • We didn't take the closest, because it's not necessarily the right one (for example if the resolution is "עיר" then the geo_location we got from google is just the center of the city, so choosing the closest waze accident doesn't mean it's the right one..but mostly we had hoped that in real data, most times we won't get more than one matching waze accidents (in the time range we will decide to query)

Also, there is no need to query Waze alert from the news flash datetime minus delta until now. News flash will probably appear after the event is already finished so at most it should query up until news flash datetime + delta.

  • Good point, I'll fix that 👍

Was this logic checked on a large chunk of existing news flash? .......

  • No, we didn't test it yet, after talking with @atalyaalon, we thought it'll be a good idea to test this with previous data (hopefully soon, when waze data backfill will be fixed)
    Do you have an idea to test this, other then going manually over the data and see if the chosen waze location was accurate according to the newsflash description? if not, then I guess that's what I'll do once waze data backfill is in

@atalyaalon atalyaalon marked this pull request as draft March 31, 2022 19:51
@atalyaalon atalyaalon marked this pull request as draft March 31, 2022 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants