Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions xtas/tests/test_corenlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def test_lemmatize_unicode():
annotators=['tokenize', 'ssplit', 'pos', 'lemma'])
saf = stanford_to_saf(lines)
assert_equal({t['lemma'] for t in saf['tokens']},
{'Cesar', 'hit', 'Hovik'})
{'Cesar', 'hit', 'Hovik'}) #
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the order reversed here relative to the input?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} is set builder syntax, so the order is considered irrelevant. From the looks of stanford_to_saf output, the order shouldn't matter because each token carries an offset.



def test_ner():
Expand Down Expand Up @@ -92,7 +92,7 @@ def test_parse():
assert_equal(corefs, {tuple(sorted(['John', 'himself']))})


def test_multiple_sentences():
def test_multiple_sentences(): #
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be testing CoreNLP more than xtas, maybe a bit too extensive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also checking whether stanford_to_saf preserves the information it gets from CoreNLP.

_check_corenlp()
p = parse("John lives in Amsterdam. He works in London")
saf = stanford_to_saf(p)
Expand Down
2 changes: 1 addition & 1 deletion xtas/tests/test_es.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def test_query_batch():
assert_equal(set(b), {"test", "test2"})


def test_store_get_result():
def test_store_get_result(): #
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be test deleting something, or does that never happen?

"test whether results can be stored and retrieved"
from xtas.tasks.es import (
store_single,
Expand Down
2 changes: 1 addition & 1 deletion xtas/tests/test_frog.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,4 +94,4 @@ def test_frog_task():
assert_equal(tokens[0]['lemma'], 'dit')
saf = frog("dit is een test", output='saf')
assert_equal(len(saf['tokens']), 4)
assert_equal(saf['header']['processed'][0]['module'], 'frog')
assert_equal(saf['header']['processed'][0]['module'], 'frog') #
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we spot-check the output as well, like for the other output formats, to be a bit more sure that it worked as intended?

2 changes: 1 addition & 1 deletion xtas/tests/test_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def test_pipeline():
s = "cats are furry"
expected = [('cats', 'NNS'), ('are', 'VBP'), ('furry', 'JJ')]
result = pos_tag(tokenize(s), 'nltk')
assert_equal(result, expected)
assert_equal(result, expected) #
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First a check without the pipeline, then with, so you can easily see whether the pipeline is at fault or not. Nice!

with eager_celery():
# do we get correct result from pipeline?
r = pipeline(s, [{"module": tokenize},
Expand Down