Approxdata #2213

marqh · 2016-10-25T15:16:10Z

replace the cmlApproxData pattern of comparing to array in the source tree (with massive file sizes) with a pattern to compare to array statistics instead

marqh · 2016-10-25T19:55:55Z

Hi @pp-mo

so, there are some epicly large tolerances in here, required to get teh tests passing.

It is worth comparing some results which assertCMLArrayAlmostEqual happily lets pass at the moment.
for example, it states that the following two arrays are 'close enough'

iris.tests.test_cdm.TestCubeCollapsed.test_multi_d

an env with numpy 1.11 (amongst others)

masked_array(data = [368.6925354003906 368.68853759765625 368.6865234375 368.6839599609375
 368.6794128417969 368.6705322265625 368.6591796875 368.6497802734375
 368.6462097167969 368.6452941894531 368.6461181640625 368.64453125
 368.6423645019531 368.6401062011719 368.6358642578125 368.6296081542969
 368.6223449707031 368.6148376464844 368.60845947265625 368.60198974609375
 368.595947265625 368.5941467285156 368.5929870605469 368.59637451171875
 368.599609375 368.60394287109375 368.607421875 368.6073913574219
 368.6022644042969 368.59613037109375 368.5901794433594 368.58563232421875
 368.5825500488281 368.58251953125 368.5830078125 368.5833435058594
 368.580810546875 368.5801086425781 368.5786437988281 368.575439453125
 368.57763671875 368.5855407714844 368.5899658203125 368.5853576660156
 368.57391357421875 368.56060791015625 368.5539245605469 368.55755615234375
 368.56439208984375 368.57098388671875 368.5701904296875 368.5653381347656
 368.5633239746094 368.5679931640625 368.57464599609375 368.5804138183594
 368.58209228515625 368.58203125 368.5873718261719 368.5945739746094
 368.6002502441406 368.5993347167969 368.5931396484375 368.5860595703125
 368.58367919921875 368.58673095703125 368.5877990722656 368.5860900878906
 368.5823059082031 368.5749206542969 368.5688171386719 368.5647277832031
 368.5616760253906 368.5543212890625 368.55047607421875 368.5524597167969
 368.55877685546875 368.5687561035156 368.5752258300781 368.5714416503906
 368.56304931640625 368.5572814941406 368.5528869628906 368.54913330078125
 368.54144287109375 368.5326232910156 368.52569580078125 368.5216979980469
 368.5147399902344 368.5064392089844 368.4972229003906 368.4885559082031
 368.4785461425781 368.46881103515625 368.4627380371094 368.46173095703125
 368.4648742675781 368.4660339355469 368.4701843261719 368.4712219238281],
             mask = [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False],
       fill_value = 1e+20)

a travis like env with numpy 1.10 amongst others

masked_array(data = [368.6860656738281 368.68182373046875 368.6794128417969 368.6761779785156
 368.6722106933594 368.6630859375 368.6531066894531 368.6439208984375
 368.6392822265625 368.6369934082031 368.63836669921875 368.6362609863281
 368.6347351074219 368.63232421875 368.62811279296875 368.6224670410156
 368.6158447265625 368.6090087890625 368.6016540527344 368.59521484375
 368.58953857421875 368.5875244140625 368.58795166015625 368.5911560058594
 368.59222412109375 368.5946350097656 368.5977783203125 368.59661865234375
 368.5915222167969 368.5853271484375 368.5790100097656 368.5736999511719
 368.571533203125 368.57135009765625 368.5711364746094 368.5721740722656
 368.5697021484375 368.5675048828125 368.56591796875 368.5644836425781
 368.5673828125 368.5765380859375 368.58074951171875 368.5750427246094
 368.5634460449219 368.54986572265625 368.54437255859375 368.5507507324219
 368.5570068359375 368.5628356933594 368.5613708496094 368.555419921875
 368.55438232421875 368.5602722167969 368.5677185058594 368.5730895996094
 368.5743103027344 368.5738525390625 368.5794677734375 368.5854797363281
 368.59307861328125 368.5924377441406 368.5858154296875 368.5787048339844
 368.57586669921875 368.5794372558594 368.58087158203125 368.5800476074219
 368.5772705078125 368.5704040527344 368.56427001953125 368.5600891113281
 368.55609130859375 368.5489501953125 368.5447998046875 368.54718017578125
 368.5545654296875 368.56390380859375 368.5704345703125 368.56768798828125
 368.5593566894531 368.5531005859375 368.5479431152344 368.5452575683594
 368.5391540527344 368.531494140625 368.5265808105469 368.52398681640625
 368.51812744140625 368.5088195800781 368.4977111816406 368.4886779785156
 368.4778137207031 368.4678955078125 368.4626770019531 368.46124267578125
 368.4656066894531 368.4667053222656 368.4692077636719 368.46917724609375],
             mask = [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False],
       fill_value = 1e+20)

those are numerically pretty different, but our current tests say those differences are all fine (see liberal use of decimal=1)

I don't understand what in my dependency tree causes these calculation differences, but my PR hasn't changed the behaviour at all, merely altered the testing strategy

pp-mo · 2016-10-26T16:21:05Z

@marqh some epicly large tolerances in here
what in my dependency tree causes these calculation differences

I've been looking into this and it is definitely a numpy bug, and a tricksy one at that.
Here's my minimal smoking gun :

d2r3 = np.array(
    [[ 297.84573364,  297.82681274,  297.80654907],
     [ 297.83410645,  297.81118774,  297.8013916 ],
     [ 297.8309021 ,  297.80072021,  297.78295898],
     [ 297.83047485,  297.79318237,  297.76339722],
     [ 297.834198  ,  297.78555298,  297.75787354],
     [ 297.83639526,  297.78527832,  297.75598145],
     [ 297.84155273,  297.80130005,  297.77584839],
     [ 297.84802246,  297.8062439 ,  297.78305054],
     [ 297.85061646,  297.80612183,  297.77722168],
     [ 297.83703613,  297.79611206,  297.76531982]], dtype=np.float32)

#
# Use this in a transposed form to provoke the error (in numpy 1.11).
# We suspect because the transposed result has ".flags.C_CONTIGUOUS = False"
#
d2 = d2r3.transpose()
md2 = np.ma.asarray(d2)

def mtst(a, quiet=False):
  res = np.std(np.average(a, axis=-1))
  print type(a), res

mtst(d2)
mtst(md2)

The output in numpy 1.11 is ...

<type 'numpy.ndarray'> 0.0255115
<class 'numpy.ma.core.MaskedArray'> 0.025462

...and in numpy 1.10, both have the first value (which is the correct one).

pp-mo · 2016-10-26T16:21:24Z

a numpy bug

I'm chasing this up now.

pp-mo · 2016-10-28T16:54:31Z

a numpy bug

Not a bug as such, just a different calculation method in some cases, which gives different results -- though also definitely 'less correct' ones .
See : #2224 (comment)

marqh added 2 commits October 24, 2016 20:15

removed npy

7f84e7d

check stats assertArrayAllClose

4688642

marqh added the Status: Work in Progress label Oct 25, 2016

marqh added 6 commits October 25, 2016 15:44

tweaks

ec32726

tweaks

d762cd8

linear_circular results differ (mask)

7a224f2

rtol=1

c384031

rtol for decimal

563c601

Update test_analysis.py

79e5738

marqh changed the base branch from master to v1.11.x October 26, 2016 07:36

pp-mo merged commit bb9d169 into SciTools:v1.11.x Oct 26, 2016

pp-mo removed the Status: Work in Progress label Oct 26, 2016

QuLogic added this to the v1.11 milestone Oct 26, 2016

marqh mentioned this pull request Nov 4, 2016

Perserve dtype of source cube with area weighted regridder #2203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Approxdata #2213

Approxdata #2213

Uh oh!

marqh commented Oct 25, 2016

Uh oh!

marqh commented Oct 25, 2016

Uh oh!

pp-mo commented Oct 26, 2016

Uh oh!

pp-mo commented Oct 26, 2016

Uh oh!

pp-mo commented Oct 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Approxdata #2213

Approxdata #2213

Uh oh!

Conversation

marqh commented Oct 25, 2016

Uh oh!

marqh commented Oct 25, 2016

Uh oh!

pp-mo commented Oct 26, 2016

Uh oh!

pp-mo commented Oct 26, 2016

Uh oh!

pp-mo commented Oct 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants