add nan corr tests #247

raybellwaves · 2021-01-16T04:50:01Z

Closes #246

refactoring

Ignore the name of the branch. I worked on added nans to in the correlation tests.

I impact did a little bit of work on #245 but this doesn't close that.

I took out the has_nan parameter to the tests and instead moved to another test for nans which I could use a_nan from the conftest.py. Unfortunately it means duplicating most of the code in the test function but from reading some pytest docs they recommend using fixtures in the conftest.py as function imputs. Not sure if we create another function inside test_deterministic or even move some upstream to conftest.py to avoid the duplicated code.

aaronspring · 2021-01-16T21:22:15Z

EDIT: Not too much related to this PR.
We use pytest-lazy-fixture in climpred to parametrize fixtures.
If it is only about chunking or setting a few NaNs, you can parametrize input as xr.dataset, xr.dataarray, chunked, has_nans and call a function depending on input.

aaronspring

I am doubting the purpose of these tests now. Aren’t we essentially only testing whether the xs.metric function does the same as the np_deterministic function? Is there are reason why this should not work? Isn’t this all independent of chunking and nans?

This feedback is not too much about your changes to the code but rather what the tests check, which is not useful IMO. I would rather like to see tests checking that nans change the result for skipna True or False. Maybe this would then go into another file though...

aaronspring · 2021-01-16T21:28:51Z

xskillscore/tests/test_deterministic.py

+@pytest.mark.parametrize("dim", AXES)
+@pytest.mark.parametrize("weight_bool", [True, False])
+@pytest.mark.parametrize("skipna", [True, False])
+def test_distance_metrics_xr_nan(


This test could be parametrize with lazy fixture

aaronspring · 2021-01-16T21:31:08Z

xskillscore/tests/test_deterministic.py

 ):
    """Test whether distance metrics for xarray functions can be lazy when
-    chunked by using dask and give same results."""
+    chunked by using dask and give same results as np array."""


Yes the test name was misleading. We don't really check whether result is also chunked, as we often do when calling the function _dask

aaronspring · 2021-01-16T21:33:16Z

xskillscore/tests/test_deterministic.py

+    else:
+        expected = metric(a.load(), b.load(), dim, _weights, skipna=skipna)
+    assert expected.chunks is None
+    assert_allclose(actual.compute(), expected)


We don't even need all the computes and loads. Assert triggers that on its own...

aaronspring · 2021-01-16T21:34:23Z

xskillscore/tests/test_deterministic.py

+@pytest.mark.parametrize("weight_bool", [True, False])
+@pytest.mark.parametrize("skipna", [True, False])
+@pytest.mark.parametrize("has_nan", [True])
+def test_correlation_metrics_xr_dask_nan(


Try param lazy fixture

aaronspring · 2021-01-16T21:34:48Z

xskillscore/tests/test_deterministic.py

+@pytest.mark.parametrize("dim", AXES)
+@pytest.mark.parametrize("weight_bool", [True, False])
+@pytest.mark.parametrize("skipna", [True, False])
+@pytest.mark.parametrize("has_nan", [True])


Only one here

aaronspring · 2021-01-16T21:40:26Z

xskillscore/tests/test_deterministic.py

+@pytest.mark.parametrize("metrics", distance_metrics)
+@pytest.mark.parametrize("dim", AXES)
+@pytest.mark.parametrize("weight_bool", [True, False])
+def test_distance_metrics_xr(a, b, dim, weight_bool, weights, metrics):


Too me the test names here were all misleading. The docstring reflects what the tests do, but not the titles...

aaronspring · 2021-01-16T21:44:16Z

xskillscore/tests/test_deterministic.py

+def test_correlation_metrics_xr_nan(
+    a_nan, b_nan, dim, weight_bool, weights, metrics, skipna
+):
+    """Test whether correlation metric for xarray functions (from


I am doubting the purpose of these tests now. Aren’t we essentially only testing whether the xs.metric function does the same as the np_deterministic function? Is there are reason why this should not work? Isn’t this all independent of chunking and nans?

raybellwaves · 2021-01-17T03:53:59Z

Removed the extra tests. There's probably a bigger question about the purpose of the tests etc. but I this PR is ok for getting the correlation tests in line with the distance tests. I'm not familiar with pytest lazy fixture.

aaronspring · 2021-01-17T08:47:08Z

xskillscore/tests/test_deterministic.py

-def test_correlation_metrics_xr(a, b, dim, weight_bool, weights, metrics):
+@pytest.mark.parametrize("skipna", [True, False])
+@pytest.mark.parametrize("has_nan", [True, False])
+def test_correlation_metrics_ufunc_same_np(


how to work with lazy fixtures:

add pytest-lazy-fixture to env

pytest.mark.parametrize(fixtures) https://github.com/TvoroG/pytest-lazy-fixture See https://github.com/pangeo-data/climpred/blob/fdec3af0aabac42de1787aecc0fcbef64a4800f7/climpred/tests/test_bootstrap.py#L227

can be done here for this test and ufunc_dask_np

aaronspring · 2021-01-17T09:39:41Z

xskillscore/tests/conftest.py

+
+
+@pytest.fixture
+def a_nan_land(a):


Can we call this fixed mask and the other random nan?

aaronspring · 2021-01-17T09:41:27Z

xskillscore/tests/test_deterministic.py

+    if _weights is not None:
+        _weights = _weights.load()
+    if metric in temporal_only_metrics:
+        expected = metric(a.load(), b.load(), dim, skipna=skipna)


Compute not needed IMO

aaronspring · 2021-01-17T09:41:53Z

xskillscore/tests/test_deterministic.py

+        expected = metric(a.load(), b.load(), dim, skipna=skipna)
+    else:
+        expected = metric(a.load(), b.load(), dim, _weights, skipna=skipna)
+    assert expected.chunks is None


Is that assert needed for the docstring description? IMO not

aaronspring · 2021-01-17T09:43:37Z

xskillscore/tests/test_deterministic.py

+    # check that chunks for chunk inputs
+    assert actual.chunks is not None
+    if _weights is not None:
+        _weights = _weights.load()


Compute / load now only needed because a,b,weights require being all either or not chunked

aaronspring · 2021-01-17T13:58:55Z

xskillscore/tests/test_deterministic.py

    np_deterministic.py)."""
+    if has_nan:


Didn’t you want to use a_nan from conftest here?

raybellwaves · 2021-01-18T03:18:32Z

Thanks for creating #248

One thing I learned here is a we have a function that is named test_distance_metrics_xr_dask. In actual fact it's testing that you get the same result if the DataArray is made up of a dask array compared to if the DataArray is made up of a numpy array. So I renamed it test_distance_metrics_daskda_same_npda.

Couple things left.

Yes I would like to use a_rand_nan when we set "has_nan" = True. I'm unsure about calling something from conftest.py instead of a function. i.e. I know you can do def test(a, ...) of def test(a_rand_nan, ...) but i'm not sure how to do

def test(a, ...)
if has_nan:
    a = a_rand_nan(a)

without getting NameError: name 'a_rand_nan' is not defined

Would you mind making a code suggestion on the pytest-lazy-fixture useage: https://haacked.com/archive/2019/06/03/suggested-changes/

aaronspring · 2021-01-18T11:08:12Z

One thing I learned here is a we have a function that is named test_distance_metrics_xr_dask. In actual fact it's testing that you get the same result if the DataArray is made up of a dask array compared to if the DataArray is made up of a numpy array. So I renamed it test_distance_metrics_daskda_same_npda.

I see what you were doing. I was critizising the purpose of both of these tests.

aaronspring · 2021-01-18T11:11:44Z

Yes I would like to use a_rand_nan when we set "has_nan" = True. I'm unsure about calling something from conftest.py instead of a function. i.e. I know you can do def test(a, ...) of def test(a_rand_nan, ...) but i'm not sure how to do

We either use functions inside the test to modify inputs, or we list fixtures test function arguments with pytestlazyfixture. But these different approaches shouldn't mix. I will provide a code suggestion

aaronspring · 2021-01-18T15:50:32Z

xskillscore/tests/test_deterministic.py

+@pytest.mark.parametrize("metrics", correlation_metrics)
+@pytest.mark.parametrize("dim", AXES)
+@pytest.mark.parametrize("weight_bool", [True, False])
+@pytest.mark.parametrize("skipna", [True, False])
+@pytest.mark.parametrize("has_nan", [True, False])
+def test_correlation_metrics_daskda_same_npda(
+    a_dask, b_dask, dim, weight_bool, weights_dask, metrics, skipna, has_nan
+):
+    """Test whether correlation metric for xarray functions can be lazy when
+    chunked by using dask and give same results as DataArray with numpy array."""
+    a = a_dask.copy()
+    b = b_dask.copy()


Suggested change

@pytest.mark.parametrize("metrics", correlation_metrics)

@pytest.mark.parametrize("dim", AXES)

@pytest.mark.parametrize("weight_bool", [True, False])

@pytest.mark.parametrize("skipna", [True, False])

@pytest.mark.parametrize("has_nan", [True, False])

def test_correlation_metrics_daskda_same_npda(

a_dask, b_dask, dim, weight_bool, weights_dask, metrics, skipna, has_nan

):

"""Test whether correlation metric for xarray functions can be lazy when

chunked by using dask and give same results as DataArray with numpy array."""

a = a_dask.copy()

b = b_dask.copy()

@pytest.mark.parametrize(

"a2, b2",

[

(

pytest.lazy_fixture("a_dask"),

pytest.lazy_fixture("b_dask"),

),

(

pytest.lazy_fixture("a"),

pytest.lazy_fixture("b"),

),

(

pytest.lazy_fixture("a_nan"),

pytest.lazy_fixture("b_nan"),

),

],

)

@pytest.mark.parametrize("metrics", correlation_metrics)

@pytest.mark.parametrize("dim", AXES)

@pytest.mark.parametrize("weight_bool", [True, False])

@pytest.mark.parametrize("skipna", [True, False])

def test_correlation_metrics_daskda_same_npda(

a2, b2, dim, weight_bool, weights_dask, metrics, skipna, has_nan

):

"""Test whether correlation metric for xarray functions can be lazy when

chunked by using dask and give same results as DataArray with numpy array."""

a = a2.copy()

b = b2.copy()

requires: pip install pytest-lazy-fixture

maybe also rename a to a_1d so we can use a in the parametrized tests as input. cannot give a2 and a in this example both the same name.

this example now with a_nan is probably not better than has_nan [True, False]. has_nan has the advantage that the small function changes the code. on the other hand you could also add a_nan_dask to the list of inputs, but this would be growing and growing...

there will be no 100% clean and everything matching solution in this...

raybellwaves · 2021-01-19T02:27:43Z

I believe this PR is good to go. This closes the issue linked with a little bit of renaming to make further test development easier (or even standardized). I've opted not to use pytest-lazy-fixture but thank you for teaching me about it. I think it's very useful but as you mention I would prefer to have one method of testing per file

add nan corr tests

d1e66e2

raybellwaves requested a review from aaronspring January 16, 2021 04:51

aaronspring requested changes Jan 16, 2021

View reviewed changes

revert extra tests

f4e8b9f

aaronspring reviewed Jan 17, 2021

View reviewed changes

aaronspring mentioned this pull request Jan 17, 2021

Purpose of testing np_deterministic and ufunc #248

Open

aaronspring reviewed Jan 17, 2021

View reviewed changes

raybellwaves added 2 commits January 17, 2021 21:50

rename functions

db9ec06

rename dask test

d47bb25

aaronspring reviewed Jan 18, 2021

View reviewed changes

Ray Bell added 2 commits January 18, 2021 21:19

rename in conftest

d878855

update changelog

c880299

aaronspring approved these changes Jan 19, 2021

View reviewed changes

raybellwaves merged commit 7e2ed96 into xarray-contrib:master Jan 19, 2021

raybellwaves deleted the use-nan-from-conftest branch January 19, 2021 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add nan corr tests #247

add nan corr tests #247

raybellwaves commented Jan 16, 2021 •

edited

Loading

aaronspring commented Jan 16, 2021 •

edited

Loading

aaronspring left a comment

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

aaronspring Jan 16, 2021

raybellwaves commented Jan 17, 2021

aaronspring Jan 17, 2021

aaronspring Jan 17, 2021

aaronspring Jan 17, 2021

aaronspring Jan 17, 2021

aaronspring Jan 17, 2021

aaronspring Jan 17, 2021

raybellwaves commented Jan 18, 2021

aaronspring commented Jan 18, 2021

aaronspring commented Jan 18, 2021

aaronspring Jan 18, 2021 •

edited

Loading

aaronspring Jan 18, 2021

aaronspring Jan 18, 2021

raybellwaves commented Jan 19, 2021

add nan corr tests #247

add nan corr tests #247

Conversation

raybellwaves commented Jan 16, 2021 • edited Loading

aaronspring commented Jan 16, 2021 • edited Loading

aaronspring left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raybellwaves commented Jan 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raybellwaves commented Jan 18, 2021

aaronspring commented Jan 18, 2021

aaronspring commented Jan 18, 2021

aaronspring Jan 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raybellwaves commented Jan 19, 2021

raybellwaves commented Jan 16, 2021 •

edited

Loading

aaronspring commented Jan 16, 2021 •

edited

Loading

aaronspring Jan 18, 2021 •

edited

Loading