Multi-GPU support with dask #179

Intron7 · 2024-04-25T13:07:19Z

This adds dask support

Functions to add:

for more information, see https://pre-commit.ci

flying-sheep · 2024-09-30T07:51:56Z

src/rapids_singlecell/preprocessing/_sparse_pca/_kernels/_pca_sparse_kernel.py

These need docstrings. What do they do?

rapids_singlecell/src/rapids_singlecell/preprocessing/_sparse_pca/_kernels/_pca_sparse_kernel.py

Lines 68 to 77 in b6c2689

def _cov_kernel(dtype):

return cuda_kernel_factory(cov_kernel_str, (dtype,), "cov_kernel")

def _gramm_kernel_csr(dtype):

return cuda_kernel_factory(gramm_kernel_csr, (dtype,), "gramm_kernel_csr")

def _copy_kernel(dtype):

return cuda_kernel_factory(copy_kernel, (dtype,), "copy_kernel")

They internal functions that handle dtype for cudakernels. They don't need docstrings.

flying-sheep · 2024-09-30T09:20:52Z

src/rapids_singlecell/preprocessing/_sparse_pca/_dask_sparse_pca.py

+        )
+        return gram_matrix[None, ...]  # need new axis for summing
+
+    n_blocks = len(x.to_delayed().ravel())


Why not x.blocks.size?

@ilan-gold why did we do this?

simply did not know this existed!

Intron7 · 2024-10-01T08:49:01Z

There will be a seperate PR for the update of the docstrings and a tutorial.

ilan-gold · 2024-10-10T15:33:04Z

src/rapids_singlecell/preprocessing/_hvg.py

+        adata_subset = adata[adata.obs[batch_key] == batch].copy()

        calculate_qc_metrics(adata_subset, layer=layer)
        filt = adata_subset.var["n_cells_by_counts"].to_numpy() > 0
-        adata_subset = adata_subset[:, filt]
+        adata_subset = adata_subset[:, filt].copy()


Why copy here? seems like there should be a more efficient way to do this

ilan-gold · 2024-10-10T15:34:53Z

src/rapids_singlecell/preprocessing/_normalize.py

+    if isinstance(X, sparse.csr_matrix):
+        return _normalize_total_csr(X, target_sum)
+    elif isinstance(X, DaskArray):
+        return _normalize_total_dask(X, target_sum)


@flying-sheep just so you're away when reviewing this, this is why we can't use single dispatch anywhere: sphinx-doc/sphinx#10591

possible that this is no longer relevant, i didn't look into it too hard

ilan-gold · 2024-10-10T15:35:28Z

src/rapids_singlecell/preprocessing/_normalize.py


+
+def _normalize_total(X: cp.ndarray, target_sum: int):


X type is wrong here

src/rapids_singlecell/preprocessing/_normalize.py

ilan-gold · 2024-10-10T15:38:02Z

src/rapids_singlecell/preprocessing/_normalize.py

+        chunks=(X.chunksize[0],),
+        drop_axis=1,
+    )
+    counts_per_cell = target_sum_chunk_matrices.compute()


I believe we have an implementation in scanpy that makes this lazy as well:

https://github.com/scverse/scanpy/blob/be99b230fa84e077f5167979bc9f6dacc4ad0d41/src/scanpy/preprocessing/_normalization.py#L34-L48

probably worth trying out since this is fairly expensive (i.e., requires a full-pass over the data)

I'll investigate if that a solution for me aswell

ilan-gold · 2024-10-10T15:43:00Z

src/rapids_singlecell/preprocessing/_qc.py

+    blocks = X.to_delayed().ravel()
+    cell_blocks = [
+        da.from_delayed(
+            __qc_calc_1(block),
+            shape=(2, X.chunks[0][ind]),
+            dtype=X.dtype,
+            meta=cp.array([]),
+        )
+        for ind, block in enumerate(blocks)
+    ]
+
+    blocks = X.to_delayed().ravel()
+    gene_blocks = [
+        da.from_delayed(
+            __qc_calc_2(block),
+            shape=(2, X.shape[1]),
+            dtype=X.dtype,
+            meta=cp.array([]),
+        )
+        for ind, block in enumerate(blocks)
+    ]


can't we map_blocks here now that we are vstack-ing?

ilan-gold · 2024-10-10T15:43:34Z

src/rapids_singlecell/preprocessing/_qc.py

+                )
+
+
+def _first_pass_qc(X):


and needs a more descriptive name. what does it do?

ilan-gold · 2024-10-10T15:43:42Z

src/rapids_singlecell/preprocessing/_qc.py

+    )
+
+
+def _second_pass_qc(X, mask):


and a more descriptive name

ilan-gold · 2024-10-10T15:43:47Z

src/rapids_singlecell/preprocessing/_qc.py

+
+
+@with_cupy_rmm
+def _second_pass_qc_dask(X, mask):


flying-sheep

this needs to be heavily deduplicated and a few more comments, especially the kernels and some very short variable names.

also you should reorganize:

why are _pca and _sparse_pca sibling modules? the latter should be a submodule of the former.
why _sparse_pca._sparse_pca? that should probably be _sparse_pca._cupy or _sparse_pca._mem or so.

src/rapids_singlecell/_compat.py

flying-sheep · 2024-10-01T09:37:05Z

src/rapids_singlecell/preprocessing/_hvg.py

+        if isinstance(X, DaskArray):
+            if isinstance(X._meta, cp.ndarray):
+                X = X.map_blocks(lambda X: cp.expm1(X), meta=_meta_dense(X.dtype))
+            elif isinstance(X._meta, csr_matrix):
+                X = X.map_blocks(lambda X: X.expm1(), meta=_meta_sparse(X.dtype))
        else:
-            X = cp.expm1(X)
+            X = X.copy()
+            if issparse(X):
+                X = X.expm1()
+            else:
+                X = cp.expm1(X)


should probably be wrapped into something like def expm1(X: DaskArray | csr_matrix | np.ndarray): instead of just having this inline.

also definitely needs an else branch for uncovered cases

flying-sheep · 2024-10-01T09:41:26Z

src/rapids_singlecell/preprocessing/_normalize.py

+        from ._kernels._norm_kernel import _mul_csr
+
+        mul_kernel = _mul_csr(X.dtype)
+        mul_kernel.compile()


why do you .compile() here and not above?

Why not make these kernel .compile() things inside? You could use functools.cache/lru_cache to reuse the compiled versions.

Cupy takes care of caching and loading internally. Its just important that its there. It has massive performance implication.

You mean not calling it results in really bad performance, whereas calling it multiple times has no noticable impact?

Then it should most likely go inside of the wrapper, (here _mul_csr) so it’s impossible to forget at the call site

flying-sheep · 2024-10-01T14:26:29Z

src/rapids_singlecell/preprocessing/_normalize.py

+    elif isinstance(X._meta, cp.ndarray):
+        from ._kernels._norm_kernel import _mul_dense
+
+        mul_kernel = _mul_dense(X.dtype)
+        mul_kernel.compile()
+
+        def __mul(X_part):
+            mul_kernel(
+                (math.ceil(X_part.shape[0] / 128),),
+                (128,),
+                (X_part, X_part.shape[0], X_part.shape[1], int(target_sum)),
+            )
+            return X_part
+
+        X = X.map_blocks(__mul, meta=_meta_dense(X.dtype))


this branch is almost identical to the above. You should probably do:

if not isinstance(X._meta, (cp.ndarray, sparse.csr_matrix)): raise ValueError(f"Cannot normalize {type(X)}") ... def mul(X_part, rename_me_1, rename_me_2): ... ... __mul = partial(mul, rename_me_1=..., rename_me_2=...) if isinstance(X._meta, cp.ndarray) else partial(mul, rename_me_1=..., rename_me_2=...) return X.map_blocks(...)

I really don't see how this is suppose to work.

like here: #179 (comment)

identify parts that are different in both if branches, assign them to variables, pull the parts that are identical out of the branches, where they can use the variables you define in the branches.

if all there’s left is two branches and each of the branches contain nothing but variable assignments, replace the if statement with a ternary (foo, bar = (..., ...) if condition else (..., ...))

I dont really think that this is a good idea. I know that you don't like this but i think that is easier to maintain

flying-sheep · 2024-10-14T12:31:21Z

tests/dask/test_hvg_dask.py

looks like all these tests are copy and pasted and should be deduplicated using parametrize

flying-sheep · 2024-10-14T12:31:47Z

tests/dask/test_normalize_dask.py

+def test_normalize_sparse(client):
+    adata = pbmc3k()
+    sc.pp.filter_cells(adata, min_genes=100)
+    sc.pp.filter_genes(adata, min_cells=3)
+    dask_data = adata.copy()
+    dask_data.X = as_sparse_cupy_dask_array(dask_data.X)
+    adata.X = cusparse.csr_matrix(adata.X)
+    rsc.pp.normalize_total(adata)
+    rsc.pp.normalize_total(dask_data)
+    cp.testing.assert_allclose(adata.X.toarray(), dask_data.X.compute().toarray())
+
+
+def test_normalize_dense(client):
+    adata = pbmc3k()
+    sc.pp.filter_cells(adata, min_genes=100)
+    sc.pp.filter_genes(adata, min_cells=3)
+    dask_data = adata.copy()
+    dask_data.X = as_dense_cupy_dask_array(dask_data.X)
+    adata.X = cp.array(adata.X.toarray())
+    rsc.pp.normalize_total(adata)
+    rsc.pp.normalize_total(dask_data)
+    cp.testing.assert_allclose(adata.X, dask_data.X.compute())
+


deduplicate using parametrize

flying-sheep · 2024-10-14T12:31:52Z

tests/dask/test_normalize_dask.py

+def test_log1p_sparse(client):
+    adata = pbmc3k()
+    sc.pp.filter_cells(adata, min_genes=100)
+    sc.pp.filter_genes(adata, min_cells=3)
+    sc.pp.normalize_total(adata)
+    dask_data = adata.copy()
+    dask_data.X = as_sparse_cupy_dask_array(dask_data.X)
+    adata.X = cusparse.csr_matrix(adata.X)
+    rsc.pp.log1p(adata)
+    rsc.pp.log1p(dask_data)
+    cp.testing.assert_allclose(adata.X.toarray(), dask_data.X.compute().toarray())
+
+
+def test_log1p_dense(client):
+    adata = pbmc3k()
+    sc.pp.filter_cells(adata, min_genes=100)
+    sc.pp.filter_genes(adata, min_cells=3)
+    sc.pp.normalize_total(adata)
+    dask_data = adata.copy()
+    dask_data.X = as_dense_cupy_dask_array(dask_data.X)
+    adata.X = cp.array(adata.X.toarray())
+    rsc.pp.log1p(adata)
+    rsc.pp.log1p(dask_data)
+    cp.testing.assert_allclose(adata.X, dask_data.X.compute())


deduplicate using parametrize

flying-sheep · 2024-10-14T12:32:00Z

tests/dask/test_qc_dask.py

deduplicate using parametrize

flying-sheep · 2024-10-14T12:32:20Z

tests/dask/test_scale_dask.py

deduplicate using parametrize

Co-authored-by: Philipp A. <[email protected]>

for more information, see https://pre-commit.ci

add first functions

17df571

Intron7 marked this pull request as draft April 25, 2024 13:08

add hvg part1

40167ca

Intron7 changed the title ~~add first functions~~ Multi-GPU support with dask Apr 30, 2024

Intron7 and others added 10 commits April 30, 2024 12:01

Merge branch 'main' into dask_mg_support

f4db387

Merge branch 'main' into dask_mg_support

6526b42

[pre-commit.ci] auto fixes from pre-commit.com hooks

0cdb85d

for more information, see https://pre-commit.ci

reset to main for hvg

48b68f6

add support for hvg

886cafa

first pass pca

d7bf01e

pca update

b216890

fix bug with csc matrix

cdffd33

add dask to docs

177afa1

add tests

dd1377c

Intron7 added the run-gpu-ci runs GPU CI label May 3, 2024

Intron7 and others added 14 commits May 3, 2024 13:50

update names

e254800

get docs to work

77b3c34

remove client from sparse calc

36bebf9

need dask for docs

82cc22c

Merge branch 'main' into dask_mg_support

7ddde9b

add scale

e33821f

int64 updates

e1e6c19

For main branch

7da41e0

Merge branch 'main' into dask_mg_support

e676dbe

test docs

b6f436f

Merge branch 'main' into dask_mg_support

ef00052

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b22562

for more information, see https://pre-commit.ci

fix import

5ed8e68

fix rebase

b879ea4

Intron7 marked this pull request as ready for review May 13, 2024 14:27

pre-commit-ci bot and others added 3 commits September 26, 2024 15:01

[pre-commit.ci] auto fixes from pre-commit.com hooks

98441ea

for more information, see https://pre-commit.ci

move test helpers

118a37a

[pre-commit.ci] auto fixes from pre-commit.com hooks

b6c2689

for more information, see https://pre-commit.ci

flying-sheep reviewed Sep 30, 2024

View reviewed changes

Intron7 requested review from flying-sheep and ilan-gold October 1, 2024 08:48

Intron7 marked this pull request as ready for review October 1, 2024 08:48

Merge branch 'main' into dask_mg_support

1b1023a

Intron7 added 4 commits October 2, 2024 09:52

Merge branch 'main' into dask_mg_support

9235c3e

Merge branch 'main' into dask_mg_support

7315d99

Merge branch 'main' into dask_mg_support

6a4394d

Merge branch 'main' into dask_mg_support

7945775

ilan-gold requested changes Oct 10, 2024

View reviewed changes

Merge branch 'main' into dask_mg_support

0201658

flying-sheep requested changes Oct 14, 2024

View reviewed changes

Intron7 and others added 12 commits October 14, 2024 15:09

update typing

ea57084

update normalize

13760b7

go back to lambda

b9e4931

slim down tests

9308e21

run tests on rapids-24.08

f8d6269

compress hvg tests

65f941a

remove .todelayed

17ca2ef

remove dask.delayed

06ce8e5

update qc

6d94835

Update src/rapids_singlecell/preprocessing/_pca.py

b733ab3

Co-authored-by: Philipp A. <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

75bbbb8

for more information, see https://pre-commit.ci

Merge branch 'main' into dask_mg_support

42ac1f9

github-actions bot removed the run-gpu-ci runs GPU CI label Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU support with dask #179

Multi-GPU support with dask #179

Intron7 commented Apr 25, 2024 •

edited

Loading

flying-sheep Sep 30, 2024

Intron7 Sep 30, 2024

flying-sheep Sep 30, 2024 •

edited

Loading

Intron7 Sep 30, 2024

ilan-gold Oct 10, 2024

Intron7 commented Oct 1, 2024

ilan-gold Oct 10, 2024

ilan-gold Oct 10, 2024

ilan-gold Oct 10, 2024

ilan-gold Oct 10, 2024

ilan-gold Oct 10, 2024

Intron7 Oct 15, 2024

ilan-gold Oct 10, 2024

ilan-gold Oct 10, 2024

flying-sheep Oct 14, 2024

ilan-gold Oct 10, 2024

flying-sheep Oct 14, 2024

ilan-gold Oct 10, 2024

flying-sheep left a comment

flying-sheep Oct 1, 2024

flying-sheep Oct 1, 2024

Intron7 Oct 14, 2024

flying-sheep Oct 15, 2024

flying-sheep Oct 1, 2024

Intron7 Oct 15, 2024

flying-sheep Oct 15, 2024

Intron7 Oct 22, 2024

flying-sheep Oct 14, 2024

flying-sheep Oct 14, 2024

flying-sheep Oct 14, 2024

flying-sheep Oct 14, 2024

flying-sheep Oct 14, 2024

	def _cov_kernel(dtype):
	return cuda_kernel_factory(cov_kernel_str, (dtype,), "cov_kernel")


	def _gramm_kernel_csr(dtype):
	return cuda_kernel_factory(gramm_kernel_csr, (dtype,), "gramm_kernel_csr")


	def _copy_kernel(dtype):
	return cuda_kernel_factory(copy_kernel, (dtype,), "copy_kernel")

Multi-GPU support with dask #179

Are you sure you want to change the base?

Multi-GPU support with dask #179

Conversation

Intron7 commented Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flying-sheep Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Intron7 commented Oct 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flying-sheep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Intron7 commented Apr 25, 2024 •

edited

Loading

flying-sheep Sep 30, 2024 •

edited

Loading