Additional Labels for Immgen Data #68

robertamezquita · 2019-12-02T22:56:59Z

Proposing an additional layer of annotations for the T cell population. Splitting on CD4 vs CD8 would be first and foremost (looks like its already mostly doable by grepping on T.4 or T.8 from the fine labels). Adding the different subsets would be next, again, mostly can get this from the name already.

In case this is useful, leaving this here for consideration. Obviously way too many different ways to cut this data to really have it all in a single object and make everybody happy.

To add this to the current immgen dataset, below be some code. Please excuse the tidyverse coding.

library(SingleR)
library(tidyverse)

immgen <- ImmGenData()

manual <- tribble(
    ~label.manual, ~label.fine,
    "CD4 Naive", c("T cells (T.4NVE)", "T cells (T.4NVE44-49D-11A-)", "T cells (T.4Nve)"),
    "CD4 Effector", "T cells (T.4EFF49D+11A+.D8.LCMV)",
    "CD4 Memory", c("T cells (T.4MEM)", "T cells (T.4MEM44H62L)",
                    "T cells (T.4MEM49D+11A+.D30.LCMV)", "T cells (T.4Mem)"),
    "CD8 Naive", c("T cells (T.8NVE)", "T cells (T.8NVE.OT1)", "T cells (T.8Nve)"),
    "CD8 Effector", c("T cells (T.8EFF.OT1.D10.LISOVA)", "T cells (T.8EFF.OT1.D10LIS)",
                      "T cells (T.8EFF.OT1.D8.LISOVA)", "T cells (T.8EFF.OT1.D8.VSVOVA)",
                      "T cells (T.8EFF.OT1.D8LISO)"),
    "CD8 Memory", c("T cells (T.8MEM)", "T cells (T.8MEM.OT1.D100.LISOVA)",
                    "T cells (T.8MEM.OT1.D106.VSVOVA)", "T cells (T.8MEM.OT1.D45.LISOVA)",
                    "T cells (T.8Mem)"),
    "Treg", "T cells (T.Tregs)"
)

manual.vec <- manual$label.manual
names(manual.vec) <- manual$label.fine


## Filter based on manual new annotation
immgen.tc <- immgen[, immgen$label.fine %in% manual$label.fine]

## Append new label
immgen.tc$label.manual <- manual.vec[immgen.tc$label.fine]

anoronh4 · 2020-02-09T18:15:08Z

i actually think better curation of the entire Immgen data set may be helpful. For example, there exists:

Epithelial cells | Epithelial cells (Ep.5wk.MEC.Sca1+)
Epithelial cells | Epithelial cells (Ep.5wk.MEChi)
Epithelial cells | Epithelial cells (Ep.5wk.MEClo)
Epithelial cells | Epithelial cells (Ep.8wk.CEC.Sca1+)
Epithelial cells | Epithelial cells (Ep.8wk.CEChi)
Epithelial cells | Epithelial cells (Ep.8wk.MEChi)
Epithelial cells | Epithelial cells (Ep.8wk.MEClo)

in the fine label category of ImmGenData. I don't think most people using this package have much use for time points (8 wk vs 5wk), and information such as "Ep.8wk.MEChi" is too obscure for me to figure out what it is and relate it to my own dataset. An intermediate data layer or better curation of one level would be extremely helpful.

That being said, this issue is most apparent to me for ImmGenData. MonacoImmuneData, for example, has much more helpful fine categories (but is not mouse, so it doesn't help me).

LTLA · 2020-02-10T03:22:21Z

@anoronh4 Funny you say that, because - thanks to the efforts of @j-andrews7 - the latest version of SingleR has Cell Ontology mappings returned for all labels in ImmGenData(). This can be used to adjust the labels to any desired resolution by traversing the ontology tree - in principle, at least. Perhaps @vjcitn may have some comments/code on how one might do so in practice via ontoProc.

vjcitn · 2020-02-10T12:53:15Z

@anoronh4 -- the colData()$label.ont has Cell Ontology mappings. You can check slack discussion around https://community-bioc.slack.com/archives/CE8AB163W/p1580737521140000 to see some relevant concepts. I don't see uptake of the subset_descendants and common_classes methods discussed there so have not pursued it further; I need to update the ontoProc vignette to deal with the label.ont fields but AFAIK there is no commitment to use that name or define methods to retrieve ontology tags for samples.

LTLA · 2020-02-11T07:51:08Z

@vjcitn:

I will add the common_classes example to the SingleR vignette.
I don't recall us discussing subset_descendents?
I wonder what would be an easy interface for users to tune the desired granularity of these terms.

LTLA · 2020-02-12T07:04:28Z

Right. Having poked around, I think onto_plot2 may be close to what we need to close this issue.

To restate the problem; the user has a bunch of terms near the tips of the ontology DAG. They want to scale back the granularity of these terms to something that is broader. I propose the following workflow:

User uses onto_plot2() to visualize the relationships between the available terms. This does, however, require some pruning of the current visualization; there are far too many terms and the plot is very crowded (try using it on the ImmGenData terms). I would like an option to limit the graph to the observed terms, the MRCA of those terms and the MRCA of the MRCAs.
User chooses some MRCAs that represents their desired granularity.
User supplies these MRCAs to another function that remaps each descendant term in label.ont to its MCRA. No remapping is done if the MRCA is not listed, in which case the existing labels are assumed to be satisfactory. Some care is required to handle cases where a term is a descendent of two MRCAs - I guess the whole concept of a MRCA doesn't really work here.

vjcitn · 2020-02-12T11:06:11Z

I see -- by the way, I didn't know that MRCA = most recent common ancestor. These are the lines from onto_plot2 that will help to carry this out:

    pl = ontologyPlot::onto_plot(ont, terms2use, ...)
    gnel = make_graphNEL_from_ontology_plot(pl) # defined in ontoProc

Once we have the gnel (terms2use here should be inclusive) we can make subgraphs as you wish. If interactive visualization is important we might need to move beyond Rgraphviz but I am not clear on the most appropriate option.

LTLA · 2020-02-13T06:06:21Z

Having tried this, I don't think it's reasonable to expect people to poke through the plot:

library(ontoProc)
library(ontologyPlot)
library(SingleR)

cl <- getCellOnto()
imm <- ImmGenData()
pl <- ontologyPlot::onto_plot(cl, imm$label.ont)

The graph is too large, the words are too small and you can't easily copy and paste the terms. I think the plot would be all right to look at for an overview but not as the frontline tool for the details.

After some more thought, one possible option is to have a function that takes a set of terms and then simply prints out a data.frame of all internal nodes that are MRCAs (with some plain-english annotation in the other columns, plus some statistics about how many children are present). The user can then easily examine the internal nodes that provide a biological resolution they are happy with; after this choice is made, it is then straightforward to have a function to roll back terms to their parents.

namit-k · 2020-03-13T13:52:42Z

Along the line of comments here, working with ImmGen has been troubling due to its naming convention like "Ep.8wk.MEChi", which is adding unnecessary details (time points 8 wk) to the base "Epithelial cells" annotation. I have a code that cleaned up the entire ImmGen labels to meaningful and easy to comprehend annotations. I am happy to share code or create a pull request, if interested?

marencc · 2023-11-30T10:35:22Z

@namit-k Hi! I am using the InmGen labels, could you please provide the code to clean up the labels to extract a meaningful annotation? Many thanks!

LTLA mentioned this issue Jan 18, 2020

Mapping labels to standardized cell ontology #84

Merged

phoebee-h mentioned this issue Aug 24, 2021

Using cell ontology with SingleR() #199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Labels for Immgen Data #68

Additional Labels for Immgen Data #68

robertamezquita commented Dec 2, 2019

anoronh4 commented Feb 9, 2020 •

edited

Loading

LTLA commented Feb 10, 2020 •

edited

Loading

vjcitn commented Feb 10, 2020

LTLA commented Feb 11, 2020

LTLA commented Feb 12, 2020

vjcitn commented Feb 12, 2020

LTLA commented Feb 13, 2020 •

edited

Loading

namit-k commented Mar 13, 2020

marencc commented Nov 30, 2023

Additional Labels for Immgen Data #68

Additional Labels for Immgen Data #68

Comments

robertamezquita commented Dec 2, 2019

anoronh4 commented Feb 9, 2020 • edited Loading

LTLA commented Feb 10, 2020 • edited Loading

vjcitn commented Feb 10, 2020

LTLA commented Feb 11, 2020

LTLA commented Feb 12, 2020

vjcitn commented Feb 12, 2020

LTLA commented Feb 13, 2020 • edited Loading

namit-k commented Mar 13, 2020

marencc commented Nov 30, 2023

anoronh4 commented Feb 9, 2020 •

edited

Loading

LTLA commented Feb 10, 2020 •

edited

Loading

LTLA commented Feb 13, 2020 •

edited

Loading