Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: draft of a minimal "developer" base image #346

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

sjfleming
Copy link

@sjfleming sjfleming commented Aug 5, 2022

This is a discussion of the general strategy toward resolving #333 .

Some preliminary discussion (design doc) is here
https://docs.google.com/document/d/1b9uXanA3uCxUJoKktczxFXpJ1o3TYeXnaM0PGvbAczM/edit#


What has been done:

  • This adds a folder called ./terra-jupyter-dev-base to the repo that contains two files:
    • Dockerfile
    • build_docker.sh: you can run this script to build the image, which is called terra-jupyter-dev-base:0.0.1

Results so far:

  • A (seemingly) working image built on ubuntu:20.04 that can successfully run a Jupyter notebook server using the command docker run --rm -it -p 8888:8000 terra-jupyter-dev-base:0.0.1
  • Image size is 1.77 GB

Testing:

  • Very little...
    • No integration testing to see if this actually works in Terra
    • No testing to see if the scripts function (the scripts that do welder-related things? and the custom Jupyter extensions?)
  • If I run docker run --rm -it -p 8888:8000 terra-jupyter-dev-base:0.0.1, then I can
    • Successfully connect to the notebook server from a browser
    • See my python3 kernel, which points to the correct python3 install in the image
    • Successfully run simple commands in the Jupyter notebook, i.e. the kernel works
  • No actual tests have been added to the repo: I don't understand enough about the repo to do this
    • But tests should be exactly the same as for terra-jupyter-base?

@sjfleming
Copy link
Author

Anybody please feel free to jump in with any comments or advice
@Qi77Qi @SHuang-Broad

&& apt-get update && apt-get install -yq --no-install-recommends \
sudo \
&& sudo -i \
echo "deb http://security.ubuntu.com/ubuntu/ bionic main" >> /etc/apt/sources.list \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bionic is code for 18.04. Is this supposed to be Focal Fossa?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: I am not sure about this. I copied it directly from terra-jupyter-base. I'm honestly not sure what the base operating system actually is in terra-jupyter-base, since it uses a google deep learning base image gcr.io/deeplearning-platform-release/tf-gpu.2-7, and I can't seem to find the details.

The only thing I know is that, if I remove these steps from the build, it doesn't work. These commands seem to be necessary for successful install of libexempi3 and libv8-3.14-dev (which in turn are only needed in order to install the python packages firecloud and terra-notebook-utils)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't mean to delete these. I mean for 20.04, shouldn't there be a new name other than bionic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, yeah, you could be right. I'm not sure. Somehow this works as is. Is it maybe a workaround to install bionic packages in later ubuntu releases? Or maybe it does need to be updated from bionic to focal fossa here. I would need to ask the person who wrote this part :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's now focal replacing bionic for 20.04.

# Install jupyter and some necessary python packages
&& pip3 -V \
# For gcloud alpha storage support.
&& pip3 install google-crc32c --target /usr/lib/google-cloud-sdk/lib/third_party \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about mixing pip and conda, but I know this existed before this PR.

# When we upgraded from jupyter 5.7.8 to 6.1.1, we broke terminal button on terra-ui.
# Hence, make sure to manually test out "launch terminal" button (the button in the green bar next to start and stop buttons)
# to make sure we don't accidentally break it every time we upgrade notebook version until we figure out an automation test for this
&& pip3 install notebook \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to pin down the versions? And/or have a requirements.txt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements.txt would be a nice addition!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes sense to pin them

@Qi77Qi
Copy link
Collaborator

Qi77Qi commented Aug 5, 2022

@sjfleming
Copy link
Author

@Qi77Qi is that something other than these?
https://github.com/sjfleming/terra-docker/blob/186a8b7225a04a2a704f26bbb939fdf4bd3e03b8/terra-jupyter-dev-base/Dockerfile#L140-L143

(I didn't see any other scripts being added to terra-jupyter-base)

@jdcanas
Copy link
Contributor

jdcanas commented Aug 5, 2022

You should be able to push the image to gcr or the like, and specify it as a custom image to test it in terra

image

@sjfleming
Copy link
Author

@jdcanas Do you think I'm ready to try that out? Is that the next step?

@sjfleming
Copy link
Author

@jdcanas Okay I did try it out ... I created a Terra Compute Environment using this image.

  • It does work:
    • I can run cells / do compute in a notebook
    • I can save the notebook as normal, and it gets written to the google bucket associated with the workspace
      • I see messages about auto-saving
    • I can pip install packages from cells in the Jupyter notebook, and it seems to work fine
    • I can stop the environment, restart it, open a saved notebook, and work with it
    • It behaves like any Terra notebook environment as far as I can tell

image

The google bucket for the workspace, in the notebooks folder:
image

If anybody else wants to test out the image (without having to build it themselves), it is currently hosted here

us.gcr.io/broad-dsde-methods/terra-jupyter-dev-base:0.0.1

although I can't promise I won't over-write it after more changes and testing.

@SHuang-Broad
Copy link

SHuang-Broad commented Aug 5, 2022

If what @sjfleming experimented above is allowed, then this section on the readme needs to be updated as well

https://github.com/databiosphere/terra-docker#terra-base-images

Screen Shot 2022-08-05 at 3 55 43 PM

@sjfleming
Copy link
Author

Okay so I figured, while I'm at it, let's make an equivalent base image that enables GPU functionality and has all the CUDA stuff pre-installed. This might be out of scope for this current PR, but I'm going to include the Dockerfile here in case anyone wants to look at it. The size of the image is 2.63 GB.

I have hosted it here for the time being, if anyone wants to play with it

us.gcr.io/broad-dsde-methods/terra-jupyter-gpu-dev-base:0.0.1

Tested:

  • If I do nvidia-docker run --rm -it --entrypoint /bin/bash terra-jupyter-gpu-dev-base:0.0.1 and then, inside the container, I run conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch to install pytorch, then I can do python -c "import torch; print(torch.cuda.is_available())" and I get True
    • I can also successfully run nvidia-smi
    • This means the GPU is ready to roll
  • Also made a Terra Cloud Environment successfully (with a GPU), and all the Jupyter stuff still works as above
    • nvidia-smi also works in Terra, meaning the GPU is good to go

image

I always build GPU-enabled images from the Nvidia docker image catalog (i.e. FROM nvcr.io/nvidia/cuda:11.3.1-base-ubuntu20.04). They're great, and they're way smaller than the google deep learning images. And this way, neither pytorch nor tensorflow is pre-installed. People can build an image on top of this that installs either of those things, but they won't have stuff they don't need.


NOTE: the two new images in this PR are identical except for the FROM first line. This could easily be refactored so that there's only one Dockerfile and two separate build_docker.sh files. (It only works because the Nvidia base image is also ubuntu:20.04)

@Qi77Qi
Copy link
Collaborator

Qi77Qi commented Aug 9, 2022

@Qi77Qi is that something other than these?
https://github.com/sjfleming/terra-docker/blob/186a8b7225a04a2a704f26bbb939fdf4bd3e03b8/terra-jupyter-dev-base/Dockerfile#L140-L143

Is that actually working? you don't have scripts dir in terra-jupyter-dev-base to begin with...

Regardless tho, I think we're not planning to advertise these images for other users any time soon (if at all)...so it's really just for your own use case for now..If things are working for you, I don't mind merging them as is (adding a comment like "this image is not officially supported, use at your own risk" might be good in case others start using them). Bear in mind, it's additional work to add automation for building these 2 images, so you'll need to build them manually until we figure out if this is something we want to support

@sjfleming
Copy link
Author

Hi @Qi77Qi , yes, the reason I don't need to have the scripts copied into the same terra-jupyter-dev-base directory is that in the build script build_docker.sh, I am setting the "build directory" to the terra-jupyter-base directory, which contains the scripts, so docker can COPY them during the build:
https://github.com/sjfleming/terra-docker/blob/c5891655b6821b00e70e77e5b28f01c287bdfd71/terra-jupyter-dev-base/build_docker.sh#L3
it's the ../terra-jupyter-base part at the end that sets the build directory.

That sounds good to me! I'm sure building manually is fine for now. If anybody else ends up using these images in the future, then maybe automation could be revisited.

@Qi77Qi
Copy link
Collaborator

Qi77Qi commented Aug 10, 2022

ah I see..that makes sense...whenever you're ready, feel free to put this to non draft

@sjfleming
Copy link
Author

Okay thanks Qí! I will do a little bit of cleanup:

  • refactor the GPU image to use the non-GPU Dockerfile with an input argument for the base image (since they are identical currently)
  • use two requirements files for the python package installs, and pin all the versions of packages
    • requirements.txt that has all the "normal" python packages
    • requirements_gcc.txt that has the python packages that require gcc in order to build them (firecloud and terra-notebook-utils)
      • also explore whether we can get around these gcc requirements in the first place by doing a conda install instead of pip install?

@sjfleming
Copy link
Author

Turns out I was not including the google cloud SDK, but I realize that's 100% necessary. (I'm guessing it's part of the tensor flow base image, so I didn't realize I was missing it at first, since it's not explicit in the "base" Dockerfile.)

That will increase the size of the image a bit.

@sjfleming
Copy link
Author

I am a bit confused about how cloud file syncing (syncing the notebook ipynb file to the project's google bucket) was working if my image did not have gsutil. I guess there's another image running on the VM that takes care of that kind of copy operation?

@sjfleming
Copy link
Author

With gsutil the non-gpu image is now 1.96 GB

Feel free to experiment with it here (I've renamed it)

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-base:0.0.1

@sjfleming
Copy link
Author

And the GPU-enabled image is 2.02 GB

It is hosted here temporarily

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-gpu-base:0.0.1

(Of course this doesn't have pytorch or tensorflow installed, so that's why it seems small.)

@sjfleming
Copy link
Author

I installed gsutil to the home directory... that turned out to be a bad idea, since I think everything in there gets clobbered when the VM gets set up. Let me fix that.

@sjfleming
Copy link
Author

It seems like $PATH is getting overwritten by some step in this process. When I run an image locally, I see the $PATH I have specified in my docker build:

/opt/gcloud/google-cloud-sdk/bin:/opt/conda/bin:/home/jupyter/.local/bin:/home/jupyter/packages/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

But then when I actually use the image to start a Cloud Environment in Terra, I see something different. It is always see the exact same $PATH, regardless of my Dockerfile, and it is the same for official images as well, like the GATK image:

/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/jupyter/.local/bin:/home/jupyter/packages/bin

@Qi77Qi is this expected behavior?

(I'm trying to fix a gsutil command not found error I am seeing in a Cloud Environment, and it seems to come down to the above problem. For now, I will try to sidestep this by installing somewhere on that fixed PATH, if I can figure out how)

@sjfleming
Copy link
Author

Tested the images, and they seem to work, but I might have found one issue:
when I try

import os

os.environ['WORKSPACE_BUCKET']

I see '.'
while for other Cloud Environments I have built previously with an official Terra image, I see something more reasonable that's an actual bucket path.

Where / how does the WORKSPACE_BUCKET environment variable get set?

Other environment variables all seem to be okay, for instance

os.environ['WORKSPACE_NAMESPACE']

'bayer-pcl-cell-imaging'
is correct


This is now ready for review @Qi77Qi @jdcanas

There are two new images included in this PR:

terra-jupyter-minimal-base
terra-jupyter-minimal-gpu-base

each can be built using the build_docker.sh script in the respective folder.

@sjfleming sjfleming marked this pull request as ready for review September 1, 2022 20:52
@thouis
Copy link

thouis commented Jun 21, 2023

Did this development die, or is it continuing elsewhere? Being able to start with a minimal Terra install for custom images would be quite useful.

@sjfleming
Copy link
Author

Hey @thouis , it certainly didn’t die from my perspective. But it looks like we haven’t made any progress toward merging this PR or making the minimal base images “official”. I’d still love to see it happen. Maybe your interest will help revitalize the effort.

@sjfleming
Copy link
Author

If you want, feel free to try

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-gpu-base:0.0.1

from this PR. That’s what I’m using now as the base for images I build for Terra. (As with any custom image, I find it’s necessary to extend the default “timeout” for Cloud Environment creation to much longer than 10 mins.)

The only lingering problem is this (#346 (comment)) but I’ve been able to work around it. I think I would need help from the experts to solve the issue.

@thouis
Copy link

thouis commented Jun 22, 2023 via email

@sjfleming
Copy link
Author

Hi Ray, I would also like to update this to cuda 11.8 and python 3.8, since python 3.8 is now needed for pytorch 2.0+, which is the ecosystem I work in. It's been quite a while since I touched this.

If you have made those changes, I'd be happy to merge them into my branch if you wanted them to become part of this PR. Or I could incorporate the changes on my side if that's easier. Was it just a matter of using a different base image and then using a different miniconda version? And everything else still worked?

So the comment above is, I believe, an isolated issue that doesn't indicate deeper concerns. But that's me guessing! I've tested out other stuff pretty extensively, just by using it on Terra, and everything seems to work just fine. I think what's going on is that somewhere deep in the Leonardo or Welder codebase (which I'm just not familiar with at all), the environment variables in any image run on Terra are being overwritten, so that users can access things like the workspace name and the bucket. I have no idea why it's not working as expected here... and just the bucket environment variable. The workspace name environment variable still works! It really seems to just be the bucket. It does bug me though. But I can't figure it out myself.

Somebody from the Interactive Analysis DSP team please correct me if I'm wrong! @rtitle do you know how these environment variables get set?

@thouis
Copy link

thouis commented Jun 22, 2023

My changes were:

Use these instead:
nvcr.io/nvidia/cuda:11.8.0-base-ubuntu20.04
Miniconda3-py38_23.3.1-0-Linux-x86_64.sh

For cromshell, I had to change to this (tags have changed?):
git clone --branch 2.0.0 --single-branch https://github.com/broadinstitute/cromshell.git $HOME/cromshell

Also, just after installing Miniconda, I install mamba (much faster drop-in replacement for conda)
conda install -y mamba -c conda-forge

And added this at the end (because mamba was complaining about compatible versions)
RUN pip3 install -U requests

Then I was able to use this to get torch:
RUN mamba install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

@sjfleming
Copy link
Author

sjfleming commented Jun 22, 2023

Okay thanks! I'll update this. I'm conflicted as to whether to pre-install mamba. I have heard people like it a lot. Then again, I want to stick to the philosophy of having this be as "minimal" as possible.

But it probably doesn't hurt to have it.

@thouis
Copy link

thouis commented Jun 22, 2023 via email

sjfleming and others added 4 commits August 30, 2023 17:13
* Python 3.10 and suggested updates from Ray Jones; install mamba

* Bump to version 0.0.2

* Update google-cloud-cli version

* Install compiled crcmod

* Update package versions and install libarchive for mamba
@sjfleming
Copy link
Author

Alright @thouis , I have taken your advice, thank you for it. And actually I went ahead and upgraded to python 3.10 since that's what all the other Terra docker images are using now (I think). I did also install mamba. Package versions across the board are now pinned to more up-to-date versions as well.

The images are currently hosted here, if anyone is interested:

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-base:0.0.2
us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-gpu-base:0.0.2

sizes have increased slightly:

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-gpu-base   0.0.2       1.94GB
us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-base       0.0.2       1.8 GB

These images can successfully create a Terra Cloud Environment that seems to function normally (still barring the comment above about missing the environment variable WORKSPACE_BUCKET).

@sjfleming
Copy link
Author

sjfleming commented Aug 31, 2023

Compressed sizes:

us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-base:0.0.2         674.4 MB
us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-gpu-base:0.0.2     724.8 MB

@nick-youngblut
Copy link

Any status updates on merging terra-jupyter-minimal-base? My group also needs Python 3.8 (as with @thouis ) for some specific bfx software.

If anything, we could probably modify us.gcr.io/broad-dsde-methods/terra-jupyter-minimal-base:0.0.2 from this PR in order to use Python 3.8 instead of Python 3.10, but it would be great if there was more support for each major Python version.

Also, it was a bit of a treasure hunt to find this PR, since this info isn't in the Terra-Bio docs, AFAIK.

@sjfleming
Copy link
Author

@nick-youngblut I think the official effort toward making this happen has moved to this PR:
#463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants