1. Request an account for Biohpc at https://biohpc.cornell.edu/lab/lab.aspx
2. Add or remove users from the mongo-db access list
Any user on the access list can add or remove users:
[root@cbsujohnson ~]# docker1 access list
Container UserID
infant-mongo-db ad596
infant-mongo-db sw835
infant-mongo-db ap689
[root@cbsujohnson ~]#
To add or remove users, use the command:
[root@cbsujohnson docker]# docker1 access
docker1 access list
prints current access list
docker1 access remove <container name> <user id>
Removes access for <user id> to <container name>
docker1 access add <container name> <user id>
Adds access for <user id> to <container name>
3. Steps to get access to the data pipeline
a. at your computer terminal, type: ssh yournetid@cbsujohnson.biohpc.cornell.edu
b. enter your password
c. type: screen (enter screen session)
d. run the following command:
docker1 run \
--volume "/workdir/yournetid/get_colors:/home/jovyan/work" \
-p 8031:8888 \
--env "GRANT_SUDO=yes" \
--env "PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--env "DEBIAN_FRONTEND=noninteractive" \
--env "CONDA_DIR=/opt/conda" \
--env "SHELL=/bin/bash" \
--env "NB_USER=jovyan" \
--env "NB_UID=1000" \
--env "NB_GID=100" \
--env "LC_ALL=en_US.UTF-8" \
--env "LANG=en_US.UTF-8" \
--env "LANGUAGE=en_US.UTF-8" \
--env "HOME=/home/jovyan" \
--env "XDG_CACHE_HOME=/home/jovyan/.cache/" \
--label "maintainer"="Jupyter Project " \
--interactive \
"jupyter/tensorflow-notebook:20221209" \
"start-notebook.sh"
If you don't have permission to edit or run after opening the Jupyter Notebook
Run this command before running the docker1 command:
chmod -R a+rwX /workdir/yournetid/get_colors
You will see a token like this: http://127.0.0.1:8888/lab?token=c7007fb9f1a6aea3c4685328c35741c968d2e24e8821ce2d
Add the highlighted part of the token to:
http://cbsujohnson.biohpc.cornell.edu:8031/lab?token=XXX
In this case, it would be:
Copy this link to your browser, you can then see the jupyter notebook page
e. type control-a, releasing, the type d (detach from the screen session)
Theoretically, by using the screen command, you should be able to start the notebook this way and leave it running indefinitely (assuming it doesn't crash anymore). It should only need to be restarted when the server is rebooted. Using the screen command is good practice for any command that you don't want to get killed if you get disconnected from the server.
f. next time when you log in, type: screen - r (resume session)
4. Steps to run the data pipeline in Jupyter Notebook
a. Run the pipeline:
Run extract_colors_pipeline.py in run_extract_colors_pipeline.ipynb
b. Store the results in the output_colors folder to Backend Database:
Run store_backend.py in run_store_backend.ipynb
5. More on The Screen Command & Provide Access for other team members
a. More details on the screen command are in this pdf:
https://biohpc.cornell.edu/lab/doc/Linux_exercise_part2.pdf
b. As far as sharing with other team members:
- they could just visit the same URL as you, but they will be working on the exact same files as you. If multiple people are working on the same notebook at the same time, you can run into problems with overwriting each other's work or interfering with each other's sessions.
- If you prefer for each lab member to have their own Jupyter Notebook environment, then they can run the same 'docker1 start' command as you, but would have to use a different port number (if you are using 8031, anything between 8009-8039 should work), and change your username to theirs in the command. Then, contact Biohpc to fix the ownership of the files as it requires admin privileges.
6. Steps to edit and run the data pipeline in the Jupyter Notebook
- Edit in the extract_colors_pipeline.py file
- Execute %run extract_colors_pipeline.py in extract_colors_pipeline.ipynb
7. Steps to edit and run the data pipeline in your computer terminal:
(in case the Jupyter Notebook does not work)
a. cd /workdir/yournetid/get_colors
b. vim /local/workdir/yournetid/get_colors/extract_colors_pipeline.py
c. search mode(find where you hope to edit): type \
d. edit mode: type i (insert)
e. save and quit: esc, type :(column) wq, then hit enter
f. to run: type python /local/workdir/yournetid/get_colors/extract_colors_pipeline.py
PS: All the steps above require access to the Cornell Network, if you are outside of Cornell University, you can connect to the VPN: https://it.cornell.edu/cuvpn
google doc version of this documentation: https://docs.google.com/document/d/10mJzJUvUKSlWoK7e-n8yJ-766NdSm0LfIWmDjoRTBQ8/edit