Access infant-mongo-db & Data Pipeline (Updated Version)

1. Request an account for Biohpc at https://biohpc.cornell.edu/lab/lab.aspx

2. Add or remove users from the mongo-db access list

Any user on the access list can add or remove users:

[root@cbsujohnson ~]# docker1 access list

Container UserID

infant-mongo-db ad596

infant-mongo-db sw835

infant-mongo-db ap689

[root@cbsujohnson ~]#

To add or remove users, use the command:

[root@cbsujohnson docker]# docker1 access

docker1 access list

prints current access list

docker1 access remove <container name> <user id>

Removes access for <user id> to <container name>

docker1 access add <container name> <user id>

Adds access for <user id> to <container name>

3. Steps to get access to the data pipeline

a. at your computer terminal, type: ssh yournetid@cbsujohnson.biohpc.cornell.edu

b. enter your password

c. type: screen (enter screen session)

d. run the following command:

docker1 run \

--volume "/workdir/yournetid/get_colors:/home/jovyan/work" \

-p 8031:8888 \

--env "GRANT_SUDO=yes" \

--env "PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \

--env "DEBIAN_FRONTEND=noninteractive" \

--env "CONDA_DIR=/opt/conda" \

--env "SHELL=/bin/bash" \

--env "NB_USER=jovyan" \

--env "NB_UID=1000" \

--env "NB_GID=100" \

--env "LC_ALL=en_US.UTF-8" \

--env "LANG=en_US.UTF-8" \

--env "LANGUAGE=en_US.UTF-8" \

--env "HOME=/home/jovyan" \

--env "XDG_CACHE_HOME=/home/jovyan/.cache/" \

--label "maintainer"="Jupyter Project " \

--interactive \

"jupyter/tensorflow-notebook:20221209" \

"start-notebook.sh"

If you don't have permission to edit or run after opening the Jupyter Notebook

Run this command before running the docker1 command:

chmod -R a+rwX /workdir/yournetid/get_colors

You will see a token like this: http://127.0.0.1:8888/lab?token=c7007fb9f1a6aea3c4685328c35741c968d2e24e8821ce2d

Add the highlighted part of the token to:

http://cbsujohnson.biohpc.cornell.edu:8031/lab?token=XXX

In this case, it would be:

http://cbsujohnson.biohpc.cornell.edu:8031/lab?token=c7007fb9f1a6aea3c4685328c35741c968d2e24e8821ce2d

Copy this link to your browser, you can then see the jupyter notebook page

e. type control-a, releasing, the type d (detach from the screen session)

Theoretically, by using the screen command, you should be able to start the notebook this way and leave it running indefinitely (assuming it doesn't crash anymore). It should only need to be restarted when the server is rebooted. Using the screen command is good practice for any command that you don't want to get killed if you get disconnected from the server.

f. next time when you log in, type: screen - r (resume session)

4. Steps to run the data pipeline in Jupyter Notebook

a. Run the pipeline:

Run extract_colors_pipeline.py in run_extract_colors_pipeline.ipynb

b. Store the results in the output_colors folder to Backend Database:

Run store_backend.py in run_store_backend.ipynb

5. More on The Screen Command & Provide Access for other team members

a. More details on the screen command are in this pdf:

https://biohpc.cornell.edu/lab/doc/Linux_exercise_part2.pdf

b. As far as sharing with other team members:

they could just visit the same URL as you, but they will be working on the exact same files as you. If multiple people are working on the same notebook at the same time, you can run into problems with overwriting each other's work or interfering with each other's sessions.
If you prefer for each lab member to have their own Jupyter Notebook environment, then they can run the same 'docker1 start' command as you, but would have to use a different port number (if you are using 8031, anything between 8009-8039 should work), and change your username to theirs in the command. Then, contact Biohpc to fix the ownership of the files as it requires admin privileges.

6. Steps to edit and run the data pipeline in the Jupyter Notebook

Edit in the extract_colors_pipeline.py file
Execute %run extract_colors_pipeline.py in extract_colors_pipeline.ipynb

7. Steps to edit and run the data pipeline in your computer terminal:

(in case the Jupyter Notebook does not work)

a. cd /workdir/yournetid/get_colors

b. vim /local/workdir/yournetid/get_colors/extract_colors_pipeline.py

c. search mode(find where you hope to edit): type \

d. edit mode: type i (insert)

e. save and quit: esc, type :(column) wq, then hit enter

f. to run: type python /local/workdir/yournetid/get_colors/extract_colors_pipeline.py

PS: All the steps above require access to the Cornell Network, if you are outside of Cornell University, you can connect to the VPN: https://it.cornell.edu/cuvpn

google doc version of this documentation: https://docs.google.com/document/d/10mJzJUvUKSlWoK7e-n8yJ-766NdSm0LfIWmDjoRTBQ8/edit

Space shortcuts

Page tree