diff --git a/README.md b/README.md index fd58623..59386fa 100644 --- a/README.md +++ b/README.md @@ -131,14 +131,16 @@ Then, launch Firefox web browser using the configured profile. Open `localhost:8 ## Launch a Ray Cluster in Batch Mode -1. Add execution permissions to `start-ray-worker.sh` +Let us [estimate the value of π](https://docs.ray.io/en/releases-2.8.0/ray-core/examples/monte_carlo_pi.html) as an example application. + +**Step 1.** Add execution permissions to `start-ray-worker.sh` ```bash cd deployment_scripts chmod +x start-ray-worker.sh ``` -2. Submit a job to launch the head and worker nodes. +**Step 2.** Submit a job to launch the head and worker nodes. You must modify the following lines in `submit-ray-job.sh`: - Line 3 changes the cluster size. The default configuration launches a 3 node cluster. @@ -155,4 +157,6 @@ qstat -anw # Q: Queued, R: Running, E: Ending ls -l # list files after the job finishes cat ray-job.o... # inspect the output file cat ray-job.e... # inspect the error file -``` \ No newline at end of file +``` + +If you need to delete the job, use `qdel `. If this doesn't work, use the `-W force` option: `qdel -W force ` \ No newline at end of file diff --git a/deployment_scripts/submit-ray-job.pbs b/deployment_scripts/submit-ray-job.pbs index e0b6caa..0083d64 100644 --- a/deployment_scripts/submit-ray-job.pbs +++ b/deployment_scripts/submit-ray-job.pbs @@ -40,7 +40,7 @@ ray start --disable-usage-stats \ export NUM_NODES=$(sort $PBS_NODEFILE |uniq | wc -l) for ((i=1;i<$NUM_NODES;i++)); do - pbsdsh -n $i -- bash -l -c "'$DEPLOYMENT_SCRIPTS/ray-start-worker.sh' '$WS_DIR' '$ENV_ARCHIVE' '$RAY_ADDRESS' '$REDIS_PASSWORD' '$OBJECT_STORE_MEMORY'" & + pbsdsh -n $i -- bash -l -c "'$DEPLOYMENT_SCRIPTS/start-ray-worker.sh' '$WS_DIR' '$ENV_ARCHIVE' '$RAY_ADDRESS' '$REDIS_PASSWORD' '$OBJECT_STORE_MEMORY'" & done python3 $PYTHON_FILE