multiple node test successful

This commit is contained in:
Kerem Kayabay 2024-01-05 16:25:02 +01:00
parent ef8058ea34
commit 5a8bf27936
2 changed files with 8 additions and 4 deletions

View file

@ -131,14 +131,16 @@ Then, launch Firefox web browser using the configured profile. Open `localhost:8
## Launch a Ray Cluster in Batch Mode
1. Add execution permissions to `start-ray-worker.sh`
Let us [estimate the value of π](https://docs.ray.io/en/releases-2.8.0/ray-core/examples/monte_carlo_pi.html) as an example application.
**Step 1.** Add execution permissions to `start-ray-worker.sh`
```bash
cd deployment_scripts
chmod +x start-ray-worker.sh
```
2. Submit a job to launch the head and worker nodes.
**Step 2.** Submit a job to launch the head and worker nodes.
You must modify the following lines in `submit-ray-job.sh`:
- Line 3 changes the cluster size. The default configuration launches a 3 node cluster.
@ -155,4 +157,6 @@ qstat -anw # Q: Queued, R: Running, E: Ending
ls -l # list files after the job finishes
cat ray-job.o... # inspect the output file
cat ray-job.e... # inspect the error file
```
```
If you need to delete the job, use `qdel <job-id>`. If this doesn't work, use the `-W force` option: `qdel -W force <job-id>`

View file

@ -40,7 +40,7 @@ ray start --disable-usage-stats \
export NUM_NODES=$(sort $PBS_NODEFILE |uniq | wc -l)
for ((i=1;i<$NUM_NODES;i++)); do
pbsdsh -n $i -- bash -l -c "'$DEPLOYMENT_SCRIPTS/ray-start-worker.sh' '$WS_DIR' '$ENV_ARCHIVE' '$RAY_ADDRESS' '$REDIS_PASSWORD' '$OBJECT_STORE_MEMORY'" &
pbsdsh -n $i -- bash -l -c "'$DEPLOYMENT_SCRIPTS/start-ray-worker.sh' '$WS_DIR' '$ENV_ARCHIVE' '$RAY_ADDRESS' '$REDIS_PASSWORD' '$OBJECT_STORE_MEMORY'" &
done
python3 $PYTHON_FILE