Common Issues and Solutions for SLURM Scheduler System
After you start using the cluster for a long time, the most time-consuming issues are usually not "not knowing how to submit jobs," but rather abnormal job queuing, mismatched resource requests, environment initialization failures, or scripts that appear to have no errors but just won't run.
This material is suitable to keep handy as a troubleshooting entry point. When encountering problems, first check common symptoms and handling approaches -- it's often faster than repeated trial and error, and makes it easier to locate whether the issue is with the script, environment, or resource request itself.