Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
proteinortho
proteinortho
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 0
    • Merge Requests 0
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • PHD
  • proteinorthoproteinortho
  • Wiki
  • Large compute jobs (the jobs option)

Last edited by Paul Klemm Sep 02, 2020
Page history

Large compute jobs (the jobs option)

Splitting your input in multiple parts

Given your fasta files in infile/*.fasta, first do step 1 (generation of the index files):

perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -step=1 infile/*.fasta

Then each of the following lines are generating a tenth of the step=2 (RBH generation):

perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=1/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=2/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=3/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=4/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=5/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=6/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=7/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=8/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=9/10 -step=2 infile/*.fasta
perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -jobs=10/10 -step=2 infile/*.fasta

!! Dont use the -keep option here as it only increases the I/O usage dramatically !!

These commands can be called on different machines at different times. The output blast-graphs then are containing numbers from the job:

test.blast-graph_1_10, test.blast-graph_2_10, ..., test.blast-graph_10_10

What to do with the .blast-graph1, .blast-graph2, .. ?

Proteinortho can use the part files for step 3 with no problem (just give the same --project= name as in step 2). So for the example above use

perl /home/klemmp/proteinortho-master/proteinortho.pl -project=test -step=3

Or you can simple concatenate the blast-graph1, .blast-graph2, .. files to a single blast-graph file

qsub (MARC2) script

qsub script for deploying 1 job (on the MARC2 cluster) with 64 cores working on the first 1/10th of the input files:

#\$ -S /bin/bash
#\$ -e /home/klemmp/sge
#\$ -o /home/klemmp/sge
#\$ -l h_rt=200000
#\$ -l h_vmem=2G
#\$ -pe orte_sl64* 64
#\$ -cwd
#\$ -N q_test

. /etc/profile.d/modules.sh
module purge
module load gcc/6.3.0

/usr/bin/time -f "%e,%M" perl /home/klemmp/proteinortho-master/proteinortho.pl -jobs=1/10 -project=$projectname -step=2 -cpus=64 -binpath=/home/klemmp/bin -p=$p infile/*.fasta -tmp=\$TMPDIR >"/scratch/klemmp/stdout" 2>"/scratch/klemmp/stderr"

bash script for generating qsub (MARC2) scripts

Bash script for deploying 10 distinct jobs (on the MARC2 cluster) each with 64 cores working on the same input files but on different parts:

#!/bin/bash

p=mmseqsp
cores=64
projectname="$p"_ob"$cores"
infile=/scratch/klemmp/fasta_2017
numofjobs=10

mkdir /scratch/klemmp/$projectname
cd /scratch/klemmp/$projectname

for i in `seq 1 $numofjobs`
do
echo "#\$ -S /bin/bash
#\$ -e /home/klemmp/sge
#\$ -o /home/klemmp/sge
#\$ -l h_rt=200000
#\$ -l h_vmem=2G
#\$ -pe orte_sl$cores* $cores
#\$ -cwd
#\$ -N q"$i"_"$p"

. /etc/profile.d/modules.sh
module purge
module load gcc/6.3.0

mkdir $i
cd $i

\$(/usr/bin/time -f \"%e,%M\" perl /home/klemmp/proteinortho-master/proteinortho.pl -jobs=$i/$numofjobs -project=$projectname -step=2 -cpus=$cores -binpath=/home/klemmp/bin -p=$p $infile/*.fasta -tmp=\$TMPDIR >\"/scratch/klemmp/$projectname.$i.stdout\" 2>\"/scratch/klemmp/$projectname.$i.stderr\"
">q_$i

if [ $cores -eq 64 ]; then
        qsub -R y q_$i
else
        qsub q_$i
fi
Clone repository
  • Continuous Integration
  • Error Code
  • Error Codes
  • FAQ
  • Large compute jobs (the jobs option)
  • Tools and additional programs
  • biological examples
  • Home