ClusterTable of Contents
add_input_fileassign_cluster_workconstruct_work_for_simget_workJob Brokeropen_output_fileopen_staging_fileregister_workersetup_sim_jobJob BrokerPerl code for job broker functionality.A broker instance maintains a database of jobs. Each job has a descriptio
of some sort (heh), including a field defining the type of job. The pla
at this point is to not generalize the job type management, but to rather t
inclde a job type in the job table, but then to use a separate table to hol
the individual of work in each of the job types (sim-computaton, scopmap, etc).A complication that might occur is that we wish to be able to allocate a piec
of work with a single query "get me the next available piece of work and mar
that piece as being worked on". If jobs are kept in multiple tables, this become
difficult. We may be able to get around this by keeping a single work tabl
that refers, for each piece of work, to the work type and identifier for tha
workpiece.register_workerRegister this worker with the task manager. It will return a worke
ID for use in future calls. We pass the cluster name here as well
and the cluster ID is also returned.get_workRetrieve the next piece of work.We first want to retrieve work that is part of a job that does not have any (remaining
cluster work items. If there is no such work, attempt to retrieve a piece o
cluster work. If there is none, return a wait code.We begin the process by creating two lists. First, a list of jobs that have cluster work to be don
(either available or currently being worked upon) on MY_CLUSTER_ID (the ID of the cluster tha
the worker in question belongs to):Second, a list of jobs that have noncluster work to do:We can now define our policy. The ORDER BY clauses, together with th
sequential allocation of job and work identifiers by th
auto-incrementing table keys, enforce a FIFO ordering on jobs and wor
units. We choose work by picking the lowest numbered job between th
two lists.In other words, if we have a job with noncluster work available with
lower jobid than another job with cluster-work available, we wil
allocate first to the lower job.Similarly, if there is cluster work ready to be done for the cluste
the current worker is part of, we prefer doing that even to working o
a job that has work to be done that has a larger jobid.Note that that the two queries above do not take into account wor
that is currently in progress. We must account for the following case:If there is no cluster work available for a particular job, and there i
noncluster work available for that job, we must check that the cluste
work for that job is actually finished:We require that the count above be zero; otherwise we cannot allocate
work out of SOMEJOB. If it is the case that additional jobs ar
available, it is then possible to allocate work out of them, followin
the rules above.We can use grouping to determine this status in fewer queries:This gives us the complete status of cluster work for my cluster id.setup_sim_jobSet up for a new similarity computation.We are given a NR filename, a fasta input filename, a chunk size
and an optional BLAST threshhold.We create a new sim_job record for this job, and create a spool director
for it. The NR and fasta input are copied to the spool directory.The fasta is carved up into chunk_size blocks of sequences. js_work and js_sim_wor
records are created for each of the blocks.assign_cluster_workAssign a piece of cluster work from job $job_id to worker $worker_id.Construct the work assignment to be returned from a get_wor
request. This routine is given all of the particulars for a pice
of work, including the name of the worktype. We attempt to creat
the return by invoking $self->constuct_work_for_TYPE.(Better design likely needed for this, but this is proof of principle code.)$actual_work_id is the work_id that has the type-specific work attached. Thi
will be different than $work_id in the case of cluster work, where a bas
piece of work is snapshotted for each cluster; the type-specific work informatio
remains attached to the base work.construct_work_for_simJob-specific work return method.open_staging_fileOpen a filehandle to the file we are staging to a client.We are given the job_id and work_id that define the file. If there are an
problems, return undef (or die if there is an error message).open_output_fileOpen a filehandle to the file we are writing as output from a worker.We are given the job_id and work_id that define the file. If there are an
problems, return undef (or die if there is an error message).add_input_fileAdd a file to the set of files to be staged for input to a job. Returns the file id.