Package gridmap

Package gridmap

source code

Grid Map provides wrappers that simplify submission and collection of jobs, in a more 'pythonic' fashion.


Authors:
Christian Widmer, Cheng Soon Ong, Dan Blanchard (dblanchard@ets.org)

Version: 0.9.5

Submodules
  • gridmap.data: This modules provides all of the data-related function for gridmap.
  • gridmap.job: This module provides wrappers that simplify submission and collection of jobs, in a more 'pythonic' fashion.
  • gridmap.runner: This module executes pickled jobs on the cluster.

Classes
  Job
Central entity that wraps a function and its data.
Functions
 
process_jobs(jobs, temp_dir=u'/scratch/', wait=True, white_list=None, quiet=True)
Take a list of jobs and process them on the cluster.
source code
 
grid_map(f, args_list, cleanup=True, mem_free=u'1G', name=u'gridmap_job', num_slots=1, temp_dir=u'/scratch/', white_list=None, queue='nlp.q', quiet=True)
Maps a function onto the cluster.
source code
 
pg_map(f, args_list, cleanup=True, mem_free=u'1G', name=u'gridmap_job', num_slots=1, temp_dir=u'/scratch/', white_list=None, queue='nlp.q', quiet=True) source code
Variables
  USE_MEM_FREE = False
Does your cluster support specifying how much memory a job will use via mem_free? Can be overriden by setting the GRID_MAP_USE_MEM_FREE environment variable.
  DEFAULT_QUEUE = 'nlp.q'
The default job scheduling queue to use; can be overriden via the GRID_MAP_DEFAULT_QUEUE environment variable.
  REDIS_PORT = 7272
The port of the Redis server to use; can be overriden by setting the GRID_MAP_REDIS_PORT environment variable.
  REDIS_DB = 2
The index of the database to select on the Redis server; can be overriden by setting the GRID_MAP_REDIS_DB environment variable.
  MAX_TRIES = 10
Maximum number of times to try to get the output of a job from the Redis database before giving up and assuming the job died before writing its output; can be overriden by setting the GRID_MAP_MAX_TRIES environment variable.
  SLEEP_TIME = 3
Number of seconds to sleep between attempts to retrieve job output from the Redis database; can be overriden by setting the GRID_MAP_SLEEP_TIME environment variable.
Function Details

process_jobs(jobs, temp_dir=u'/scratch/', wait=True, white_list=None, quiet=True)

source code 

Take a list of jobs and process them on the cluster.

Parameters:
  • temp_dir (basestring) - Local temporary directory for storing output for an individual job.
  • wait (bool) - Should we wait for jobs to finish? (Should only be false if the function you're running doesn't return anything)
  • white_list (list of basestring) - If specified, limit nodes used to only those in list.
  • quiet (bool) - When true, do not output information about the jobs that have been submitted.

grid_map(f, args_list, cleanup=True, mem_free=u'1G', name=u'gridmap_job', num_slots=1, temp_dir=u'/scratch/', white_list=None, queue='nlp.q', quiet=True)

source code 

Maps a function onto the cluster.

Parameters:
  • f (function) - The function to map on args_list
  • args_list (list) - List of arguments to pass to f
  • cleanup (bool) - Should we remove the stdout and stderr temporary files for each job when we're done? (They are left in place if there's an error.)
  • mem_free (basestring) - Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (basestring) - Base name to give each job (will have a number add to end)
  • num_slots (int) - Number of slots each job should use.
  • temp_dir (basestring) - Local temporary directory for storing output for an individual job.
  • white_list (list of basestring) - If specified, limit nodes used to only those in list.
  • queue (basestring) - The SGE queue to use for scheduling.
  • quiet (bool) - When true, do not output information about the jobs that have been submitted.

Note: This can only be used with picklable functions (i.e., those that are defined at the module or class level).

pg_map(f, args_list, cleanup=True, mem_free=u'1G', name=u'gridmap_job', num_slots=1, temp_dir=u'/scratch/', white_list=None, queue='nlp.q', quiet=True)

source code 
Parameters:
  • f (function) - The function to map on args_list
  • args_list (list) - List of arguments to pass to f
  • cleanup (bool) - Should we remove the stdout and stderr temporary files for each job when we're done? (They are left in place if there's an error.)
  • mem_free (basestring) - Estimate of how much memory each job will need (for scheduling). (Not currently used, because our cluster does not have that setting enabled.)
  • name (basestring) - Base name to give each job (will have a number add to end)
  • num_slots (int) - Number of slots each job should use.
  • temp_dir (basestring) - Local temporary directory for storing output for an individual job.
  • white_list (list of basestring) - If specified, limit nodes used to only those in list.
  • queue (basestring) - The SGE queue to use for scheduling.
  • quiet (bool) - When true, do not output information about the jobs that have been submitted.

Deprecated: This function has been renamed grid_map.