Recall that the spec of a FireWork completely bootstraps a job and determines what will run. One the major tasks as a FireWorks user is to decide how your spec is structured. We suggest you keep the following suggestions in mind:
Note
You can also put input data needed by your FireTasks within the _tasks section of your spec. For example, the ScriptTask we explored defined the script input parameter within the _tasks section (see the introductory tutorial). Generally, this technique makes querying on your parameters more difficult and can lead to input data repetition if you have many FireTasks that need to access the same data. However, its advantage is preventing namespace collisions between parameters used in different FireTasks or if when running the same FireTask multiple times. For example, you might have multiple ScriptTask instances in your FireWork, and each needs its own distinct script rather than a global parameter. The default FireTasks built into FireWorks generally allow you to choose how you want to do things; for example, see the _use_global_spec option in the ScriptTask documentation.
Imagine that each of your workloads involves 6 computing steps. There are many ways you could map your workload into a Workflow:
A simplified version of the problem with 2 computing steps is shown below:
We saw an example of the “multi FireTask, single FW” solution in the firetask tutorial and the “single FireTask, multi FW” solution in the Creating Workflows. Which is better?
In general, using fewer FireWorks is simpler to implement, but less powerful. We suggest that you start by trying to map your workload into a single FireWork with multiple tasks to maximize simplicity. Then, use the following table to help decide what points of your workload would benefit from being encapsulated into its own FireWork, and what parts would be cumbersome to split up.
Multiple FireTasks w/in one FireWork | Multiple FireWorks |
---|---|
all jobs executed on the same directory, on the same machine | jobs executed in different directories, perhaps on different machines (custom) |
cannot rerun individual task (e.g. on failure); must rerun all tasks in FW | can rerun individual FW; each FireWork acts as a “checkpoint” |
all tasks receive the same spec. Cannot modify spec between tasks | each FW can have its own spec; dynamic changes can be passed between FW through FWAction |
Cannot branch workflows, tasks executed in serial as a linear array | run any workflow shape, e.g. diamond- shaped branching for parallelization |
Cannot dynamically change WF between tasks | Can change workflow between FWs through FWAction |
Cannot get statistics like runtime for individual tasks within a FW | Statistics like runtime recorded per FW |
When running on queue, all tasks must be within same batch job. Means that all tasks must finish within the walltime (bad if you have walltime restrictions) | Each FW can have its own batch job. Splitting workload into multiple FW can reduce walltime needed for each FW |
Cannot duplicate check each FireTask individually | Each FW within a workflow can be checked for duplication with past runs |
What you might notice is that the FireWorks codebase treats the “FireWork” object to be the atomic computing job. Jobs are launched, tracked, restarted, and duplicate-checked at the FireWork level. The FireTasks are a convenience that let you simplify some of the overhead associated with having many FireWorks, especially when you want to run many tasks in the same directory and on the same machine. However, not many features are available at the FireTask level.
You’ve made it to the end of the workflows tutorial! By now you should have a good feeling for the basic operation of FireWorks and the types of automation it allows. However, it is certainly not the end of the story. Job priorities, duplicate job detection, and running through queues are just some of the features we haven’t discussed in the core tutorial.
If you haven’t already set up Worker computing resources to execute your jobs, you might do that now by following the Worker tutorial. Otherwise, you might return to the home page and choose what topic to pursue next.