BRTE Standards Channel

BRTE Standards

Intorduction

The Batch RunTime Environment (BRTE) defines a set of standards and interfaces to which all batch jobs associated with UIS maintained production applications must conform. It specifies locations of resources as well as programming standards that need to be adhered to. It is designed to interact very closely with SPDE.

Goals

The BRTE strives to accomplish the following goals:

Enable developers to quickly develop batch jobs through the use of standards and re-usable code.
Enable consistent support from operations and production services staff.
Enable the consistent and effective execution of application business logic.
Enhance flexibility of application deployment.

Overview

BRTE delivers most of these goals through three main components - a set of programming standards, consistent resources provided through a abstract system, and a library of functions for common tasks.

The programming standards specify a general program flow and guidelines for how programs should execute. It does have some requirements that all programs must follow. In addition it specifies a minimal standard of documentation that must be adhered to when the code is deployed in the environment. Example programs exist to demonstrate these standards. Indeed these examples can be copied and used as the a skeleton for new batch programs.

A set of resources, such as standard places to find or work on files, is defined. This is done through the use of environment variables. Whenever the application requires to access resources it must do so through these environment variables. These variables can be thought of representing an abstract system in which the batch job is being executed. By changing the values of these environment variables one can enable the batch job to operate in an arbitrary physical system. This enhances the flexibility of the program as well as enabling better development and testing standards.

The third component of the BRTE is the set of re-usable code libraries. These libraries provide common services such as logging, error handling, event notification, and connections to other services. These modules are available for the two common languages in which most batch jobs are written – ksh and perl. These libraries allow developers to more quickly develop programs as well as enhancing supportability of the program.

Standards

The following standards have been defined to ensure that all programs are supportable both by operations staff and other developer's either working in the same application or across applications.

Program flow

All batch scripts should be defined as a series of transactional steps. Each step should represent a unit of business logic. Except in the most trivial cases, the actual business logic will not be implemented in the ksh, but will be implemented in some other fashion - a compiled program, perl script, or SQL statement for instance.

The batch script makes an initial call to the BRTE libraries which initializes the BRTE environment. The batch script will consist mainly of functions defining each step. Then there is a call to a function that determines the order of function evaluation and then evaluates them in that order. Then cleanup work is performed.

The call to the /opt/brte/batch_modules initializes several variables that are used by programmer directly and indirectly via BRTE. In addition, the library initializes some runtime files such as the log and process ID files.

The steps are written as functions. An executor function actually calls each function in order. This is accomplished by a naming standard for the function steps. The function step names need to be of the form STEP?? where the ?? represents a number to denote the sequence in which this step is to be executed. Step numbers 01 to 1000 are supported.

If an error occurs during the step, then appropriate error codes need to be returned. This may be done via the error handling module or may be done explicitly in the function. It is legal for the error handler to exit the program without returning to the executor function.

The following skeleton sample ksh program illustrates the basic steps required. (The sample is for ksh, but a very similar approach can be used for perl programs.)

   #!/bin/sh . batch_modules
   # Trivial step to put something in the log.

   function STEP10 {
      print_to_log "Executing first step"
      # by default the last command executed will be the
      # return code of the function.
   }

   # Trivial step to compress a file; we will have executor
   # handle our error if mydir is not readable
   function STEP20 {
      if [ -d mydir ]; then
         compress mydir/myfile
      else
         # explicitly return an error
         return 8
      fi
   }

   # evaluate our functions in the specified order
   executor

   # the programmer can put any cleanup code here

   # call the BRTE function to do final cleanup and exit
   end_run 0

The program first calls in a standard BRTE library that contains BRTE functions. (This example does not illustrate the use of those functions.) Next the program defines the steps of the business logic as individual functions. Finally the function executor is called. This function will call the step functions in order.

A step can be designated restartable by the programer. This means that the program can be restarted in that step directly provided any preceding steps have executed successfully.

If the environment variable STARTSTEP contains an integer value corresponding to a valid function step, executor will start with that step and proceed on. This allows for a restart procedure. Since each step is a unit of work, then one can restart a program at the designated step by first setting the environment variable before starting the program.

While steps can be designated as restartable, there is no automatic roll-back capability. The programmer must either provide roll-back in the steps definition or the step must assume that it can completely redo its work and that the preceding step completed successfully.

Invoking batch_modules begins all required logging and prep work. executor is responsible for calling the steps in the proper order and calling the error handler if the step returns a non-zero value. Any batch_module call will call error handling routines when appropriate. The function end_run is responsible for final cleanup and exit of the batch script.

Additional helper functions are perfectly legal. However, any business logic must be executed within the scope of a step. It is possible for multiple steps to use the same helper function.

Error handling

By calling the batch_modules the batch script automatically has defined for it a set of functions for doing standard type activities such as loading tables, reconciling, executing SQL commands, etc. These functions all do appropriate error handling and logging functions.

However, if the developer needs to execute commands or capabilities not supported by BRTE, then the developer is still responsible for appropriate error handling and logging. In this case the developer can still call a generic function, do_command. This function takes a command with all appropriate arguments as one string. It will execute the command and redirect output to the log. It also assumes that the command will return a non-zero value if any errors are detected.

If the developer wants even finer control of the command, the developer can access the error handling functions directly. However, it will still be the responsibility of the developer to make sure that all logging and error handling is done.

Manipulating the logs directly or re-implementing the error handling routines is strongly discouraged. These functions offer consistency across programs for support and re-usability. In addition some global behavior changes can be performed if all programs are using the same interface. If a developer re-implements the modules then that developer may find his/her program breaking in the future. In many cases the manipulation of logs could also result in the automated scheduling package incorrectly or unable to provide accurate information. If additional functionality is required, it would be better for the developer to work with the proper support staff to add the functionality to BRTE.

Script and program location

Each application should obtain a three character or less System code assigned by the IARS Business Naming group, this is also needed for database, table naming and other standards. Once the System code is obtained the developer must request that the Security group create a system account, system group and home directory for the application. The developer will need to let Security know what UIS employees are to be members of the system group for access purposes. Batch scripts are executed under this system account. While development is typically done under a developer's account, SPDE coordinates the access by the system group and deployment into the system account.

Once the system home directory is created the developer should create the following directories under the home directory, batch, restart, and redschedule. Batch will contain application scripts utilized for business processes, the restart and redschedule directories will be used more in testing and production environments. To organize the actual programs and processes that the script executes you may also have sub-directories under the batch directory. Restart is the location of scripts that need to be re-ran after that script's job abended. The redschedule directory is the location of scripts that are to run once for a special purpose.

At the appropriate time the Security group needs to be contacted to make sure the environment is properly defined for the test and production environments. For instance the Security group will create the system home directories for both test and production environment. Developers will create the required sub-directories indicated above for the test and development environments. The Security group along with Production Services will create those sub-directories for the production environment.

Each system will have a source control environment assigned to it. This is the role of SPDE. All development of application batch code and scripts should be done in this environment. This becomes the coordination point for migrating code from development to test to production. It provides for proper version control so that quality is preserved. Application code and associated elements such as DDL should not be stored or developed outside of the source development environment.

Batch scheduling and process requirements

All batch jobs should be written as such that they do not need to run each and everyday. The business logic of the application should be such that the batch jobs can run to retroactively catch up data that was processed in the application or feed by other processes from prior days. Although the removal of a batch schedule or the failure to run for a particular day or days should be the exception instead of the rule the design should be such to allow for that situation.

There should be a defined schedule establish for each particular batch job. Once that schedule is defined it becomes the standard for the application. Schedules should not change in an adhoc fashion. Example: A particular job is scheduled to run Monday through Saturday, this is the established schedule for the process. There should not be requests to periodically pull the job from the schedule on Tuesday. If the job should not run on Tuesdays its established schedule should be Monday, Wednesday through Saturday.

Testing of Batch Processes

When the time comes to perform testing of batch processes the execution of test should replicate production execution as much as possible. When items are being developed this is not always necessarily true but for system test and testing interfaces to other application mimic of the production set up is critical to validate your application. An example would be in testing a feed from your system to another application you should utilize your test environment and the other systems test environment to validate all aspects of the process. This would include the actual methods utilized for passing the data between systems and to verify how that process should and would work in a production setting including the protocol and physical platforms to be used. Extracting the data from your system, loading on your desktop machine and then passing the data to the receiving system would not replicate the real system integration you would be utilizing in production.

Event notification

In the case of an error or unexpected event, BRTE provide a mechanism for notification of such. The event notification mechanism will include information about what program is sending the notification and when.

Exactly who is notified and how is handled by assigning the script an error class. The programmer does this within the program. The programmer is required to set an initial error class. The programmer can change the error class for each step. This allows the programmer to decide if certain points in the program execution are more or less critical than others.

The table below denotes the various valid error class values and their meaning.

ERRCLASS Value	Description	Paged	Emailed
000	Condition clear. No action or notification required.	N/A	N/A
002	Warning condition. No action required. Notification is required.		$EMAIL_LIST
004	Warning condition. No action required. Notification is required. Standard notification is to send mail to operations.		OPERATORS, $EMAIL_LIST
006	Warning condition. No action required. Notification is required. Standard notification is to send mail to operations.		OPERATORS, $EMAIL_LIST
008	Critical condition. Action and notification is required. Standard notification is mail to operations as well as page operations.	OPERATORS, $PAGER_LIST	OPERATORS, $EMAIL_LIST
009	Critical default condition. Programmer should set an error class for the script. Failure to do so will result in the script having this errorclass. It is treated as a critical error like 008.	OPERATORS, $PAGER_LIST	OPERATORS, $EMAIL_LIST

In addition to the error class listed above, there is a special built in error handling in the event that BRTE detects internal errors such as an invalid error class value. This special error handling is known as a panic event. BRTE will panic when fundamental problems are detected like wrong number of arguments to BRTE functions. A panic will try to mail operators, page operators, page, mail to the system account and print to the standard out.

Documentation

Every batch script must have proper documentation. This documentation comes in three forms:

Overall batch script documentation (internal to processes).
Step documentation (internal to processes).
Operator documentation (external to processes).

Overall program documentation

This documentation takes the form of a header for the script. It should include the following components:

Script Name: (Must be unique to the application.)
Description: (Explanation of the function of this particular process.)
Constraints: (Specify assigned values and declared procedures.)
Successors: (Identify any processes that must execute after this process is complete.)
Predecessors: (Identify any processes that must complete successfully prior to the execution of this particular process.)
Schedule: (Frequency of this particular job, i.e. days of the week, day of the month, time, etc.)
Inputs: (List files or tables that this program expects to use. )
Outputs: (List file or table that this process produces.)
Parameters: (List the set of parameters that the program may accept. If command line, make sure defaults are specified or which are mandatory. If a configuration or parameter file is required, list the location and format. If environment variables outside the standard BRTE variables, can affect the execution of the program, these must be listed and their affect.)
Exit Status: (Define exit status values and descriptions.)

Step documentation

Each step should have additional information at the top of the function definition. This should include the following information:

Description: (Describe the function of this particular step.)
Restartable: (Indicate whether this step is a safe restartable point for the program.)
Restart assumptions: (List the assumptions that this step makes in case the program is restarted in this step.)
Failure scenarios: (List likely reason why this step may have failed and suggestions for restarting.)

Operator Documentation

This is external documentation that is submitted to Production Services. The following is a list of the items that this documentation should include:

Flowchart: (Shows process flow within the application as well as between applications.)
Schedule: (Job names, days of the week, month or year when expected to run, and time.)
Predecessors and Successors: (Jobs expected to proceed or follow this process's execution.)
Node/Platform: (Specify the node(s)/platform(s) on which the program is expected to execute.)

Naming Standards

All batch scripts need to conform to common naming standard to make support easier. All batch scripts are lower case. The name needs to be unique within the application. Naming standards apply to batch scripts only. Directories, other than those specified by BRTE, can have any name they prefer.

Every batch script should start with the 2-3 character application identifier. (Examples are ps, mms, dss, etc.) The application identifier is followed by an underscore, '_'.

Script names should not exceed 40 characters. The names utilized should be descriptive but use standard abbreviations where possible. Staff are encouraged to keep the names as short as possible. For jobs performing such activities as loads, extracts, data being sent to another application or processing data from application use the words load, extract, out (data being sent from your application to another) and, in (processing data from another system) at the end of your process name prior to language identifier. This helps in doing support so staff will know there are other jobs or process to check that ran prior or after a particular process if a problem has occurred.

Each program should end with a identifier that denotes the language. The language identifier extension should be proceeded by a period, '.'. Binary objects that do not require a interpreter or run-time compiler should not have a language identifier extension. The recognized identifiers at this time are:

Language	Identifiers
perl	pl
ksh	sh
SQL (transact-sql and PL/SQL)	sql
SQR	sqr
awk	awk
tcl	tcl

Each program should also have descriptive segments of the name in action-object form. Examples of actions are load, extract, and build. Examples of objects are table names, file names, or processes such as Oracle or Sybase servers.

Reconciliation

Batch processes passing data between applications should contain reconciliation. Reconciliation is a method to validate that the data sent was also the data received. The reconciliation can came in two forms, total record counts and if dealing with data that contains dollar amounts the total dollars sent and received should be validated.

Abstract system definition

BRTE works by defining an abstract system definition(ASD). That is to say, environment variables are used to define resources that are used by all batch jobs running. These resources are defined in such a way that for most situations, the developer can assume his/her programs are the only ones running on the node. The developer need only be concerned with contention between programs within the application.

Application Environment

The ASD is built in layers. Each subsequent layer relies on the definitions in the layer before it. There has to be a base layer of definitions though. This base layer is the brte_profile file for the application production/test ID. The brte_profile is invoked by by the call to batch_modules. The file is located in the home directory for the application ID. It is best referenced by $HOME/brte_profile.

System ID's

Every system has at three system ID's and a group assigned to it. The system ID's and group names are based upon the system's 2-3 character identifier, i.e. ps, dss, mms, etc. The ID's are of the form XXXdev, XXXtest, XXXprod and the group name is just XXX. Each of these ID's has its own home directory. The src ID is used by the Automated Integrated Testing Service (AITS). The test and prod ID's though can actually have BRTE implementations. If they do then certain requirements will have to be met by the home directories.

System Defined

The root environment variable that must be defined is the SYSTEMNAME variable. This contains the 2-3 character application identifier mentioned above. This is set in $HOME/brte_profile of the system ID. This variable is used to create other meaningful variables defined in additional layers. It is also used by the batch_modules to perform basic functions in the correct way. The SYSTEMNAME variable is marked immutable at definition time.

Additional environment variables that are application specific should be assigned here as well. These are left to the design of the application owner. Some programs in the application may not use these particular variables. The developer must make a decision about whether variables of limited use would be best defined in the $HOME/brte_profile or in each program. The developer should consider the fact that defining additional variables to the environment is inexpensive and can always be overridden by the program. The following are some typical application specific variables, some of which are required by certain BRTE functions. Others are required by certain tools or utilities.

Variable Name	Description
FPATH	Path to search for autoload functions in the ksh language.
SYSTEMNAME	Required by BRTE. Denotes the application identifier. Used by nearly all other layers of BRTE.
ORACLE_HOME	Denotes the base home directory for the Oracle instance that the application uses. Required for some functions in batch_modules.
SYBASE	This denotes the default location to find Sybase interface file and various utilities. Required for some functions in batch_modules.
SQRDIR	The default location of SQR utilities. This variable is used by SQR utilities.
PERL5LIB	Specifies location of additional perl modules that the program may use.
SYSBIN	Location of application specific binaries. The files could actually be links to older versions of the programs that are still on the system but are considered non-standard versions.

Environment definitions

Environment definitions forms the next layer of the BRTE. These are BRTE definitions that exist for all applications. However the values in assigned to each variable will likely be different for each application. For instance, the $WORKDIR could contain the value '/staging/dss' for the DSS application but would contain '/staging/mms' for the MMS application. The following is a list of the BRTE environment variables:

Variable Name	Description	Read Only
BRTEBIN	The location of BRTE runtime utilities such as batch_modules.	YES
CURTIME	This denotes the current time. Its value will likely change throughout the execution of the program since it is used heavily by the batch_modules.
EMAIL_LIST	The comma delimited list of additional e-mail recipients to be notified if an error occurs.
ERRCLASS	This denotes the level of severity if an error occurs with this program. Currently there are four levels - 002 Warning, 004 Warning, 006 Warning, and 008 Error. This should be reset by the developer. By default it will be 009 Undefined.
ERRMESG	This contains a default message to be communicated in the event of an error. The developer is encouraged to change this and in fact must change this when calling specifying a operator or pager notification method.
EXECTIME	This is an immutable variable denoting the original time that this script was executed.	YES
FPATH	The path to search for autoload functions. BRTE pre-appends its own directory to any FPATH created in the brte_profile.	YES
LOGDIR	This is a directory for program log output. Each job run of the program should create its own log in this location. This is done automatically by the batch_modules. The files older than 31 days are deleted from this location. (Files are maintained on backup tapes for 6 months after they are deleted from this location.)	YES
OPMAIL	Operators mail address	YES
OPPAGE	Operators pager address.	YES
PAGER_LIST	The comma delimited list of additional pager addresses to be notified if an error occurs.
PATH	The default path searched for commands that do not have an absolute path specified. This can be redefined by the developer within the program.
PROGRAMDIR	Location of the batch programs for the application. This is actually can be the root of a whole directory tree. It is up to the application developer to decide how complex this tree needs to be.
REDSCHED	This is the directory from which redschedule programs are executed. In addition, modified copies of scripts used for restarts should be placed in this directory. Files older than 30 days in this directory are deleted, but are retained in the backup system for another 90 days.	YES
SCRIPT_NAME	This is an immutable variable denoting the base name of the program. It contains no reference to the directory in which the program resides.	YES
SCRIPTLOG	This is an immutable variable containing the fully qualified name of the log for the current execution of the program. The file name is of the form: program_name:YYYYMMDD:HH:MM:SS:PID	YES
SCRIPTPID	Process ID of the script as it is currently running.	YES
SECUREDIR	Location of security maintained files.	YES
STAGINGDIR	This is a directory for input or output from the program that is for other steps or programs. Files older than seven days are deleted from this location.	YES
STATUS	This immutable variable denotes whether the program is executing in a production or test environment. The code in batch_modules is actually different between production and test systems when it comes to assigning this value.	YES
WORKDIR	This is a directory to which an application can do temporary IO. Files older than 24 hours are deleted from this location. Generally only IO needed within a step should occur here.

Critical locations

Several critical locations are defined by the BRTE. These include such things as location of batch_module components, logging directories, and locations of other programs. Listed here are the specific locations. That is this the implementation version of ASD. This is for informational purposes only. All references from within programs should be done via the corresponding variables. The representative variables are listed with each location.

Programs

Here are the major program locations and the variables to access them.

Location	Variable Name	Description
$HOME/batch	$PROGRAMDIR	Location of the batch programs used by the application. This can be the root of a whole directory tree of programs used by the application.
$HOME/bin	$SYSBIN	Location of application specific binaries. The files could actually be links to older versions of the programs that are still on the system but are considered non-standard versions.
/opt/brte	BRTEBIN	The location of BRTE runtime utilities such as batch_modules.

Re-usable modules

A set of re-usable modules have been written to assist developers. These increase support, reduce development time, and increase overall stability of the applications. In BRTE these are invoked by the inclusion of the batch_modules file. This file defines the functions that are usable by any program running in the BRTE. The batch_modules also does some other basic setup things, so it should only be called once by a program. Consult the BRTE Program Modules documentation for specific details of these functions. Below is a general listing of what modules are available.

Environment Variables - Many of the BRTE variables are initialized in this module. It contains such things as the location of working directories and such.
Error Handling - This module contains various functions to check return codes automatically for other functions or programs. It dispatches errors and utilizes the Notification Module to alert people of the problems.
Logging - This module contains functions to assist with the creation and updating of the log. This is used extensively by other modules.
Connection Module - This module contains many different functions for connecting to very services to extract or load files or information.
Signals - This module defines a set of signal handlers so that the programs can take appropriate action if they are killed, suspended, resumed, etc.
Other miscellaneous modules - There are a few other modules that implement miscellaneous functions such as time stamping, monitoring other programs, controlling execution, etc.

Logging

All logging should be accessed through the logging module and where necessary the SCRIPTLOG variable. However, this variable will point to a file located in /var/brte/$SYSTEMNAME/ directory. Any log files older than 31 days are automatically removed. However, the ADSM backup system will maintain the files for an additional 6 months after the files have been deleted from the system.

The directory /var/brte is actually the mount point for a filesystem. Runaway logs may interfere with other programs executing in that it may fill up the logging filesystem. If this becomes a significant problem, there are a couple ways of imposing restrictions on particular application ID's to prevent them from filling up this space. In any case, the filesystem is sized adequately for even heavy log usage.

Working storage

There are two types of working storage referenced by WORKDIR and STAGEDIR.

WORKDIR points to /tmp. Files older than 24 hours are deleted from this directory. This should be used for IO required within a step of a program. This directory is not backed up.

STAGEDIR points to /staging/$SYSTEMNAME. Generally, step-to-step IO or program-to-program IO should use this space. Files older than 7 days are deleted from this location. Files deleted from here can are maintained for up to 30 days in the backup system.

Security

For connection to other systems, authentication mechanisms need to be employed. This is currently done by creating files inside a secure location that contains the authentication information. These files are only readable by the system ID's. They are located in the directory /opt/sa_forms/pass/. Note this directory may not be executable, i.e. one may not be able to list the files in this directory even as the application ID. Generally the developers will not access these files directly. They will be accessed by the connection modules.

Scheduling

The developers must pass to Production Services the particular scheduling requirements for the job using TNG Job Scheduling Request Form or spreadsheet. All jobs will be scheduled through Unicenter TNG. http://www.indiana.edu/~ucschmgt/iu-only/tngjsrf.html

Stopping

Need to be able to stop jobs. This means we should write a tool that can walk a process tree and kill from the bottom up. It should execute trappable kills first so that programs have an opportunity to quit gracefully. Also need a way to walk the process tree right into Sybase or Oracle.

Restart

Restart procedures for batch jobs in Unicenter/TNG:

In production, developers have write access to the restart directory.

Follow these steps to restart a script or set of scripts in the production environment:

Sign on to the production node as yourself and stay signed on as yourself.
Copy the scripts to be restarted from the batch to the restart directory using the following command: cp /home/local/ applbatchid/batch/scriptname /home/local/applbatchid/restart/scriptname. If you need to change a program that is called by the script you are restarting, you must also copy the program into the restart directory. Based on how you have your scripts and programs structured, changes to the pathnames may be required in your calling script or your called programs. When you copy the scripts and programs to the restart directory make sure the group execute bit is set on the file to ensure the batch id will be able to execute the script. If it is not, enter chmod 754 at your Unix prompt. Below is an example of the permission bits set the correct way:

-rwxr-xr-- 1 cstine sis 1458 Sep 20 20:43 sisadm_sy_iumcsqldel.sh
Create a file called restart.config with one line for each script that will be restarted. Type vi restart.config at the Unix prompt and place an entry in this new file for each script being restarted based on the examples below:

We want to restart two scripts, foo_process_input_tables.sh and foo_reports.sh.

For foo_process_input_tables.sh, we want to start at step 20 and continue from there.

Sample restart.config file for the above example:

foo_reports.sh

STARTSTEP=20 foo_process_input_tables.sh
Call operations and have them demand in the application restart job. The standard for the restart jobs is systemnamerestart. For example, fisrestart, sisrestart.
If the restart job finishes successfully then the operators will CANCEL the original job that abended so that the schedule can continue.

Restart rules:

a. All files in restart will be copied to an audit location and executed from there.

b. The restart job will copy the restart.config file to restart.backup and then delete the restart.config file. This will allow additional restarts to be run and avoid duplicate restarts at the same time for the same system code. The owner of the restart.config file can be used to identify who is using the restart script if another developer is attempting to use it at the same time.

c. The restart job has it�s own log in /var/brte/ITSO which developers can review in case problems occur with the restart.

d. Scripts will have their own logs in /var/brte/systemname location.

e. Parameters can be specified in the line of the restart.config file.

f. Scripts are run in sequential order as specified in the restart.config file.

g. Scripts can�t be run in parallel during a restart.

h. Each script must return a zero value on exit or the restart job will abend with an error 8 at that point.

i. You must copy any scripts or programs from the restart directory by 1:00 PM the following day if you want to keep the changes made. The restart directory is automatically cleaned out at 1:15 each day by a job in Unicenter/TNG.
Special Jobs
RED SCHEDULE is used for a one time special process such as a batch execution to correct a problem or update files outside of the normal process and is not intended to become a regularly scheduled production job. In production BRTE, these jobs will be moved into the $REDSCHED directory by Production Services. They will then be executed by Production Services under the production application ID. Developers can test execution of these programs in the $REDSCHED directory for the test BRTE. The $REDSCHED is application specific. Files are automatically deleted from this directory the next morning.