![]() |
Intorduction The Batch RunTime Environment (BRTE) defines a set of standards and interfaces to which all batch jobs associated with UIS maintained production applications must conform. It specifies locations of resources as well as programming standards that need to be adhered to. It is designed to interact very closely with SPDE. GoalsThe BRTE strives to accomplish the following goals:
BRTE delivers most of these goals through three main components - a set of programming standards, consistent resources provided through a abstract system, and a library of functions for common tasks. The programming standards specify a general program flow and guidelines for how programs should execute. It does have some requirements that all programs must follow. In addition it specifies a minimal standard of documentation that must be adhered to when the code is deployed in the environment. Example programs exist to demonstrate these standards. Indeed these examples can be copied and used as the a skeleton for new batch programs. A set of resources, such as standard places to find or work on files, is defined. This is done through the use of environment variables. Whenever the application requires to access resources it must do so through these environment variables. These variables can be thought of representing an abstract system in which the batch job is being executed. By changing the values of these environment variables one can enable the batch job to operate in an arbitrary physical system. This enhances the flexibility of the program as well as enabling better development and testing standards. The third component of the BRTE is the set of re-usable code libraries. These libraries provide common services such as logging, error handling, event notification, and connections to other services. These modules are available for the two common languages in which most batch jobs are written ksh and perl. These libraries allow developers to more quickly develop programs as well as enhancing supportability of the program. StandardsThe following standards have been defined to ensure that all programs are supportable both by operations staff and other developer's either working in the same application or across applications. Program flowAll batch scripts should be defined as a series of transactional steps. Each step should represent a unit of business logic. Except in the most trivial cases, the actual business logic will not be implemented in the ksh, but will be implemented in some other fashion - a compiled program, perl script, or SQL statement for instance. The batch script makes an initial call to the BRTE libraries which initializes the BRTE environment. The batch script will consist mainly of functions defining each step. Then there is a call to a function that determines the order of function evaluation and then evaluates them in that order. Then cleanup work is performed.
The call to the
The steps are written as functions. An
If an error occurs during the step, then appropriate error codes need to be returned. This may be done via the
error handling module or may be done explicitly in the function. It is legal for the error handler to exit the
program without returning to the The following skeleton sample ksh program illustrates the basic steps required. (The sample is for ksh, but a very similar approach can be used for perl programs.)
#!/bin/sh . batch_modules
The program first calls in a standard BRTE library that contains BRTE functions. (This example does not illustrate the
use of those functions.) Next the program defines the steps of the business logic as individual functions. Finally the
function A step can be designated restartable by the programer. This means that the program can be restarted in that step directly provided any preceding steps have executed successfully.
If the environment variable STARTSTEP contains an integer value corresponding to a valid function step,
While steps can be designated as restartable, there is no automatic roll-back capability. The programmer must either provide roll-back in the steps definition or the step must assume that it can completely redo its work and that the preceding step completed successfully.
Invoking Additional helper functions are perfectly legal. However, any business logic must be executed within the scope of a step. It is possible for multiple steps to use the same helper function. Error handling
By calling the However, if the developer needs to execute commands or capabilities not supported by BRTE, then the developer is still responsible for appropriate error handling and logging. In this case the developer can still call a generic function, do_command. This function takes a command with all appropriate arguments as one string. It will execute the command and redirect output to the log. It also assumes that the command will return a non-zero value if any errors are detected. If the developer wants even finer control of the command, the developer can access the error handling functions directly. However, it will still be the responsibility of the developer to make sure that all logging and error handling is done. Manipulating the logs directly or re-implementing the error handling routines is strongly discouraged. These functions offer consistency across programs for support and re-usability. In addition some global behavior changes can be performed if all programs are using the same interface. If a developer re-implements the modules then that developer may find his/her program breaking in the future. In many cases the manipulation of logs could also result in the automated scheduling package incorrectly or unable to provide accurate information. If additional functionality is required, it would be better for the developer to work with the proper support staff to add the functionality to BRTE. Script and program locationEach application should obtain a three character or less System code assigned by the IARS Business Naming group, this is also needed for database, table naming and other standards. Once the System code is obtained the developer must request that the Security group create a system account, system group and home directory for the application. The developer will need to let Security know what UIS employees are to be members of the system group for access purposes. Batch scripts are executed under this system account. While development is typically done under a developer's account, SPDE coordinates the access by the system group and deployment into the system account. Once the system home directory is created the developer should create the following directories under the home directory, batch, restart, and redschedule. Batch will contain application scripts utilized for business processes, the restart and redschedule directories will be used more in testing and production environments. To organize the actual programs and processes that the script executes you may also have sub-directories under the batch directory. Restart is the location of scripts that need to be re-ran after that script's job abended. The redschedule directory is the location of scripts that are to run once for a special purpose. At the appropriate time the Security group needs to be contacted to make sure the environment is properly defined for the test and production environments. For instance the Security group will create the system home directories for both test and production environment. Developers will create the required sub-directories indicated above for the test and development environments. The Security group along with Production Services will create those sub-directories for the production environment. Each system will have a source control environment assigned to it. This is the role of SPDE. All development of application batch code and scripts should be done in this environment. This becomes the coordination point for migrating code from development to test to production. It provides for proper version control so that quality is preserved. Application code and associated elements such as DDL should not be stored or developed outside of the source development environment. Batch scheduling and process requirementsAll batch jobs should be written as such that they do not need to run each and everyday. The business logic of the application should be such that the batch jobs can run to retroactively catch up data that was processed in the application or feed by other processes from prior days. Although the removal of a batch schedule or the failure to run for a particular day or days should be the exception instead of the rule the design should be such to allow for that situation. There should be a defined schedule establish for each particular batch job. Once that schedule is defined it becomes the standard for the application. Schedules should not change in an adhoc fashion. Example: A particular job is scheduled to run Monday through Saturday, this is the established schedule for the process. There should not be requests to periodically pull the job from the schedule on Tuesday. If the job should not run on Tuesdays its established schedule should be Monday, Wednesday through Saturday. Testing of Batch ProcessesWhen the time comes to perform testing of batch processes the execution of test should replicate production execution as much as possible. When items are being developed this is not always necessarily true but for system test and testing interfaces to other application mimic of the production set up is critical to validate your application. An example would be in testing a feed from your system to another application you should utilize your test environment and the other systems test environment to validate all aspects of the process. This would include the actual methods utilized for passing the data between systems and to verify how that process should and would work in a production setting including the protocol and physical platforms to be used. Extracting the data from your system, loading on your desktop machine and then passing the data to the receiving system would not replicate the real system integration you would be utilizing in production. Event notificationIn the case of an error or unexpected event, BRTE provide a mechanism for notification of such. The event notification mechanism will include information about what program is sending the notification and when. Exactly who is notified and how is handled by assigning the script an error class. The programmer does this within the program. The programmer is required to set an initial error class. The programmer can change the error class for each step. This allows the programmer to decide if certain points in the program execution are more or less critical than others. The table below denotes the various valid error class values and their meaning.
In addition to the error class listed above, there is a special built in error handling in the event that BRTE detects internal errors such as an invalid error class value. This special error handling is known as a panic event. BRTE will panic when fundamental problems are detected like wrong number of arguments to BRTE functions. A panic will try to mail operators, page operators, page, mail to the system account and print to the standard out. DocumentationEvery batch script must have proper documentation. This documentation comes in three forms:
This documentation takes the form of a header for the script. It should include the following components:
Each step should have additional information at the top of the function definition. This should include the following information:
This is external documentation that is submitted to Production Services. The following is a list of the items that this documentation should include:
All batch scripts need to conform to common naming standard to make support easier. All batch scripts are lower case. The name needs to be unique within the application. Naming standards apply to batch scripts only. Directories, other than those specified by BRTE, can have any name they prefer. Every batch script should start with the 2-3 character application identifier. (Examples are ps, mms, dss, etc.) The application identifier is followed by an underscore, '_'. Script names should not exceed 40 characters. The names utilized should be descriptive but use standard abbreviations where possible. Staff are encouraged to keep the names as short as possible. For jobs performing such activities as loads, extracts, data being sent to another application or processing data from application use the words load, extract, out (data being sent from your application to another) and, in (processing data from another system) at the end of your process name prior to language identifier. This helps in doing support so staff will know there are other jobs or process to check that ran prior or after a particular process if a problem has occurred. Each program should end with a identifier that denotes the language. The language identifier extension should be proceeded by a period, '.'. Binary objects that do not require a interpreter or run-time compiler should not have a language identifier extension. The recognized identifiers at this time are:
Each program should also have descriptive segments of the name in action-object form. Examples of actions are load, extract, and build. Examples of objects are table names, file names, or processes such as Oracle or Sybase servers. ReconciliationBatch processes passing data between applications should contain reconciliation. Reconciliation is a method to validate that the data sent was also the data received. The reconciliation can came in two forms, total record counts and if dealing with data that contains dollar amounts the total dollars sent and received should be validated. Abstract system definitionBRTE works by defining an abstract system definition(ASD). That is to say, environment variables are used to define resources that are used by all batch jobs running. These resources are defined in such a way that for most situations, the developer can assume his/her programs are the only ones running on the node. The developer need only be concerned with contention between programs within the application. Application EnvironmentThe ASD is built in layers. Each subsequent layer relies on the definitions in the layer before it. There has to be a base layer of definitions though. This base layer is the brte_profile file for the application production/test ID. The brte_profile is invoked by by the call to batch_modules. The file is located in the home directory for the application ID. It is best referenced by $HOME/brte_profile. System ID'sEvery system has at three system ID's and a group assigned to it. The system ID's and group names are based upon the system's 2-3 character identifier, i.e. ps, dss, mms, etc. The ID's are of the form XXXdev, XXXtest, XXXprod and the group name is just XXX. Each of these ID's has its own home directory. The src ID is used by the Automated Integrated Testing Service (AITS). The test and prod ID's though can actually have BRTE implementations. If they do then certain requirements will have to be met by the home directories. System DefinedThe root environment variable that must be defined is the SYSTEMNAME variable. This contains the 2-3 character application identifier mentioned above. This is set in $HOME/brte_profile of the system ID. This variable is used to create other meaningful variables defined in additional layers. It is also used by the batch_modules to perform basic functions in the correct way. The SYSTEMNAME variable is marked immutable at definition time. Additional environment variables that are application specific should be assigned here as well. These are left to the design of the application owner. Some programs in the application may not use these particular variables. The developer must make a decision about whether variables of limited use would be best defined in the $HOME/brte_profile or in each program. The developer should consider the fact that defining additional variables to the environment is inexpensive and can always be overridden by the program. The following are some typical application specific variables, some of which are required by certain BRTE functions. Others are required by certain tools or utilities.
Environment definitions forms the next layer of the BRTE. These are BRTE definitions that exist for all applications. However the values in assigned to each variable will likely be different for each application. For instance, the $WORKDIR could contain the value '/staging/dss' for the DSS application but would contain '/staging/mms' for the MMS application. The following is a list of the BRTE environment variables:
Several critical locations are defined by the BRTE. These include such things as location of batch_module components, logging directories, and locations of other programs. Listed here are the specific locations. That is this the implementation version of ASD. This is for informational purposes only. All references from within programs should be done via the corresponding variables. The representative variables are listed with each location. ProgramsHere are the major program locations and the variables to access them.
A set of re-usable modules have been written to assist developers. These increase support, reduce development time, and increase overall stability of the applications. In BRTE these are invoked by the inclusion of the batch_modules file. This file defines the functions that are usable by any program running in the BRTE. The batch_modules also does some other basic setup things, so it should only be called once by a program. Consult the BRTE Program Modules documentation for specific details of these functions. Below is a general listing of what modules are available.
All logging should be accessed through the logging module and where necessary the SCRIPTLOG variable. However, this variable will point to a file located in /var/brte/$SYSTEMNAME/ directory. Any log files older than 31 days are automatically removed. However, the ADSM backup system will maintain the files for an additional 6 months after the files have been deleted from the system. The directory /var/brte is actually the mount point for a filesystem. Runaway logs may interfere with other programs executing in that it may fill up the logging filesystem. If this becomes a significant problem, there are a couple ways of imposing restrictions on particular application ID's to prevent them from filling up this space. In any case, the filesystem is sized adequately for even heavy log usage. Working storageThere are two types of working storage referenced by WORKDIR and STAGEDIR. WORKDIR points to /tmp. Files older than 24 hours are deleted from this directory. This should be used for IO required within a step of a program. This directory is not backed up. STAGEDIR points to /staging/$SYSTEMNAME. Generally, step-to-step IO or program-to-program IO should use this space. Files older than 7 days are deleted from this location. Files deleted from here can are maintained for up to 30 days in the backup system. SecurityFor connection to other systems, authentication mechanisms need to be employed. This is currently done by creating files inside a secure location that contains the authentication information. These files are only readable by the system ID's. They are located in the directory /opt/sa_forms/pass/. Note this directory may not be executable, i.e. one may not be able to list the files in this directory even as the application ID. Generally the developers will not access these files directly. They will be accessed by the connection modules. SchedulingThe developers must pass to Production Services the particular scheduling requirements for the job using TNG Job Scheduling Request Form or spreadsheet. All jobs will be scheduled through Unicenter TNG. http://www.indiana.edu/~ucschmgt/iu-only/tngjsrf.html StoppingNeed to be able to stop jobs. This means we should write a tool that can walk a process tree and kill from the bottom up. It should execute trappable kills first so that programs have an opportunity to quit gracefully. Also need a way to walk the process tree right into Sybase or Oracle. RestartRestart procedures for batch jobs in Unicenter/TNG: In production, developers have write access to the restart directory. Follow these steps to restart a script or set of scripts in the production environment:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||