Distributable Package With Basic Functionality
Objective
Create an installable python package to replace the various standalone shell scripts and notebooks that have been the basis for processing GPFS log data.
Description
Develop a package that combines all of the distinct scripts and notebooks into a more approachable and flexible pipeline for processing GPFS logs. The current layout creates symlinks to binary files that a user can add to their PATH
in order to access. This is needlessly complicated from a design perspective, and the processing and plotting still relies on manually run Jupyter notebooks. Instead, this should be converted to an easily installable package with a modular Python REPL interface paired with supporting CLI functionality that encompass full pipelines. This will accomplish a few things:
- Processing stream becomes modular and approachable. Specific pipeline stages are able to be used in auxiliary jobs without modification
- Code is more easily distributed when available from a package repository
- Code becomes maintainable
Key Deliverables
- Package installable from the gitlab package registry
- CI/CD pipeline for automatic the build, deploy, and versioning of the package
- Package functionality:
- Split and convert raw GPFS logs into a parquet dataset
- Both steps should be executable within the current job or by automatically submitting batch jobs
- Automated aggregation of the parquet dataset
- Includes automatically detecting which style of compute backend to use based on data size, available hardware, and user input
- Compute backend will transparently change exactly how the aggregation is implemented without affecting the computation results
- CLI for convenience functions such as performing a full aggregation including automated backend detection and exporting the results to a file.
- Split and convert raw GPFS logs into a parquet dataset
Submitting Policy Jobs
The functionality for submitting initial policy runs will not be included in the package since those commands require sudo privileges. Instead, that will be moved to a separate repo and included as a sub-project here.