2025-01-27 08:59:18 +00:00
---
gitea: none
include_toc: true
---
2025-02-03 10:37:32 +00:00
# PerfSPEC Learning Phase
2025-01-20 05:51:54 +00:00
2025-01-27 20:51:23 +00:00
Based in [PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers ](https://ieeexplore.ieee.org/document/10577533 ) document presented in [1], this repository contains source files used to generate and process data.
2025-01-20 05:51:54 +00:00
2025-01-27 20:51:23 +00:00
- Main Reference: [PerfSPEC reference document ](PerfSPEC.pdf ) as [White paper ](https://en.wikipedia.org/wiki/White_paper )
2025-01-27 19:24:19 +00:00
- [Presentación in Spanish ](presentacion.pdf )
- [How to install ](https://repo.jesusperez.pro/jesus/perfspec-learning/src/branch/main/install.md ) covers basic enviroment,tools, and recommendations.
2025-01-27 08:21:39 +00:00
2025-01-20 08:26:40 +00:00
< div style = "margin: auto" >
2025-01-27 20:56:50 +00:00
< a target = "_blank" href = "perfspec-learning/src/branch/main/presentacion.pdf" > < img src = "imgs/perfSPEC-learning.png" width = "800" > < / a >
2025-01-20 08:26:40 +00:00
< / div >
2025-01-27 08:21:39 +00:00
__PerfSPEC__
2025-01-27 08:59:18 +00:00
>[!IMPORTANT]
2025-01-27 19:52:49 +00:00
With `PerfSPEC` [Security Policies ](https://en.wikipedia.org/wiki/Security_policy ) can be managed / watched in **Proactive** mode by using < u > ranking</ u > , < u > learning</ u > and < u > profiles</ u > for safetiness, performance and resource costs
2025-01-27 08:21:39 +00:00
2025-01-27 08:59:18 +00:00
It has three phases:
2025-01-20 05:51:54 +00:00
- Ranking
- Learning
- Runtime
This repository is focused in __Learning__ phase with attention on:
2025-01-27 10:05:43 +00:00
- Event logs, info load and process
2025-01-20 05:51:54 +00:00
- Predictive learning model
2025-01-27 10:05:43 +00:00
There are additional documents to this:
- [Quick start ](installation.md ) and installation
- [Intro ](intro.md ) about why and what is done
- [About ](about.md ) goals and experiences
2025-01-27 19:24:19 +00:00
- [Presentation in Spanish ](presentacion.pdf ) slides to explain process and enviroment
2025-01-27 10:05:43 +00:00
2025-01-27 08:59:18 +00:00
> [!NOTE]
> It is considered that __event data collection__ in `raw-audit-logs.log.xz` are realistic and representative to simulate
2025-01-20 05:51:54 +00:00
administrative operations.
## Files
2025-01-20 06:51:02 +00:00
2025-01-27 08:59:18 +00:00
### Data
2025-01-20 05:51:54 +00:00
- `raw-audit-logs.log` contains raw Kubernetes audit logs collected using the `audit-policy.yaml` audit policy.
2025-01-27 08:59:18 +00:00
### Layout
2025-01-20 06:57:26 +00:00
Tools are distributed in directories:
2025-01-20 06:58:41 +00:00
- [Collect ](collect )
- [Process ](process )
- [Learning ](learning )
2025-01-20 06:57:26 +00:00
2025-01-27 09:49:29 +00:00
Content structure overview with notes
< pre >
2025-01-27 08:59:18 +00:00
├── PerfSPEC.pdf Reference document
├── README.md
├── about.md
├── actions_distribution.pdf Generated actions distribytion
├── collect Collect logs scripts
├── data Extracted from compress archive
├── data_sample.tar.xz Compress archive with 'data'
├── imgs
2025-01-28 14:19:32 +00:00
├── full_content_layout.md Full content layout
├── html HTML download for notebooks
2025-01-27 08:59:18 +00:00
├── install.md Installation notes
├── intro.md
├── learning
2025-01-27 17:18:11 +00:00
├── models Extracted from compress archive
├── models_sample.tar.xz Comperss archive with 'models'
├── presentacion.pdf Presentation slides
└── raw-audit-logs.log.xz Main Raw Logs file
2025-01-27 09:49:29 +00:00
< / pre >
A [full directory layout ](full_content_layout.md ) is available.
2025-01-27 08:59:18 +00:00
2025-01-27 08:21:39 +00:00
As some tasks can be used in [Python ](https://python.org ) or [Rust ](https://www.rust-lang.org/ ) there are or will be directories for each programming languge inside directories tasks.
2025-01-20 06:57:26 +00:00
2025-01-27 08:21:39 +00:00
Each `task/programming-language` use a common __data__ directory where processing output files is generated.
2025-01-20 07:53:45 +00:00
2025-01-27 08:59:18 +00:00
## Collect data
2025-01-20 06:51:02 +00:00
2025-01-20 06:57:26 +00:00
If you wish to [collect ](collect ) your own dataset, there are several source files that might help:
2025-01-20 06:51:02 +00:00
2025-01-27 09:49:29 +00:00
- `collect/audit-policy.yaml` is for [Kubernetes ](https://kubernetes.io/ ) event logs capture, other resources are also required: [adminssion controllers ](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/ ), etc
2025-01-20 06:51:02 +00:00
- `collect/collect.py` is a script to trigger the installation and uninstallation of public Helm repositories.
- `collect/helm-charts.json` is a backup of Helm charts used at the time of the collection.
2025-01-27 08:59:18 +00:00
## Process data
2025-01-20 06:51:02 +00:00
2025-01-27 09:54:33 +00:00
`data/raw-audit-logs.log` Raw logs captured from Services
`data/main-audit-logs.log` Data logs fixed and clean
`data/actions-dataset-audit.txt` Source content for learning models
2025-01-27 09:49:29 +00:00
2025-01-27 19:24:19 +00:00
`data/actions_distribution.pdf` Autogenerated graph view of actions and events distribution
2025-01-27 09:49:29 +00:00
## Data Models
> [!CAUTION]
> These files are default names and paths, can be changed:
> - by [settings](learning/python/lib_perfspec.py) modifications
2025-01-27 09:54:33 +00:00
> - by <u>command-line</u> in running script mode. Add **--help** for more info
2025-01-27 09:49:29 +00:00
`models/checkpoints` is where files are stored as part of learning process:
< pre >
├── checkpoints
│ ── model_at_epoch_175.keras
└── model_at_epoch_185.keras
< / pre >
`models/perfSPEC_model.keras` is the generated model by default
`models/history.json` is model history with stats
## Learning Notebooks
2025-01-27 09:54:33 +00:00
[lib_perfspec.py ](learning/python/lib_perfspec.py ) Main library with settings
2025-01-27 09:49:29 +00:00
2025-01-27 09:54:33 +00:00
[prepare_perfspec.py ](learning/python/prepare_perfspec.py ) Prepare data from raw to source for learning models
2025-01-27 09:49:29 +00:00
2025-01-27 09:55:57 +00:00
[train_perfspec.py ](learning/python/train_perfspec.py ) To train model from data
2025-01-27 10:10:10 +00:00
2025-01-27 09:54:33 +00:00
[run_perfspec.py ](learning/python/run_perfspec.py ) To run/check predictions
2025-01-20 06:51:02 +00:00
2025-01-27 10:10:10 +00:00
[model_perfspec.py ](learning/python/model_perfspec.py ) To inspect / review generated models
2025-01-20 06:51:02 +00:00
2025-01-27 09:49:29 +00:00
< small > __ pycache __ is for Python execution, is ignored in git tasks.</ small >
2025-01-20 05:51:54 +00:00
2025-01-28 14:19:32 +00:00
## HTML Notebooks
Notebooks downloaded as HTML with code (no data is includes in this mode only output):
[prepare_perfspec.html ](html/prepare_perfspec.html ) Prepare data from raw to source for learning models
[model_perfspec.html ](html/model_perfspec.html ) To inspect / review generated models
2025-01-20 05:51:54 +00:00
## Reference
[1]: [H. Kermabon-Bobinnec et al., "PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2024.3420712. ](https://ieeexplore.ieee.org/document/10577533 )