---
gitea: none
include_toc: true
---

# PerfSPEC Learning Phase

Based in [PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers](https://ieeexplore.ieee.org/document/10577533) document presented in [1], this repository contains source files used to generate and process data.

- Main Reference: [PerfSPEC reference document](PerfSPEC.pdf) as [White paper](https://en.wikipedia.org/wiki/White_paper) 
- [Presentación in Spanish](presentacion.pdf)
- [How to install](https://repo.jesusperez.pro/jesus/perfspec-learning/src/branch/main/install.md) covers basic enviroment,tools, and recommendations.

<div style="margin: auto">
 <a target="_blank" href="perfspec-learning/src/branch/main/presentacion.pdf"><img src="imgs/perfSPEC-learning.png" width="800"></a>
</div>

__PerfSPEC__ 

>[!IMPORTANT]  
With `PerfSPEC` [Security Policies](https://en.wikipedia.org/wiki/Security_policy) can be managed / watched in **Proactive** mode by using <u>ranking</u>, <u>learning</u> and <u>profiles</u> for safetiness, performance and resource costs

It has three phases:

- Ranking
- Learning
- Runtime

This repository is focused in __Learning__ phase with attention on:

- Event logs, info load and process
- Predictive learning model 

There are additional documents to this:

- [Quick start](installation.md) and installation
- [Intro](intro.md) about why and what is done
- [About](about.md) goals and experiences
- [Presentation in Spanish](presentacion.pdf) slides to explain process and enviroment

> [!NOTE]
> It is considered that __event data collection__ in `raw-audit-logs.log.xz` are realistic and representative to simulate
administrative operations.

## Files

### Data 

- `raw-audit-logs.log` contains raw Kubernetes audit logs collected using the `audit-policy.yaml` audit policy.

### Layout 

Tools are distributed in directories:

- [Collect](collect)
- [Process](process)
- [Learning](learning)

Content structure overview with notes 
<pre>
    ├── PerfSPEC.pdf                   Reference document
    ├── README.md
    ├── about.md
    ├── actions_distribution.pdf       Generated actions distribytion
    ├── collect                        Collect logs scripts 
    ├── data                           Extracted from compress archive 
    ├── data_sample.tar.xz             Compress archive with 'data'
    ├── imgs
    ├── full_content_layout.md         Full content layout
    ├── html                           HTML download for notebooks
    ├── install.md                     Installation notes
    ├── intro.md
    ├── learning
    ├── models                         Extracted from compress archive
    ├── models_sample.tar.xz           Comperss archive with 'models'
    ├── presentacion.pdf               Presentation slides
    └── raw-audit-logs.log.xz          Main Raw Logs file
 </pre>   

A [full directory layout](full_content_layout.md) is available.

As some tasks can be used in [Python](https://python.org) or [Rust](https://www.rust-lang.org/) there are or will be directories for each programming languge inside directories tasks.  

Each `task/programming-language` use a common __data__ directory where processing output files is generated. 

## Collect data

If you wish to [collect](collect) your own dataset, there are several source files that might help:

- `collect/audit-policy.yaml` is for [Kubernetes](https://kubernetes.io/) event logs capture, other resources are also required: [adminssion controllers](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/), etc
- `collect/collect.py` is a script to trigger the installation and uninstallation of public Helm repositories.
- `collect/helm-charts.json` is a backup of Helm charts used at the time of the collection.

## Process data 

`data/raw-audit-logs.log`        Raw logs captured from Services
`data/main-audit-logs.log`       Data logs fixed and clean
`data/actions-dataset-audit.txt` Source content for learning models

`data/actions_distribution.pdf`  Autogenerated graph view of actions and events distribution 

## Data Models 

> [!CAUTION]  
> These files are default names and paths, can be changed:
> - by [settings](learning/python/lib_perfspec.py) modifications
> - by <u>command-line</u> in running  script mode. Add **--help** for more info 

`models/checkpoints` is where files are stored as part of learning process:

<pre>
├── checkpoints
    │   ── model_at_epoch_175.keras
    └── model_at_epoch_185.keras
</pre>

`models/perfSPEC_model.keras` is the generated model by default 
`models/history.json` is model history with stats 

## Learning Notebooks

[lib_perfspec.py](learning/python/lib_perfspec.py)  Main library with settings 

[prepare_perfspec.py](learning/python/prepare_perfspec.py) Prepare data from raw to source for learning models

[train_perfspec.py](learning/python/train_perfspec.py) To train model from data

[run_perfspec.py](learning/python/run_perfspec.py)   To run/check predictions

[model_perfspec.py](learning/python/model_perfspec.py) To inspect / review generated models

<small>  __ pycache __ is for Python execution,  is  ignored in git tasks.</small>

## HTML Notebooks

Notebooks downloaded as HTML with code (no data is includes in this mode only output):

[prepare_perfspec.html](html/prepare_perfspec.html) Prepare data from raw to source for learning models

[model_perfspec.html](html/model_perfspec.html) To inspect / review generated models

## Reference

[1]: [H. Kermabon-Bobinnec et al., "PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2024.3420712.](https://ieeexplore.ieee.org/document/10577533)