PerfSPEC Learning Phase
Go to file
Jesús Pérez Lorenzo 2858c188d3 chore: Fix title
2025-02-03 10:37:32 +00:00
collect chore: move files, fix README location 2025-01-20 06:51:02 +00:00
data chore: full directory layout 2025-01-27 09:49:00 +00:00
html chore: add html downloaded notebooks 2025-01-28 14:18:51 +00:00
imgs chore: fix typos 2025-01-27 20:52:30 +00:00
learning/python chore: fix image url and hide view some code 2025-01-28 14:21:29 +00:00
.gitignore chore: ignore __pycache__ 2025-01-27 06:53:28 +00:00
about.md chore: fix typos 2025-01-27 20:56:50 +00:00
actions_distribution.pdf chore: acctios_distributed pdf generated by prepare notebook 2025-01-27 01:00:04 +00:00
data_sample.tar.xz chore: data_sample format xc 2025-01-27 06:49:58 +00:00
full_content_layout.md chore: add html downloads links 2025-01-28 14:19:32 +00:00
install.md chore: fix typos 2025-01-27 20:56:50 +00:00
intro.md chore: fix typos 2025-01-27 21:00:39 +00:00
models_sample.tar.xz chore: models sample format xz 2025-01-27 06:54:23 +00:00
PerfSPEC.pdf chore: add PerfSPEC doc 2025-01-20 07:47:13 +00:00
presentacion.pdf chore: fix sentence 2025-01-28 20:36:07 +00:00
raw-audit-logs.log.xz chore: main raw-audit-log.log inxz format 2025-01-27 06:55:10 +00:00
README.md chore: Fix title 2025-02-03 10:37:32 +00:00

Table of Contents

PerfSPEC Learning Phase

Based in PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers document presented in [1], this repository contains source files used to generate and process data.

PerfSPEC

Important

With PerfSPEC Security Policies can be managed / watched in Proactive mode by using ranking, learning and profiles for safetiness, performance and resource costs

It has three phases:

  • Ranking
  • Learning
  • Runtime

This repository is focused in Learning phase with attention on:

  • Event logs, info load and process
  • Predictive learning model

There are additional documents to this:

Note

It is considered that event data collection in raw-audit-logs.log.xz are realistic and representative to simulate administrative operations.

Files

Data

  • raw-audit-logs.log contains raw Kubernetes audit logs collected using the audit-policy.yaml audit policy.

Layout

Tools are distributed in directories:

Content structure overview with notes

    ├── PerfSPEC.pdf                   Reference document
    ├── README.md
    ├── about.md
    ├── actions_distribution.pdf       Generated actions distribytion
    ├── collect                        Collect logs scripts 
    ├── data                           Extracted from compress archive 
    ├── data_sample.tar.xz             Compress archive with 'data'
    ├── imgs
    ├── full_content_layout.md         Full content layout
    ├── html                           HTML download for notebooks
    ├── install.md                     Installation notes
    ├── intro.md
    ├── learning
    ├── models                         Extracted from compress archive
    ├── models_sample.tar.xz           Comperss archive with 'models'
    ├── presentacion.pdf               Presentation slides
    └── raw-audit-logs.log.xz          Main Raw Logs file
 

A full directory layout is available.

As some tasks can be used in Python or Rust there are or will be directories for each programming languge inside directories tasks.

Each task/programming-language use a common data directory where processing output files is generated.

Collect data

If you wish to collect your own dataset, there are several source files that might help:

  • collect/audit-policy.yaml is for Kubernetes event logs capture, other resources are also required: adminssion controllers, etc
  • collect/collect.py is a script to trigger the installation and uninstallation of public Helm repositories.
  • collect/helm-charts.json is a backup of Helm charts used at the time of the collection.

Process data

data/raw-audit-logs.log Raw logs captured from Services data/main-audit-logs.log Data logs fixed and clean data/actions-dataset-audit.txt Source content for learning models

data/actions_distribution.pdf Autogenerated graph view of actions and events distribution

Data Models

Caution

These files are default names and paths, can be changed:

  • by settings modifications
  • by command-line in running script mode. Add --help for more info

models/checkpoints is where files are stored as part of learning process:

├── checkpoints
    │   ── model_at_epoch_175.keras
    └── model_at_epoch_185.keras

models/perfSPEC_model.keras is the generated model by default models/history.json is model history with stats

Learning Notebooks

lib_perfspec.py Main library with settings

prepare_perfspec.py Prepare data from raw to source for learning models

train_perfspec.py To train model from data

run_perfspec.py To run/check predictions

model_perfspec.py To inspect / review generated models

__ pycache __ is for Python execution, is ignored in git tasks.

HTML Notebooks

Notebooks downloaded as HTML with code (no data is includes in this mode only output):

prepare_perfspec.html Prepare data from raw to source for learning models

model_perfspec.html To inspect / review generated models

Reference

[1]: H. Kermabon-Bobinnec et al., "PerfSPEC: Performance Profiling-based Proactive Security Policy Enforcement for Containers," in IEEE Transactions on Dependable and Secure Computing, doi: 10.1109/TDSC.2024.3420712.