memory fixes

This commit is contained in:
Roger Oriol
2026-02-02 20:47:09 +01:00
parent b6284bec1f
commit aa4793dd51
11 changed files with 459 additions and 1 deletions

View File

@@ -0,0 +1,195 @@
# Raspberry Pi Node Scheduling Fix - Implementation Guide
## Problem Summary
Your Raspberry Pi node (4GB RAM) keeps crashing because high-resource applications are scheduling on it instead of on nodes with more capacity.
## Root Causes Identified
1. **High-memory applications without node targeting:**
- n8n PostgreSQL: 2-4Gi memory requirements
- Minecraft server: 1-4Gi memory requirements
- OpenWebUI: 1-2Gi memory requirements
- Phoenix services: 512Mi-2Gi memory requirements
- Jellyfin: 512Mi-2Gi memory requirements
2. **Missing node selectors:** Only Gitea services target ARM64 architecture
3. **No taints/tolerations:** Raspberry Pi node isn't protected from heavy workloads
4. **Resource limits missing:** Some applications can consume unlimited resources
## Solution Applied
### Modified Files with Node Selectors (Prevent RPi Scheduling)
**Updated these manifests to include `nodeSelector: hardware: high-memory`:**
1. `/n8n/postgres-deployment.yaml` - PostgreSQL (2-4Gi memory)
2. `/minecraft-server/ss.yaml` - Minecraft server (1-4Gi memory)
3. `/openwebui/openwebui.yaml` - OpenWebUI (1-2Gi memory)
4. `/phoenix/phoenix-statefulset.yaml` - Phoenix app (512Mi-2Gi memory)
5. `/phoenix/postgres-statefulset.yaml` - Phoenix PostgreSQL (256Mi-1Gi memory)
6. `/jellyfin/jellyfin.yaml` - Jellyfin media server (512Mi-2Gi memory)
7. `/monitoring/prometheus-deployment.yaml` - Prometheus (512Mi-1Gi memory)
### Implementation Steps
#### Step 1: Label and Taint Your Nodes
```bash
# 1. Identify your nodes
kubectl get nodes -o wide
# 2. Label your powerful nodes
kubectl label nodes <powerful-node-1> hardware=high-memory
kubectl label nodes <powerful-node-2> hardware=high-memory
# 3. Label your Raspberry Pi node
kubectl label nodes <raspberry-pi-node> hardware=low-memory
kubectl label nodes <raspberry-pi-node> node-type=raspberry-pi
# 4. Taint the Raspberry Pi to prevent most workloads
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule
```
#### Step 2: Apply Updated Manifests
```bash
# Apply all updated manifests
kubectl apply -f n8n/postgres-deployment.yaml
kubectl apply -f minecraft-server/ss.yaml
kubectl apply -f openwebui/openwebui.yaml
kubectl apply -f phoenix/phoenix-statefulset.yaml
kubectl apply -f phoenix/postgres-statefulset.yaml
kubectl apply -f jellyfin/jellyfin.yaml
kubectl apply -f monitoring/prometheus-deployment.yaml
```
#### Step 3: Force Reschedule Existing Pods
```bash
# Delete existing pods to force rescheduling on correct nodes
kubectl delete pods -n n8n -l service=postgres-n8n
kubectl delete pods -n minecraft -l app=minecraft-server
kubectl delete pods -l app=open-webui
kubectl delete pods -n phoenix -l app=phoenix
kubectl delete pods -n phoenix -l app=postgres
kubectl delete pods -n jellyfin -l app=jellyfin
kubectl delete pods -n monitoring -l app=prometheus
```
#### Step 4: Verify Pod Scheduling
```bash
# Check where pods are scheduled
kubectl get pods -o wide --all-namespaces | grep -E "(n8n|minecraft|openwebui|phoenix|jellyfin|prometheus)"
# Verify node resource usage
kubectl top nodes
# Check events for scheduling issues
kubectl get events --sort-by='.lastTimestamp' | tail -20
```
### Optional: Add Tolerations for Lightweight Services
For services that CAN run on Raspberry Pi, add tolerations:
```yaml
# Example for Pi-hole (good candidate for RPi)
spec:
template:
spec:
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values: ["raspberry-pi"]
```
**Good candidates for Raspberry Pi:**
- Pi-hole (DNS filtering)
- Home Assistant (IoT hub)
- Fava (lightweight accounting)
- Vaultwarden (password manager)
- Glance (dashboard)
### Monitoring and Validation
#### Check Resource Usage
```bash
# Monitor node resource consumption
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory
# Check pod distribution across nodes
kubectl get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c
```
#### Verify Scheduling Constraints
```bash
# Check node labels and taints
kubectl get nodes --show-labels
kubectl describe nodes | grep -E "(Name:|Taints:|Labels:)"
# Verify no high-memory pods on RPi
kubectl get pods -o wide --all-namespaces | grep <raspberry-pi-node-name>
```
## Troubleshooting
### If Pods Stay Pending
```bash
# Check why pods can't be scheduled
kubectl describe pod <pending-pod-name> -n <namespace>
# Common issues:
# - Node doesn't have required labels
# - Resource requests too high for available nodes
# - No nodes tolerate the pod's requirements
```
### If You Need to Rollback
```bash
# Remove node selectors from manifests and reapply
# Remove taints from Raspberry Pi
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule-
# Remove labels if needed
kubectl label nodes <node-name> hardware-
kubectl label nodes <node-name> node-type-
```
## Expected Results
After implementation:
1. **High-resource applications** will only schedule on powerful nodes
2. **Raspberry Pi node** will be protected from resource-heavy workloads
3. **Cluster stability** will improve with proper resource distribution
4. **Pi node crashes** should stop occurring
5. **Lightweight services** can still run on Pi (with tolerations)
## Architecture Summary
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Powerful │ │ Powerful │ │ Raspberry Pi │
│ Node 1 │ │ Node 2 │ │ Node (4GB) │
│ │ │ │ │ │
│ • n8n Postgres │ │ • Minecraft │ │ • Pi-hole │
│ • Phoenix │ │ • OpenWebUI │ │ • Glance │
│ • Jellyfin │ │ • Prometheus │ │ • Fava │
│ • Grafana │ │ • Other apps │ │ • Vaultwarden │
│ │ │ │ │ • Home Asst │
└─────────────────┘ └─────────────────┘ └─────────────────┘
hardware=high-mem hardware=high-mem hardware=low-mem
TAINTED (protected)
```
The Raspberry Pi is now protected while still being available for lightweight services that benefit from its unique characteristics.

View File

@@ -38,6 +38,9 @@ spec:
labels: labels:
app: jellyfin app: jellyfin
spec: spec:
# Prevent scheduling on Raspberry Pi due to high resource requirements (512Mi-2Gi memory, 500m-2000m CPU)
nodeSelector:
hardware: high-memory
containers: containers:
- name: jellyfin - name: jellyfin
image: jellyfin/jellyfin:latest image: jellyfin/jellyfin:latest

View File

@@ -13,6 +13,9 @@ spec:
labels: labels:
app: minecraft-server app: minecraft-server
spec: spec:
# Prevent scheduling on Raspberry Pi due to high resource requirements (1Gi-4Gi memory, 1-2 CPU)
nodeSelector:
hardware: high-memory
containers: containers:
- name: minecraft-server - name: minecraft-server
image: itzg/minecraft-server:latest # Or specific version if needed image: itzg/minecraft-server:latest # Or specific version if needed

View File

@@ -15,6 +15,9 @@ spec:
labels: labels:
app: prometheus app: prometheus
spec: spec:
# Prevent scheduling on Raspberry Pi due to resource requirements (512Mi-1Gi memory, 500m-1000m CPU)
nodeSelector:
hardware: high-memory
serviceAccountName: prometheus serviceAccountName: prometheus
containers: containers:
- name: prometheus - name: prometheus

View File

@@ -20,6 +20,9 @@ spec:
labels: labels:
service: postgres-n8n service: postgres-n8n
spec: spec:
# Prevent scheduling on Raspberry Pi due to high memory requirements (2-4Gi)
nodeSelector:
hardware: high-memory
containers: containers:
- image: postgres:18 - image: postgres:18
name: postgres name: postgres

View File

@@ -0,0 +1,45 @@
# Node Management Commands for Raspberry Pi Scheduling Issues
## 1. Taint the Raspberry Pi Node (Recommended Approach)
```bash
# Find your Raspberry Pi node name
kubectl get nodes -o wide
# Taint the Raspberry Pi node to prevent scheduling (except for tolerating pods)
kubectl taint nodes <raspberry-pi-node-name> node-type=raspberry-pi:NoSchedule
# Alternative: Use a more descriptive taint
kubectl taint nodes <raspberry-pi-node-name> hardware=low-memory:NoSchedule
```
## 2. Label Nodes for Better Targeting
```bash
# Label your Raspberry Pi node
kubectl label nodes <raspberry-pi-node-name> node-type=raspberry-pi
kubectl label nodes <raspberry-pi-node-name> hardware=low-memory
# Label your more powerful nodes
kubectl label nodes <powerful-node-1> node-type=worker
kubectl label nodes <powerful-node-1> hardware=high-memory
kubectl label nodes <powerful-node-2> node-type=worker
kubectl label nodes <powerful-node-2> hardware=high-memory
```
## 3. Verify Node Configuration
```bash
# Check node labels and taints
kubectl describe nodes
# See which nodes have what resources available
kubectl describe nodes | grep -A 5 "Allocatable"
```
## 4. Remove Taint if Needed
```bash
# Remove the taint if you need to rollback
kubectl taint nodes <raspberry-pi-node-name> node-type=raspberry-pi:NoSchedule-
```

View File

@@ -25,6 +25,9 @@ spec:
labels: labels:
app: open-webui app: open-webui
spec: spec:
# Prevent scheduling on Raspberry Pi due to high resource requirements (1Gi-2Gi memory, 1-2 CPU)
nodeSelector:
hardware: high-memory
volumes: volumes:
- name: webui-data - name: webui-data
persistentVolumeClaim: persistentVolumeClaim:

View File

@@ -42,6 +42,9 @@ spec:
labels: labels:
app: phoenix app: phoenix
spec: spec:
# Prevent scheduling on Raspberry Pi due to high resource requirements (512Mi-2Gi memory, 500m-2000m CPU)
nodeSelector:
hardware: high-memory
initContainers: initContainers:
- name: wait-for-postgres - name: wait-for-postgres
image: busybox:1.36 image: busybox:1.36

View File

@@ -33,6 +33,9 @@ spec:
labels: labels:
app: postgres app: postgres
spec: spec:
# Prevent scheduling on Raspberry Pi due to resource requirements (256Mi-1Gi memory, 250m-1000m CPU)
nodeSelector:
hardware: high-memory
containers: containers:
- name: postgres - name: postgres
image: postgres:16 image: postgres:16

View File

@@ -0,0 +1,79 @@
# Examples of tolerations for services that SHOULD run on Raspberry Pi
# These services have low resource requirements and can benefit from Pi-specific features
# 1. Pi-hole - Perfect for Raspberry Pi (DNS filtering, network service)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: pihole
spec:
template:
spec:
# Allow scheduling on Raspberry Pi
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
# Prefer Raspberry Pi for network services
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values: ["raspberry-pi"]
# 2. Home Assistant - May benefit from running on Pi for local device access
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: home-assistant
namespace: home-assistant
spec:
template:
spec:
# Allow scheduling on Raspberry Pi (good for IoT hub role)
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
# Prefer Raspberry Pi for home automation
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node-type
operator: In
values: ["raspberry-pi"]
# 3. Lightweight services (Fava, Vaultwarden, Glance)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: lightweight-service-example
spec:
template:
spec:
# Allow scheduling on Raspberry Pi for lightweight workloads
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
# No preference - let scheduler decide based on resource availability
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "512Mi"
cpu: "500m"

118
validate-scheduling.sh Executable file
View File

@@ -0,0 +1,118 @@
#!/bin/bash
# Raspberry Pi K3s Scheduling Validation Script
# Run this to check your cluster configuration and pod distribution
echo "=== Kubernetes Node Analysis ==="
echo
echo "1. Node Overview:"
kubectl get nodes -o wide
echo
echo "2. Node Resource Capacity:"
kubectl describe nodes | grep -A 5 "Allocatable:"
echo
echo "3. Node Labels and Taints:"
kubectl get nodes --show-labels
echo
kubectl describe nodes | grep -E "(Name:|Taints:)" | grep -A 1 "Name:"
echo
echo "=== Pod Distribution Analysis ==="
echo
echo "4. High-Resource Pods Location:"
echo "Checking where memory-intensive applications are scheduled..."
echo
echo "n8n PostgreSQL pods:"
kubectl get pods -n n8n -o wide | grep postgres || echo "No n8n postgres pods found"
echo
echo "Minecraft server pods:"
kubectl get pods -n minecraft -o wide || echo "No minecraft pods found"
echo
echo "OpenWebUI pods:"
kubectl get pods -o wide | grep open-webui || echo "No OpenWebUI pods found"
echo
echo "Phoenix pods:"
kubectl get pods -n phoenix -o wide || echo "No Phoenix pods found"
echo
echo "Jellyfin pods:"
kubectl get pods -n jellyfin -o wide || echo "No Jellyfin pods found"
echo
echo "Prometheus pods:"
kubectl get pods -n monitoring -o wide | grep prometheus || echo "No Prometheus pods found"
echo
echo "=== Resource Usage ==="
echo
echo "5. Current Node Resource Usage:"
kubectl top nodes 2>/dev/null || echo "Metrics server not available - install with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml"
echo
echo "6. Top Memory-Consuming Pods:"
kubectl top pods --all-namespaces --sort-by=memory 2>/dev/null | head -10 || echo "Metrics server not available"
echo
echo "=== Pod Events (Recent Issues) ==="
echo
echo "7. Recent Pod Scheduling Events:"
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | grep -E "(Failed|Error|Warning)" | tail -10
echo
echo "=== Validation Summary ==="
echo
# Count pods per node
echo "8. Pod Distribution Per Node:"
echo "Node Pod Count"
echo "------------------------|---------"
kubectl get pods --all-namespaces -o wide --no-headers | awk '{print $8}' | sort | uniq -c | awk '{printf "%-24s| %s\n", $2, $1}'
echo
echo "=== Recommendations ==="
echo
# Check if any high-resource pods are on wrong nodes
echo "9. Checking for Potential Issues:"
# Get Raspberry Pi node name (assumes it has 'pi' in the name or is ARM64)
RPI_NODE=$(kubectl get nodes -o jsonpath='{.items[?(@.status.nodeInfo.architecture=="arm64")].metadata.name}' | head -1)
if [ -n "$RPI_NODE" ]; then
echo "Detected Raspberry Pi node: $RPI_NODE"
# Check if high-resource pods are on RPi
HIGH_MEM_PODS=$(kubectl get pods --all-namespaces -o wide | grep "$RPI_NODE" | grep -E "(postgres|minecraft|phoenix|jellyfin|prometheus|openwebui)")
if [ -n "$HIGH_MEM_PODS" ]; then
echo "⚠️ WARNING: High-resource pods found on Raspberry Pi node:"
echo "$HIGH_MEM_PODS"
echo
echo "These pods should be moved to more powerful nodes."
else
echo "✅ Good: No high-resource pods detected on Raspberry Pi node."
fi
else
echo " Could not auto-detect Raspberry Pi node. Please check manually."
fi
echo
echo "=== Next Steps ==="
echo
echo "If you see high-resource pods on your Raspberry Pi node:"
echo "1. Apply the node labels: kubectl label nodes <powerful-node> hardware=high-memory"
echo "2. Apply the taint: kubectl taint nodes <rpi-node> node-type=raspberry-pi:NoSchedule"
echo "3. Apply updated manifests with nodeSelectors"
echo "4. Delete problematic pods to force rescheduling"
echo
echo "See RASPBERRY_PI_SCHEDULING_FIX.md for detailed instructions."