DevNation Day 2020 - Knative Backstage (how autoscaler actually works)

학습일지/Knative

inspirit941 2022. 6. 10. 22:50

Knative BackStage - how autoscaler Actually works

Paul Morie. Serverless Engineering team at RedHat

스크린샷 2023-10-31 오전 9 58 55

일반적으로 Serverless 하면 Function as a Service나 AWS Lambda를 떠올린다.

Knative Serving: Scaling Application on-demand.
Knative Eventing: working with events that are emitted by different sources.

스크린샷 2023-10-31 오전 9 59 01

Knative Serving

ingress는 Pluggable. (Kourier는 RedHat의 3Scale 팀에서 만들어진 프로젝트)
Key Component
- Service : High Level container that manages other resources - service의 spec을 조정하면 아래 두 개의 컴포넌트 설정이 바뀐다; configuration, route.
  - Configuration : service의 spec을 변경하면, 새 revision을 생성하는 컴포넌트.
  - Revision: Immutable Snapshot
  - Route: Direct traffic. traffic split / rollout 기능을 담당하는 컴포넌트.

스크린샷 2023-10-31 오전 10 07 15

Knative Serving은 Scale Zero를 지원함.

knative의 Autoscaler가 동작하기 때문. Autoscaler에는 네 가지 컴포넌트가 있다.

Autoscaler
- collect / receive metrics by websocket
- scaling decision 결정
- kubernetes api에 replica count 변경하는 api 보내는 역할.
SKS (Serverless Services)
- Abstraction of k8s services to control the data flow into the revisions.
  - 트래픽을 revision으로 바로 보낼 것인지 (Serve Mode), activator로 보낼 것인지 (Proxy Mode) 결정
Activator
- scale 0인 컴포넌트에 트래픽이 들어왔을 때 거쳐가는 컴포넌트
- capacity-aware Load balancing to handle burst.
Queue Proxy
- user pod에 붙어 있는 sidecar. metric 수집해서 autoscaler로 전달하는 역할
- request가 몰리면 queue 역할 담당.

스크린샷 2023-10-31 오전 11 03 39

2020년 기준, Knative에서는 HPA를 지원하지 않는다.

HPA는 아직 Generally Available (GA) Level 단계에서 Scale Zero 기능을 지원하지 않고 있음.
- custom metric 조절해서 강제로 만들 수는 있지만, 특수한 상황에서만 가능한 설정이라고 봄.
HPA는 cpu / memory 사용량 기준으로 scale을 결정하므로, metric server라는 컴포넌트도 추가로 필요함
커뮤니티는 Knative의 Autoscaler (KPA)가 HPA 방식보다 easy to follow / maintain 이라고 생각하는 중.

스크린샷 2023-10-31 오후 1 14 50

ingress로 트래픽이 들어오고, ServerlessService (SKS)로 트래픽이 전달된다.

scale 0 상황이라면, ServerlessService는 proxy mode이다. 즉 activator가 트래픽을 먼저 받고, scale up 될 때까지 기다려야 함.

SKS는 Activator로 트래픽을 전달하고, incoming traffic을 buffer 형태로 담아둔다.

Activator는 Autoscaler 컴포넌트에게 'scale up' 요청을 보낸다.

Autoscaler는 k8s API server에게 'pod가 올라와야 할 revision'에 해당하는 Deployment의 scale up 요청을 보낸다.

Deployment의 pod가 정상적으로 올라오면, activator가 buffer로 가지고 있던 traffic을 user pod의 queue-proxy로 전달한다.

요렇게 모종의 과정을 거치고 나면, SKS는 proxy mode -> serve mode로 변경된다.

스크린샷 2023-10-31 오후 1 23 31

Serve Mode 상태에서 new Request가 들어올 경우의 흐름은 위와 같다.

ingress로 트래픽이 들어와서 SKS로 전달되었을 때, SKS는 트래픽을 activator로 보내는 대신 user pod로 전달한다.

request 양이 많아지면 queue-proxy가 수집하는 metric에도 값이 반영됨.
- amount of incoming request, how long it takes, how many requests are queued... 등을 autoscaler가 수집함.

queue / buffer request 수치가 올라가면, autoscaler 내부 컴포넌트 중 하나인 Decider가 scale up 결정을 내린다.

스크린샷 2023-10-31 오후 2 13 10