Serverless Practitioners Summit 2019 - How Knative Uses Concurrency and RPS

학습일지/Knative

Serverless Practitioners Summit 2019 - How Knative Uses Concurrency and RPS

inspirit941 2023. 3. 26. 12:55

How Knative Uses Concurrency and Rps (Requests per Second) For Autoscaling

2019년의 발표이므로, 기록하는 현재 2023년과는 달라진 점이 있다.
발표자료: https://spsna19.sched.com/event/Wb35/how-knative-uses-concurrency-and-rps-requests-per-second-for-autoscaling-tara-gu-ibm

Serverless Practitioners Summit North America 2019: How Knative Uses Concurrency and Rps (Re...

View more about this event at Serverless Practitioners Summit North America 2019

spsna19.sched.com

Knative 프로덕트 설명

스크린샷 2023-03-25 오후 1 18 10

Knative : k8s 위에서 service mesh와 함께 동작하며, serverless application / functions의 배포 / 운영을 지원하는 프로덕트

facilitate / simplifies configuring applications running on k8s.
provides transformation of higher level application definition to lower level k8s resources.

스크린샷 2023-03-25 오후 1 23 13

Configruation : Current State of the app.
Revision: Certain Snapshot of certain App.
Routes: Determines which revision should go when incoming request comes in.

Knative Request-based Autoscaler (KPA)

스크린샷 2023-03-25 오후 1 29 00

Knative Scaling은 Resource-based, Request-based 두 가지 방식의 autoscaling을 지원함. 이 발표에서는 Request-based Scaling을 다룬다.

일정 시간 Incoming traffic이 없을 경우 pod 개수를 0으로 낮춤.

Handle bursty traffic

스크린샷 2023-03-25 오후 1 36 40

예측 가능한 Traffic Bursty일 경우

minScale, maxScale을 지정할 수 있다.
- minScale: Cold Start 방지
- maxScale: 최대 instance 개수 제한

스크린샷 2023-03-25 오후 1 38 32 스크린샷 2023-03-25 오후 1 51 13

예측 불가능한 Traffic Bursty일 경우

Concurrent request 규모가 어느 정도일지 지정해둘 수 있다.
예시에서는 annotation만 있으나, configmap으로도 설정 가능.

스크린샷 2023-03-25 오후 1 52 03

하나의 Container가 받을 수 있는 request의 Hard limit을 지정하는 방법?

containerConcurrency 옵션. container가 받을 수 있는 concurrent request의 최댓값을 정한다.
최댓값을 넘는 요청은 queued.

스크린샷 2023-03-25 오후 1 54 15

Autoscaler가 exact target value를 결정하기 위해 사용하는 또 다른 옵션들

ContainerConcurrency 값에 따라 실제 값이 달라짐.
- container-concurrency-target-default: containerConcurrency 값이 0일 경우 적용됨. 수치는 100.
- container-concurrency-target-percentage: containerConcurrency 값을 지정한 경우, 컨테이너가 stable state라고 판단하기 위한 기준.
  - default는 70. 즉 autoscaler는 하나의 컨테이너가 containerConcurrency의 70% 값을 받을 수 있도록 autoscale을 수행함.
  - 만약 containerConcurrency가 10이면, pod가 평균 7 concurrent traffic을 받을 수 있도록 autoscale을 진행한다는 뜻.

스크린샷 2023-03-25 오후 2 01 41

knative Autoscaler의 traffic sampling? : window.

stable 상태일 때는 window size을 크게 잡고 Concurrent Traffic (RPS)을 측정
- stable window 기준 시간을 지정할 수 있음.
RPS 값이 임계치를 넘으면 panic mode로 진입. window size를 작게 잡고 quick reaction 수행.

예컨대 Default KPA Configuration은 아래와 같다.

stable-window: 60s (60 seconds)
panic-window-percentage: 10 (10%)
panic-threshold-percentage: 200 (200%)

panic window는 아래와 같이 연산된다.

stable-window * (panic-window-percentage / 100)
- 예시의 숫자를 대입하면 60s * (10 / 100) = 6s (6 seconds)

default targetConcurrency는 100. panic으로 판단하기 위한 painc threshold는 아래와 같이 연산된다.

target-concurrency * (panic-threshold-percentage / 100)
- 예시의 경우 panic-threshold = 100 * (200 / 100) = 200

즉, Knative Autoscaler (KPA)는

average concurrency over stable window (60s) 기준으로 scale 여부 결정
average concurrency가 panic window (6s) 동안 panic threshold (200)을 초과하면 panic mode 진입 -> scale more aggressively.

스크린샷 2023-03-25 오후 2 40 15

스크린샷 2023-03-26 오전 11 43 54

excessBurstCapacity: (how much spare capacity we have) - (configured target burst capacity)

autoscaler가 targetConcurrency 값을 일시적으로 높여서 application의 responsiveness를 높이기 위한 목적으로 쓰임.
scale up 결정을 내리기 전, container가 감당할 수 있는 concurrency request 값을 일시적으로 어디까지 높일 것인지. (scale up 결정 내리기 전에 어느 정도까지는 기존 app으로 버티게 할지 결정하는 필드)

예시

container-concurrency-target-default: 100 (concurrency 값을 따로 정의하지 않음)
excessBurstCapacity: 10

이 경우

autoscaler가 scale out을 결정하는 기준은 default concurrent traffic값인 100이 아니라 110이 됨. (default concurrency + excessBurstCapacity)
110을 넘으면 scaling rule에 따라 scale out 수행.

이 옵션 자체는 기본적으로 비활성화되어 있음 (-1로 설정돼 있다)

값이 너무 높으면 application overloading
값이 너무 낮으면 unnecessary scaling operations 발생

How to Scale 0 as fast as we can?

스크린샷 2023-03-26 오후 12 03 17

기본 설정에서 Scale to Zero까지 내려가는 데 걸리는 시간은 90초.

stable window size: 60
scale-to-zero-grace-period: 30

트래픽이 60초 동안 들어오지 않으면, grace period인 30초를 더 기다린 다음 scale down to zero.

개인적인 궁금증 -> scale zero에서 cold start에 걸리는 time도 stable window에 포함되나?

스크린샷 2023-03-26 오후 12 08 19

stable window size를 조정할 수 있으나, 6s 아래로 내려갈 수 없다.

autoscaler는 2초마다 concurrent traffic을 수집. 따라서 값이 작을수록 incoming traffic 합산하고 sampling 과정에서 누락되는 트래픽이 생길 가능성이 커짐
panic window는 stable window보다 값이 작아야 하며, metric 수집 주기로는 (2 times * minimal tick) = 최소한 4s는 되어야 함.
- 근데 이렇게 되면 panic window size가 stable window size의 80%에 달하게 됨. 그다지 권장하는 옵션이 아님.

스크린샷 2023-03-26 오후 12 15 30