Scenario: Hybrid (A/B + SLOs) testing and progressive traffic shift of KFServing models
Hybrid (A/B + SLOs) testing enables you to combine A/B or A/B/n testing with a reward metric on the one hand with SLO validation using objectives on the other. Among the versions that satisfy objectives, the version which performs best in terms of the reward metric is the winner. In this tutorial, you will:
Perform hybrid (A/B + SLOs) testing.
Specify user-engagement as the reward metric.
Specify latency and error-rate based objectives, for which data will be provided by Prometheus.
Combine hybrid (A/B + SLOs) testing with progressive traffic shift. Iter8 will progressively shift traffic towards the winner and promote it at the end as depicted below.
apiVersion:v1kind:Namespacemetadata:name:iter8-kfserving---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:user-engagementnamespace:iter8-kfservingspec:mock:-name:flowers-v1level:"15.0"-name:flowers-v2level:"20.0"---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:95th-percentile-tail-latencynamespace:iter8-kfservingspec:description:95th percentile tail latencyjqExpression:.data.result[0].value[1] | tonumberparams:-name:queryvalue:|histogram_quantile(0.95, sum(rate(revision_app_request_latencies_bucket{namespace_name='$ns'}[${elapsedTime}s])) by (le))provider:prometheussampleSize:iter8-kfserving/request-counttype:Gaugeunits:millisecondsurlTemplate:http://prometheus-operated.iter8-system:9090/api/v1/query---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:error-countnamespace:iter8-kfservingspec:description:Number of error responsesjqExpression:.data.result[0].value[1] | tonumberparams:-name:queryvalue:|sum(increase(revision_app_request_latencies_count{response_code_class!='2xx',namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0)provider:prometheustype:CounterurlTemplate:http://prometheus-operated.iter8-system:9090/api/v1/query---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:error-ratenamespace:iter8-kfservingspec:description:Fraction of requests with error responsesjqExpression:.data.result[0].value[1] | tonumberparams:-name:queryvalue:|(sum(increase(revision_app_request_latencies_count{response_code_class!='2xx',namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(revision_app_request_latencies_count{namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0))provider:prometheussampleSize:iter8-kfserving/request-counttype:GaugeurlTemplate:http://prometheus-operated.iter8-system:9090/api/v1/query---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:mean-latencynamespace:iter8-kfservingspec:description:Mean latencyjqExpression:.data.result[0].value[1] | tonumberparams:-name:queryvalue:|(sum(increase(revision_app_request_latencies_sum{namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0)) / (sum(increase(revision_app_request_latencies_count{namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0))provider:prometheussampleSize:iter8-kfserving/request-counttype:Gaugeunits:millisecondsurlTemplate:http://prometheus-operated.iter8-system:9090/api/v1/query---apiVersion:iter8.tools/v2alpha2kind:Metricmetadata:name:request-countnamespace:iter8-kfservingspec:description:Number of requestsjqExpression:.data.result[0].value[1] | tonumberparams:-name:queryvalue:|sum(increase(revision_app_request_latencies_count{namespace_name='$ns'}[${elapsedTime}s])) or on() vector(0)provider:prometheustype:CounterurlTemplate:http://prometheus-operated.iter8-system:9090/api/v1/query
apiVersion:iter8.tools/v2alpha2kind:Experimentmetadata:name:hybrid-expspec:target:flowersstrategy:testingPattern:A/BdeploymentPattern:Progressiveactions:# when the experiment completes, promote the winning version using kubectl applyfinish:-if:CandidateWon()run:"kubectlapply-fhttps://raw.githubusercontent.com/iter8-tools/iter8/master/samples/kfserving/quickstart/promote-v2.yaml"-if:not CandidateWon()run:"kubectlapply-fhttps://raw.githubusercontent.com/iter8-tools/iter8/master/samples/kfserving/quickstart/promote-v1.yaml"criteria:rewards:# Business rewards-metric:iter8-kfserving/user-engagementpreferredDirection:High# maximize user engagementduration:intervalSeconds:5iterationsPerLoop:5versionInfo:# information about model versions used in this experimentbaseline:name:flowers-v1weightObjRef:apiVersion:networking.istio.io/v1alpha3kind:VirtualServicename:routing-rulenamespace:defaultfieldPath:.spec.http[0].route[0].weightvariables:-name:nsvalue:ns-baselinecandidates:-name:flowers-v2weightObjRef:apiVersion:networking.istio.io/v1alpha3kind:VirtualServicename:routing-rulenamespace:defaultfieldPath:.spec.http[0].route[1].weightvariables:-name:nsvalue:ns-candidate