Custom Metrics¶
Custom Iter8 metrics enable you to use data from any database for evaluating app/ML model versions within Iter8 experiments. This document describes how you can define custom Iter8 metrics and (optionally) supply authentication information that may be required by the metrics provider.
Metric providers differ in the following aspects.
- HTTP request authentication method: no authentication, basic auth, API keys, or bearer token
- HTTP request method: GET or POST
- Format of HTTP parameters and/or JSON body used while querying them
- Format of the JSON response returned by the provider
- The logic used by Iter8 to extract the metric value from the JSON response
The examples in this document focus on Prometheus, NewRelic, Sysdig, and Elastic. However, the principles illustrated here will enable you to use metrics from any provider in experiments.
Metrics with/without auth¶
Note: Metrics are defined by you, the Iter8 end-user.
Prometheus does not support any authentication mechanism out-of-the-box. However, Prometheus can be setup in conjunction with a reverse proxy, which in turn can support HTTP request authentication, as described here.
The following is an example of an Iter8 metric with Prometheus as the provider. This example assumes that Prometheus can be queried by Iter8 without any authentication.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Suppose Prometheus is set up to enforce basic auth with the following credentials:
username: produser
password: t0p-secret
You can enable Iter8 to query this Prometheus instance as follows.
-
Create secret: Create a Kubernetes secret that contains the authentication information. In particular, this secret needs to have the
username
andpassword
fields in thedata
section with correct values.kubectl create secret generic promcredentials -n myns --from-literal=username=produser --from-literal=password=t0p-secret
-
Create RBAC rule: Provide the required permissions for Iter8 to read this secret. The service account
iter8-analytics
in theiter8-system
namespace will have permissions to read secrets in themyns
namespace.kubectl create rolebinding iter8-cred --clusterrole=iter8-secret-reader-analytics --serviceaccount=iter8-system:iter8-analytics -n myns
-
Define metric: When defining the metric, ensure that the
authType
field is set toBasic
and the appropriatesecret
is referenced.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Brief explanation of the request-count
metric
- Prometheus enables metric queries using HTTP GET requests.
GET
is the default value for themethod
field of an Iter8 metric. This field is optional; it is omitted in the definition ofrequest-count
, and defaulted toGET
. - Iter8 will query Prometheus during each iteration of the experiment. In each iteration, Iter8 will use
n
HTTP queries to fetch metric values for each version, wheren
is the number of versions in the experiment2. - The HTTP query used by Iter8 contains a single query parameter named
query
as required by Prometheus. The value of this parameter is derived by substituting the placeholders in the value string. - The
jqExpression
enables Iter8 to extract the metric value from the JSON response returned by Prometheus. - The
urlTemplate
field provides the URL of the prometheus service.
New Relic uses API Keys to authenticate requests as documented here. The API key may be directly embedded within the Iter8 metric, or supplied as part of a Kubernetes secret.
The following is an example of an Iter8 metric with Prometheus as the provider. In this example, t0p-secret-api-key
is the New Relic API key.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Suppose your New Relic API key is t0p-secret-api-key
; you wish to store this API key in a Kubernetes secret, and reference this secret in an Iter8 metric. You can do so as follows.
-
Create secret: Create a Kubernetes secret containing the API key.
The above secret contains a data field namedkubectl create secret generic nrcredentials -n myns --from-literal=mykey=t0p-secret-api-key
mykey
whose value is the API key. The data field name (which can be any string of your choice) will be used in Step 3 below as a placeholder. -
Create RBAC rule: Provide the required permissions for Iter8 to read this secret. The service account
iter8-analytics
in theiter8-system
namespace will have permissions to read secrets in themyns
namespace.kubectl create rolebinding iter8-cred --clusterrole=iter8-secret-reader-analytics --serviceaccount=iter8-system:iter8-analytics =myns
-
Define metric: When defining the metric, ensure that the
authType
field is set toAPIKey
and the appropriatesecret
is referenced. In theheaderTemplates
field, includeX-Query-Key
as the name of a header field (as required by New Relic). The value for this header field is a templated string. Iter8 will substitute the placeholder ${mykey} at query time, by looking up the referencedsecret
namednrcredentials
in themyns
namespace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Brief explanation of the name-count
metric
- New Relic enables metric queries using both HTTP GET or POST requests.
GET
is the default value for themethod
field of an Iter8 metric. This field is optional; it is omitted in the definition ofname-count
, and defaulted toGET
. - Iter8 will query New Relic during each iteration of the experiment. In each iteration, Iter8 will use
n
HTTP queries to fetch metric values for each version, wheren
is the number of versions in the experiment2. - The HTTP query used by Iter8 contains a single query parameter named
nrql
as required by New Relic. The value of this parameter is derived by substituting the placeholders in its value string. - The
jqExpression
enables Iter8 to extract the metric value from the JSON response returned by New Relic. - The
urlTemplate
field provides the URL of the New Relic service.
Sysdig data API accepts HTTP POST requests and uses a bearer token for authentication as documented here. The bearer token may be directly embedded within the Iter8 metric, or supplied as part of a Kubernetes secret.
The following is an example of an Iter8 metric with Sysdig as the provider. In this example, 87654321-1234-1234-1234-123456789012
is the Sysdig bearer token (also referred to as access key by Sysdig).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Suppose your Sysdig token is 87654321-1234-1234-1234-123456789012
; you wish to store this token in a Kubernetes secret, and reference this secret in an Iter8 metric. You can do so as follows.
-
Create secret: Create a Kubernetes secret containing the token.
The above secret contains a data field namedkubectl create secret generic sdcredentials -n myns --from-literal=token=87654321-1234-1234-1234-123456789012
token
whose value is the Sysdig token. The data field name (which can be any string of your choice) will be used in Step 3 below as a placeholder. -
Create RBAC rule: Provide the required permissions for Iter8 to read this secret. The service account
iter8-analytics
in theiter8-system
namespace will have permissions to read secrets in themyns
namespace.kubectl create rolebinding iter8-cred --clusterrole=iter8-secret-reader-analytics --serviceaccount=iter8-system:iter8-analytics -n myns
-
Define metric: When defining the metric, ensure that the
authType
field is set toBearer
and the appropriatesecret
is referenced. In theheaderTemplates
field, includeAuthorize
header field (as required by Sysdig). The value for this header field is a templated string. Iter8 will substitute the placeholder ${token} at query time, by looking up the referencedsecret
namedsdcredentials
in themyns
namespace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Brief explanation of the cpu-utilization
metric
- Sysdig enables metric queries using both POST requests; hence, the method field of the Iter8 metric is set to POST.
- Iter8 will query Sysdig during each iteration of the experiment. In each iteration, Iter8 will use
n
HTTP queries to fetch metric values for each version, wheren
is the number of versions in the experiment2. - The HTTP query used by Iter8 contains a JSON body as required by Sysdig. This JSON body is derived by substituting the placeholders in body template.
- The
jqExpression
enables Iter8 to extract the metric value from the JSON response returned by Sysdig. - The
urlTemplate
field provides the URL of the Sysdig service.
Elasticsearch REST API accepts HTTP GET or POST requests and uses basic authentication as documented here. Suppose Elasticsearch is set up to enforce basic auth with the following credentials:
username: produser
password: t0p-secret
You can then enable Iter8 to query the Elasticsearch service as follows.
-
Create secret: Create a Kubernetes secret that contains the authentication information. In particular, this secret needs to have the
username
andpassword
fields in thedata
section with correct values.kubectl create secret generic elasticcredentials -n myns --from-literal=username=produser --from-literal=password=t0p-secret
-
Create RBAC rule: Provide the required permissions for Iter8 to read this secret. The service account
iter8-analytics
in theiter8-system
namespace will have permissions to read secrets in themyns
namespace.kubectl create rolebinding iter8-cred --clusterrole=iter8-secret-reader-analytics --serviceaccount=iter8-system:iter8-analytics -n myns
-
Define metric: When defining the metric, ensure that the
authType
field is set toBasic
and the appropriatesecret
is referenced.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Brief explanation of the average sales
metric
- Elastic enables metric queries using GET or POST requests. In the elastic example, The method field of the Iter8 metric is set to POST.
- Iter8 will query Elastic during each iteration of the experiment. In each iteration, Iter8 will use
n
HTTP queries to fetch metric values for each version, wheren
is the number of versions in the experiment2. - The HTTP query used by Iter8 contains a JSON body as required by Elastic. This JSON body is derived by substituting the placeholders in body template.
- The
jqExpression
enables Iter8 to extract the metric value from the JSON response returned by Elastic. - The
urlTemplate
field provides the URL of the Elastic service.
Placeholder substitution¶
Note: This step is automated by Iter8.
Iter8 will substitute placeholders in the metric query based on the time elapsed since the start of the experiment, and information associated with each version in the experiment.
Suppose the metrics defined above are referenced within an experiment as follows. Further, suppose this experiment has started, Iter8 is about to do an iteration of this experiment, and the time elapsed since the start of the experiment is 600 seconds.
Look inside sample experiment
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
For the sample experiment above, Iter8 will use two HTTP(S) queries to fetch metric values, one for the baseline version, and another for the candidate version.
Consider the baseline version. Iter8 will send an HTTP(S) request with a single parameter named query
whose value equals:
sum(increase(revision_app_request_latencies_count{service_name='current',usergroup!~"wakanda"}[600s])) or on() vector(0)
Consider the baseline version. Iter8 will send an HTTP(S) request with a single parameter named nrql
whose value equals:
SELECT count(appName) FROM PageView WHERE revisionName='sample-app-v1' SINCE 600 seconds ago
Consider the baseline version. Iter8 will send an HTTP(S) request with the following JSON body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Consider the baseline version. Iter8 will send an HTTP(S) request with the following JSON body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The placeholder $elapsedTime
has been substituted with 600, which is the time elapsed since the start of the experiment. The other placeholders have been substituted based on the versionInfo field of the baseline version in the experiment. Iter8 builds and sends an HTTP request in a similar manner for the candidate version as well.
JSON response¶
Note: This step is handled by the metrics provider.
The metrics provider is expected to respond to Iter8's HTTP request with a JSON object. The format of this JSON object is defined by the provider.
The format of the Prometheus JSON response is defined here. A sample Prometheus response is as follows.
1 2 3 4 5 6 7 8 9 10 11 |
|
The format of the New Relic JSON response is discussed here. A sample New Relic response is as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
The format of the Sysdig JSON response is discussed here. A sample Sysdig response is as follows.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The format of the Elastic JSON response is discussed here. A sample Elastic response is as follows.
1 2 3 4 5 6 7 8 |
|
Processing the JSON response¶
Note: This step is automated by Iter8.
Iter8 uses jq to extract the metric value from the JSON response of the provider. The jqExpression
used by Iter8 is supplied as part of the metric definition. When the jqExpression
is applied to the JSON response, it is expected to yield a number.
Consider the jqExpression
defined in the sample Prometheus metric. Let us apply it to the sample JSON response from Prometheus.
echo '{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"value": [1556823494.744, "21.7639"]
}
]
}
}' | jq ".data.result[0].value[1] | tonumber"
21.7639
, a number, as required by Iter8. Consider the jqExpression
defined in the sample New Relic metric. Let us apply it to the sample JSON response from New Relic.
echo '{
"results": [
{
"count": 80275388
}
],
"metadata": {
"eventTypes": [
"PageView"
],
"eventType": "PageView",
"openEnded": true,
"beginTime": "2014-08-03T19:00:00Z",
"endTime": "2017-01-18T23:18:41Z",
"beginTimeMillis=": 1407092400000,
"endTimeMillis": 1484781521198,
"rawSince": "'2014-08-04 00:00:00+0500'",
"rawUntil": "`now`",
"rawCompareWith": "",
"clippedTimeWindows": {
"Browser": {
"beginTimeMillis": 1483571921198,
"endTimeMillis": 1484781521198,
"retentionMillis": 1209600000
}
},
"messages": [],
"contents": [
{
"function": "count",
"attribute": "appName",
"simple": true
}
]
}
}' | jq ".results[0].count | tonumber"
80275388
, a number, as required by Iter8. Consider the jqExpression
defined in the sample Sysdig metric. Let us apply it to the sample JSON response from Sysdig.
echo '{
"data": [
{
"t": 1582756200,
"d": [
6.481
]
}
],
"start": 1582755600,
"end": 1582756200
}' | jq ".data[0].d[0] | tonumber"
6.481
, a number, as required by Iter8. Consider the jqExpression
defined in the sample Elastic metric. Let us apply it to the sample JSON response from Elastic.
echo '{
"aggregations": {
"items_to_sell": {
"doc_count": 3,
"avg_sales": { "value": 128.33333333333334 }
}
}
}' | jq ".aggregations.items_to_sell.avg_sales.value | tonumber"
128.33333333333334
, a number, as required by Iter8. Note: The shell command above is for illustration only. Iter8 uses Python bindings for
jq
to evaluate thejqExpression
.
Error handling¶
Note: This step is automated by Iter8.
Errors may occur during Iter8's metric queries due to a number of reasons (for example, due to an invalid jqExpression
supplied within the metric). If Iter8 encounters errors during its attempt to retrieve metric values, Iter8 will mark the respective metric as unavailable.
-
Iter8 can be used with any provider that can receive an HTTP request and respond with a JSON object containing the metrics information. Documentation requests and contributions (PRs) are welcome for providers not listed here. ↩
-
In a conformance experiment,
n = 1
. In canary and A/B experiments,n = 2
. In A/B/n experiments,n > 2
. ↩↩↩↩