One disadvantage of using TTL based health check is the high network
traffic between Consul agent (either between servers, or between server
and client).
In order for the services considered alive by Consul, microservices must
send an update TTL to Consul every n seconds (currently 30 seconds).
Here is the explanation about TTL check from Consul documentation [1]
Time to Live (TTL) - These checks retain their last known state for a
given TTL. The state of the check must be updated periodically over
the HTTP interface. If an external system fails to update the status
within a given TTL, the check is set to the failed state. This
mechanism, conceptually similar to a dead man's switch, relies on the
application to directly report its health. For example, a healthy app
can periodically PUT a status update to the HTTP endpoint; if the app
fails, the TTL will expire and the health check enters a critical
state. The endpoints used to update health information for a given
check are the pass endpoint and the fail endpoint. TTL checks also
persist their last known status to disk. This allows the Consul agent
to restore the last known status of the check across restarts.
Persisted check status is valid through the end of the TTL from the
time of the last check.
Hint:
TTL checks also persist their last known status to disk. This allows
the Consul agent to restore the last known status of the check
across restarts.
When microservices update the TTL, Consul will write to disk. Writing to
disk means all other slaves need to replicate it, which means master need
to inform other standby Consul to pull the new catalog. Hence, the
increased traffic.
More information about this issue can be viewed at Consul mailing list [2].
[1] https://www.consul.io/docs/agent/checks.html
[2] https://groups.google.com/forum/#!topic/consul-tool/84h7qmCCpjg