Let's keep this point simple.
We monitor servers, databases, caches, apps and other parts of our ecosystem.We have alerting in place.
People, who are responsible for given part have access to monitoring and alerting.
If possible in a given team, we try to be ready to fix problems 24/7.