Services sometimes go offline. When this happens, users like to know that the issue has been detected and that engineers are working to resolve the problem. Does Glowforge have a status website? I’m thinking of something like http://status.aws.amazon.com
This question was inspired by the following thread:
No, although we’ve discussed it and it’s in the hopper. I don’t believe that we’ve had an outage since we started production - if we start failing you on that front we’ll need to bump it up the priority list!
We still don’t have any indication of a common cause for the problems a few people saw recently, so if we had a status page, it would have shown clear.
It’s good to hear that this is in the hopper. When users are troubleshooting a service that uses the cloud, it’s helpful to have a page that indicates that the cloud services are healthy, and that the root cause is likely somewhere else.
Our service was experiencing errors for a few hours last week and our customer definitely appreciated both the indication that there was a problem and the note that everything was healthy once the event was resolved.
Oh, and if I can make a suggestion, make the page itself use the same cluster, so if there is some horrible internet segmentation event or something you can see that your GF can’t get to the cloud, or whatever.
Hmm, I’m not entirely sure what you mean by cluster here but you generally want your status page to be as independent as possible of your service so that it can correctly show the service status when the service or service infra is down. The health check should happen between the status site and the service, not between the user and the status site.
The status page itself is generally fed by a combination of internal monitors and heuristics and external geo-distributed service checks.
Established inexpensive services provide all this so it’s not really a big deal to wire up.
Actually we do the exact opposite of what you are suggesting as the ultimate check. I have had numerous IT “solutions” tell me the health of my cloud servers (what I was terming a cluster, since cloud is just someone else’s sever cluster) was “excellent” which was great because it was able to actually contact our servers (and yes the servers were up), while the rest of us were segmented off or whatever; so by having an actual on cluster status app, that connection is tested too.
The advantage is the ultimate “we are down” is you can’t reach anything (as in, I ping health.glowforge.com or whatever and no connection). Yes, by hosting your connection app on someone else’s server off network, you may get the same effect, but I have had numerous cloud vendors suddenly in a pinch move my apps around onto AWS or whatever during a migration, and suddenly I’m back on same network.
I think the use case here is a bit different. For a typical enterprise app, the user connection is fully in support scope and for a consumer application it isn’t really. So for GF some x% of customers (or just one) not being able to reach the service does not mean the service is down.
Also I think for many users being able to concretely see status page that says something like “Glowforge print service is unavailable. Staff are aware and actively working on the problem” is more effective (and will shunt more support requests) than a http 500 or equiv. Also it’s a place to post status updates when the core services are done (not that that would ever happen…)
An interesting thing would be to use all the installed GF’s as telemetry devices and post status to a third party service. This would generate the ‘as the user sees it’ type status I think you are looking for.
I really like the idea of being able to see some anonymized and/or obfuscated data that shows the state of the Glowforge services from the perspective of the Glowforge devices. This would be a nice complement to the “canaries” that Glowforge would build and operate which constantly test the Glowforge web services and update the status page as appropriate.
I trust @Dan and the rest of the the Glowforge engineering team to build the right solution for all of us. Dan and others on the team have plenty of experience working with and for major cloud platforms.