We’re Cameron, Trung and Matt from API Tracker (https://www.apitracker.com). We make tools to help with using third-party APIs in production.
When software teams integrate with APIs they often run into outages, network issues, interface changes or even bugs that cause unexpected behavior in the rest of their system. These problems are hard to predict and prepare for so most teams don’t deal with them until there's a outage and have to do an emergency build to add logging and get to a root cause.
This is what happened to us. Trung and I are both software engineers and we spent a lot of time and energy trying to make our API integrations robust and reliable in production. We found ourselves instrumenting all our API calls so we could know how many calls we were making, how long they were taking and if they were failing. We set up alerts for errors and latency increases and integrated with PagerDuty. We wrote retry logic with exponential backoff. We wrote failover from one API provider to another. At the end of it all we built a lot of tooling that required maintenance and wasn’t even applied uniformly across all of our integrations.
After building all this infrastructure we realized that many other teams are reinventing the same wheel.
To solve this problem we built an API proxy that takes requests and relays them to the API provider. By proxying this traffic we are able to instrument each call to measure latency, record status codes, headers and bodies, and add reliability features like automatic retry with exponential backoff. From there we can monitor and alert on issues and provide a searchable call log for debugging and auditability.
We knew that because we were asking teams to run their mission critical API calls through us that we had to build a highly available and scalable proxy architecture. We’ve done this by designing a proxy that can be distributed across multiple regions and clouds. We are currently running out of AWS. Global Accelerator allows us to use their private internet backbone to quickly get traffic to our proxies which run behind AWS Network Load Balancers. While this can help us ensure resilience against infrastructure outages, we also need to protect against self-inflicted wounds like bugs and bad deployments. Upon release we bring up a new set of proxy instances, deploy the code, and run our full test suite to make sure that each instance is able to proxy requests correctly. Once all instances are healthy they begin to go into the load balancer.
For companies with more stringent needs we support on-premise installations as well as a client-side SDK that can do instrumentation without the proxy.
Today we offer the service as a subscription. We hope to make it easy for teams to get visibility and control across all their integrations without having to build it themselves. This includes:
- Detailed logging on all of their third-party API calls
- Monitoring and alerting for increased latency and error rates
- Reliability features like automatic retry, circuit breaker and request queueing
- Rate limit and quota monitoring
We would love to hear from the community how you are managing your API integrations. Our story is a result of our experiences and how we dealt with them, but we know the HN community has seen it all. We would love to hear from you about problems you’ve had and how you dealt with them. Please leave a comment or send us an email to [email protected] Looking forward to the discussion!