I’ve written about installing and using Graphite and it’s a really great tool for measuring lots of kinds of metrics. Most of the guides online don’t touch on the security aspects of this setup, and there was at least one thing that I thought should be worth writing about.
How are we measuring
Metrics we gather from our applications have the current characteristics / requirements:
- We want to gather lots of data over time.
- Any single data-point isn’t significant on its own. Only in aggregate.
- Measuring is important, but not if it slows down our application in any way.
Graphite uses some kind of a stats-collector / listener called Carbon. In a typical scenario Carbon will listen on a TCP port for requests, and clients can reports stats by connecting to it. It will store the stats inside the database (Whisper), which is then used by Graphite to display and query the information.
Given the characteristics above, it’s easy to see why using Carbon to collect our data might not be the ideal choice. Why?
- Carbon requires leaving a TCP connection open. If the carbon server, network connection or anything along the path breaks – it can not only stop gathering monitored data – it can slow down our application.
- Making sure a connection is ‘alive’ requires techniques such as connection-pooling and generally is quite resource-intensive.
- TCP has overhead that might not be necessary and slows things down.
So a fire-and-forget mechanism is much better for this purpose.
This is exactly the problem Statsd is trying to solve. Statsd listens on a UDP port (not TCP), it aggregates/buffers metrics over a short period of time, and then forwards requests to your carbon all at once.
This means that your application is completely decoupled from your monitoring. If any component is down, the app doesn’t need to know about it. It will simply send UDP messages into the void. All you lose is those messages.
No free meals
So what’s the catch? As always, nothing comes for free. So we know we might lose some data, but that’s acceptable. So what else we might compromise on?
We finally get to down to the reason of writing this post in the first place. It’s a really subtle point, but somehow I haven’t seen this mentioned explicitly anywhere else.
Location, Location, Location
The one thing that I never see mentioned is where to place your statsd server in relation to carbon. As programmers we live by the DRY (Don’t Repeat Yourself) principle. If a component is used many times, don’t copy&paste it all over the place. Instead just load and use it once.
So the natural instinct is to place statsd just in front of carbon, which in turn sits in front of our whisper database, on the monitoring server.
We then tell all our apps, running on different servers, to all fire those small UDP messages to our monitoring server. There lies the problem. By its nature, UDP is susceptible to spoofing attacks, which are much harder to do with TCP. That means it’s very easy to create fake statsd requests to your monitoring server, pretending they are coming from the IP address(es) of your apps.
This can completely obscure your real stats, or create a rather easy denial-of-service by overwhelming your monitoring server with fake stats.
So can we use a firewall to block these attacks? Probably not. The only reliable firewall rule would be based on source IP addresses. However, if these are easy to spoof, your firewall isn’t useful.
It can however be very useful if your requests are TCP-based!
So the solution is terribly simple. Place your statsd collector on each server you run your app on, and let it connect to the remote monitoring server using carbon. This way, you can filter which app server is allowed to connect to carbon based on its source IP address. To spoof the source IP with a TCP connection is much harder.
There are very little drawbacks to this solution. Yes, you’d need to deploy more instances of statsd. But statsd is very lightweight anyway and won’t take much resources.
This is far from a huge security hole. I would definitely classify it low when considering the bigger picture. The likelihood of this thing being exploited and the potential gain from exploiting this is in most cases negligible. Nevertheless, it’s something that depends on your exposure/profile and people should be aware of.
I just touched the surface when considering risks to your graphite/carbon/statsd setup. There are of course many other considerations and potential issues. Some are probably worse than spoofing statsd packets. For example, accessing your graphite server (which when installed isn’t even password-protected by default!). I might try to cover some of those aspects on future posts. For now, I felt it’s important to make one point clear about correct placement of your statsd server.