Just to let people know, I edited the post a bit, as some people suggested, and the question was posted on the Azure CDN forum here . The reason I post it also on StackOverflow is to try to reach a wider audience in the hope that people who are faced with the same problem can provide valuable solutions / feedback. As far as I know, there is currently no solution to this problem, but this is something that affects any business that uses CDN to deliver its content. I am open to editing this question further, but I would ask that people do not just vote for this question, because it sounds like "rant", it is not, and I can guarantee that it will affect thousands of enterprises there and cost people thousands of dollars a year, whether they know about it or not.
So here is the situation. Say I'm building an image gallery site and I would like to use the Azure CDN to deliver my content to me. In the Azure backend, the CDN will pull content from the Azure storage account. A CDN is fast and powerful, but it looks like it might be a little unsecured in terms of preventing someone from pulling out content in very large quantities and thus leaving a user with a huge bandwidth score. Let me show you what I mean.
So, last night I decided to write a simple console application that would download a simple image from my future to become a website for the image gallery, in the loop {}, the code below:
namespace RefererSpoofer { class Program { static void Main(string[] args) { HttpWebRequest myHttpWebRequest = null; HttpWebResponse myHttpWebResponse = null; for (int x = 0; x < 1000; x++) { string myUri = "http://myazurecdnendpoint.azureedge.net/mystoragecontainer/DSC00580_1536x1152.jpg"; myHttpWebRequest = (HttpWebRequest) WebRequest.Create(myUri); myHttpWebRequest.Referer = "www.mywebsite.com"; myHttpWebResponse = (HttpWebResponse) myHttpWebRequest.GetResponse(); Stream response = myHttpWebResponse.GetResponseStream(); StreamReader streamReader = new StreamReader(response); Image image = Image.FromStream(streamReader.BaseStream); image.Save(string.Format("D:\\Downloads\\image{0}.Jpeg", x), ImageFormat.Jpeg); myHttpWebResponse.Close(); } Console.ReadKey(); } } }
This console application makes 1000 ultrafast continuous requests to the image file, which is located on my Azure CDN endpoint, and saves them to the "D: \ Downloads" folder on my PC with each file name corresponding to the value for the {} loop iteration, that is image1.jpeg, image2.jpeg, etc.
So what happened? After about 1 minute, I cost 140 MB of bandwidth. If you are a Premium CDN priced at $ 0.17 / GB, give the math together: 0.14 GB * 60 minutes * 24 hours * 30 days * 0.17 cm / GB = $ 1028.16 of the bandwidth cost, only if anyone something (like a competitor) for one request per image for a month to jeopardize my site. I think you guys can see where I am going with this ... there will be thousands of images on my website, in hi-res, by the way, the image that I used in this example was only 140 KB in size. These types of requests may come from anonymous proxies, etc.
So, I have a question: what can be done to prevent someone from abusing the CDN's public open endpoint? Obviously, you canβt delay by paying $ 5,000, $ 20,000 for the bandwidth caused by malicious requests.
Now Azure Premium CDN has an advanced rules engine that can filter Referer based requests and respond with a 403 error if the Referer does not match your site. But, Referer can be falsified, as in the above code example, and CDN still allows you to serve requests (I tested using the Referer spoof). This sucks, many people use Refer to prevent "hotlinking", but in this case bandwidth abuse, what does it matter if Referer can only be faked with a line of code?
A few ideas I had regarding preventing such abuse and huge bandwidth:
* Both solutions will require action from the CDN:
When a request arrives for content on a CDN, the CDN can make a call to the client server by transmitting: a) the user's IP address; b) the requested CDN Uri. And then the client server will check how many times the Uri has been requested from this particular IP address, and if the client logic sees that it was requested, let me say 100 times in the last minute, then obviously this will signal abuse, as browsers cache images, while malicious requests do not. That way, the client machine will simply respond βfalseβ to the content service for this particular request. This would not be an ideal solution, since an additional callback for the client infrastructure may cause a slight delay, but it is definitely better than potentially stuck in the account, which will look like the amount of money that you have accumulated in your bank savings account.
The best solution. Built into a limit for the number of times, a file can be transferred via CDN for a certain period of time, per ip. For example, in the example image file above, if you can configure the CDN to serve no more than allow 50 image requests / IP / over a 10-minute time interval. If abuse is detected, then the CDN may, for a specified time, determined by the client, a) file 403 for the specific abuse uri. or b) a 403 server for all uri if the request comes from an attacker IP address. All time / parameters should be left configurable for the client. This will definitely help. There is no callback here, which saves time. The downside is that the CDN will have to track the Uri / IP address / hit count.
What solutions will NOT work:
The signed URL will not work, because the signature request line parameter will be different every time, and browsers will constantly request data, effectively destroying the browser cache for images.
Having a SAS access signature for azure blob will not work either because: a) the Uri is different every time; b) There are no restrictions on how many times you can request a blob after providing SAS. So a scenario of abuse is still possible.
Checking your logs and just banning IP. Yesterday I tested this type of abuse through an anonymous proxy, and it worked like a charm. Switched IPs in seconds and continued abuse (of my own content) for testing. Therefore, this is not the case if you do not have a nanny to monitor magazines.
Solutions that may work but are not possible:
Filter requests on your web server. Of course, this will be the best way to control the problem and track the number of requests / IP and just do not serve the content when it detects abuse. But then you lose the big benefit of not delivering your content beyond the ultra-fast, CDN-optimized client. Besides the fact that your servers will slow down by supplying a large number of bytes, such as images.
Just bite the bullet and don't worry about it. Well ... then you know that the pothole that drives your wheel is just down the road, so no, it's not a comfortable feeling to go with this option.
With all of the above, the Azure Premium CDN offer with a custom rule engine might offer a solution somewhere out there, but with very poor documentation and no examples, you just had to guess how to protect yourself, so I am writing this post. Has anyone ever addressed such a problem? and how to solve it?
Any suggestions are welcome, I am very upset about this.
Thanks for reading.