REST in WCF – Part X – Supporting Caching and Conditional GET
So far in this series (click here for an index of the complete series, as well as supporting screencasts), I have illustrated how to develop both a LO-REST, AJAX-Friendly service, as well as HI-REST services adhering to the unified API of HTTP. In the very first post, I touched on some aspects of REST, but I haven’t spent much time on the benefits of following a RESTful architectural style. I made mention of the fact that RESTful services follow the "way of the web". As it turns out, this proves to be quite powerful.
The first 2 sentences of Section 13 of the HTTP 1.1 Specification highlight this point quite well: "HTTP is typically used for distributed information systems, where performance can be improved by the use of response caches. The HTTP/1.1 protocol includes a number of elements intended to make caching work as well as possible." What can be gleaned from this is that HTTP, the underlying protocol used by the web, is explicit about how to support caching responses. Clients (such as browsers), proxies and web servers all participate in caching responses, providing the scalability required by applications running on the web. RESTful service architectures seek to take advantage this infrastructure for their services.
Why Not SOAP?
You might be thinking to yourself: traditional web services run over HTTP, wouldn’t they take advantage of this caching infrastructure, as well? The answer lies in how SOAP was designed. The authors of SOAP had more than HTTP in mind when they built the specification. One of the goals of SOAP was to be able to pass SOAP messages over differing protocols. That dream is alive today… just take a look at WCF services running over TCP, Named Pipes or MSMQ. In order to make this work, SOAP acts in a protocol independent manner. Take the following SOAP service using the basicHttpBinding:
I used SvcUtil to create a client proxy for this service in a console application and called SayHello. Below you will see a screenshot of Fiddler, showing the HTTP traffic for this call:
I underlined some key aspects of this call. First of all notice that, although our intent is to fetch some information, we have issued an HTTP POST. More about that later. How did I know what our intention was? I had to look in the body of the message. I underlined where the method name is embedded in the body of the message. In SOAP services, the intention of the call is the method. Lastly, notice that scope of what I want to fetch is also embedded in the message. Is this a bad thing? Not at all. Embedding this information in the message makes SOAP protocol independent. The same message could be sent over TCP and work all the same. However, that very nature is what precludes this SOAP service from easily taking advantage of the webs caching infrastructure. Keep this information in mind while you read the next few sections and it should become clear.
GET (and HEAD) is/are special in HTTP
HTTP GET and HEAD are considered special. The HTTP specification is explicit that unless the server prohibits caching, GET and HEAD shouldn’t have any side effects that would adversely effect the caching of the response (GETs and HEADs with querystring parameters are an exception). In other words, these two verbs are set up for caching. Section 9.1.1 of the HTTP 1.1 Specification states "In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested."
Section 9 of the HTTP 1.1 Specification further gives credence that GET is special with regards to caching. Within this section it is clear that responses to PUT and DELETE are not cacheable. Further, by default, responses to POST are not cacheable. However, the response to an HTTP GET is cacheable as long as it meets the caching requirements outlined in Section 13 of the HTTP 1.1 Specification.
Supporting Client Caching with a REST Service
As you may have gathered from the previous sections, caching is prevalent on the web. When you are authoring RESTful services, you need to be mindful of this. It is in your best interest to take advantage of caching, where possible, while further providing the clients to your service the appropriate hints on whether or not to cache and for how long. In this example, I will use IE as my client, as it is both prevalent and exhibits typical behavior. I will start with illustrating what the behavior is if you do not provide any caching information.
Below is the default implementation of my GetWine service.
As you can see, there are no cache-related HTTP headers set in the implementation. I am again going to use Fiddler to view the traffic over HTTP, as I issue 3 consecutive GET requests to this service from IE. Below are screenshots of IE, as well as Fiddler after 3 requests:
As you can see, only one request went to the server. Given no other information from the original response, IE cached it. The 2 subsequent requests were sourced from this cache. Below, I have included the raw request and response below for reference:
In certain cases, you may want to direct your client not to cache any responses. For example, the data may be updated very often or the non-tolerance for stale data may preclude you from allowing the caching of responses. In this case, you need only add one or two HTTP headers to let your clients know not to cache. The first header to set is the Cache-Control to no-cache. In order cover your bases, it is probably best to set a second header, Pragma to no-cache, as well. This will cover you for HTTP 1.0 caches that do not implement Cache-Control.
WCF offers a helper class called WebOperationContext that provides convenient access to the properties of the response (and the request, as well). You can use this class to set the Cache-Control and Pragma HTTP headers. See the updated implementation code below:
I used IE to again make 3 successive calls. Below you can see the Fiddler screenshots:
You will notice that this time all 3 requests were sent to the server. The response from the original (and each subsequent) call shows the Cache-Control and Pragma headers. The IE client rightfully abided by these directives and did not cache the response.
In the previous code, we told the client not to cache. While in certain cases this is the right thing to do, it does adversely effect the scalability of our solution. Perhaps we want to let our clients cache the response, but we want to give them a hint of how long the cache is likely to be good for. We can do this by setting an Expires header to a date and time. A date and time in the past essentially invalidates the cache. A date and time in the future provides guidance to the client that their cache is likely good until that date and time.
In the code below I set the Expires HTTP header to a time of 10 seconds in the future (GetFormattedCacheTime is simply a helper method I wrote):
In this demo, I make 3 successive calls within 10 seconds. I then wait 12 seconds and make another 2 successive calls. Here are the Fiddler screenshots:
You will notice that, although I made 6 calls, only 2 made it to the server and 4 were sourced from cache. I made 3 calls before the cache expired. The fourth call went to the server and re-established the cache for 10 more seconds.
You might have noticed the use of some traditionally weak words in the description of the previous example. I noted that we want to view the client a "a hint of how long the cache is likely to be good for". The key words here are hint and likely. We are not guaranteeing that the cache will be good. Rather, we are stating that it will likely be good. The only way to ensure that the cache is good is to make a call to the server and check. However, if we make a call to the server, what is the benefit of the cache? We can look to some of the verbage in Section 13 of the HTTP 1.1 Specification for the answer. The second paragraph begins with: "Caching would be useless if it did not significantly improve performance. The goal of caching in HTTP/1.1 is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases."
As you might have gathered, a safer, albeit less performant, solution is to issue what is known as a Conditional GET. In the case where the client cannot tolerate a stale cache and a "hint" is not good enough, they may issue a request to the server, if the cache is still fresh, the server will send a 304 "Not Modified" response with no entity body. The client can then safely source the data from it’s cache.
Another scenario is the case where the client abides by the caching hint, but the cache is expired. As opposed to simply throwing away the cache, the client can issue a Conditional GET to see if the cache is still good.
I am going to discuss a Conditional GET from the perspective of authoring a service that supports Conditional GET. There are 2 HTTP headers that are commonly used for Conditional GETS: If-None-Match and If-Modified-Since. As a service author, if you want to support Conditional GET, you should support both of these headers.
The If-None-Match header set by the client contains an entity tag that the server can compare against it’s entity tag to see if the data has changed. Essentially, an entity tag represents the state of the entity at a point in time. Some folks use a hash of the data as their entity tag, while others maintain a GUID for each state of the data. In the second case, every time the data is updated a new GUID is generated to represent that state. The server checks the provided If-None-Match header value against it’s entity tag and if they are the same, it sends a 304 "Not Modified" with no entity body. If there is no match, the server sends the full response (including the entity body).
The If-Modified-Since header set by the client contains the date that the data in the client cache was last updated. The server checks this date against the date it has for when the data was last modified. If the data has not been modified since the date the server is provided, the server sends a 304 "Not Modified" with no entity body. If the data has been modified since that date, the server sends the full response (including the entity body).
You are now probably asking yourself where the client gets the data it sends in the If-Modified-Since and and If-None-Match headers. Before I answer that, one thing needs to be clear: a Conditional GET can only be called if a previous request was made. This makes sense in that we are calling to the server to see if our cache is valid. We would have to have received a response previously that we could cache. With that said, the client headers are sourced from the server’s response to the previous request. The client would use the ETag header from the response for the If-None-Match header and the Last-Modified for the If-Modified-Since header.
The following is a step-by-step on how the whole process works:
- The client makes the original request to the server with no special HTTP headers
- The server sends a 200 response code along with either an ETag and a Last-Modified HTTP header
- The client makes a subsequent request to the server, passing the value from the ETag as an If-None-Match header and the value from the Last-Modified for the If-Modified-Since header.
- The server checks to see if the data has been modified since the date provided in the If-Modified-Since header. If it has, the server sends the full response, complete with the entity body.
- The server checks to see if the If-None-Match header matches it’s entity tag. If it does not match, the server sends the full response, complete with the entity body. If it does match, the server sends a 304 "Not Modified" status code to the client and suppresses the entity body. The client is thus notified that it can source the data from it’s cache.
Let’s take a look at how we can support this with code. Here is the updated code:
As you can see, for every request, the service sets an ETag and Last-Modified header. You will also notice that the service calls 2 helper methods to check the headers against the server data. Here are the helper methods:
The helper methods simply call the SupressEntityBody method on the OutgoingWebResponseContext and set the HTTP status code to 304 "Not Modified" if the ETags do not match or the data has been updated since the date provided in the HTTP header.
You may be wondering how I implemented the ETag and LastModifiedDate for the Wine entity in the first place. What I chose to do is to maintain them as columns within the database table. I chose to use a trigger to maintain them. Essentially if a record is inserted the ETag and LastModifiedDate are set. If a record is updated, a new ETag is generated and the LastModifiedDate is reset. Here is a screenshot of the table and the trigger:
Let’s run our same example again, using Fiddler to illustrate the traffic. I will again make 3 successive calls, wait 12 seconds and make 3 more successive calls. Remember that I still have the Cache-Control header set to 10 seconds in the future. Here is the HTTP traffic:
The first request and response looked like this:
Notice that the response to the original request includes a Last-Modified header and an ETag. After the cache went stale (the 12 second wait), the request and response looked like this:
Notice that the If-Modified-Since and If-None-Match headers were sent. If you look at the previous response, you will note that the values match between the ETag and If-None-Match and between the Last-Modified and the If-Modified-Since. Because the data was not modified, the dates matched, as well as the entity tags. The service then rightfully returned a 304 and suppressed the entity body. IE then was free to source the data from it’s cache. Also note that the cache was then re-established fresh for another 10 seconds.
I hope this post was helpful in understanding how and why to support caching and Conditional GETs in your RESTful services with WCF. It is important to also note that you can take advantage of the ASP.NET caching infrastructure from your WCF service. You simply need to turn on Asp.NET compatibility mode. Specifically all the caching specific APIs at HttpContext.Current.Response can be used from your WCF service.