Demystifying The Code

Working with Azure Table Storage from PHP

Introduction

Windows Azure Table Storage is a non-relational structured storage system in the cloud that offers massive scalability, durability and high availability.  The service is exposed with a RESTful API.  As such, it is easily consumable from a variety of platforms, including PHP.  In this post, I will illustrate how to consume Azure Table Storage via the RESTful API.  It is important to note that on July 31, 2009 we will reach Milestone 2 on the PHP SDK for Windows Azure.  Milestone 2 focuses on support for Azure Table Storage.  Accordingly, in all of the code you see in this blog post and the accompanying screencasts (Part I and Part II) illustrate accessing Azure Table Storage the “hard way”.  Much of the work I had to do by hand in the accompanying example here will be taken care of for you by the SDK.  That said, let’s dig in…

 

PHP Interop Series

This is part of a small series of blog posts and screencasts that I am doing on interoperability between PHP and the Microsoft stack.  My goal is to illustrate some features of the MS stack that PHP developers can take advantage of, as well as to illustrate how it is done.

 

Watch the Screencasts

I have created 2 screencasts that accompany this blog post that you can watch.  You can view them here:

 

What is Windows Azure Table Storage

As stated in the intro, Windows Azure Table Storage is a non-relational structured storage system in the cloud that offers massive scalability, durability and high availability.  While the goal of this post is not to dig deep into what Azure Table Storage is, I think it is best to at least provide a high-level overview of what it is and why you should care.  Looking at the first sentence in this paragraph, there are 4 major points that need to be covered: 1) non-relational storage system, 2) in the cloud, 3) massive scalability and 4) durability, high availability.

Non-Relational Structured Storage System

It is important to note that while we refer to this storage abstraction as tables, they do not conform to tables in the traditional sense.  To start with, the container for tables is not a database, rather a Storage Account.  You establish this storage account in the Azure Services Developer Portal (see this blog post for full details).  The Storage Account acts as the parent namespace for Table Storage, therefore you will not have any naming collisions with others that have similarly named tables.

A second major difference is that Azure tables do not conform to the traditional rows and columns structure.  Azure tables contain entities.  I think of entities as property bags.  These entities contain 3 or more properties which are typed name-value pairs.  I say 3 or more properties because every entity in Azure Tables must contain a PartitionKey, a RowKey and a TimeStamp property (I will go into a bit more detail on the role of these properties later in this post).  An interesting byproduct of this design is that tables are completely schema-less.  For instance, you are not precluded from adding 2 entities into the same table that have completely different schemas.  I do not want to address the inevitable debate that surfaces whenever this statement comes up, but suffice it to say that it is up to the developer to protect the schema in their tables.  The figure below illustrates how it is structured.

image

A third major difference is that there are no relational capabilities in Windows Azure Tables.  Many people look at these differences (between this structured storage system and a traditional RDBMS system) and focus on what is missing.  The simple reality is that Azure Table Storage is what it is… which is a structured storage system that provides massive scalability.

In the Cloud

Windows Azure is a cloud platform.  The data in Windows Azure Table Storage is stored in Microsoft Data Centers around the world and are accessible via HTTP.

Massive Scalability

The typical means of scaling databases is to scale up – buy bigger and bigger boxes.  However, after a while you run out of either bigger boxes or money.  At that point you have to scale out.  In the database world, this process is know as horizontal partitioning or sharding.  In RDBMS tables, this means that differing rows of data are stored on different physical partitions.  For example, I might store a subset of customers and their orders on one partition and another subset on another (perhaps customers with last names A-M on one and N-Z on another – or perhaps customers in Europe on one, customers in Asia on another …).

You might correctly infer that the more flexible you are in your ability to move data from one partition to another, the more scalability your system will have.  For instance, at one extreme, if all of your data must be on the same partition, you can only scale up (not out) and when you reach that threshold, you can scale no more.  At the other extreme, if you are able to store each record / entity on a differing partition, you have infinite scalability.  Now don’t go getting excited.  When data is stored across different partitions, it is quite costly to query it.  You are forced to query each partition independently.  As a result, you need to take special care to ensure that data that you want to query together remains on the same partition, while still allowing for the scalability required.

As mentioned earlier, every entity in an Azure Table is required to have a PartitionKey property.  This PartitionKey provides you with the control to determine which records must be stored on the same partition.  Windows Azure is free to move Entities with different PartitionKeys to differing nodes when it needs to, for scaling or load balancing purposes.  If you think about it, that is pretty cool.  You don’t have to worry about moving the data about.  Azure will handle that for you.  However, you do need to realize that if you query data across PartionKeys, it will be rather expensive.  It is the classic give-take.

Durability, High Availability

Data in Windows Azure is replicated at least 3 times.  Further, as described above, Windows Azure will move data about for load balancing purposes (as well as for scaling).

 

The REST API

I published a blog post a month or so ago that described ad nauseum the REST API for Windows Azure Table Storage.  I would urge you to take a look at that post, or at least watch the first screencast that accompanies this post.  The understanding of the REST API, as well as the message signing process was key to me to be able to put together the following code.

 

Writing the PHP Implementation

Caveat (again)

I realize I have stated this before, but I want to mention again that on July 31, 2009, we will be releasing Milestone 2 of the PHP SDK for Windows Azure.  This milestone will include code to greatly ease the process of working with Azure Table Storage.  Much of the following code will be unnecessary.  That said, I think it is very cool to illustrate accessing Table Storage from PHP without the SDK.  It illustrates the benefits of exposing an open RESTful API.

Wine.php

In my code, I want to work with classes.  To that end, I first created a Wine class as seen here:

<?php
        class Wine
        {
            public $ShortWineName = "";
            public $WineLabelUrl = "";
            public $BottlePrice = 0.00;
            public $Description = "";
            public $Vintage = "";
            public $WineID = 0;

            public $PartitionKey = "";
            public $RowKey = "";
            public $TimeStamp;

            public function __construct()
            {

            }
        }
?>        
PHPTableStorageHelper.php

This class is a little helper I wrote.  It performs the following functions: 1) Calculates the appropriate URI for my Table (and page) 2) Signs the message, 3) Creates the HTTP Headers for the Request (including the Authorization header containing the signed message, 4) Issues the Request using CURL, 5) Parses the Response HTTP Headers and 6) Returns the Reponse Body.

GetResponse Method Signature
    public static function GetResponse($account_name,
             $shared_key,
            $table_name,
            $pageSize,
            &$nextPartition,
            &$nextRow){

        //Get the base uri
        $service_uri = self::getUri($account_name, $table_name);

        //Limit the results by the pageSize
        $query = '';
        if ($pageSize > 0)
            $query = '?$top='.$pageSize;

        //Add a filter for paging (if the partition and row were passed
        if (strlen($nextPartition) > 0 && strlen($nextRow) > 0)
        {
            //See if we have already created a query
            if (strlen($query) > 0)
                $query .= '&';
            else
                $query .= '?';

            //Add the NextPartitionKey and NextRowKey QueryString parameters
            $query .= 'NextPartitionKey'.'='.rawurlencode($nextPartition);
            $query .= '&NextRowKey'.'='.rawurlencode($nextRow);
        }

        $service_uri .= $query;

        //Get the date (GMT)
        $date = self::GetGMTDate();

        //Get the signature
        $signed_message = self::GetSignature($date, $account_name, $table_name, $shared_key);

        //Get the http headers array
        $headers = self::GetHttpHeaders($date, $signed_message);

        $session = curl_init($service_uri);
        curl_setopt($session, CURLOPT_HEADER, 1);

        curl_setopt($session, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
        $response = curl_exec($session)
            or die('Curl error: ' . curl_error($session));

        //Split the response into headers and the actual response body
        list( $outheaders, $body )= explode( "\r\n\r\n", $response, 2 );

        //Split the outheaders into an array and set the ref arguments
        $outheadersArray = self::http_parse_headers($outheaders);
        if (array_key_exists('x-ms-continuation-NextPartitionKey', $outheadersArray))
            $nextPartition = $outheadersArray['x-ms-continuation-NextPartitionKey'];
        else
            $nextPartition = "";
        if (array_key_exists('x-ms-continuation-NextRowKey', $outheadersArray))
            $nextRow = $outheadersArray['x-ms-continuation-NextRowKey'];
        else
            $nextRow = "";

        curl_close($session);

        return $body;
    }

The core of this helper class is the GetResponse method.  This method takes in the account name containing the table to query, the table name, the shared key for use in signing the method and some arguments used for paging.  I realize that this method doesn’t allow you to really query a table, rather just return pages of records, but that is ok for the purposes of this post.

Calculate the appropriate URI

The class declares a constant used as a template for the URI to our table.  It is as follows:

    const uriTemplate = "http://%s.table.core.windows.net/%s";

The first placeholder (%s) in the template is replaced with the account name, while the second is replaced with the table.  The first line of code in the method simply does this replacement:

    $service_uri = self::getUri($account_name, $table_name);

The next bit of code simply adds a few querystring parameters for paging:

    //Limit the results by the pageSize
    $query = '';
    if ($pageSize > 0)
        $query = '?$top='.$pageSize;

    //Add a filter for paging (if the partition and row were passed
    if (strlen($nextPartition) > 0 && strlen($nextRow) > 0)
    {
        //See if we have already created a query
        if (strlen($query) > 0)
            $query .= '&';
        else
            $query .= '?';

        //Add the NextPartitionKey and NextRowKey QueryString parameters
        $query .= 'NextPartitionKey'.'='.rawurlencode($nextPartition);
        $query .= '&NextRowKey'.'='.rawurlencode($nextRow);
    }

    $service_uri .= $query;
Sign the message

The next 2 lines of code get the signed message that is to be used for the Authorization HTTP header.

    //Get the date (GMT)
    $date = self::GetGMTDate();

    //Get the signature
    $signed_message = self::GetSignature($date, $account_name, $table_name, $shared_key);

Firstly, we get the current date in GMT format.  Next we call a helper method that creates a string to sign and signs it with the private key.  Here is that helper method:

    public static function GetSignature($date, $account_name, $table_name, $key){

        $message_to_sign = "$date\n/$account_name/$table_name";
        $signed_message = base64_encode(hash_hmac('sha256',
                                                  $message_to_sign,
                                                  base64_decode($key),true));

        return "SharedKeyLite $account_name:".$signed_message;
    }
Create the HTTP Headers

The next line after returning the signature simply calls a helper method to return an array containing the HTTP headers:

    //Get the http headers array
    $headers = self::GetHttpHeaders($date, $signed_message);

Here is the helper method:

    public static function GetHttpHeaders($date, $signed_message){
            return array("x-ms-date:$date",
                         "Authorization:$signed_message",
                         "Accept:application/atom+xml,
                         application/xml");
    }
Issue the Request

The next few lines use CURL to issue the HTTP GET Request to the URI we calculated, passing the HTTP headers

    $session = curl_init($service_uri);
    curl_setopt($session, CURLOPT_HEADER, 1);

    curl_setopt($session, CURLOPT_HTTPHEADER, $headers);
    curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
    $response = curl_exec($session)
        or die('Curl error: ' . curl_error($session));
Parse the Response HTTP Headers

The next few lines parse the HTTP Headers from the response.  CURL doesn’t give you access to an array of headers.  You have to do this yourself.  We have to in order to parse out the continuation headers that are returned.  These provide a pointer to us for paging.  Remember earlier when we added the querystring parameters?  We used the values passed here from a previous Response.

    //Split the response into headers and the actual response body
    list( $outheaders, $body )= explode( "\r\n\r\n", $response, 2 );

    //Split the outheaders into an array and set the ref arguments
    $outheadersArray = self::http_parse_headers($outheaders);
    if (array_key_exists('x-ms-continuation-NextPartitionKey', $outheadersArray))
        $nextPartition = $outheadersArray['x-ms-continuation-NextPartitionKey'];
    else
        $nextPartition = "";
    if (array_key_exists('x-ms-continuation-NextRowKey', $outheadersArray))
        $nextRow = $outheadersArray['x-ms-continuation-NextRowKey'];
    else
        $nextRow = "";

Here is the helper method http_parse_headers:

    static function http_parse_headers($headers=false){
        if($headers === false){
            return false;
        }

        $headers = str_replace("\r", "", $headers);
        $headers = explode("\n",$headers);

        foreach($headers as $value){
            $header = explode(": ", $value);

            if(count($header) == 1){
                $headerdata['status'] = $header[0];
            }
            elseif($header[0] && $header[1]){
                $headerdata[$header[0]] = $header[1];
            }
        }

        return $headerdata;
    }
Return the Response Body

The last 2 lines simply clean up and return the response body.

    curl_close($session);

   
    return $body;

WineContext.php

This class calls our helper class above, parses the ATOM Formatted response and returns a collection of the Wine class.  I’m not going to go into any detail, as it is pretty straightforward:

    public static function GetWinePage($table_name, $pageSize, &$nextPartition, &$nextRow)
    {
        include_once('config.php');

        $wines = array();
        $response = PHPTableStorageHelper::GetResponse($accountName, $sharedKey, $table_name, 
	$pageSize, $nextPartition, $nextRow);

        //Load the xml into simplexml and register the namespaces
        //referenced in the xml
        $dom = new SimpleXMLElement($response);

        //Create vars for the namespaces.
        //Calls to children takes the namespace (if exists)
        $metadataNS = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata";
        $dataservicesNS = "http://schemas.microsoft.com/ado/2007/08/dataservices";

        //Iterate over the entity nodes
        foreach($dom->entry as $entry) {
            //Get a reference to the properties node
            $properties = $entry->content->children($metadataNS)->children($dataservicesNS);

            //Create the wine
            $wine = new Wine();
            $wine->WineID = (int)($properties->WineID);
            $wine->ShortWineName = (string)($properties->ShortWineName);
            $wine->Description = (string)($properties->Description);
            $wine->BottlePrice = (float)($properties->BottlePrice);
            $wine->Vintage = (string)($properties->Vintage);
            $wine->WineLabelUrl =(string)($properties->WineLabelUrl);

            //Add the wine to the array
            $wines[] = $wine;
        }

        return $wines;
    }

Catalog.php

Now on to consuming our classes.  I’m not going to get into all of the paging logic and UI code.  At the end of this post is a link where you can download all of the files.  The paging code should be pretty self-explanatory.  I do want to show you the code that calls our context and writes the HTML, though.  Firstly, I included the context and the wine classes:

<?php
    include('Wine.php');
    include('WineContext.php');
?>

Next, at the appropriate place in the HTML, I call our context and return the array of wines.  I then iterate over the results, outputting the appropriate HTML.

<?php

 

  //Get the data
  $table_name="Wines";

  $wines=CohoStorageData::GetWinePage($table_name,2,$nextPartition,$nextRow);

  //Add the next row returned from the service to the array   $nextRowArray[] =$nextRow;

  //Iterate over the results   foreach ($wines as $wine)   {


        echo'<tr>';

      echo   '<td>';

      echo      '<table width="100%" border="0" align="center" cellpadding="8" cellspacing="0" class="thinblueline" style="background-color: White;">';

      echo           '<tr>';

      echo               '<td valign="top">';

      echo                   '<img src="'.$wine->WineLabelUrl .'" />';

      echo               '</td>';

      echo               '<td valign="top">';

      echo                   '<strong>';

      echo                        $wine->ShortWineName;

      echo                   '</strong>';

      echo                   '<br />';

      echo                   '<span>';

      echo                        $wine->Description;

      echo                   '</span>';

      echo                   '<br />';

      echo                   '<br />';

      echo               '</td>';

      echo               '<td width="80" align="right" valign="top">';

      echo                   '<strong>';

      echo                       '$'.number_format($wine->BottlePrice,2,'.',' ');

      echo                   '</strong>';

      echo                   '<br />';

      echo                   '<br />';

      echo                   '<img id="ImageButton1" productid="'.$wine->WineID .'" cssclass="noUnderline" onclick="ItemSelected" src="images/selectbutton.gif" />';

      echo               '</td>';

      echo           '</tr>';

      echo       '</table>';

      echo     '</td>';

      echo'</tr>';                                                }
?>

Get the code

You can download the code here.  Don’t forget to add the appropriate account name and key in config.php.  Also, this code has no provisions for creating the tables or adding the records.  If you want direction on that, please follow this blog series.  It will illustrate how to do it (in .NET not PHP).  Otherwise, you can wait until July 31 (or shortly thereafter) and I will illustrate these features in the SDK for PHP.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!

Demystifying The Code