Programming

Compare two XML files and output the difference with PHP

How to compare two XML files and email yourself the difference

Estimated reading time: 7 minutes

Sometimes product data feeds work perfectly, but most of the time they don’t. In this post we will discuss how you can compare two XML files, an old and new, then output the difference between to two files. We will finally email the results to ourselves at 8 am every day.

Possible use case: You run an online store and need to use a data feed from a supplier. The data feed is in XML format and removes the product from the feed when it goes out of stock. This becomes problematic when you have different feeds running from different suppliers. You don’t want to remove every product on your website that does not appear in the feed, but you need to know which products from that supplier have been taken out of stock.

In this posts I will be using new.xml to refer to the newest XML file and olx.xml to refer to the oldest XML file.

Workflow
Step 1: Delete old.xml
Step 2: Rename new.xml to old.xml
Step 3: Get the latest XML file from the supplier and call it new.xml
Step 4: Make an array from old.xml with a unique identifier, usually the product barcode or SKU.
Step 5: Make an array from new.xml with the same unique identifier we used in the step above.
Step 6: Loop through the old product array and see if the unique identier exists in the array of new products. If its not, the product must have been taken out of stock.
Step 7: Loop through the new products array and see if its in the array of old products. If its not, the product must be new in stock.
Step 8: Send yourself an email with the out of stock products and the products that are new in stock.

The first couple of stocks will be done with a cron job and so we will start at step 4.

Make an array of products from old.xml

<?php
if (file_exists('old.xml')) {
    $oldxml = simplexml_load_file('old.xml');
    //Load all the old product IDs into an array
    $oldproductsarray = array();
    foreach($oldxml->Product as $oldp){
        $productdata = array(
        'id'=>(int)$oldp->Id
        );
        $oldproductsarray[] = $productdata;
    }
}else{
    die("Old XML does not exist");
}
?>

The above PHP code will first check to see if the file exists. If itdoes exist, it will load the file into a simplexml object called $oldxml, else the script will die. The multidimensional $oldproductsarray loads the id element from XML file, with ‘id’ as a key. This part will need to be changed to suit your XML file. Make sure you get all the data that you will need to use in your script.

Make an array of products from new.xml

<?php
if (file_exists('new.xml')){
    $newxml = simplexml_load_file('new.xml');
    //Load the new product IDs into an array
    $newproductsarray = array();
    foreach ($newxml->Product as $newprod){
        $newproductsarray[] = (int)$newprod->Id;
    }
}else{
    die("New XML file does not exist");
}
?>

The above script will load the product id from each product in new.xml and put it into an array with id as the key.

Define a Bootstrap table 

<div class="table-responsive">
    <table class="table table-bordered">
        <tr>
            <th>Status</th>
            <th>Title</th>
            <th>Barcode</th>
        </tr>

We want to be able to display the data for testing purposes, making sure we get everything right before sending email. A HTML table is the perfect way to display the data and Bootstrap makes it look pretty. The above snippet of HTML will define a Bootstrap table with some headings.

Check to see if products are out of stock

//Loop though the old products
<?php
            foreach($oldxml->Product as $oldproduct) {
                //Add the current product's id to $x
                $x = $oldproduct->Id;
                //Test to see if the current product's Id is the array of new products
                //if it is not then it must be out of stock
                if(!(in_array($x, $newproductsarray))) {
                    echo '<tr class="danger">';
                    echo '<td>Out of stock</td>';
                    echo '<td>' . $oldproduct->Name . '</td>';
                    echo '<td>' . $oldproduct->Barcode . '</td>';
                    echo '</tr>';
                }
            }
            unset($oldproduct);//End the loop
?>

In the above PHP we loop through the $oldxml object that we created earlier and define a variable called $x which holds the product’s id for the current iteration of the loop. We then test to see if the value of $x does not exist in the array of new products. If $x doesn’t exist, the supplier must have taken that product out of the XML file, therefore its out of stock. We then make a new row for the product and echo out some data that will help us identify the products.

Check to see if products are back in stock
This snippet of code is similar to the above code, but in reverse.

//Loop through new products
<?php
            foreach($newxml->Product as $newproduct){
                //Add the current product's Id to $y
                $y = $newproduct->Id;
                //Test to see if the current product's Id is in the array of old products
                //if it is not it must be back in stock
                if(!(in_array($y, $oldproductsarray))) {
                    echo '<tr class="success">';
                    echo '<td>In stock</td>';
                    echo '<td>' . $newproduct->Name . '</td>';
                    echo '<td>' . $newproduct->Barcode . '</td>';
                    echo '</tr>';
                }
            }
            unset($newproduct);//End loop
?>

Make a proper HTML document
Our PHP file doesn’t have its HTML head or body definition yet, so lets add that.

<html>
<head>
    <title>Feeds = $$$$</title>
    <!-- Latest compiled and minified CSS -->
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
    <!-- Optional theme -->
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap-theme.min.css">
</head>
<body>
<div class="container-fluid">

Make sure you have put your PHP tags around all PHP code, or you will just output the code and not the results.

Test your output
Test the output of the script to make sure you are getting the right results. You should see a table that first outputs all the out of stock products in rows that have the ‘danger’ class, making them red. Then, all the products that are new in stock will e displayed in rows that have the ‘success’ class, making them green.

Send yourself an email with updates
We will now send an email to ourself with updates on the products. We can send an email every day at a certain time, or we can send it once a week. How often you send the email will depend on how much time you have to check products on your website.

<?php
//Define the recipient of the email
$to = "[email protected]";
//Define the email subject
$subject = "Your email subject";
$message = '<html><body>';
$message .= '<div class="table-responsive">
    <table style="width:100%;max-width:100%;">
        <tr>
            <th>Status</th>
            <th>Title</th>
            <th>Barcode</th>
        </tr>';
foreach($oldxml->Product as $oldproduct) {
    //Add the current product's id to $x
    $x = $oldproduct->Id;
    //Test to see if the current product's Id is the array of new products
    //if it is not then it must be out of stock
    if(!(in_array($x, $newproductsarray))) {
        $message .= '<tr style="background-color: #f2dede;">';
        $message .= '<td>Out of stock</td>';
        $message .= '<td>' . $oldproduct->Name . '</td>';
        $message .= '<td>' . $oldproduct->Barcode . '</td>';
        $message .= '</tr>';
    }
}
            unset($oldproduct);//End the loop

                        //Loop through new products
            foreach($newxml->Product as $newproduct){
                //Add the current product's Id to $y
                $y = $newproduct->Id;
                //Test to see if the current product's Id is in the array of old products
                //if it is not it must be back in stock
                if(!(in_array($y, $oldproductsarray))) {
                    $message .= '<tr style="background-color: #dff0d8;">';
                    $message .= '<td>In stock</td>';
                    $message .= '<td>' . $newproduct->Name . '</td>';
                    $message .= '<td>' . $newproduct->Barcode . '</td>';
                    $message .= '</tr>';
                }
            }
            unset($newproduct);//End loop
$message .= '</table>';
$message .= '</html></body>';

//Define some headers for the email
$headers = "MIME-Version: 1.0" . "\r\n";
$headers .= "Content-type:text/html;charset=UTF-8" . "\r\n";
$headers .= 'From: <[email protected]>' . "\r\n";

//Send and test if the email sent
if(mail($to,$subject,$message,$headers)){
    echo "Email sent";
}else{
    echo "Email could not be sent";
}

?>

You will need to modify the above PHP so that it matches your arrays and XML files. If you have questions about what you will need to modify, please ask in the questions below. Remember that I will need to information about your XML file so please do not just ask “How do I modify the email?”.

Automating the process

We don’t really want to have to move files around, get new files and load the web page every single day, we need it to send us an email without any more work from us. For this step you will need to have shell access to your web server and know how to use Linux from the command line.

Create a script
Make a new file called update.script in the same folder that your project files are located.

rm old.xml
mv new.xml old.xml
wget http://dta-feed-URL.com
php5 ./index.php

The above script will first remove old.xml and then rename new.xml to old.xml before getting the new XML file and calling it new.xml. Finally, the script will execute your PHP file that does all the magic.

To hit this script at our scheduled time we will make a cron job  that looks something like this:

0 8 * * 1-5 /path-to-your-script/update.script

To make sure you get your crontab right, you can use the Crontab Generator.

You will now have a file that compares two XML files and sends you an email with the difference between the two files.

[et_bloom_inline optin_id=”optin_2″]

Top