Cover V11, I02

Article

feb2002.tar

Deleting Temporary Files Created by Web Sites Using PHP

Bruno Pedro

Many Web sites use temporary files for one reason or another. Sometimes they generate images, send them to the browser, and then remove them. What if the user clicks on the "STOP" button? The script will abort and all those temporary files will become permanent. You could delete them manually, or periodically with crontab, or you could automate this task using PHP. The solution presented in this article takes advantage of PHP's ability to execute a specific task whenever a script terminates.

Traditional Solutions

Manual Removal

Typically, these temporary Web files are saved in a specific directory. Sometimes there is even an algorithm that generates the location and name of these files. The easiest solution is to manually remove the unwanted files. This solution, although easy, is very time consuming. In UNIX, we could delete all files on a given directory in the following way:

$ rm /some_dir/*
Periodic Removal

The second solution, and perhaps the most used one, is the creation of a process that is executed periodically to remove the files. In UNIX, the easiest way to accomplish this goal is with crontab. The following crontab entry illustrates a removal that would execute everyday at 00:00:

# remove all files at /some_dir
00 00 * * * root rm /some_dir/*
However, the chosen timing might not be enough to keep the amount of files and occupied space within a reasonable limit. Furthermore, at any given moment, there might not be files to delete, making the use of resources pointless. It would be great if there were some way to know whether the script was aborted and take the necessary measures.

The PHP Solution

Removal at the End of the Script

In PHP, this functionality is called register_shutdown_function(). Through this function, you can call a specific function anytime the script terminates. There is only one input argument: the name of the function to be executed at the end of the script. That function cannot have input arguments, so it must be carefully designed. As the function is executed upon script termination, there is no way of producing output.

In this way, you can direct the script execution to a specific function whenever it is terminated. This is a good solution when you don't care whether the script was aborted or terminated normally. It is also a good solution when you need to delete a file at the end of the script. As an example, I will analyze a script that first generates a file with a random name and then shows its contents on the browser. To begin, here is the script without the added functionality:

<?php

// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);

// simulate time consumption
for ($i=0;$i<1000000;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
}

// send something to the browser
for ($i=0;$i<100;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
   echo $rand."<br>\n";
}

// delete the file
unlink($fname);

?>
In this case, the file is removed at the end of the script. If, for any reason, the script aborts, the file is not removed. Test this script by clicking on the "STOP" button of your browser some seconds after you start it. To guarantee that the file is removed, we must implement the following code:

<?php

// this is the function responsible for deleting the unwanted file
function cleanexit() {

   // we need to use the variable that holds the file name
   global $fname;

   // delete the file
   // (here we have to use the complete path)
   unlink("/some_dir/".$fname);
}

// this is how we tell PHP to run the cleanexit() 
// function every time the script ends
register_shutdown_function("cleanexit");

// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);

// simulate time consumption
for ($i=0;$i<1000000;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
}

// send something to the browser
for ($i=0;$i<100;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
   echo $rand."<br>\n";
}

?>
The changes are obvious. The code responsible for file removal is inside the cleanexit() function. This function becomes associated to the script termination event through the register_shutdown_function(). Inside the cleanexit() function, there is no way to get environment variables and settings, so you must use the complete path when referring to the file.

Conditional Removal

What if we only want to delete the file when the script is aborted? There is a PHP function that indicates the connection status: connection_status() (obviously). This function has no arguments and returns a bit-mask of the following values: 0 - NORMAL, 1 - ABORTED, 2 - TIMEOUT. The easiest way to use this function is to call it from within the cleanexit() function to check the connection status. This is the ideal solution when you want to distinguish between the normal status (NORMAL) and any other abnormal status (ABORTED or TIMEOUT).

There are two other functions that allow you to do an isolated verification of the connection status: connection_aborted() and connection_timeout(). These functions take no arguments and return TRUE or FALSE depending on the status. Sometimes coding becomes easier to understand when using these functions instead of the previous one. As an example, check the following piece of code:

<?php

// this is the function responsible for deleting the unwanted file
function cleanexit() {

   // we need to use the variable that holds the file name
   global $fname;

   // check if user aborted the connection
   if (connection_aborted())

      // delete the file
      // (here we have to use the complete path)
      unlink("/some_dir/".$fname);
}

// this is how we tell PHP to run the cleanexit() function every 
// time the script ends
register_shutdown_function("cleanexit");

// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);

// simulate time consumption
for ($i=0;$i<1000000;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
}

// send something to the browser
for ($i=0;$i<100;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
   echo $rand."<br>\n";
}

?>
This time the file is removed only if the connection is aborted.

Avoiding the Problem (Instead of Fixing It)

Another way to solve the problem of scripts terminating early and leaving temporary files around is to force the script to execute until the end, even if the user stops it or if it takes too long to finish. This is not the best solution, but I'll present it so you know all the options. The following sections discuss:

  • Ignoring a connection break
  • Changing the execution time limit

Of course, an option that forces the script to execute until the end will only dispose of temporary files if you add the necessary cleanup code at the end of the script.

Ignoring Connection Break

Let's consider the connection break first (what happens when, for instance, the user clicks on the "STOP" button of the browser). We want to ignore that action and keep running until the very end of the script. To do that, we use the ignore_user_abort() PHP function. This function can be used in two different ways: a) without arguments it returns the previous status (whether we are or aren't ignoring the connection break); b) when we want to change the status of this behavior we can use the argument with the values TRUE or FALSE. In the following example, we change the behavior so that the script simply ignores the fact that the user has broken the connection.

<?php

// ignore user abort
ignore_user_abort(TRUE);

// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);

// simulate time consumption
for ($i=0;$i<1000000;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
}

// send something to the browser
for ($i=0;$i<100;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
   echo $rand."<br>\n";
}

// delete the file
unlink($fname);

?>
Even if the user presses the "STOP" button, the script will execute until its last line of code and will remove the file.

Changing the Execution Time Limit

There is yet another important issue: time limitations. By default, PHP allows scripts to execute for 30 seconds, or the value that is in the max_execution_time configuration variable. If a script takes too long to finish, it may terminate before the end because it exceeds the time limitation. To avoid this scenario, you can change that limit by using the set_time_limit() function.

This function can only be executed if PHP is not running on safe mode and should be called at the very beginning of the script. When the function is called, time limit is reset and the script ends when that limit is reached. Note that only script execution time is accounted for. If we call any external scripts or utilities, that spent time is not taken into account.

We could change the limit to an incredibly high value but it still might not be enough. If we don't know exactly how long the script is going to take to finish, we can use the input argument value 0 to indicate an infinite time limit. This is a dangerous choice because the script can run forever and consume precious resources. Careful testing and debugging must be done before using this feature. Consider the following example:

<?php

// ignore user abort
ignore_user_abort(true);

// set time limit to infinite
set_time_limit(0);

// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);

// simulate time consumption
for ($i=0;$i<1000000;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
}

// send something to the browser
for ($i=0;$i<100;$i++) {
   srand ((float) microtime() * 1000000);
   $rand = rand(1000,10000);
   echo $rand."<br>\n";
}

// delete the file
unlink($fname);

?>
Conclusion

There are many ways to deal with temporary files using PHP. Choose the best solution for your specific case, and always consider resources and how you might need to modify the scripts. In the future, I hope to see many commercial Web sites using PHP's abilities, freeing administrators from such tedious tasks such as "deleting temporary files".

Bruno Pedro, co-founder and manager of ethernet Ida., is a Systems Engineer with 10-years experience in database-related applications. He was an early adopter of Linux and has been using open source technology since then. He has been developing applications for the Internet since 1995.