Deleting
Temporary Files Created by Web Sites Using PHP
Bruno Pedro
Many Web sites use temporary files for one reason or another.
Sometimes they generate images, send them to the browser, and then
remove them. What if the user clicks on the "STOP" button?
The script will abort and all those temporary files will become
permanent. You could delete them manually, or periodically with
crontab, or you could automate this task using PHP. The solution
presented in this article takes advantage of PHP's ability
to execute a specific task whenever a script terminates.
Traditional Solutions
Manual Removal
Typically, these temporary Web files are saved in a specific directory.
Sometimes there is even an algorithm that generates the location
and name of these files. The easiest solution is to manually remove
the unwanted files. This solution, although easy, is very time consuming.
In UNIX, we could delete all files on a given directory in the following
way:
$ rm /some_dir/*
Periodic Removal
The second solution, and perhaps the most used one, is the creation
of a process that is executed periodically to remove the files.
In UNIX, the easiest way to accomplish this goal is with crontab.
The following crontab entry illustrates a removal that would execute
everyday at 00:00:
# remove all files at /some_dir
00 00 * * * root rm /some_dir/*
However, the chosen timing might not be enough to keep the amount
of files and occupied space within a reasonable limit. Furthermore,
at any given moment, there might not be files to delete, making the
use of resources pointless. It would be great if there were some way
to know whether the script was aborted and take the necessary measures.
The PHP Solution
Removal at the End of the Script
In PHP, this functionality is called register_shutdown_function().
Through this function, you can call a specific function anytime
the script terminates. There is only one input argument: the name
of the function to be executed at the end of the script. That function
cannot have input arguments, so it must be carefully designed. As
the function is executed upon script termination, there is no way
of producing output.
In this way, you can direct the script execution to a specific
function whenever it is terminated. This is a good solution when
you don't care whether the script was aborted or terminated
normally. It is also a good solution when you need to delete a file
at the end of the script. As an example, I will analyze a script
that first generates a file with a random name and then shows its
contents on the browser. To begin, here is the script without the
added functionality:
<?php
// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);
// simulate time consumption
for ($i=0;$i<1000000;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
}
// send something to the browser
for ($i=0;$i<100;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
echo $rand."<br>\n";
}
// delete the file
unlink($fname);
?>
In this case, the file is removed at the end of the script. If, for
any reason, the script aborts, the file is not removed. Test this
script by clicking on the "STOP" button of your browser
some seconds after you start it. To guarantee that the file is removed,
we must implement the following code:
<?php
// this is the function responsible for deleting the unwanted file
function cleanexit() {
// we need to use the variable that holds the file name
global $fname;
// delete the file
// (here we have to use the complete path)
unlink("/some_dir/".$fname);
}
// this is how we tell PHP to run the cleanexit()
// function every time the script ends
register_shutdown_function("cleanexit");
// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);
// simulate time consumption
for ($i=0;$i<1000000;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
}
// send something to the browser
for ($i=0;$i<100;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
echo $rand."<br>\n";
}
?>
The changes are obvious. The code responsible for file removal is
inside the cleanexit() function. This function becomes associated
to the script termination event through the register_shutdown_function().
Inside the cleanexit() function, there is no way to get environment
variables and settings, so you must use the complete path when referring
to the file.
Conditional Removal
What if we only want to delete the file when the script is aborted?
There is a PHP function that indicates the connection status: connection_status()
(obviously). This function has no arguments and returns a bit-mask
of the following values: 0 - NORMAL, 1 - ABORTED, 2 - TIMEOUT. The
easiest way to use this function is to call it from within the cleanexit()
function to check the connection status. This is the ideal solution
when you want to distinguish between the normal status (NORMAL)
and any other abnormal status (ABORTED or TIMEOUT).
There are two other functions that allow you to do an isolated
verification of the connection status: connection_aborted()
and connection_timeout(). These functions take no arguments
and return TRUE or FALSE depending on the status. Sometimes coding
becomes easier to understand when using these functions instead
of the previous one. As an example, check the following piece of
code:
<?php
// this is the function responsible for deleting the unwanted file
function cleanexit() {
// we need to use the variable that holds the file name
global $fname;
// check if user aborted the connection
if (connection_aborted())
// delete the file
// (here we have to use the complete path)
unlink("/some_dir/".$fname);
}
// this is how we tell PHP to run the cleanexit() function every
// time the script ends
register_shutdown_function("cleanexit");
// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);
// simulate time consumption
for ($i=0;$i<1000000;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
}
// send something to the browser
for ($i=0;$i<100;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
echo $rand."<br>\n";
}
?>
This time the file is removed only if the connection is aborted.
Avoiding the Problem (Instead of Fixing It)
Another way to solve the problem of scripts terminating early
and leaving temporary files around is to force the script to execute
until the end, even if the user stops it or if it takes too long
to finish. This is not the best solution, but I'll present
it so you know all the options. The following sections discuss:
- Ignoring a connection break
- Changing the execution time limit
Of course, an option that forces the script to execute until the
end will only dispose of temporary files if you add the necessary
cleanup code at the end of the script.
Ignoring Connection Break
Let's consider the connection break first (what happens when,
for instance, the user clicks on the "STOP" button of
the browser). We want to ignore that action and keep running until
the very end of the script. To do that, we use the ignore_user_abort()
PHP function. This function can be used in two different ways: a)
without arguments it returns the previous status (whether we are
or aren't ignoring the connection break); b) when we want to
change the status of this behavior we can use the argument with
the values TRUE or FALSE. In the following example, we change the
behavior so that the script simply ignores the fact that the user
has broken the connection.
<?php
// ignore user abort
ignore_user_abort(TRUE);
// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);
// simulate time consumption
for ($i=0;$i<1000000;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
}
// send something to the browser
for ($i=0;$i<100;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
echo $rand."<br>\n";
}
// delete the file
unlink($fname);
?>
Even if the user presses the "STOP" button, the script will
execute until its last line of code and will remove the file.
Changing the Execution Time Limit
There is yet another important issue: time limitations. By default,
PHP allows scripts to execute for 30 seconds, or the value that
is in the max_execution_time configuration variable. If a
script takes too long to finish, it may terminate before the end
because it exceeds the time limitation. To avoid this scenario,
you can change that limit by using the set_time_limit() function.
This function can only be executed if PHP is not running on safe
mode and should be called at the very beginning of the script. When
the function is called, time limit is reset and the script ends
when that limit is reached. Note that only script execution time
is accounted for. If we call any external scripts or utilities,
that spent time is not taken into account.
We could change the limit to an incredibly high value but it still
might not be enough. If we don't know exactly how long the
script is going to take to finish, we can use the input argument
value 0 to indicate an infinite time limit. This is a dangerous
choice because the script can run forever and consume precious resources.
Careful testing and debugging must be done before using this feature.
Consider the following example:
<?php
// ignore user abort
ignore_user_abort(true);
// set time limit to infinite
set_time_limit(0);
// create a random file and fill it with custom data
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
$fname = $rand.".txt";
$file = fopen($fname,"w");
fputs($file,"Test data");
fclose($file);
// simulate time consumption
for ($i=0;$i<1000000;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
}
// send something to the browser
for ($i=0;$i<100;$i++) {
srand ((float) microtime() * 1000000);
$rand = rand(1000,10000);
echo $rand."<br>\n";
}
// delete the file
unlink($fname);
?>
Conclusion
There are many ways to deal with temporary files using PHP. Choose
the best solution for your specific case, and always consider resources
and how you might need to modify the scripts. In the future, I hope
to see many commercial Web sites using PHP's abilities, freeing
administrators from such tedious tasks such as "deleting temporary
files".
Bruno Pedro, co-founder and manager of ethernet Ida., is a
Systems Engineer with 10-years experience in database-related applications.
He was an early adopter of Linux and has been using open source
technology since then. He has been developing applications for the
Internet since 1995.
|