I need to create a web page that a person can use to select and upload a text file (eg: csv) and have that file imported to a backend mysql database.
This is all fine, and I have PHP code that will do this - accept the file upload, parse, save to database.
The problem is that I need a reliable method that will allow the user to upload very large files, potentially around 30 - 80Mb, without timing out or failing during the upload or the save to database.
Other than increasing the upload limit and timeout settings, is there any other advice on how I can ensure a consistently reliable method to ensure all data is uploaded?
The dataset changes daily, and all the previous day's data in the database needs to be cleared out and replaced with the new day's data - so it is vital that no data is lost during the new upload.
I would welcome the advice of others as to how best to achieve this, with a high degree of confidence that the data will be uploaded reliably and successfully, and how best to do the clearing out of the previous days data - ideally it would be best to check the new data was fully uploaded before clearing out and replacing the previous data.
All suggestions welcome!
There is a Stack Overflow question to confirm GolezTrol's answer that the PHP timeout doesn't begin until after the upload is finished: PHP file upload affected or not by max_input_time?
As for actually processing the data, if the file is in a CSV/tab-delimited or similar format that matches the table structure, the fastest method is to use MySQL's
LOAD DATA statement: https://dev.mysql.com/doc/refman/5.5/en/loading-tables.html
For the aspect of replacing the data and making sure the import was successful before deleting the old data, the best answer to this question: MySQL: Truncate Table within Transaction? talks about how you can do this by inserting the data into a new table, then once you've verified the new data, renaming the tables to swap them. Then you can safely delete the old data. This seems like the most fluid way of accomplishing this (keeping in mind that
TRUNCATE is not transaction-safe).
I think the timeout doesn't count as long as the file is being uploaded. The file data is part of the request, and the timeout only starts when the request is fully sent.
Then, instead of parsing the file immediately in the request, you can parse it using a cron job after it is uploaded.
Look into uploaders such as fine uploader that chunk your uploads and allow resume if the connection breaks (instead of starting from beginning each time).
The other upside with uploader like fine uploader is that user knows how much more s/he has to wait until upload is done.
If you think the import after the upload might be a problem in terms of processing time and user feedback. I would suggest to make a "worker" that runs in the background and let the users know when there data is ready. That way you can also have many users upload there files all the time and process them when there is processor power or hard-disk-time available.