You have a processing job which dumps a bunch of output files in a directory. A ...

mullen · on Dec 2, 2020

If this is an issue, you wait X amount of seconds after the file has been created and then start processing it. This would allow the file to be consistent before processing it.

A better idea would be triggering Lambda jobs which either directly processes the files as they are added to S3 or trigger Lambda jobs which add the files to SQS and each job in the SQS is processed by another Lambda job.

Macha · on Dec 2, 2020

The issues with a sleep and pray strategy are:

1. No amount of time is provably enough

2. If you just take the maximum recorded time and say add 20% padding, then waiting that amount of time to process every dataset could be detrimental to performance.

The example I gave happened to the team I was on in 2017/2018. We had 1000s of files totaling terabytes of data in a given batch. The 90th percentile time for consistency was the low 10s of seconds, the 99th percentile was measured in minutes. The manifest and retry not yet present method avoids having to put in a sleep(5 minutes) for the 1% of cases.