Lane is a Senior Software Engineer for ThreeWill. He is a strong technology expert with a focus on programming, network and hardware design, and requirements and capacity planning. He has an exceptional combination of technical and communication skills.
For a project, we recently had to implement a mechanism for performing bulk data operations. Essentially, the customer wanted for an end user to upload an Excel file with a bunch of tabular data in it that we would then process and insert into SQL Server. We knew the act of processing and inserting could be very time consuming (we estimated it would be ~30 seconds/row x 10,000 max rows = ~1.2 days) so we wanted something very robust. Simply working in the application pool wasn’t an option, IIS would time us out. When coding the on-premise implementation, we just used a Windows service to watch for a file event in a folder that would process the file uploaded by the user. Easy peasy lemon squeezy.
However, Azure works differently. To have the equivalent of a Windows service in Azure we either needed a Worker Role or an Azure Fabric Stateless Worker. Both of these approaches were undesirable since worker roles always consume resources, even when they are idle. This bulk import process would be used once, maybe twice, in the initial phases of an engagement and then never again over the course of months (or maybe even years). That means that money is being spent even when nothing is happening.
Our first idea for solving this was to use an Azure Function App. A Function allowed us to listen to an Azure BLOB storage account for when a file was written and then react accordingly. However, we quickly realized that Function Apps have a 5-minute timeout (at least the Consumption plan ones do, I think the ‘always on’ Functions do not have this limitation) and our job had the likelihood of running for days.
The next couple ideas we tried out were very complicated, and in the end, none of them worked. That’s when we stumbled across Azure Batch Services. Batch Services are a way to run very long running operations much like the worker roles. However, unlike Worker Roles, you only pay for Batch Services when a batch job is running. The Azure Team were even nice enough to provide some example projects that showed how to wire up a job, start it, monitor it, and eventually kill it when its done (and all kinds of other very helpful things). Even better, we learned we could set up a Function App to trigger off a file being written to BLOB storage that would start the batch job, but the Function App didn’t have to wait until the job was completed, so we weren’t limited by the 5-minute timeout any longer. Even better, when setting up a job, you can specify how many cores and nodes in the batch cluster you want. That allows us to “preflight” the job by looking at how many records we will be importing and scale our job accordingly. Very slick!
It feels weird writing a blog post without code so here are some steps that should get you pointed in the right direction:
- Create a Batch Service account from the Azure Portal.
- Grab one of the sample projects from the Github project I linked above. I used HelloWorld as a starting place.
- Open Main.cs and comment out the lines for Console.ReadLine at the end of Main() (~line 38), and the call to WaitForJobAndPrintOutputAsync (~line 71) and all the code in the Finally block (~line 75-80) in HelloWorldAsync() (mainly we just want HelloWorld to tee up the job and not wait for it to finish). Save and compile everything.
- Make sure your Azure specific settings (keys, URL, etc.) are saved in Common’s AccountSettings.settings.
- Create a new Function App. Add a BlobTrigger-CSharp Function and add the following code into Run (you will also need to add ‘using System.Diagnostics’ at the top):
Process process = new Process(); process.StartInfo.FileName = @"D:\home\site\wwwroot\BlobTriggerCSharp1\HelloWorld\HelloWorld.exe"; process.StartInfo.Arguments = ""; process.StartInfo.UseShellExecute = false; process.StartInfo.RedirectStandardOutput = true; process.StartInfo.RedirectStandardError = true; process.Start(); string output = process.StandardOutput.ReadToEnd(); string err = process.StandardError.ReadToEnd(); log.Info(output); process.WaitForExit();
- Open the Function App settings and open Kudu. Use Kudu to upload the contents of the HelloWorld Debug folder into the Function App.
- Configure the BLOB triggers for the Function App using any path and storage account you want to.
- If needed, correct the path to the EXE above.
- Open Azure Storage Explorer and upload a file to the storage account you configured in #7.
You should now see your HelloWorld job if you navigate to the Batch Service’s Jobs in the Azure Portal. Note that because we took out the code to delete the job once its done from HelloWorld, you will need to add that logic somewhere else in your workflow (maybe another BLOB trigger that kicks off another Azure Function App).
And that’s it. An on-demand, scalable batch processing mechanism that works (and costs us money) only when we want it to.