Introduction
You might need to encounter situations where you need to calculate the checksum of file / stream while transmitting across the wire. Nowadays, it is common to transmit the file stream frequently and during that scenario, you need to ensure data has not been corrupted during this transmission process. For which in the receiving end you need to use the same algorithm and recalculate the checksum to ensure the transmitted data is not corrupted.
Scenario
Let me take the same scenario I explained in my previous post Download large files as chunks and upload them into BLOB. In which we downloaded and transmitted large file as stream in chunks. There are lot of articles over web explaining calculating checksum for full file stream. Here we'll see the snippet below for the case of calculating checksum for chunks and get them accumulated at the end.
Tips
HashAlgorithm.TransformBlock and HashAlgorithm.TransformFinalBlock will help you achieve this.Snippet
public class LargeFileProcessor { /// <summary> /// Logger instance. /// </summary> private ILogger logger = new Logger(); /// <summary> /// Download Large File as chunk and upload as chunk into BLOB. /// </summary> public async Task ProcessLargeFile() { // Trimmed for brevity. string urlToDownload = CloudConfigurationManager.GetSetting("DownloadURL"); // Provide valid URL from where the large file can be downloaded. Stopwatch stopwatch = Stopwatch.StartNew(); try { using (HttpClient httpClient = new HttpClient()) { var httpRequestMessage = new HttpRequestMessage(HttpMethod.Get, new Uri(urlToDownload)) { // To avoid error related to 'An existing connection was forcibly closed by the remote host'. Use Http1.0 instead of Http1.1. Version = HttpVersion.Version10 }; using (HttpResponseMessage response = await httpClient.SendAsync(httpRequestMessage, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false)) { using (Stream stream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false)) { const int pageSizeInBytes = 104857600; // 100MB. As Blob chunk max size is 100MB as of now. var sha256 = new SHA256Managed(); var bytesRemaing = response.Content.Headers.ContentLength.Value; // Read Total file size from the header. while (bytesRemaing > 0) { var bytesToCopy = (int)Math.Min(bytesRemaing, pageSizeInBytes); var bytesToSend = new byte[bytesToCopy]; var bytesCountRead = await ReadStreamAndAccumulate(stream, bytesToSend, bytesToCopy); // Instead of calculating bytes remaining to exit the While loop, we can use bytesCountRead as bytesCountRead will be 0 when there are no more bytes to read form the stream. bytesRemaing -= bytesCountRead; // Calculate the checksum value. if (bytesRemaing <= 0) { sha256.TransformFinalBlock(bytesToSend, 0, bytesCountRead); } else { sha256.TransformBlock(bytesToSend, 0, bytesCountRead, bytesToSend, 0); } } var checksum = BitConverter.ToString(sha256.Hash).Replace("-", string.Empty); this.logger.WriteLine($"Hash value is : {checksum}"); await Task.FromResult(0); } } } } catch (Exception ex) { this.logger.WriteLine(ex.Message); throw; } finally { stopwatch.Stop(); this.logger.WriteLine($"Execution time in mins: {stopwatch.Elapsed.TotalMinutes}"); } } /// <summary> /// Read the stream and accumulate till it reaches the number of bytes specified to copy. /// </summary> /// <param name="stream">Stream to be read from.</param> /// <param name="bytesToSend">Target byte array that holds the bytes read.</param> /// <param name="bytesCountToCopy">The number of bytes to be copied.</param> /// <returns>The number of bytes read.</returns> private async Task<int> ReadStreamAndAccumulate(Stream stream, byte[] bytesToSend, int bytesCountToCopy) { // Trimmed for brevity. } /// <summary> /// Reads the stream with retry when failed. /// </summary> /// <param name="stream">Stream to be read from.</param> /// <param name="bytesToSend">Target byte array that holds the bytes read.</param> /// <param name="bytesCountToCopy">The number of bytes to be copied.</param> /// <param name="offset">The byte offset in buffer at which to begin writing data from the stream.</param> /// <returns>The number of bytes read.</returns> private async Task<int> ReadStreamWithRetry(Stream stream, byte[] bytesToSend, int bytesCountToCopy, int offset) { // Trimmed for brevity. } }
In the above Snippet I've trimmed set of code to just focus on minimalist snippet to portray calculating checksum for chunk read. You can find full source code in my github repo here.
References
- http://peterkellner.net/2010/11/24/efficiently-generating-sha256-checksum-for-files-using-csharp/
- https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.transformblock?view=netframework-4.7.1#System_Security_Cryptography_HashAlgorithm_TransformBlock_System_Byte___System_Int32_System_Int32_System_Byte___System_Int32_
- https://docs.microsoft.com/en-us/dotnet/api/system.security.cryptography.hashalgorithm.transformfinalblock?view=netframework-4.7.1#System_Security_Cryptography_HashAlgorithm_TransformFinalBlock_System_Byte___System_Int32_System_Int32_