Sunday, February 18, 2018

Calculating Hash values / checksum for files while we read them as Streams in chunks

Introduction

You might need to encounter situations where you need to calculate the checksum of file / stream while transmitting across the wire. Nowadays, it is common to transmit the file stream frequently and during that scenario, you need to ensure data has not been corrupted during this transmission process. For which in the receiving end you need to use the same algorithm and recalculate the checksum to ensure the transmitted data is not corrupted.

Scenario

Let me take the same scenario I explained in my previous post Download large files as chunks and upload them into BLOB. In which we downloaded and transmitted large file as stream in chunks. There are lot of articles over web explaining calculating checksum for full file stream. Here we'll see the snippet below for the case of calculating checksum for chunks and get them accumulated at the end.


Tips

HashAlgorithm.TransformBlock and HashAlgorithm.TransformFinalBlock will help you achieve this.

Snippet

public class LargeFileProcessor
    {       
        /// <summary>
        /// Logger instance.
        /// </summary>
        private ILogger logger = new Logger();

        /// <summary>
        /// Download Large File as chunk and upload as chunk into BLOB.
        /// </summary>
        public async Task ProcessLargeFile()
        {
            // Trimmed for brevity.

            string urlToDownload = CloudConfigurationManager.GetSetting("DownloadURL"); // Provide valid URL from where the large file can be downloaded.

            Stopwatch stopwatch = Stopwatch.StartNew();

            try
            {
                using (HttpClient httpClient = new HttpClient())
                {
                    var httpRequestMessage = new HttpRequestMessage(HttpMethod.Get, new Uri(urlToDownload))
                    {
                        // To avoid error related to 'An existing connection was forcibly closed by the remote host'. Use Http1.0 instead of Http1.1.
                        Version = HttpVersion.Version10
                    };

                    using (HttpResponseMessage response = await httpClient.SendAsync(httpRequestMessage, HttpCompletionOption.ResponseHeadersRead).ConfigureAwait(false))
                    {
                        using (Stream stream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false))
                        {
                            const int pageSizeInBytes = 104857600; // 100MB. As Blob chunk max size is 100MB as of now.

                            var sha256 = new SHA256Managed();

                            var bytesRemaing = response.Content.Headers.ContentLength.Value; // Read Total file size from the header.

                            while (bytesRemaing > 0)
                            {
                                var bytesToCopy = (int)Math.Min(bytesRemaing, pageSizeInBytes);
                                var bytesToSend = new byte[bytesToCopy];

                                var bytesCountRead = await ReadStreamAndAccumulate(stream, bytesToSend, bytesToCopy);

                                // Instead of calculating bytes remaining to exit the While loop,  we can use bytesCountRead as bytesCountRead will be 0 when there are no more bytes to read form the stream.   
                                bytesRemaing -= bytesCountRead;

                                // Calculate the checksum value.
                                if (bytesRemaing <= 0)
                                {
                                    sha256.TransformFinalBlock(bytesToSend, 0, bytesCountRead);
                                }
                                else
                                {
                                    sha256.TransformBlock(bytesToSend, 0, bytesCountRead, bytesToSend, 0);
                                }
                            }

                            var checksum = BitConverter.ToString(sha256.Hash).Replace("-", string.Empty);
                            this.logger.WriteLine($"Hash value is : {checksum}");

                            await Task.FromResult(0);
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                this.logger.WriteLine(ex.Message);
                throw;
            }
            finally
            {
                stopwatch.Stop();
                this.logger.WriteLine($"Execution time in mins: {stopwatch.Elapsed.TotalMinutes}");
            }
        }

        /// <summary>
        /// Read the stream and accumulate till it reaches the number of bytes specified to copy.
        /// </summary>
        /// <param name="stream">Stream to be read from.</param>
        /// <param name="bytesToSend">Target byte array that holds the bytes read.</param>
        /// <param name="bytesCountToCopy">The number of bytes to be copied.</param>
        /// <returns>The number of bytes read.</returns>
        private async Task<int> ReadStreamAndAccumulate(Stream stream, byte[] bytesToSend, int bytesCountToCopy)
        {
                        // Trimmed for brevity.
        }

        /// <summary>
        /// Reads the stream with retry when failed. 
        /// </summary>
        /// <param name="stream">Stream to be read from.</param>
        /// <param name="bytesToSend">Target byte array that holds the bytes read.</param>
        /// <param name="bytesCountToCopy">The number of bytes to be copied.</param>
        /// <param name="offset">The byte offset in buffer at which to begin writing data from the stream.</param>
        /// <returns>The number of bytes read.</returns>
        private async Task<int> ReadStreamWithRetry(Stream stream, byte[] bytesToSend, int bytesCountToCopy, int offset)
        {
                        // Trimmed for brevity.
        }
    }


In the above Snippet I've trimmed set of code to just focus on minimalist snippet to portray calculating checksum for chunk read. You can find full source code in my github repo here.

References

1 comment:

  1. Excellent web site. Lots of useful information here.

    I'm sending it to some pals ans also sharing in delicious.

    And certainly, thank you to your sweat!

    ReplyDelete

Creative Commons License
This work by Tito is licensed under a Creative Commons Attribution 3.0 Unported License.