MSDN Magazine - February 2008 - (Page 133) Stream Pipeline STEPHEN TOUB cess will adequately be able to determine where one GZipStream ends and the next begins. GZipStream currently buffers data as it reads from the input stream, so it may end up consuming more data from the input stream than it actually needs, which means you’d need to reset the position in the input stream after the first GZipStream completed its decompression. One of the nice things about the chunking approach is that you can get the process to scale relatively well to more than two cores, as you can simply create as many chunks as are necessary to saturate your processors. But with only two cores, and with two operations (compression and encryption), a more practical solution for you at First, just to make sure, you’re doing the compression operation this point would probably be to create a parallel stream pipeline. before the encryption operation, correct? If not, change it if you You can do one operation on one processor at the same time that the other operation is working on the other can. Good encryption will generate relatively processor. Now, obviously they can’t be workuncompressible data. If you switch the order ing on the same data at the same time, as you’d of the operations so that you first compress end up with two outputs (one compressed and and then encrypt, not only should you end one encrypted) that would be relatively useless up with a smaller file, but the encryption will to you given the problem you’re trying to solve most likely take less time as it’ll be operating on less data. As an example of this, I downloaded Figure 1 Passing Data through Streams (one output both compressed and encrypted). the text to War and Peace from the Gutenberg Project (www.guten- Instead, however, you can mimic what you’re probably doing today, berg.org) and ran it through the two orderings. Encrypting (with where using the decorator pattern you pass the output from one RijndaelManaged and the default key size) and then compressing stream as the input to the next (shown in Figure 1): resulted in a data stream 250 percent larger than the one generusing (CryptoStream encrypt = new CryptoStream( output, transform, CryptoStreamMode.Write)) ated by compressing and then encrypting, and it took 50 percent using (GZipStream compress = new GZipStream( longer to execute. encrypt, CompressionMode.Compress, true)) CopyStream(input, compress); Now, to answer your actual question: yes, there are several ap proaches you could take here. The first would be to parallelize static void CopyStream(Stream input, Stream output){ byte[] buffer = new byte[0x1000]; the actual compression and encryption operations. You probably int read; don’t want to (and shouldn’t) re-implement the functionality in while ((read = input.Read(buffer, 0, buffer.Length)) > 0) output.Write(buffer, 0, read); GZipStream and CryptoStream, so, until the .NET Framework team } parallelizes them for you, you’ll want an alternate solution. Here, the CopyStream method is copying from the input stream If you don’t care about the actual output format (for example, you need compression, but you don’t care that it actually adheres to the into the compression stream. When the compression stream has gzip standard), you could chunk your input and process each chunk compressed a buffer of data, it in turn writes that data out as the in parallel. For example, on your dual-core machine, you could di- input into the encryption stream. And similarly, when the encrypvide in half the input byte array being passed to your GZipStream tion stream has completed a buffer of data, it in turn writes that out currently and, instead, process one half with one GZipStream and to the output stream. This is known as a pipeline. Parallelizing a pipeline is a very natural concept that manifests the other half with another GZipStream. You can then save these frequently in the “real world.” Consider a group of people sending out to your output file one after the other. Note, though, that you’ll probably need to include some header out invitation letters. One person is in charge of folding the letters information around your output so that your decompression pro- to fit in an envelope, one person is in charge of putting the folded In my application, I’m encrypting and compressing quite a bit of data. As these are computationally intensive operations, I was expecting to see 100 percent CPU utilization in Task Manager, but I noticed that on my dual-core machine, it’s topping out at around 50 percent. I’m assuming this is because only one core is being used, which is a shame given that the process takes a non-trivial amount of time to run. Is there any way I can get this encryption and compression process to use both processors? I’m using CryptoStream and GZipStream from the Microsoft® .NET Framework. Q A february2008 133 http://www.gutenberg.org http://www.gutenberg.org
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.