Dr. Dobb's Journal - November 2007 - (Page 30) d11moha_p5db 9/17/07 9:00 AM Page 30 Core Technology byMohamed Abo El-Fotouh and Klaus Diepold Distributed Computing: Windows and Linux A cluster for analyzing hundreds of gigabytes of data In this article, we discuss a distributed system in which a single Windows machine controls a Mohamed is a Ph.D. student at Munich University of Technology. Klaus is the head of the Institute for Data Processing at Munich University of Technology. They can be contacted at mohamed@tum.de and kldi@tum.de, respectively. Linux cluster. We implemented this system as a part of an investigation into the security of block ciphers. In particular, we were analyzing block ciphers as to their suitability to act as random number generators. To this end, we used the NIST statistical suite for testing the randomness of specific data sets (csrc.nist.gov/rng). For each data set, we performed 188 different statistical tests, measuring the percentage of the tests each cipher passes. The two main parameters we were changing were the keys and plaintext. To generate and analyze the data, we built “System1” under Windows on a single machine. This system was easily able to process megabytes of generated data for our analysis. Before long, however, the number and size of the data sets increased dramatically (as we decided to study the randomness of block cipher modes of operation), and System1 could no longer analyze data in a timely and reasonable manner. Consequently, we then built “System2,” a Linux-based cluster that works in parallel on several machines to analyze hundreds of gigabytes of data; see Figure 1. The Windows-based System1 consists of the following programs: • The Generator, which generates statistical data sets, written in Visual C++. • The NIST statistical suite, which processes the data sets generated with the Generator, written in C. • The Extractor, which extracts and summarizes the results generated from the NIST statistical suite, written in Visual Basic 6. For its part, System2 works as follows: • Data sets are generated with the Generator on the Windows machine. • These data sets are then transferred to a shared hard-disk partition on the Linux cluster. • The NIST statistical suite processes the data sets on the Linux cluster. • The results are then transferred back to the Windows machine. • Finally, the data are extracted and summarized by the Extractor on the Windows machine. Our biggest challenge here was how to get System2 programs to communicate with each other, and how to automate the system as much as possible. To accomplish this, we had to modify the current programs and write new ones. In a nutshell, we decided to divide System2 into three parts: • Data sets generation and transmission to the Linux cluster. • Processing data on the Linux cluster. • Transmitting results to the Windows machine for analysis. Once we generated the data using the Generator, our next problem was how to transfer it to the Linux cluster. We tried several techniques, including generating all data sets, then transferring them via the Samba client. But the problem here is that we had to wait until all the data is generated (which could take a couple of days), then start the Samba client from a Linux emulator under Windows (cgywin). Clearly, this didn’t suit our needs. In another attempt, during generation of the data sets, we started WinSCP using the command, “Keep the remote directory up to date,” and changed the priority of the WinSCP to a higher priority than that of the Generator, to transfer the data once it is created. The problem with this approach was that we ran out of space on the Windows machine, as all the 30 Dr. Dobb’s Journal l www.ddj.com l November 2007 http://csrc.nist.gov/rng http://www.ddj.com
Table of Contents Feed for the Digital Edition of Dr. Dobb's Journal - November 2007 Contents Hmmmm Alia Vox Developer Diaries Developer’s Notebook Smart Compilers - But Smart Enough? Conversations Grid-Enabling Resource-Intensive Applications Distributed Computing: Windows and Linux Adobe AIR: Desktop/Web Convergence Transparency on Demand Reusable Associations Effective Concurrency The Agile Edge Swaine’s Flames Dr. Dobb's Journal - November 2007 Dr. Dobb's Journal - November 2007 - (Page Cover1) Dr. Dobb's Journal - November 2007 - (Page Cover2) Dr. Dobb's Journal - November 2007 - (Page 1) Dr. Dobb's Journal - November 2007 - (Page 2) Dr. Dobb's Journal - November 2007 - (Page 3) Dr. Dobb's Journal - November 2007 - Contents (Page 4) Dr. Dobb's Journal - November 2007 - Contents (Page 5) Dr. Dobb's Journal - November 2007 - Hmmmm (Page 6) Dr. Dobb's Journal - November 2007 - Hmmmm (Page 7) Dr. Dobb's Journal - November 2007 - Hmmmm (Page 8) Dr. Dobb's Journal - November 2007 - Hmmmm (Page 9) Dr. Dobb's Journal - November 2007 - Alia Vox (Page 10) Dr. Dobb's Journal - November 2007 - Alia Vox (Page 11) Dr. Dobb's Journal - November 2007 - Developer Diaries (Page 12) Dr. Dobb's Journal - November 2007 - Developer Diaries (Page 13) Dr. Dobb's Journal - November 2007 - Developer’s Notebook (Page 14) Dr. Dobb's Journal - November 2007 - Developer’s Notebook (Page 15) Dr. Dobb's Journal - November 2007 - Smart Compilers - But Smart Enough? (Page 16) Dr. Dobb's Journal - November 2007 - Smart Compilers - But Smart Enough? (Page 17) Dr. Dobb's Journal - November 2007 - Smart Compilers - But Smart Enough? (Page 18) Dr. Dobb's Journal - November 2007 - Smart Compilers - But Smart Enough? (Page 19) Dr. Dobb's Journal - November 2007 - Conversations (Page 20) Dr. Dobb's Journal - November 2007 - Conversations (Page 21) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 22) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 23) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 24) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 25) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 26) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 27) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 28) Dr. Dobb's Journal - November 2007 - Grid-Enabling Resource-Intensive Applications (Page 29) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 30) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 31) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 32) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 33) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 34) Dr. Dobb's Journal - November 2007 - Distributed Computing: Windows and Linux (Page 35) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 36) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 37) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 38) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 39) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 40) Dr. Dobb's Journal - November 2007 - Adobe AIR: Desktop/Web Convergence (Page 41) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 42) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 43) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 44) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 45) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 46) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 47) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 48) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 49) Dr. Dobb's Journal - November 2007 - Transparency on Demand (Page 50) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 51) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 52) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 53) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 54) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 55) Dr. Dobb's Journal - November 2007 - Reusable Associations (Page 56) Dr. Dobb's Journal - November 2007 - Effective Concurrency (Page 57) Dr. Dobb's Journal - November 2007 - Effective Concurrency (Page 58) Dr. Dobb's Journal - November 2007 - Effective Concurrency (Page 59) Dr. Dobb's Journal - November 2007 - The Agile Edge (Page 60) Dr. Dobb's Journal - November 2007 - The Agile Edge (Page 61) Dr. Dobb's Journal - November 2007 - The Agile Edge (Page 62) Dr. Dobb's Journal - November 2007 - The Agile Edge (Page 63) Dr. Dobb's Journal - November 2007 - Swaine’s Flames (Page 64) Dr. Dobb's Journal - November 2007 - Swaine’s Flames (Page Cover3) Dr. Dobb's Journal - November 2007 - Swaine’s Flames (Page Cover4)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.