Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-05-27T18:51:07.909Z Has data issue: false hasContentIssue false

3 - Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

Published online by Cambridge University Press:  05 December 2012

Antonino Tumeo
Affiliation:
Pacific Northwest National Laboratory
Oreste Villa
Affiliation:
Pacific Northwest National Laboratory
Daniel Chavarría-Miranda
Affiliation:
Pacific Northwest National Laboratory
Ian Gorton
Affiliation:
Pacific Northwest National Laboratory, Washington
Deborah K. Gracio
Affiliation:
Pacific Northwest National Laboratory, Washington
Get access

Summary

Introduction

Data-intensive applications have special characteristics that in many cases prevent them from executing well on traditional cache-based processors. They can have highly irregular access patterns with very little locality that do not match the expectations of automatically controlled caches. In other cases, such as when they process data in streaming, they do not have temporal locality at all and only limited spatial locality, therefore reducing the effectiveness of caches.

We present an application-driven study of several architectures that are suitable for data-intensive algorithms. Our chosen application is high-speed string matching, which exhibits two key properties of data-intensive codes: highly irregular access patterns and high-speed streaming data. Irregular access patterns appear in string matching when traversing graph-based representations of the pattern dictionaries being used. String matching is typically used in cybersecurity applications to scan incoming network traffic or files for the presence of signatures (such as specific sequences of symbols), which may relate to attack patterns, viruses, or other malware.

String Matching

String matching algorithms check and detect the presence of one or more known symbol sequences inside the analyzed data sets. Besides their wellknown application to databases and text processing, they are the basis of several other critical, real-world applications. String matching algorithms are key components of DNA and protein sequencing, data mining, security systems, such as Intrusion Detection Systems (IDS) for Networks (NIDS), Applications (APIDS), Protocols (PIDS), or Systems (Host based IDS [HIDS]), anti-virus software, and machine learning problems.

Type
Chapter
Information
Data-Intensive Computing
Architectures, Algorithms, and Applications
, pp. 24 - 47
Publisher: Cambridge University Press
Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Aho, A. V., and Corasick, M. J.Efficient String Matching: An Aid to Bibliographic Search.” Communications of the ACM 18, 6(1975): 333–40.CrossRefGoogle Scholar
2. Chavarría-Miranda, D., Marquez, A., Nieplocha, J., Maschhoff, K., and Scherrer, C.Early Experience with Out-of-Core Applications on the Cray XMT. In IPDPS '08: 22nd IEEE International Parallel and Distributed Processing Symposium (April 2008), pp. 1–8.Google Scholar
3. Cho, Y. H., and Mangione-Smith, W. H. “Deep Packet Filter with Dedicated Logic and Read Only Memories.” In FCCM '04: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (April 2004), pp. 125–34.CrossRefGoogle Scholar
4. Clark, C. R., and Schimmel, D. E. “Scalable Pattern Matching for High Speed Networks.” In FCCM '04: 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Apr. 2004), pp. 249–57.CrossRefGoogle Scholar
5. Feo, J., Harper, D., Kahan, S., and Konecny, P. Eldorado. In CF '05: Proceedings of the 2nd conference on Computing frontiers (New York, NY, USA, 2005), ACM, pp. 28–34.Google Scholar
6. Jacob, N., and Brodley, C. “Offloading IDS Computation to the GPU.” In ACSAC '06: 22nd Annual Computer Security Applications Conference (Dec. 2006), pp. 371–80.Google Scholar
7. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. Version 2.2, September 2009.
8. Mitra, A., Najjar, W., and Bhuyan, L. “Compiling PCRE to FPGA for accelerating SNORT IDS.” In ANCS '07: The 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems (2007), pp. 127–36.CrossRefGoogle Scholar
9. Nawathe, U., Hassan, M., Yen, K., Kumar, A., Ramachandran, A., and Greenhill, D.Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip.” Solid-State Circuits, IEEE Journal of 43, 1 (Jan. 2008): 6–20.Google Scholar
10. Nvidia Nvidia Cuda: Compute Unified Device Architecture. Programming guide. Version 2.0, July 2008.
11. Pasetto, D., Petrini, F., and Agarwal, V.Tools for Very Fast Regular Expression Matching. Computer 43 (2010): 50–58.CrossRefGoogle Scholar
12. Roesch, M.Snort: Lightweight Intrusion Detection for Networks. In LISA (1999), pp. 229–38.Google Scholar
13. Ruetsch, G., and Micikevicius, P. “NVIDIA Whitepaper: Optimizing Matrix Transpose in CUDA.”
14. Scarpazza, D. P., Villa, O., and Petrini, F. “Exact Multi-Pattern String Matching on the Cell/B.E. Processor.” In CF '08: Proceedings of the 2008 conference on Computing frontiers (New York, NY, USA, 2008), ACM, pp. 33–42.Google Scholar
15. Sourdis, I., and Pnevmatikatos, D.Fast, Large-Scale String Match for a 10Gbps FPGA-Based Network Intrusion. In FPL '03: 13th Conference on Field Programmable Logic and Applications (September 2003), pp. 880–89.Google Scholar
16. ,Symantec Global Internet Security Threat Report. White Paper (April 2008).Google Scholar
17. Tumeo, A., Villa, O., Chavarria-Miranda, D.Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 436–43, March, 2012.Google Scholar
18. Vasiliadis, G., Antonatos, S., Polychronakis, M., Markatos, E. P., and Ioannidis, S. “Gnort: High Performance Network Intrusion Detection Using Graphics Processors.” In RAID '08: 11th international symposium on Recent Advances in Intrusion Detection (2008), pp. 116–34.CrossRefGoogle Scholar
19. Villa, O., Chavarria-Miranda, D., and Maschhoff, K. “Input-Independent, Scalable and Fast String Matching on the Cray XMT.” In IPDPS '09: The 2009 IEEE International Symposium on Parallel & Distributed Processing (2009), pp. 1–12.Google Scholar
20. Villa, O., Scarpazza, D. P., and Petrini, F.Accelerating Real-Time String Searching with Multicore Processors.” Computer 41, 4 (2008): 42–50.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×