The DRC RegEx100BTM engine delivers 100,000,000,000 (100 billion) character matches per second. Character patterns are specified using standard POSIX regular expressions. Actionable intelligence is delivered in real-time on unstructed, unindexed text.

Background

With over 10 trillion SMS messages sent in 2012 alone people are rapidly shifting P2P communications from conventional semi-structured mechanisms, like email, to unstructured, casual messaging. With social messaging shorthand and misspellings are common. Specifying character patterns using regular expressions is well known and heavily used to detect key phrases and patterns.

Solution

The DRC RegEx100 analyzer solves this dilemma by performing up to 100 billion character matches / second on real-time data. Up to 1,000 character patterns are compared in parallel on the input text that can be streamed in at 100M Characters / second. Patterns are specified using standard POSIC Regular Expressions. Users can also update the RegEx engine with new patterns in real-time with no perceivable delay or data loss. Thus, an unlimited number of character patterns can be used on the same data set. Matched patterns are output to the user in real-time as they are found. Indexing is not required.


Dense packaging

Each RegEx100B server contains up to four RegEx engines packaged in a 1U rack mountable configuration.

Highly flexible expression engine

Each analyzer engine can be individually configured to handle hundreds of search strings of variable length simultaneously.

Very low latency

By analyzing incoming data in real-time microsecond response times are achieved.

Multi-byte support

Each expression can consist of multi-byte characters enabling all languages to be supported.

Massive Scalability

Clustered RegEx analyzers enable tens of thousands of expressions, petabytes of data and thousands of users to be supported simultaneously.

Ultra-low energy consumption

Each RegEx server with 4 analyzer engines requires less than 400 watts of power.

Cloud ready

Analyzers can be cloud based.


Specification (per analyzer engine):

Performance

Character patterns / second

100,000,000,000

Detection latency

10-5 (10us)

Character patterns

Patterns executed in parallel

1,000

Maximum pattern length

64K char

Number of patterns

unlimited

Input Data

Character encoding

8, 16, 32-bit

Data throughput

100 MB/s

Maximum length

unlimited

1U server contains up to 4 analyzer engines.

1Us can be clustered.