Web Application Defense: Bayesian Attack Analysis


Regular Expressions for Input Validation

If your web application defensive strategy against injection attacks relies solely upon the use of blacklist regular expression for input validation, it is only a matter of time before an attacker finds an evasion.  Want proof?  Check out our SQL Injection Challenge post mortem.  Just to clarify, there is value in using regular expressions for input validation:
  • Positive Security Model (Whitelisting) - where web application developers (Builders) define what is acceptable input and deny anything that does not match.  Examples would be the OWASP Validation Regex Repository.
  • Negative Security Model (Blacklisting) - where web applicationd defenders (Defenders) define filters to detect and block known malicious input.  Examples would be the OWASP ModSecurity Core Rule Set.
Where organizations get into trouble is when they do not utilize a positive security model and instead only use a negative security model to try and block attacks.

Blacklist Filter Evasion Analysis

While conducting our SQL Injection Challenge post mortem analysis, we identified the following common methodology used by attackers:
  • Use DAST tools to identify injection points.  Tools included:
    • Arachni
    • Sqlmap
    • Havij
    • Netsparker
  • Manual testing to develop a working evasion payload.  This consisted of an iterative process of trial and error:
    1. Send SQL payload and inspect DB error response
    2. Use obfuscation techniques (such as Sqlmap's Tamper Scripts)
    3. Send attack and observe the response
    4. Repeat steps 2 and 3
Here is an example of the iterative testing process:
div 1 union%23%0Aselect 1,2,current_user
div 1 union%23foo*/*bar%0Aselect 1,2,current_user
div 1 union%23foofoofoofoo*/*bar%0Aselect 1,2,current_user
div 1 union%23foofoofoofoofoofoofoofoofoofoo*/*bar%0Aselect 1,2,current_user
…
div 1 union%23foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo*/*bar%0Aselect 1,2,current_user
The final payload successfully bypassed the following OWASP ModSecurity CRS SQL Injection rule as it only allows between 1 and 100 characters between the union and select keywords:
SecRule REQUEST_FILENAME|ARGS_NAMES|ARGS|XML:/* \
"\bunion\b.{1,100}?\bselect\b" \
"phase:2,rev:'2.2.0',capture,t:none,t:urlDecodeUni,t:htmlEntityDecode,t:lowercase, t:replaceComments,t:compressWhiteSpace,ctl:auditLogParts=+E,block,msg:'SQL Injection Attack',id:'959047',tag:'WEB_ATTACK/SQL_INJECTION',tag:'WASCTC/WASC-19',tag:'OWASP_TOP_10/A1',tag:'OWASP_AppSensor/CIE1',tag:'PCI/6.5.2',logdata:'%{TX.0}',severity:'2',setvar:'tx.msg=%{rule.msg}',setvar:tx.sql_injection_score=+%{tx.critical_anomaly_score},setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},setvar:tx.%{rule.id}-WEB_ATTACK/SQL_INJECTION-%{matched_var_name}=%{tx.0}"
How long does this trial and error process last?  Depends on the skill level of the attacker and if detailed SQL error messages are returned (otherwise it turns into a blind SQL attack which usually takes more time).  Here is a quick table listing some time-to-evasion statistics from the SQL Injection Challenge:

Screen shot 2012-09-20 at 10.07.40 AM

Blacklist Filter Evasion Conclusion

  • Blacklist filtering alone will only slow down determined
    attackers
  • Attackers need to try many
    permutations
    to identify a working filter evasion
  • The OWASP ModSecurity Core Rules Set’s blacklists SQLi signatures caught several hundred attempts
    before an evasion was found

New Questions

  • How can we use this methodology to our advantage?
  • RegEx detection is binary
    • The operator
      either matched or it didn’t
    • Need a method
      of detecting attack
      probability
  • What detection technique can we use other than regular
    expressions?

Using Bayesian Analysis

Bayesian analysis has achieved great results in Anti-SPAM
efforts for email.  Why can’t we use the same detection logic for HTTP data?  Conceptually, we need to look at the HTTP equvalence for using Bayesian analysis:
  • Data Source
    • Email – OS
      level text files
    • HTTP – text
      taken directly from HTTP transaction
  • Data Format
    • Email – Mime
      headers + Email body
    • HTTP – URI +
      Request Headers + Parameters
  • Data
    Classification
    • Non-malicious
      HTTP request = HAM
    • HTTP Attack
      payloads = SPAM
Conceptually, we should be able to analyze HTTP request traffic using Bayesian analysis to identify an attack probability.  Now we just need to figure out what Bayesian tool to use and how to pass live HTTP data to it!

OSBF-Lua + ModSecurity's Lua API = Win

In order to extend ModSecurity's capabilities, we can use the flexible Lua API to add inspection logic.  After some searching on the inter-webs, I was able to find the following Lua packages for Bayesian analysis:
OSBF-Lua by Fidelis Assis - http://osbf-lua.luaforge.net/
  • Orthogonal
    Sparse Bigrams with Confidence Factor (OSBF)
  • Uses space
    characters for tokenization (which means that it factors in meta-characters)
  • Very fast
  • Accurate
    classifiers
Moonfilter by Christian Siefkes - http://www.siefkes.net/software/moonfilter/
  • Wrapper script
    for OSBF
  • Integrate with ModSecurity’s Lua API
Once you have installed moonfilter.lua, edit the file and remove "local" from the last line so that it looks like the bolded text:
----- Exported configuration variables ---------------------------

-- Minimum absolute pR a correct classification must get not to 
-- trigger a reinforcement.
threshold = 20
-- Number of buckets in the database. The minimum value 
-- recommended for production is 94321.
buckets = 94321
-- Maximum text size, 0 means full document (default). A 
-- reasonable value might be 500000 (half a megabyte).
max_text_size = 0
-- Minimum probability ratio over the classes a feature must have 
-- not to be ignored. 1 means ignore nothing (default).
min_p_ratio = 1
-- Token delimiters, in addition to whitespace. None by default, 
-- could be set e.g. to ".@:/".
delimiters = ""
-- Whether text should be wrapped around (by re-appending the 
-- first 4 tokens after the last).
wrap_around = true
-- The directory where class database files are stored. Defaults 
-- to the current working directory (empty string). Note that the 
-- directory name MUST end in a path separator (typically '/' or 
-- '\', depending on your OS) in all other cases. Changing this 
-- value will only affect future calls to the |classes| command; 
-- it won't change the location of currently active classes.
classdir = ""
-- The text to classify/train as a string -- can be set explictly 
-- if desired
text = nil
This will allow us to pass HTTP payload data directly from ModSecurity.  The moonrunner script is very useful to manage your SPAM training files.  Here is an example usage for initially creating the HAM/SPAM training DB files:
# ./moonrunner.lua
classes /var/log/httpd/spam /var/log/httpd/ham
classes ok
create
create ok
stats /var/log/httpd/spam
stats ok: "-- Statistics for /var/log/httpd/spam.cfc\
Database version:                    OSBF-Bayes\
Total buckets in database:                94321\
Buckets used (%):                           0.0\
Trainings:                                    0\
Bucket size (bytes):                         12\
Header size (bytes):                       4092\
Number of chains:                             0\
Max chain len (buckets):                      0\
Average chain length (buckets):               0\
\
The bolded lines are the command entered.  As you can see, these command create the ham/spam classification files under the normal Apache logging directory.  You should make sure that these files have read/write permissions for the Apache user:
# ls -l *.cfc
-rw------- 1 root root 1135948 Feb 18 14:42 ham.cfc
-rw------- 1 root root 1135948 Feb 18 14:43 spam.cfc
# chown apache:apache *.cfc
# ls -l *.cfc
-rw------- 1 apache apache 1135948 Feb 18 14:42 ham.cfc
-rw------- 1 apache apache 1135948 Feb 18 14:43 spam.cfc
You can manually conduct ham/spam training from with moonrunner if you wish.  Here is an example demonstrating SQLi SPAM training:
readuntil <EOF>
12'UNION/*!00909SELECT 1,2,3,4,5,6,7,8,9 --
<EOF>
readuntil ok
train /var/log/httpd/spam
Invoking classify for ''
train ok: misclassified=false reinforced=true
stats /var/log/httpd/spam
stats ok: "-- Statistics for /var/log/httpd/spam.cfc\
Database version:                    OSBF-Bayes\
Total buckets in database:                94321\
Buckets used (%):                           0.0\
Trainings:                                    1\
Bucket size (bytes):                         12\
Header size (bytes):                       4092\
Number of chains:                            32\
Max chain len (buckets):                      1\
Average chain length (buckets):               1\
You can also train on some normal, non-malicious text as ham:
readuntil <EOF>
this is just normal text.
<EOF>
readuntil ok
train /var/log/httpd/ham
Reusing stored result for ''
train ok: misclassified=true reinforced=false
stats /var/log/httpd/ham
stats ok: "-- Statistics for /var/log/httpd/ham.cfc\
Database version:                    OSBF-Bayes\
Total buckets in database:                94321\
Buckets used (%):                           0.0\
Trainings:                                    1\
Bucket size (bytes):                         12\
Header size (bytes):                       4092\
Number of chains:                            43\
Max chain len (buckets):                      1\
Average chain length (buckets):               1\
After some training, you can then attempt to classify unknown data:
readuntil <EOF>
1'UNION/*!0SELECT user,2,3,4,5,6,7,8,9/*!0from/*!0mysql.user/*-
<EOF>
readuntil ok
classify
classify ok: prob=0.73998695843754 probs=[ 0.73998695843754 
0.26001304156246 ] class=/var/log/httpd/spam pR=0.26799507117831
 reinforce=true
Now that we have demonstrated using moonrunner to train/classify data as ham/spam, we next need to hook this into ModSecurity and the OWASP CRS.

Theory of Operation

The theory of operation is that we want regular, non-malicious users to help train our classifiers on HAM data.  This is achieved by checking the CRS anomaly score and if it is 0 then we extract payload data and train OSBF's HAM classifier.  On the flip-side, if an attacker starts sending SQLi attacks, the OWASP CRS will identify these initial attacks and train OSBF's SPAM classifier.  Here is a visual representation:
Screen shot 2012-09-20 at 11.23.00 AM
We just added a new Bayesian Analysis rules file to the experimental_rules directory OWASP ModSecurity CRS GitHub archive with the following contents:
SecRule TX:'/^\\\d.*WEB_ATTACK/' ".*" "phase:2,t:none,log,pass,logdata:'%{tx.bayes_msg}',exec:lua/bayes_train_spam.lua"

SecRuleScript lua/bayes_check_spam.lua "phase:2,t:none,block,msg:'Bayesian Analysis Detects Probable Attack.',logdata:'Score: %{tx.bayes_score}',severity:'2',tag:'WEB_ATTACK/SQL_INJECTION',tag:'WASCTC/WASC-19',tag:'OWASP_TOP_10/A1',tag:'OWASP_AppSensor/CIE1',tag:'PCI/6.5.2',setvar:'tx.msg=%{rule.msg}',setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},setvar:tx.%{rule.id}-WEB_ATTACK/BAYESIAN-%{matched_var_name}=%{tx.0}"

SecRule &TX:ANOMALY_SCORE "@eq 0" "phase:5,t:none,log,pass,logdata:'%{tx.bayes_msg}',exec:lua/bayes_train_ham.lua"
For initial deployment, it is probably best to comment out the SecRuleScript line until you have let the ham/spam scripts run for awhile and conducted some training.  As an example, if a normal user were to submit a non-malicious web form that did not increase the CRS anomaly score, here is what the Lua bayes_train_ham.lua script debug logging would look like:
Lua: Executing script: lua/bayes_train_ham.lua
 Arg Name: ARGS:txtFirstName and Arg Value: Bob.
 Arg Name: ARGS:txtLastName and Arg Value: Smith.
 Arg Name: ARGS:txtSocialScurityNo and Arg Value: 123-12-9045.
 Arg Name: ARGS:txtDOB and Arg Value: 1958-12-12.
 Arg Name: ARGS:txtAddress and Arg Value: 123 Someplace Dr..
 Arg Name: ARGS:txtCity and Arg Value: Fairfax.
 Arg Name: ARGS:drpState and Arg Value: VA.
 Arg Name: ARGS:txtTelephoneNo and Arg Value: 703-794-2222.
 Arg Name: ARGS:txtEmail and Arg Value: bob.smith@mail.com.
 Arg Name: ARGS:txtAnnualIncome and Arg Value: $90,000.
 Arg Name: ARGS:drpLoanType and Arg Value: Car.
 Arg Name: ARGS:sendbutton1 and Arg Value: Submit.
 Low Bayesian Score: . Training payloads as non-malicious.
 Setting variable: tx.bayes_msg=Training payload as ham: Submit.
 Set variable "tx.bayes_msg" to "Training payload as ham: Submit."
Lua: Script completed in 5647 usec, returning: Training payloads as non-malicious: Submit..
Resolved macro %{tx.bayes_msg} to: Training payload as ham: Submit
Warning. Operator EQ matched 0 at TX. [file "/etc/httpd/crs/base_rules/modsecurity_crs_48_bayes_analysis.conf"
On the flip-side, if an attacker starts sending SQLi attacks, then the bayes_train.spam.lua script would process data like this:
Lua: Executing script: lua/bayes_train_spam.lua
 Set variable "MATCHED_VARS:950901-WEB_ATTACK/SQL_INJECTION-ARGS:
txtSocialScurityNo" value "123-12-9045' or '2' < '5' ;--" size 29 to collection.
 Arg Name: MATCHED_VARS:950901-WEB_ATTACK/SQL_INJECTION-ARGS:
txtSocialScurityNo and Arg Value: 123-12-9045' or '2' < '5' ;--.
 Train Results: {misclassified=false,reinforced=true}.
 Setting variable: tx.bayes_msg=Completed Bayesian SPAM Training 
on Payload: 123-12-9045' or '2' < '5' ;--.
 Set variable "tx.bayes_msg" to "Completed Bayesian SPAM Training 
on Payload: 123-12-9045' or '2' < '5' ;--.".
 Lua: Script completed in 2571 usec, returning: Completed Bayesian
 SPAM Training on Payload: 123-12-9045' or '2' < '5' ;--..
 Resolved macro %{tx.bayes_msg} to: Completed Bayesian SPAM 
Training on Payload: 123-12-9045' or '2' < '5' ;--.
 Warning. Pattern match ".*" at TX:950901-WEB_ATTACK/SQL_INJECTION
-ARGS:
txtSocialScurityNo. [file "/etc/httpd/crs/base_rules/modsecurity_crs_48_bayes_analysis.conf"
After training has run for a period of time, you can then enable the SecRuleScript directive to run a classify check against requests that did not trigger any OWASP CRS alerts.  The idea is that if the attacker was able to identify a successful evasion method against the regular expressions, the final attack payload would still be similar enough to the previous payloads that were caught and trained that the Bayesian analysis would catch it.  For example, if we were to resend the example evasion payload shown at the beginning of the blog post, it would now trigger this Bayesian alert:
[Thu Sep 20 10:56:18 2012] [error] [client 72.192.214.223] ModSecurity: Warning. Bayesian Analaysis Alert for ARGS:err with payload: 
"Invalid Login: div 1 union#foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo*/*bar\nselect 1,2,current_user" [file "/etc/httpd/modsecurity.d/crs/base_rules/modsecurity_crs_48_bayes_analysis.conf"] [line "3"] 
[msg "Bayesian Analysis Detects Probable Attack."] 
[data "Score: {prob=0.98743609498952,probs={0.98743609498952,0.01256390501048},class=\\x22/var/log/httpd/spam\\x22,pR=1.1182767689974,reinforce=true}"] [severity "CRITICAL"] [tag "WEB_ATTACK/SQL_INJECTION"] [tag "WASCTC/WASC-19"] [tag "OWASP_TOP_10/A1"] [tag "OWASP_AppSensor/CIE1"] [tag "PCI/6.5.2"] [hostname "www.modsecurity.org"] [uri "/zero.webappsecurity.com/banklogin.asp"] [unique_id "KfQb7sCo8AoAAC33KoQAAAAK"]

Conclusion

This concept is considered experimental as we need more ModSecurity users to test.  If you are interested in testing, please download the latest CRS from GitHub.  If you have an questions, comments or recommendations, please use the OWASP ModSecurity CRS mail-list.

Comments

Popular posts from this blog

Protopage a great iGoogle Alternative

A simple Flex Builder contact form

Designing a Better Contact Page