This is a personal blog. My other stuff: book | home page | Twitter | G+ | CNC robotics | electronics

November 21, 2010

Understanding and using skipfish

Skipfish, my open source web application security scanner, is now about eight months old - and, over the course of over 70 releases, has undergone a number of substantial changes. While I maintain detailed documentation and a short troubleshooting guide, it seems appropriate to share some additional hints on how to get the most out of this tool, should you be inclined to try it out.

A word on design goals (and what they really mean)

While skipfish tries to scratch quite a few itches, its primary goals - and the areas where it hopefully stands out - are:
  • Raw speed: I am constantly frustrated by the performance and memory footprint of many of the open source and commercial scanners and brute-force tools I had to work with. Skipfish tries hard to improve this - and to my knowledge, has by far the fastest HTTP and content analysis engine out there.

    This does not mean that skipfish scans always take the least amount of time, compared to other tools; it simply means that I can cram a whole lot more functionality, and get much better coverage, without making the assessment unreasonably long.

    In cases where the server is the bottleneck, this can obviously backfire; but when dealing with slow targets, you can configure the scanner to get a reduced coverage - roughly comparable to a more traditional tool.

  • Unique brute-force capabilities: the performance of the scanner allowed me to incorporate an extensive ${keyword}.${extension} brute-force functionality similar to that of DirBuster - coupled with highly customized, hand-picked dictionaries, and a unique auto-learning feature that builds an adaptive, target-specific dictionary based on site content analysis.

    Most other scanners simply can't afford to let you do this in any meaningful way - and when they do, the dictionaries you can use with them are much less sophisticated, and far more request-intensive. I consider this functionality to be one of the more important assets of the tool - but you are certainly not forced to use it where impractical. The brute-force testing features are completely optional, and can be turned off to improve scan times as much as 500-fold.

  • High quality security checks: most scanners employ fairly naive security logic - for example, to test cross-site scripting, they may attempt injecting <script>alert(1)</script>; to detect directory traversal, they may try ../../../../../etc/passwd; and to test SQL injection, they may attempt supplying technology-specific code and look for equally technology-specific output strings.

    Needless to say, all these checks have many painfully simple failure modes: the XSS check will result in a false negative when the input is partly escaped - or appears inside a HTML comment; traversal checks will fail if the application always appends a fixed extension to the input string, or is running within a chroot() jail; and SQL injection logic will break when dealing with an unfamiliar backend or an uncommon application framework.

    To that effect, skipfish puts emphasis on well-crafted probes, and on testing for behavioral patterns, rather than signatures. For example, when testing for string-based SQL injection, we compare the results of passing '"original_value, \'\"original_value, and \\'\\"original_value. When the first response is similar to the third one, but different from from the second one - we can, with a pretty high confidence, say that there is an underlying query injection vulnerability (even if query results can't be observed directly). Interestingly, this check is versatile enough to do a pretty good job detecting eval()-related vulnerabilities in PHP, and injection bugs in many other non-SQL query languages.

    Similarly, when probing for file inclusion, the scanner will try to compare the output of original_value, ./original_value, ../original_value, and .../original_value; if the first two cases result in similar output, and the two remaining ones result in a different outcome, we are probably dealing with a traversal problem. The redundancy helps rule out differences that can be attributed to input validation - and hey, that check also triggers on many remote file inclusion vectors in PHP.

    For XSS, skipfish does depend on content analysis - but instead of the standard practice of throwing the entire XSS cheatsheet at the target, it injects a complex string that is guaranteed to break out of many different parsing modes (and much less likely to fail with crude XSS filters in place): -->">'>'", followed by a uniquely numbered tag. The unique identifier enables stored XSS detection later on, and is also interpreted in a special way inside <script> or <style> blocks.

    There are many other design decisions along these lines; I believe they have a profound impact on the ability to detect real-world security problems - although paradoxically, they make the scanner perform poorly with simulated vulnerabilities in demo sites.

  • Coverage of more nuanced problems: most web application assessment tools simply pay no attention to the security risks caused by subtle MIME type or character set mismatches - and the awareness that these problems may lead to exploitable XSS vectors is very low within the security community.

    Skipfish makes a point of noticing these and many other significant security issues usually neglected by other tools - such as caching intent mismatches, mixed content issues, XSSI, third-party scripts, cross-site request forgery, and so forth.

    Quite a few people complained about "pointless" or "odd" warnings for problems thought to be non-security issues. Rest assured, most of these cases could be shown to be exploitable. When in doubt, don't hesitate to ping me for a second opinion!

  • Adaptive scanning for real-world applications: many scanners don't handle complex, mixed technology sites particularly well; a common headache is dealing with closely related requests being handled by different application backends with different URL semantics, error messages, or even case-sensitivity. Another hard problem is recognizing obscure 404 behaviors, unusual parameter passing conventions, redirection patterns, content duplication, and so forth. All this often requires repeated, painstaking configuration tweaks.

    While skipfish certainly is not perfect - no scanner can be - the code is designed to be able to cope with these scenarios exceptionally well. This is achieved chiefly by not relying on directory, file, or error message signatures - and instead, carrying out adaptive probes for every new fuzzed location; and quickly recognizing crawl tree branches that look very much alike. While heuristics can fail in unexpected ways, I think it's of immense value.

  • Sleek reports with very little noise: skipfish generally does not complain about highly non-specific "vulnerabilities" commonly reported by other scanners; for example, it does not pay special attention to every non-httponly cookie, to every password form with autocomplete enabled, or to framework version or system path disclosure patterns on various pages. This means that in practice, auditors will see fewer issues in a skipfish report than in the output of most other assessment tools - and this is not a bug.

    Rest assured, the interactive report produced after a scan includes summary sections where the auditor can review all password forms, cookies, and so forth if necessary - but the assumption is that human evaluation can't and should not be substituted here.

While skipfish certainly isn't perfect, these are the core properties I care about - and try to continually improve.

The most important setting: dictionary modes

The single most misunderstood - and important - feature of skipfish is its directory management model. Quite simply, getting this part wrong can easily ruin your scans.

I encourage you to have a look at the recently revamped dictionaries/README-FIRST file, which explains the basics of dictionary management in greater detail; but at the very minimum, you should be aware of the following choice:

  • Absolutely no brute-force: in this mode, skipfish performs an orderly crawl of the target site, and behaves similarly to other basic scanners. The mode is not recommended in most cases due to limited coverage - resources such as /admin/ or /index.php.old may not be discovered - but is blazing fast. To use it, try:
    ./skipfish -W /dev/null -LV [...other options...]
  • Lightweight brute-force: in this mode, the scanner will only try fuzzing the file name (/admin/), or the extension (/index.php.old), but never both at the same time (/backup.tgz will typically not be hit). The cost of doing so is about 1,700 requests per fuzzed location. To use this mode, try:
    cp dictionaries/complete.wl dictionary.wl
    ./skipfish -W dictionary.wl -Y [...other options...]
    This mode is the preferred way of limiting scan time where fully-fledged brute-force testing is not feasible.

  • Normal dictionary brute-force: in this mode, the scanner will test all the possible file name and extension pairs (i.e., /backup.tgz will be discovered, too). The mode is significantly slower, but offers superior coverage - and should be your default pick in most cases. To enable it, try:
    cp dictionaries/minimal.wl dictionary.wl
    ./skipfish -W dictionary.wl [...other options...]
    The cost of this mode is about 50,000 requests per fuzzed location. You can replace minimal.wl with medium.wl or complete.wl for even better coverage, but at the expense of a 2x to 3x increase in scan time; see dictionaries/README-FIRST for an explanation of the difference between these files.

Other options you need to know about

Skipfish requires relatively little configuration, but rest assured, is not a point-and-click tool. You should definitely review the documentation to understand the operation of rudimentary options such as -C (use cookie authentication), -I (only crawl matching URLs), -X (exclude matching URLs), -D (define domain scope), -m (limit simultaneous connections), and so forth.

Here is a list of some of the more useful but under-appreciated options that you should consider using in your work:

  • Limit crawl tree fanout: options -c (immediate children limit) and -x (total descendant limit) allow you to fine-tune scan coverage for very large sites where -I and -X options are impractical to use. Non-deterministic crawl probability setting (-p) may also be helpful there.

  • More SSL checks: some sites care about getting SSL right; specify -M to warn about dangerous mixed content scenarios and insecure password forms.

  • Spoof another browser: use -b to convincingly pretend to be MSIE, Firefox, or an iPhone. This option does not merely change the User-Agent string, but also ensures that other headers have the right syntax and ordering.

  • Do not accept new cookies: on cookie-authenticated sites that do not maintain session state on server side, accidental logout can be prevented without the need to carefully specify -X locations: adding -N simply instructs skipfish to ignore all attempts to delete or modify -C cookies.

  • Trust another domain: skipfish complains about dangerous script inclusion and content embedding from third-party domains, and can optionally warn you about outgoing links. To minimize noise, use -B to identify any domains you trust, and therefore, want to exclude from these checks.

  • Reduce memory footprint: to generate scan reports, the scanner keeps samples of all retrieved documents in memory. On some large, multimedia-heavy sites, this may consume a lot of RAM. In these cases, -e may be used to purge binary (non-ASCII) content without impacting report quality appreciably.

  • Be dumb: by specifying -O, skipfish can be instructed not to analyze returned HTML to extract links - turning it into a fast, purely brute-force tool.


Skipfish has bugs, there are features yet to be implemented, and web frameworks it hasn't encountered yet. When you run into any trouble, please check out this doc, but also do not hesitate to ping me directly. Your feedback is the only way this tool can be improved.

No comments:

Post a Comment