Learning Program Semantics for Vulnerability Detection via Vulnerability-Specific Inter-procedural Slicing

Paper: https://dl.acm.org/doi/10.1145/3611643.3616351

Code: https://sites.google.com/view/snapvuln

Overview

SnapVuln根据漏洞种类进行过程间切片,针对每种漏洞训练子模型,进行漏洞检测及漏洞种类预测。

Challenges

  1. single-function detection: Devign, REVEAL (TSE 2021), Draper (ICMLA 2018)
    • 漏洞可能跨越多个函数(callee functions)
  2. 从vulnerable program points开始进行切片 (e.g., dangerous API calls / variables)
    • noise:无法区分sink / source,并且包含所有与vulnerable points相关的statement。
    • 由于错误执行顺序导致的漏洞语义:使用DDG / PDG切片,没有考虑CFG(程序执行顺序)。e.g., UAF。

Motivation

  1. 应该引入inter-procedural analysis用于获取更完整的漏洞语义
  2. 应该设计slicing algorithms用于获取精确的漏洞语义
    1. 没有考虑CFG,无法反映错误执行顺序导致的漏洞语义(e.g., UAF)。
    2. 包含与漏洞无关的statements:部分与漏洞无关的语句也依赖于vulnerability-related statements。

Approach - SnapVuln

  1. 从source code提取由PDG, CFG, CG (call graph) 结合的inter-procedural graph (IG)。
  2. 根据漏洞种类设计slicing algorithms用于获取漏洞语义,识别source & sink并进一步对IG进行操作,获得相应漏洞种类的IG subgraph。
    • 本文针对6种常见漏洞类型设计了用于C/C++的切片算法
  3. 对每种漏洞种类使用一个submodel学习相应subgraph的代码表示信息
  4. 将所有submodel对各类型漏洞的预测结果集成,得到综合预测结果。

Evaluation

  • RQ2: The Completeness of Vulnerability Semantics
    • 不使用切片时,比较在single-function (intra)与muti-function (inter)上,SnapVuln的检测效果。
  • RQ3: The Precision of Vulnerability Semantics
    • 比较使用不同的切片策略(vuldeepecker-DDG, sysever-PDG, deepwukong-PDG with API / arithmetic operation as criterion),SnapVuln的检测效果。

Limitation

  • Vulnerability types. “Vulnerabilities in real-world codebases are diverse, and the slicing algorithms in SnapVuln may not cover every single case.”
  • Sample size. 难以有效处理有大量代码的large program

code: (1) https://sites.google.com/view/snapvuln (2) https://github.com/wubozhi/SnapVuln

SnapVuln为各漏洞类型制定的切片策略

基于SnapVuln对漏洞类型的扩展:https://sites.google.com/view/snapvuln/basic-tutorial-for-extention

  • buffer overflow
    • pattern: stack-based (arrays) & heap-based (pointers)
    • source: allocation of arrays & pointers
    • sink: function calls / assignments to manipulate pointers / arrays
      • function calls: strcpy, memcpy, etc.
      • assignment: assigns value to memory that pointer points to or array with index
    • slicing: backward slicing on PDG: sink (write to array / pointers)->all sources
  • memory leak
    • pattern: does not su ciently track and release memory after it has been used
    • source: 1) call standard function (i.e., malloc, calloc), 2) call wraper functions (e.g., “cifs_buf_get”, “AcquireQuantumInfo”)
      • wrapper functions: analyzing the call graph of the program
      • cifs_buf_get (call) mempool_alloc (call) pool->alloc
    • sink: the end of the program
    • slicing: forward slicing on PDG: source (assign to poiners & invoke callee) -> sink (program end)
  • null pointer dereference
    • pattern: accesses a pointer that expects to be valid but is NULL instead
    • source: 1) call standard function, 2) call wraper functions
    • sink: first statement using the pointer
    • slicing: criterion - assign return of callee to pointer, 1) forward slicing on PDG, 2) forward slicing on CFG, 3) compute first statement
  • integer overflow
    • pattern: an integer value is incremented after calculation to a value that is too large to store in the associated representation
    • source: assign value to a variable (int, long, short, etc.)
    • sink: arithmetic operations
    • slicing: backward slicing on PDG: sink (arithmetic operators) -> sources
  • use after free
    • pattern: allocated heap memory used after release
    • source: 1) call standard function, 2) call wraper functions
    • sink: first statement using the pointer after the free operation
      • hard to identify free operation -> last statement uses the pointer
    • slicing: criterion - assign to pointers & invoke callee, 1) forward slicing on PDG, 2) forward slicing on CFG, 3) compute last statement
  • double free
    • pattern: the same heap pointer is released twice or more in a program
    • source: 1) call standard function, 2) call wraper functions
    • sink: two or more free operations on pointers -> last statement uses the pointer
    • slicing:  forward slicing on PDG: source (assign to pointers & invoke callee) -> sink (last statement uses the pointer)