Commit 8dba5609 authored by Sophie Brun's avatar Sophie Brun

Imported Upstream version 0.3

parent 846ea3d4
-----------------------------------------------
peepdf 0.3 r235, 2014-06-09
-----------------------------------------------
* New features:
- Added descriptive titles for the vulns found
- Added detection of CVE-2013-2729 (Adobe Reader BMP/RLE heap corruption)
- Added support for more than one script block in objects containing Javascript (e.g. XFA objects)
- Updated colorama to version 3.1 (2014-04-19)
- Added detection of CVE-2013-3346 (ToolButton Use-After-Free)
- Added command "js_vars" to show the variables defined in the Javascript context and their content
- Added command "js_jjdecode" to decode Javascript code using the jjencode algorithm (Thanks to Nahuel Riva @crackinglandia)
- Added static detection for CVE-2010-0188
- Added detection for CoolType.dll SING uniqueName vulnerability (CVE-2010-2883). Better late than never ;p
- Added new command "vtcheck" to check for detection on VirusTotal (API key included)
- Added option to avoid automatic Javascript analysis (useful with endless loops)
- Added PyV8 as Javascript engine and removed Spidermonkey (Windows issues).
* Fixes:
- Fixed bug when encrypting/decrypting hexadecimal objects (Thanks to Timo Hirvonen for the feedback)
- Fixed silly bug related to abbreviated PDF Filters
- Fixed bug related to the GNU readline function not handling correctly colorized prompts
- Fixed log_output function, it was storing the previous command output instead of the current one
- Fixed bug in PDFStream to show the stream content when the stream dictionary is empty (Thanks to Nahuel Riva)
- Fixed Issue 12, related to bad JS code parsing due to HTML entities in the XFA form (Thanks to robomotic)
- Fixed Issue 10 related to bad error handling in the PDFFile.decrypt() method
- Fixed Issue 9, related to an uncaught exception when PyV8 is not installed
- Fixed bug in do_metadata() when objects contain /Metadata but they are not really Metadata objects
* Others
- Removed the old redirection method using the "set" command, it is useless now with the shell-like redirection (>, >>, $>, $>>)
* Known issues
- It exists a problem related to the readline module in Mac OS X (it uses editline instead of GNU readline), not handling correctly colorized prompts.
-----------------------------------------------
peepdf Black Hat Vegas (0.2 r156), 2012-07-25
-----------------------------------------------
......
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2011-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......@@ -25,83 +25,85 @@
This module contains some functions to analyse Javascript code inside the PDF file
'''
import sys, re , os, jsbeautifier
from PDFUtils import unescapeHTMLEntities
import sys, re , os, jsbeautifier, traceback
from PDFUtils import unescapeHTMLEntities, escapeString
try:
from spidermonkey import Runtime
import PyV8
JS_MODULE = True
class Global(PyV8.JSClass):
evalCode = ''
def evalOverride(self, expression):
self.evalCode += '\n\n// New evaluated code\n' + expression
return
except:
JS_MODULE = False
errorsFile = 'errors.txt'
newLine = os.linesep
reJSscript = '<script[^>]*?contentType\s*?=\s*?[\'"]application/x-javascript[\'"][^>]*?>(.*?)</script>'
preDefinedCode = 'var app = this;'
def analyseJS(code):
def analyseJS(code, context = None, manualAnalysis = False):
'''
Search for obfuscated functions in the Javascript code
Hooks the eval function and search for obfuscated elements in the Javascript code
@param code: The Javascript code (string)
@return: List with analysis information of the Javascript code: [JSCode,unescapedBytes,urlsFound], where JSCode is a list with the several stages Javascript code, unescapedBytes is a list with the parameters of unescape functions, and urlsFound is a list with the URLs found in the unescaped bytes.
@return: List with analysis information of the Javascript code: [JSCode,unescapedBytes,urlsFound,errors,context], where
JSCode is a list with the several stages Javascript code,
unescapedBytes is a list with the parameters of unescape functions,
urlsFound is a list with the URLs found in the unescaped bytes,
errors is a list of errors,
context is the context of execution of the Javascript code.
'''
error = ''
errors = []
JSCode = []
unescapedBytes = []
urlsFound = []
oldStdErr = sys.stderr
errorFile = open('jserror.log','wb')
sys.stderr = errorFile
try:
scriptCode = re.findall(reJSscript, code, re.DOTALL | re.IGNORECASE)
if scriptCode != []:
code = scriptCode[0]
code = unescapeHTMLEntities(code)
scriptElements = re.findall(reJSscript, code, re.DOTALL | re.IGNORECASE)
if scriptElements != []:
code = ''
for scriptElement in scriptElements:
code += scriptElement + '\n\n'
code = jsbeautifier.beautify(code)
JSCode.append(code)
if code != None and JS_MODULE:
r = Runtime()
context = r.new_context()
if code != None and JS_MODULE and not manualAnalysis:
if context == None:
context = PyV8.JSContext(Global())
context.enter()
# Hooking the eval function
context.eval('eval=evalOverride')
#context.eval(preDefinedCode)
while True:
evalFunctionsData = searchObfuscatedFunctions(code, 'eval')
originalElement = code
for evalFunctionData in evalFunctionsData:
if not evalFunctionData[2]:
modifiedCode = evalFunctionData[1][0].replace(evalFunctionData[0],'return')
code = originalElement.replace(evalFunctionData[1][0],modifiedCode)
else:
code = originalElement.replace(evalFunctionData[1][0],evalFunctionData[1][1]+';')
originalCode = code
try:
executedJS = context.eval_script(code)
if executedJS == None:
raise exception
break
except:
if evalFunctionData[2]:
modifiedCode = evalFunctionData[1][0].replace(evalFunctionData[0],'return')
code = originalElement.replace(evalFunctionData[1][0],modifiedCode)
else:
code = originalElement.replace(evalFunctionData[1][0],evalFunctionData[1][1]+';')
try:
executedJS = context.eval_script(code)
if executedJS == None:
raise exception
except:
code = originalElement
continue
else:
break
if executedJS != originalElement and executedJS != None and executedJS != '':
code = executedJS
context.eval(code)
evalCode = context.eval('evalCode')
evalCode = jsbeautifier.beautify(evalCode)
if evalCode != '' and evalCode != code:
code = evalCode
JSCode.append(code)
else:
break
except:
error = str(sys.exc_info()[1])
open('jserror.log','ab').write(error + newLine)
errors.append(error)
break
if code != None:
if code != '':
escapedVars = re.findall('(\w*?)\s*?=\s*?(unescape\((.*?)\))', code, re.DOTALL)
for var in escapedVars:
bytes = var[2]
if bytes.find('+') != -1:
if bytes.find('+') != -1 or bytes.find('%') == -1:
varContent = getVarContent(code, bytes)
if len(varContent) > 150:
ret = unescape(varContent)
......@@ -126,20 +128,13 @@ def analyseJS(code):
if url not in urlsFound:
urlsFound.append(url)
except:
errors.append('Unknown error!!')
traceback.print_exc(file=open(errorsFile,'a'))
errors.append('Unexpected error in the JSAnalysis module!!')
finally:
errorFile.close()
sys.stderr = oldStdErr
errorFileContent = open('jserror.log','rb').read()
if errorFileContent != '' and errorFileContent.find('JavaScript error') != -1:
lines = errorFileContent.split(newLine)
for line in lines:
if line.find('JavaScript error') != -1 and line not in errors:
errors.append(line)
for js in JSCode:
if js == None or js == '':
JSCode.remove(js)
return [JSCode,unescapedBytes,urlsFound,errors]
return [JSCode,unescapedBytes,urlsFound,errors,context]
def getVarContent(jsCode, varContent):
'''
......@@ -159,6 +154,7 @@ def getVarContent(jsCode, varContent):
if re.match('["\'].*?["\']', part, re.DOTALL):
clearBytes += part[1:-1]
else:
part = escapeString(part)
varContent = re.findall(part + '\s*?=\s*?(.*?)[,;]', jsCode, re.DOTALL)
if varContent != []:
clearBytes += getVarContent(jsCode, varContent[0])
......@@ -235,7 +231,10 @@ def unescape(escapedBytes, unicode = True):
else:
unicodePadding = ''
try:
if escapedBytes.find('%u') != -1 or escapedBytes.find('%U') != -1 or escapedBytes.find('%') != -1:
if escapedBytes.lower().find('%u') != -1 or escapedBytes.lower().find('\u') != -1 or escapedBytes.find('%') != -1:
if escapedBytes.lower().find('\u') != -1:
splitBytes = escapedBytes.split('\\')
else:
splitBytes = escapedBytes.split('%')
for i in range(len(splitBytes)):
splitByte = splitBytes[i]
......
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2011-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......@@ -55,7 +55,7 @@ def computeEncryptionKey(password, dictOwnerPass, dictUserPass, dictOE, dictUE,
password = password[:32]
elif lenPass < 32:
password += paddingString[:32-lenPass]
md5input = password + dictOwnerPass + struct.pack('<i',int(pElement)) + fileID
md5input = password + dictOwnerPass + struct.pack('<I',abs(int(pElement))) + fileID
if revision > 3 and not encryptMetadata:
md5input += '\xFF'*4
key = hashlib.md5(md5input).digest()
......@@ -257,6 +257,9 @@ def isOwnerPass(password, dictO, dictU, computedUserPass, keyLength, revision):
dictO = RC4(dictO,newKey)
counter -= 1
userPass = dictO
else:
# Is it possible??
userPass = ''
return isUserPass(userPass, computedUserPass, dictU, revision)
def RC4(data, key):
......
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2011-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......@@ -69,26 +69,28 @@ def decodeStream(stream, filter, parameters = {}):
@param parameters: List of PDFObjects containing the parameters for the filter
@return: A tuple (status,statusContent), where statusContent is the decoded stream in case status = 0 or an error in case status = -1
'''
if filter == '/ASCIIHexDecode' or filter == 'AHx':
if filter == '/ASCIIHexDecode' or filter == '/AHx':
ret = asciiHexDecode(stream)
elif filter == '/ASCII85Decode' or filter == 'A85':
elif filter == '/ASCII85Decode' or filter == '/A85':
ret = ascii85Decode(stream)
elif filter == '/LZWDecode' or filter == 'LZW':
elif filter == '/LZWDecode' or filter == '/LZW':
ret = lzwDecode(stream, parameters)
elif filter == '/FlateDecode' or filter == 'Fl':
elif filter == '/FlateDecode' or filter == '/Fl':
ret = flateDecode(stream, parameters)
elif filter == '/RunLengthDecode' or filter == 'RL':
elif filter == '/RunLengthDecode' or filter == '/RL':
ret = runLengthDecode(stream)
elif filter == '/CCITTFaxDecode' or filter == 'CCF':
elif filter == '/CCITTFaxDecode' or filter == '/CCF':
ret = ccittFaxDecode(stream, parameters)
elif filter == '/JBIG2Decode':
ret = jbig2Decode(stream, parameters)
elif filter == '/DCTDecode' or filter == 'DCT':
elif filter == '/DCTDecode' or filter == '/DCT':
ret = dctDecode(stream, parameters)
elif filter == '/JPXDecode':
ret = jpxDecode(stream)
elif filter == '/Crypt':
ret = crypt(stream, parameters)
else:
ret = (-1, 'Unknown filter "%s"' % filter)
return ret
def encodeStream(stream, filter, parameters = {}):
......@@ -120,6 +122,8 @@ def encodeStream(stream, filter, parameters = {}):
ret = jpxEncode(stream)
elif filter == '/Crypt':
ret = crypt(stream, parameters)
else:
ret = (-1, 'Unknown filter "%s"' % filter)
return ret
'''
......
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2011-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......@@ -25,7 +25,7 @@
Module with some misc functions
'''
import os,re,htmlentitydefs
import os, re, htmlentitydefs, json, urllib, urllib2
def clearScreen():
'''
......@@ -414,3 +414,26 @@ def unescapeString(string):
unescapedValue += string[i]
i += 1
return unescapedValue
def vtcheck(md5, vtKey):
'''
Function to check a hash on VirusTotal and get the report summary
@param md5: The MD5 to check (hexdigest)
@param vtKey: The VirusTotal API key needed to perform the request
@return: A dictionary with the result of the request
'''
vtUrl = 'https://www.virustotal.com/vtapi/v2/file/report'
parameters = {'resource':md5,'apikey':vtKey}
try:
data = urllib.urlencode(parameters)
req = urllib2.Request(vtUrl, data)
response = urllib2.urlopen(req)
jsonResponse = response.read()
except:
return (-1, 'The request to VirusTotal has not been successful')
try:
jsonDict = json.loads(jsonResponse)
except:
return (-1, 'An error has occurred while parsing the JSON response from VirusTotal')
return (0, jsonDict)
\ No newline at end of file
......@@ -6,9 +6,9 @@ http://twitter.com/peepdf
** Dependencies **
- In order to analyse Javascript code "python-spidermonkey" is needed:
- In order to analyse Javascript code "PyV8" is needed:
http://code.google.com/p/python-spidermonkey
http://code.google.com/p/pyv8/
- The "sctest" command is a wrapper of "sctest" (libemu). Besides libemu pylibemu is used and must be installed:
......
Pending tasks:
- User manual
- Add detection of more exploits/vulns
- Documentation of methods in PDFCore.py
- Add the rest of supported stream filters (better testing of existent)
- Automatic analysis of embedded PDF files
- Add AES to the encryption implementation
- Improve the automatic Javascript analysis, getting code from other parts of the documents (getAnnots, etc)
- GUI
- ActionScript analysis?
\ No newline at end of file
- ...
\ No newline at end of file
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2012-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......
This diff is collapsed.
......@@ -3,7 +3,7 @@
# http://peepdf.eternal-todo.com
# By Jose Miguel Esparza <jesparza AT eternal-todo.com>
#
# Copyright (C) 2012 Jose Miguel Esparza
# Copyright (C) 2012-2014 Jose Miguel Esparza
#
# This file is part of peepdf.
#
......
......@@ -7,7 +7,7 @@
<!ELEMENT date ( #PCDATA ) >
<!ELEMENT basic ( filename, md5, sha1, sha256, size, version, binary, linearized, encrypted, updates, num_objects, num_streams, comments, errors ) >
<!ELEMENT basic ( filename, md5, sha1, sha256, size, detection, pdf_version, binary, linearized, encrypted, updates, num_objects, num_streams, comments, errors ) >
<!ELEMENT filename ( #PCDATA ) >
......@@ -19,6 +19,12 @@
<!ELEMENT size ( #PCDATA ) >
<!ELEMENT detection ( rate?, report_link? ) >
<!ELEMENT rate ( #PCDATA ) >
<!ELEMENT report_link ( #PCDATA ) >
<!ELEMENT pdf_version ( #PCDATA ) >
<!ELEMENT binary EMPTY >
......@@ -51,7 +57,7 @@
<!ELEMENT advanced ( version* ) >
<!ELEMENT version ( catalog , info , objects , streams , js_objects, suspicious_elements , suspicious_urls ) >
<!ELEMENT version ( catalog, info, objects, streams ,js_objects, suspicious_elements, suspicious_urls ) >
<!ATTLIST version num NMTOKEN #REQUIRED >
<!ATTLIST version type ( original | update ) #REQUIRED >
......
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment