W
wiki
Internal data formats
Sources of tracker data
- https://etip.exodus-privacy.eu.org
- https://reports.exodus-privacy.eu.org/en/trackers/- manually compiled list of Android trackers
- https://better.fyi tracker list - https://source.ind.ie/better/site/tree/master/content/trackers
- https://github.com/jawz101/potentialTrackers - meant as a feeder for Exodus ETIP
- https://github.com/AdAway/adaway.github.io/blob/master/hosts.txt
- https://github.com/YalePrivacyLab/tracker-profiles
- #14 - DuckDuckGo Tracker Radar
- https://github.com/mozilla-services/shavar-prod-lists - Disconnect's tracker list
- EFF Privacy Badger heuristics
- https://github.com/topics/adblock-list - overview of tracking/adblock lists repos
- Virustotal's Website/domain scanning engines & datasets
- Virustotal's YARA rule repositories
Sources of APKs
- https://www.apklab.io/ - massive APK collection, provides data about them, with very limited download possibilities.
- https://androzoo.uni.lu/apksearch
- list of libraries included in APKs in izzysoft repo: https://gitlab.com/IzzyOnDroid/repo/-/tree/master/libs
- Coronavirus apps
Potentially useful tools
- LibScout - detect SDKs/libraries with their version in binary APKs
- https://github.com/cryptax/droidlysis - cryptax's (aXelle's) tool: "DroidLysis is a property extractor for Android apps". See also her talk at hacklu 2019
- https://github.com/avast/apkparser - faster manifest/resources parser
- https://github.com/avast/apkverifier - faster APK signature verifier
- https://github.com/rednaga/APKiD - "In addition to detecting packers, obfuscators, and other weird stuff, it can also identify if an app was compiled by the standard Android compilers or dexlib"
- https://github.com/facebook/redex - "taking advantage of Redex allows us to normalise the applications prior to analysis"
- https://github.com/kaitai-io/kaitai_struct_formats/blob/master/executable/dex.ksy - generic binary struct parser tool
- https://github.com/autonomousapps/dependency-analysis-android-gradle-plugin - Produce a report of unused direct dependencies and used transitive dependencies.
- https://github.com/armijnhemel/binaryanalysis-ng - NLnet funded project, its a framework for unpacking files recursively and running checks on the unpacked files
- https://github.com/jedisct1/ipgrep - ipgrep extracts possibly obfuscated host names and IP addresses from text, resolves host names, and prints them, sorted by ASN.
- https://github.com/plum-umd/redexer - infer with which parameters the app uses certain permissions (we name this feature RefineDroid)
- https://github.com/ytliu/apk-static-xref - staticallly generate a cross-reference-graph (XRG) of a component (e.g., Service) of Android APK file
- https://github.com/dorneanu/smalisca - Static Code analysis tool that generates call graphs
- https://github.com/U039b/android_permissions_harvester - for finding which permissions are used based on method calls
- https://github.com/stricaud/faup - Fast URL decoder library
- Virustotal's File characterization tools & datasets
Related research
- https://www.intel.com/content/www/us/en/artificial-intelligence/documents/stamina-deep-learning-for-malware-protection-whitepaper.html - STAMINA. Work by Intel and Microsoft to use CNNs on binary files for malware classification.
- https://blogs.uni-paderborn.de/sse/tools/susi/ - SuSi is a tool for the fully automated classification and categorization of Android sources and sinks.
- https://www.guardsquare.com/en/products/dexguard - enhanced APK obfuscator, its use is detectable, and could be a sign of hiding trackers
- AdGraph: A Graph-Based Approach to Ad and Tracker Blocking - https://brave.com/brave-proposes-a-machine-learning-approach-for-ad-blocking/
- links & research papers related to Machine Learning applied to source code (MLonCode)
- https://lilicoding.github.io/papers/li2015iccta.pdf - IccTA: Detecting Inter-Component Privacy Leaks in Android Apps
- https://privacyinternational.org/long-read/3226/buying-smart-phone-cheap-privacy-might-be-price-you-have-pay
- An Analysis of Pre-installed Android Software - IMDEA/Narseo
- Execution Enhanced Static Detection of AndroidPrivacy Leakage Hidden by Dynamic Class Loading - (BroadcastReceiver) detection and other related things
- 50 Ways to Leak Your Data: An Exploration of Apps’ Circumvention of the Android Permissions System -
- Finding (partial) code clones at method level in Android programs without access to source code to detect copyright infringements or security issues - matching repeated code snippets in sets of binaries
Articles, rants, blogs, papers and other publications
- Bruce Schneier on facial recognition, surveillance, tracking
- F-Droid blog post announcing Tracking the Trackers
Related examples
- Brave Browser's privacy preserving product analytics - https://github.com/brave/brave-browser/wiki/P3A
- example F-Droid RFP issue where people are working through identifying trackers fdroid/rfp#786 (comment 145548688)
-
testInstrumentationRunner
code example of Android tracking in action: https://github.com/android/android-test/blob/master/runner/android_junit_runner/java/androidx/test/internal/runner/tracker/AnalyticsBasedUsageTracker.java#L234
Meta-stuff
- On finding the right metrics/score - this well written blog post warns us of choosing the all too obvious and easy metric for whatever we want to achieve with ML. Metrics will be gamed.
Meeting Notes
- 2020-05-07
- 2020-03-25
- 2020-03-11
- 2020-03-04
- 2020-02-19
- 2020-02-12
- 2020-02-05
- 2020-01-29
- 2020-01-22
- 2020-01-15