Commit 5bb00500 by Marek Vavrusa

initial import

parents
Copyright (c) 2016, Marek Vavrusa <marek@vavrusa.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
\ No newline at end of file
# LuaJIT to extended BPF compiler
*Disclaimer: this is still work-in-progress, see "Current state"*
Why? BPF allows you to execute a small sandboxed programs directly in kernel, that can talk back to userspace over shared maps. Since the programs are small and verified, they are guaranteed not to crash or lock the kernel. That's fantastic not only as a performance introspection tool with tracepoints and probes, but also for low-latency packet filtering, load-balancing, IPS and a ton of other purposes.
However, it's not dead simple to use as Brendan Greggs puts it:
> it's hard to use via its assembly or C interface. The challenge attracts me, but it can be a brutal experience, especially if you write eBPF assembly directly (eg, see `bpf_insn_prog[]` from sock_example.c; I've yet to code one of these from scratch that compiles). The C interface is better (see other examples in [samples/bpf](https://github.com/torvalds/linux/tree/master/samples/bpf)), but it's still laborious and difficult to use.
Now it possible to write Lua functions and compile them transparently to BPF byte code, here's the same socket example:
```lua
local bpf = require('bpf')
local map = bpf.map('array', 256)
-- Kernel-space part of the program
local prog = assert(bpf(function ()
local proto = pkt.ip.proto -- Get byte (ip.proto) from frame at [23]
xadd(map[proto], 1) -- Increment packet count
end))
-- User-space part of the program
local S = require('ljsyscall')
local sock = assert(bpf.socket('lo', prog))
for i=1,10 do
local icmp, udp, tcp = map[1], map[17], map[6]
print('TCP', tcp, 'UDP', udp, 'ICMP', icmp, 'packets')
S.sleep(1)
end
```
Similarly, the [bcc][bcc] project uses LLVM rewriter to compile C code with BPF-specific extensions to BPF bytecode. This project takes a function in Lua, decodes its bytecode and compiles it into BPF. What's the difference? luajit-bpf integrates seamlessly with existing code (and access existing Lua upvalues), no user/kernel-space separations, ELF walking and no embedded C code.
The other application of BPF programs is attaching to probes for [perf event tracing][tracing]. That means you can trace events inside the kernel (or user-space), and then collect results - for example histogram of `sendto()` latency, off-cpu time stack traces, syscall latency, and so on. While kernel probes and perf events have unstable ABI, with a dynamic language we can create and use proper type based on the tracepoint ABI on runtime.
Runtime automatically recognizes memory that needs a helper to be accessed. The type casts denote source of the memory, for example the [bashreadline][bashreadline] example that prints entered bash commands from all running shells:
```lua
local ffi = require('ffi')
local bpf = require('bpf')
-- Kernel-space part of the program
local prog = bpf(function (ptregs)
local req = ffi.cast('struct pt_regs', ptregs) -- Cast to pt_regs, specialized type.
local line = ffi.new('char [40]') -- Create a 40 byte buffer on stack
ffi.copy(line, ffi.cast('char *', req.ax)) -- Cast `ax` to string pointer and copy to buffer
print('%s\n', line) -- Print to trace_pipe
end)
local probe = assert(bpf.uprobe('/bin/bash:readline', prog, true, -1, 0))
-- User-space part of the program
local ok, err = pcall(function()
local log = bpf.tracelog()
print(' TASK-PID CPU# TIMESTAMP FUNCTION')
print(' | | | | |')
while true do
print(log:read()) -- Tail trace_pipe log
end
end)
```
Where cast to `struct pt_regs` flags the source of data as probe arguments, which means any pointer derived
from this structure points to kernel and a helper is needed to access it. Casting `req.ax` to pointer is then required for `ffi.copy` semantics, otherwise it would be treated as `u64` and only it's value would be
copied.
## Installation
```bash
$ luarocks install luajit-bpf
```
## Examples
See `examples` directory.
### Helpers
* `print(...)` is a wrapper for `bpf_trace_printk`, the output is captured in `cat /sys/kernel/debug/tracing/trace_pipe`
* `bit.*` library **is** supported (`lshift, rshift, arshift, bnot, band, bor, bxor`)
* `math.*` library *partially* supported (`log2, log, log10`)
* `ffi.cast()` is implemented
* `ffi.new(...)` allocates memory on stack, initializers are NYI
* `ffi.copy(...)` copies memory (possibly using helpers) between stack/kernel/registers
* `ntoh(x[, width])` - convert from network to host byte order.
* `hton(x[, width])` - convert from host to network byte order.
* `xadd(dst, inc)` - exclusive add, a synchronous `*dst += b` if Lua had `+=` operator
Below is a list of BPF-specific helpers:
* `time()` - return current monotonic time in nanoseconds (uses `bpf_ktime_get_ns`)
* `cpu()` - return current CPU number (uses `bpf_get_smp_processor_id`)
* `pid_tgid()` - return caller `tgid << 32 | pid` (uses `bpf_get_current_pid_tgid`)
* `uid_gid()` - return caller `gid << 32 | uid` (uses `bpf_get_current_uid_gid`)
## Current state
* Not all LuaJIT bytecode opcodes are supported *(notable mentions below)*
* Closures `UCLO` will probably never be supported, although you can use upvalues inside compiled function.
* Type narrowing is opportunistic and sticky from right to left (thus `u8 <- u64 + u8`); numbers are 64-bit by default, but 64-bit immediate loads are not supported (e.g. `local x = map[ffi.cast('uint64_t', 1000)]`)
* Tail calls `CALLT`, and iterators `ITERI` are NYI (as of now)
* Arbitrary ctype **is** supported both for map keys and values
* Basic optimisations like: constant propagation, partial DCE, liveness analysis and speculative register allocation are implement, but there's no control flow analysis yet. This means the compiler has the visibility when things are used and dead-stores occur, but there's no rewriter pass to eliminate them.
* No register sub-allocations, no aggressive use of caller-saved `R1-5`, no aggressive narrowing (this would require variable range assertions and variable relationships)
* `bpf_perf_event_output()` is NYI
* Slices with not 1/2/4/8 length are NYI (requires allocating a memory on stack and using pointer type)
[bcc]: https://github.com/iovisor/bcc
[tracing]: http://www.brendangregg.com/blog/2016-03-05/linux-bpf-superpowers.html
[bashreadline]: http://www.brendangregg.com/blog/2016-02-08/linux-ebpf-bcc-uprobes.html
This diff is collapsed. Click to expand it.
local ffi = require('ffi')
local M = {}
ffi.cdef [[
struct bpf {
/* Instruction classes */
static const int LD = 0x00;
static const int LDX = 0x01;
static const int ST = 0x02;
static const int STX = 0x03;
static const int ALU = 0x04;
static const int JMP = 0x05;
static const int ALU64 = 0x07;
/* ld/ldx fields */
static const int W = 0x00;
static const int H = 0x08;
static const int B = 0x10;
static const int ABS = 0x20;
static const int IND = 0x40;
static const int MEM = 0x60;
static const int LEN = 0x80;
static const int MSH = 0xa0;
/* alu/jmp fields */
static const int ADD = 0x00;
static const int SUB = 0x10;
static const int MUL = 0x20;
static const int DIV = 0x30;
static const int OR = 0x40;
static const int AND = 0x50;
static const int LSH = 0x60;
static const int RSH = 0x70;
static const int NEG = 0x80;
static const int MOD = 0x90;
static const int XOR = 0xa0;
static const int JA = 0x00;
static const int JEQ = 0x10;
static const int JGT = 0x20;
static const int JGE = 0x30;
static const int JSET = 0x40;
static const int K = 0x00;
static const int X = 0x08;
static const int JNE = 0x50; /* jump != */
static const int JSGT = 0x60; /* SGT is signed '>', GT in x86 */
static const int JSGE = 0x70; /* SGE is signed '>=', GE in x86 */
static const int CALL = 0x80; /* function call */
static const int EXIT = 0x90; /* function return */
/* ld/ldx fields */
static const int DW = 0x18; /* double word */
static const int XADD = 0xc0; /* exclusive add */
/* alu/jmp fields */
static const int MOV = 0xb0; /* mov reg to reg */
static const int ARSH = 0xc0; /* sign extending arithmetic shift right */
/* change endianness of a register */
static const int END = 0xd0; /* flags for endianness conversion: */
static const int TO_LE = 0x00; /* convert to little-endian */
static const int TO_BE = 0x08; /* convert to big-endian */
/* misc */
static const int PSEUDO_MAP_FD = 0x01;
};
/* eBPF commands */
struct bpf_cmd {
static const int MAP_CREATE = 0;
static const int MAP_LOOKUP_ELEM = 1;
static const int MAP_UPDATE_ELEM = 2;
static const int MAP_DELETE_ELEM = 3;
static const int MAP_GET_NEXT_KEY = 4;
static const int PROG_LOAD = 5;
static const int OBJ_PIN = 6;
static const int OBJ_GET = 7;
};
/* eBPF program types */
struct bpf_prog {
static const int UNSPEC = 0;
static const int SOCKET_FILTER = 1;
static const int KPROBE = 2;
static const int SCHED_CLS = 3;
static const int SCHED_ACT = 4;
};
/* eBPF helpers */
struct bpf_func_id {
static const int unspec = 0;
static const int map_lookup_elem = 1;
static const int map_update_elem = 2;
static const int map_delete_elem = 3;
static const int probe_read = 4;
static const int ktime_get_ns = 5;
static const int trace_printk = 6;
static const int get_prandom_u32 = 7;
static const int get_smp_processor_id = 8;
static const int skb_store_bytes = 9;
static const int l3_csum_replace = 10;
static const int l4_csum_replace = 11;
static const int tail_call = 12;
static const int clone_redirect = 13;
static const int get_current_pid_tgid = 14;
static const int get_current_uid_gid = 15;
static const int get_current_comm = 16;
static const int get_cgroup_classid = 17;
static const int skb_vlan_push = 18;
static const int skb_vlan_pop = 19;
static const int skb_get_tunnel_key = 20;
static const int skb_set_tunnel_key = 21;
static const int perf_event_read = 22;
static const int redirect = 23;
static const int get_route_realm = 24;
static const int perf_event_output = 25;
static const int skb_load_bytes = 26;
};
]]
-- Compatibility: ljsyscall doesn't have support for BPF syscall
local S = require('syscall')
if not S.bpf then
function S.bpf () error("ljsyscall doesn't support bpf(), must be updated") end
end
-- Reflect cdata type
function M.typename(v)
if not v or type(v) ~= 'cdata' then return nil end
return string.match(tostring(ffi.typeof(v)), '<([^>]+)')
end
-- Reflect if cdata type can be pointer (accepts array or pointer)
function M.isptr(v, noarray)
local ctname = M.typename(v)
if ctname then
ctname = string.sub(ctname, -1)
ctname = ctname == '*' or (not noarray and ctname == ']')
end
return ctname
end
return M
\ No newline at end of file
-- This is a tiny wrapper over libelf to extract load address
-- and offsets of dynamic symbols
local S = require('syscall')
local ffi = require('ffi')
ffi.cdef [[
/* Type for a 16-bit quantity. */
typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;
/* Types for signed and unsigned 32-bit quantities. */
typedef uint32_t Elf32_Word;
typedef int32_t Elf32_Sword;
typedef uint32_t Elf64_Word;
typedef int32_t Elf64_Sword;
/* Types for signed and unsigned 64-bit quantities. */
typedef uint64_t Elf32_Xword;
typedef int64_t Elf32_Sxword;
typedef uint64_t Elf64_Xword;
typedef int64_t Elf64_Sxword;
/* Type of addresses. */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;
/* Type of file offsets. */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;
/* Type for section indices, which are 16-bit quantities. */
typedef uint16_t Elf32_Section;
typedef uint16_t Elf64_Section;
/* Constants */
struct Elf_Cmd
{
static const int READ = 1;
static const int RDWR = 2;
static const int WRITE = 3;
static const int CLR = 4;
static const int SET = 5;
static const int FDDONE = 6;
static const int FDREAD = 7;
static const int READ_MMAP = 8;
static const int RDWR_MMAP = 9;
static const int WRITE_MMAP =10;
static const int READ_MMAP_PRIVATE =11;
static const int EMPTY =12;
static const int NUM =13;
};
/* Descriptor for the ELF file. */
typedef struct Elf Elf;
/* Descriptor for ELF file section. */
typedef struct Elf_Scn Elf_Scn;
/* Container type for metatable */
struct Elf_object { int fd; Elf *elf; };
/* Program segment header. */
typedef struct
{
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment */
} Elf64_Phdr;
typedef Elf64_Phdr GElf_Phdr;
/* Section header. */
typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
typedef Elf64_Shdr GElf_Shdr;
/* Descriptor for data to be converted to or from memory format. */
typedef struct
{
void *d_buf; /* Pointer to the actual data. */
int d_type; /* Type of this piece of data. */
unsigned int d_version; /* ELF version. */
size_t d_size; /* Size in bytes. */
uint64_t d_off; /* Offset into section. */
size_t d_align; /* Alignment in section. */
} Elf_Data;
/* Symbol table entry. */
typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
typedef Elf64_Sym GElf_Sym;
/* Coordinate ELF library and application versions. */
unsigned int elf_version (unsigned int __version);
/* Return descriptor for ELF file to work according to CMD. */
Elf *elf_begin (int __fildes, int __cmd, Elf *__ref);
/* Free resources allocated for ELF. */
int elf_end (Elf *__elf);
/* Get the number of program headers in the ELF file. If the file uses
more headers than can be represented in the e_phnum field of the ELF
header the information from the sh_info field in the zeroth section
header is used. */
int elf_getphdrnum (Elf *__elf, size_t *__dst);
/* Retrieve program header table entry. */
GElf_Phdr *gelf_getphdr (Elf *__elf, int __ndx, GElf_Phdr *__dst);
/* Retrieve section header. */
GElf_Shdr *gelf_getshdr (Elf_Scn *__scn, GElf_Shdr *__dst);
/* Retrieve symbol information from the symbol table at the given index. */
GElf_Sym *gelf_getsym (Elf_Data *__data, int __ndx, GElf_Sym *__dst);
/* Get section with next section index. */
Elf_Scn *elf_nextscn (Elf *__elf, Elf_Scn *__scn);
/* Get data from section while translating from file representation
to memory representation. */
Elf_Data *elf_getdata (Elf_Scn *__scn, Elf_Data *__data);
/* Return pointer to string at OFFSET in section INDEX. */
char *elf_strptr (Elf *__elf, size_t __index, size_t __offset);
]]
local elf = ffi.load('elf')
local EV = { NONE=0, CURRENT=1, NUM=2 }
local PT = { NULL=0, LOAD=1, DYNAMIC=2, INTERP=3, NOTE=4, SHLIB=5, PHDR=6, TLS=7, NUM=8 }
local SHT = { NULL=0, PROGBITS=1, SYMTAB=2, STRTAB=3, RELA=4, HASH=5, DYNAMIC=6, NOTE=7,
NOBITS=8, REL=9, SHLIB=10, DYNSYM=11, INIT_ARRAY=14, FINI_ARRAY=15, PREINIT_ARRAY=16,
GROUP=17, SYMTAB_SHNDX=18, NUM=19 }
local ELF_C = ffi.new('struct Elf_Cmd')
local M = {}
-- Metatable for ELF object
ffi.metatype('struct Elf_object', {
__gc = function (t) t:close() end,
__index = {
close = function (t)
if t.elf ~= nil then
elf.elf_end(t.elf)
S.close(t.fd)
t.elf = nil
end
end,
-- Load library load address
loadaddr = function(t)
local phnum = ffi.new('size_t [1]')
if elf.elf_getphdrnum(t.elf, phnum) == nil then
return nil, 'cannot get phdrnum'
end
local header = ffi.new('GElf_Phdr [1]')
for i = 0, tonumber(phnum[0])-1 do
if elf.gelf_getphdr(t.elf, i, header) ~= nil
and header[0].p_type == PT.LOAD then
return header[0].p_vaddr
end
end
end,
-- Resolve symbol address
resolve = function (t, k)
local section = elf.elf_nextscn(t.elf, nil)
while section ~= nil do
local header = ffi.new('GElf_Shdr [1]')
if elf.gelf_getshdr(section, header) ~= nil then
if header[0].sh_type == SHT.SYMTAB or header[0].sh_type == SHT.DYNSYM then
local data = elf.elf_getdata(section, data)
while data ~= nil do
if data.d_size % header[0].sh_entsize > 0 then
return nil, 'bad section header entity size'
end
local symcount = tonumber(data.d_size / header[0].sh_entsize)
local sym = ffi.new('GElf_Sym [1]')
for i = 0, symcount - 1 do
if elf.gelf_getsym(data, i, sym) ~= nil then
local name = elf.elf_strptr(t.elf, header[0].sh_link, sym[0].st_name)
if name ~= nil then
if k == ffi.string(name) then
return sym[0]
end
end
end
end
data = elf.elf_getdata(section, data)
end
end
end
section = elf.elf_nextscn(t.elf, section)
end
end,
}
})
-- Open an ELF object
function M.open(path)
if elf.elf_version(EV.CURRENT) == EV.NONE then
return nil, 'bad version'
end
local fd, err = S.open(path, 'rdonly')
if not fd then return nil, err end
local pt = ffi.new('Elf *')
pt = elf.elf_begin(fd:getfd(), ELF_C.READ, pt)
if not pt then
fd:close()
return nil, 'cannot open elf object'
end
return ffi.new('struct Elf_object', fd:nogc():getfd(), pt)
end
return M
\ No newline at end of file
local jutil = require("jit.util")
local bit = require('bit')
local shr, band = bit.rshift, bit.band
-- Decode LuaJIT 2.0 Byte Format
-- Reference: http://wiki.luajit.org/Bytecode-2.0
-- Thanks to LJ, we get code in portable bytecode with constants folded, basic
-- virtual registers allocated etc.
-- No SSA IR, type inference or advanced optimizations because the code wasn't traced yet.
local function decode_ins(func, pc)
local ins, m = jutil.funcbc(func, pc)
if not ins then return nil end
local op, ma, mb, mc = band(ins, 0xff), band(m, 7), band(m, 15*8), band(m, 15*128)
local a, b, c, d = band(shr(ins, 8), 0xff), nil, nil, shr(ins, 16)
if mb ~= 0 then
d = band(d, 0xff)
b = shr(ins, 24)
end
if ma == 5 then -- BCMuv
a = jutil.funcuvname(func, a)
end
if mc == 13*128 then -- BCMjump
c = pc+d-0x7fff
elseif mc == 9*128 then -- BCMint
c = jutil.funck(func, d)
elseif mc == 10*128 then -- BCMstr
c = jutil.funck(func, -d-1)
elseif mc == 5*128 then -- BCMuv
c = jutil.funcuvname(func, d)
end
return pc, op, a, b, c, d
end
-- Decoder closure
local function decoder(func)
local pc = 0
return function ()
pc = pc + 1
return decode_ins(func, pc)
end
end
-- Hexdump generated code
local function dump(func)
return require('jit.bc').dump(func)
end
return {
decode = decode_ins,
decoder = decoder,
dump = dump,
funcinfo = function (...) return jutil.funcinfo(...) end,
}
\ No newline at end of file
This diff is collapsed. Click to expand it.
-- This example program measures latency of block device operations and plots it
-- in a histogram. It is similar to BPF example:
-- https://github.com/torvalds/linux/blob/master/samples/bpf/tracex3_kern.c
local ffi = require('ffi')
local bpf = require('bpf')
local S = require('syscall')
-- Shared part of the program
local bins = 100
local map = bpf.map('hash', 512, ffi.typeof('uint64_t'), ffi.typeof('uint64_t'))
local lat_map = bpf.map('array', bins)
-- Kernel-space part of the program
local trace_start = assert(bpf(function (ptregs)
local req = ffi.cast('struct pt_regs', ptregs)
map[req.parm1] = time()
end))
local trace_end = assert(bpf(function (ptregs)
local req = ffi.cast('struct pt_regs', ptregs)
-- The lines below are computing index
-- using log10(x)*10 = log2(x)*10/log2(10) = log2(x)*3
-- index = 29 ~ 1 usec
-- index = 59 ~ 1 msec
-- index = 89 ~ 1 sec
-- index = 99 ~ 10sec or more
local delta = time() - map[req.parm1]
local index = 3 * math.log2(delta)
if index >= bins then
index = bins-1
end
xadd(lat_map[index], 1)
return true
end))
local probes = {
bpf.kprobe('myprobe:blk_start_request', trace_start, false, -1, 0),
bpf.kprobe('myprobe2:blk_account_io_completion', trace_end, false, -1, 0),
}
-- User-space part of the program
pcall(function()
local counter = 0
local sym = {' ',' ','.','.','*','*','o','o','O','O','#','#'}
while true do
-- Print header once in a while
if counter % 50 == 0 then
print('|1us |10us |100us |1ms |10ms |100ms |1s |10s')
counter = 0
end
counter = counter + 1
-- Collect all events
local hist, events = {}, 0
for i=29,bins-1 do
local v = tonumber(lat_map[i] or 0)
if v > 0 then
hist[i] = hist[i] or 0 + v
events = events + v
end
end
-- Print histogram symbols based on relative frequency
local s = ''
for i=29,bins-1 do
if hist[i] then
local c = math.ceil((hist[i] / (events + 1)) * #sym)
s = s .. sym[c]
else s = s .. ' ' end
end
print(s .. string.format(' ; %d events', events))
S.sleep(1)
end
end)
\ No newline at end of file
-- Simple tracing example that executes a program on
-- return from sys_write() and tracks the number of hits
local ffi = require('ffi')
local bpf = require('bpf')
local S = require('syscall')
-- Shared part of the program
local map = bpf.map('array', 1)
-- Kernel-space part of the program
local probe = assert(bpf.kprobe('myprobe:sys_write', bpf(function (ptregs)
xadd(map[0], 1)
end), true))
-- User-space part of the program
pcall(function()
for i=1,10 do
print('hits: ', tonumber(map[0]))
S.sleep(1)
end
end)
probe:close()
-- Simple parsing example of UDP/DNS that counts frequency of QTYPEs.
-- It shows how to parse packet variable-length packet structures.
local ffi = require("ffi")
local bpf = require("bpf")
local S = require("syscall")
-- Shared part of the program
local map = assert(bpf.map('array', 256))
-- Kernel-space part of the program
local prog = bpf.socket('lo', bpf(function (skb)
local ip = pkt.ip -- Accept only UDP messages
if ip.proto ~= c.ip.proto_udp then return false end
local udp = ip.udp -- Only messages >12 octets (DNS header)
if udp.length < 12 then return false end
-- Unroll QNAME (up to 2 labels)
udp = udp.data + 12
local label = udp[0]
if label > 0 then
udp = udp + label + 1
label = udp[0]
if label > 0 then
udp = udp + label + 1
end
end
-- Track QTYPE (low types)
if udp[0] == 0 then
local qtype = udp[2] -- Low octet from QTYPE
xadd(map[qtype], 1)
end
end))
-- User-space part of the program
for i=1,10 do
for k,v in map.pairs,map,0 do
v = tonumber(v)
if v > 0 then
print(string.format('TYPE%d: %d', k, v))
end
end
S.sleep(1)
end
\ No newline at end of file
-- Simple parsing example of TCP/HTTP that counts frequency of types of requests
-- and shows more complicated pattern matching constructions and slices.
-- Rewrite of a BCC example:
-- https://github.com/iovisor/bcc/blob/master/examples/networking/http_filter/http-parse-simple.c
local ffi = require("ffi")
local bpf = require("bpf")
local S = require("syscall")
-- Shared part of the program
local map = bpf.map('hash', 64)
-- Kernel-space part of the program
local prog = bpf(function (skb)
-- Only ingress so we don't count twice on loopback
if skb.ingress_ifindex == 0 then return end
local data = pkt.ip.tcp.data -- Get TCP protocol dissector
-- Continue only if we have 7 bytes of TCP data
if data + 7 > skb.len then return end
-- Fetch 4 bytes of TCP data and compare
local h = data(0, 4)
if h == 'HTTP' or h == 'GET ' or
h == 'POST' or h == 'PUT ' or
h == 'HEAD' or h == 'DELE' then
-- If hash key doesn't exist, create it
-- otherwise increment counter
local v = map[h]
if not v then map[h] = 1
else xadd(v, 1)
end
end
end)
bpf.dump(prog)
bpf.socket('lo', prog)
-- User-space part of the program
for i=1,10 do
local strkey = ffi.new('uint32_t [1]')
local s = ''
for k,v in map.pairs,map,0 do
strkey[0] = bpf.ntoh(k)
s = s..string.format('%s %d ', ffi.string(strkey, 4):match '^%s*(.-)%s*$', tonumber(v))
end
if #s > 0 then print(s..'messages') end
S.sleep(1)
end
\ No newline at end of file
-- This program looks at IP, UDP and ICMP packets and
-- increments counter for each packet of given type seen
-- Rewrite of https://github.com/torvalds/linux/blob/master/samples/bpf/sock_example.c
local ffi = require("ffi")
local bpf = require("bpf")
local S = require("syscall")
-- Shared part of the program
local map = bpf.map('hash', 256)
map[1], map[6], map[17] = 0, 0, 0
-- Kernel-space part of the program
bpf.socket('lo', bpf(function ()
local proto = pkt.ip.proto -- Get byte (ip.proto) from frame at [23]
xadd(map[proto], 1) -- Atomic `map[proto] += 1`
end))
-- User-space part of the program
for i=1,10 do