chunker: use tokenizer to determine chunk size limits

The following discussion from !13 (merged) should be addressed:

@jshobrook1 started a discussion: (+2 comments)

What embedding model are we planning to use? Any chance we can use the tokenizer to ensure chunks don't exceed a token limit instead of a character limit? It would be safer to guarantee that a chunk will never exceed a token limit.