chunker: use tokenizer to determine chunk size limits

The following discussion from !13 (merged) should be addressed:

  • @jshobrook1 started a discussion: (+2 comments)

    What embedding model are we planning to use? Any chance we can use the tokenizer to ensure chunks don't exceed a token limit instead of a character limit? It would be safer to guarantee that a chunk will never exceed a token limit.