Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions internal/tokenizer/tokenize.go
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// Package tokenizer implements file tokenization used by the enry file
// classifier. This package is an implementation detail of enry and should not
// be imported by other packages.
package tokenizer

import (
Expand All @@ -8,6 +11,9 @@ import (

const byteLimit = 100000

// Tokenize returns classification tokens from content. The tokens returned
// should match what the Linguist library returns. At most the first 100KB of
// content are tokenized.
func Tokenize(content []byte) []string {
if len(content) > byteLimit {
content = content[:byteLimit]
Expand Down