Doc-Textify is a TypeScript library and command-line tool that extracts and cleans text from various document formats.
-
Multi-format support:
- Microsoft Word (
.docx) - PowerPoint (
.pptx) - Excel (
.xlsx) - OpenOffice/LibreOffice (
.odt,.odp,.ods) - PDF (
.pdf) - Plain text (
.txt) - HTML (
.html,.htm)
- Microsoft Word (
-
Content cleaning: removes extra whitespace, handles custom line delimiters.
-
Configurable options: set newline delimiter, minimum characters to extract, and toggle error logging.
Install the package and import it in your project:
npm install doc-textify --saveimport { docTextify } from 'doc-textify'
// async/await version
try {
const text = await docTextify('path/to/file.pdf')
} catch (e) {
console.error(err)
}
// or callback version
docTextify('path/to/file.pdf')
.then(text => console.log(text))
.catch(err => console.error(err))Default options:
try {
const text = await docTextify('path/to/file.pdf', {
newlineDelimiter: '\n', // output content delimiter
minCharsToExtract: 0, // number of chars required to output the content, default disabled (0)
outputErrorToConsole: true // log error to console
})
} catch (e) {
console.error(err)
}If you prefer a ready-made command, the doc-textify CLI wraps the same functionality:
Global install to use the doc-textify command anywhere:
npm install -g doc-textifyOr install locally:
npm install doc-textify --savedoc-textify <path/to/document> [options]| Option | Description | Default |
|---|---|---|
-n, --newlineDelimiter |
Line delimiter to insert | "\n" |
-m, --minCharsToExtract |
Minimum number of characters to extract | 0 (disabled) |
-h, --help |
Display help message | — |
doc-textify document.docx -n "\r\n" -m 20 > output.txtgit clone https://github.com/johaven/doc-textify.git
cd doc-textify
npm install
npm run build # outputs compiled files into /dist
npm run test # test parsing- Fork the repository
- Create a branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m "Add my feature" - Push to your branch:
git push origin feature/my-feature - Open a Pull Request
This project is licensed under the MIT License.