Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Run test

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
run-test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"]
fail-fast: false

steps:
- uses: actions/checkout@v2

# see https://github.com/microsoft/setup-msbuild
- name: Add msbuild to PATH
if: startsWith(matrix.os, 'windows')
uses: microsoft/[email protected]

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install kaldialign
shell: bash
run: |
python3 setup.py install --verbose

- name: Test
shell: bash
run: |
python3 ./tests/test_align.py
38 changes: 38 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
if("x${CMAKE_SOURCE_DIR}" STREQUAL "x${CMAKE_BINARY_DIR}")
message(FATAL_ERROR "\
In-source build is not a good practice.
Please use:
mkdir build
cd build
cmake ..
to build this project"
)
endif()

cmake_minimum_required(VERSION 3.8 FATAL_ERROR)

project(kaldialign CXX)

set(KALDIALIGN_VERSION "0.3")

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()

list(APPEND CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/cmake)
include(pybind11)

pybind11_add_module(_kaldialign
./extensions/kaldi_align.cpp
./extensions/kaldialign.cpp
)

if(UNIX AND NOT APPLE)
target_link_libraries(_kaldialign PUBLIC ${PYTHON_LIBRARY})
elseif(WIN32)
target_link_libraries(_kaldialign PUBLIC ${PYTHON_LIBRARIES})
endif()

install(TARGETS _kaldialign
DESTINATION ../
)
28 changes: 24 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
# kaldialign

A small package that exposes edit distance computation functions from [Kaldi](https://github.com/kaldi-asr/kaldi). It uses the original Kaldi code and wraps it using Cython.
A small package that exposes edit distance computation functions from [Kaldi](https://github.com/kaldi-asr/kaldi). It uses the original Kaldi code and wraps it using pybind11.

## Installation

```bash
pip install --verbose kaldialign
```

or

```bash
pip install --verbose -U git+https://github.com/pzelasko/kaldialign.git
```

or

```bash
git clone https://github.com/pzelasko/kaldialign.git
cd kaldialign
python3 setup.py install --verbose
```

## Examples

Expand All @@ -13,10 +33,10 @@ EPS = '*'
a = ['a', 'b', 'c']
b = ['a', 's', 'x', 'c']
ali = align(a, b, EPS)
assert ali == [('a', 'a'), (EPS, 's'), ('b', 'x'), ('c', 'c')]
assert ali == [('a', 'a'), (b, 's'), (EPS, 'x'), ('c', 'c')]
```

- `edit_distance(seq1, seq2)` - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.
- `edit_distance(seq1, seq2)` - used to obtain the total edit distance, as well as the number of insertions, deletions and substitutions.

```python
from kaldialign import edit_distance
Expand All @@ -34,4 +54,4 @@ assert results == {

## Motivation

The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.
The need for this arised from the fact that practically all implementations of the Levenshtein distance have slight differences, making it impossible to use a different scoring tool than Kaldi and get the same error rate results. This package copies code from Kaldi directly and wraps it using Cython, avoiding the issue altogether.
Loading