Skip to content

Releases: data-prep-kit/data-prep-kit

v1.1.7

11 Feb 15:51
3997119

Choose a tag to compare

What's Changed

Full Changelog: v1.1.6...v1.1.7

v.1.1.6

14 Nov 16:11
488bc94

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.5...v1.1.6

v1.1.5

02 Oct 15:20
10f1144

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.1.4...v1.1.5

v1.1.4

16 Sep 01:39
18e2650

Choose a tag to compare

What's Changed

  • preparing for a new release by @swith005 in #1425
  • [bug] Extend tokenization to process list of strings by @matouma in #1433
  • Updating code to put -1 when text content is empty by @santoshborse in #1424
  • fix additional secrets by @roytman in #1431
  • updated filter transform to return empty table with original schema r… by @swith005 in #1434
  • Avoid latest release of polars 1.33 that is breaking the code by @touma-I in #1439
  • added support for binary transforms and data in chain, updated tests, readme, … by @swith005 in #1429
  • Drop lower bound on boto3 dependency. by @revit13 in #1437
  • updated logging to remove access and secret key from config if there by @swith005 in #1440
  • adding dev1 release for regression testing by @swith005 in #1441
  • preparing for a new release (1.1.4) by @swith005 in #1442

Full Changelog: v1.1.3...v1.1.4

v1.1.3

18 Aug 21:11
9d676fb

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v.1.1.2...v1.1.3

v.1.1.2

03 Jul 16:29
ffe67d4

Choose a tag to compare

What's Changed

Read more

v1.1.1

09 May 17:44
1a3c7c6

Choose a tag to compare

What's Changed

Read more

v1.1.0

09 Mar 19:24
8e45994

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.0.0...v1.1.0

v1.0.0

09 Mar 16:28
d3ea57f

Choose a tag to compare

What's Changed

  • Debug issue with Ededup kfp v1 failing in fork by @matouma in #877
  • Transforms 1.0.0a0 refactored language transforms by @matouma in #879
  • Restructure Html2Parquet with its own dpk_ namespace by @touma-I in #809
  • Restructure Pdf2Parquet with its own dpk_ namespace by @touma-I in #813
  • Restructure text_encoder with its own dpk_ namespace by @touma-I in #826
  • refactored doc quality transform as its own module with its own dpk_ namespace by @touma-I in #854
  • Refactor doc_id with its own dpk_ module name by @touma-I in #860
  • first cut at refactoring with own dpk_lang_id name space by @touma-I in #864
  • Refactor hap transform as its own dpk_ module by @touma-I in #866
  • Refactor tokenization transform as its own dpk_tokenization module by @touma-I in #869
  • Refactored ededup with own dpk_ededup namespace by @matouma in #878
  • pull changes from fork to main repo by @matouma in #892
  • FDedup Refactored as its own dpk_ module by @matouma in #893
  • refactor tokenization transform as its own named dpk_ module by @matouma in #886
  • Fix transforms 1.0 alpha release so it uses docid to generate int id required by fdedup by @matouma in #899
  • Added checkmarks for Code Profile in the README by @shahrokhDaijavad in #894
  • Initial Pass at Similarity Transform by @AnLiGentile in #897
  • Refactored fitler tansform as its own dpk_filter named module by @matouma in #900
  • PII data file by @PoojaHolkar in #828
  • Enhance header cleanser module with multi-processing and timeout by @takuyagt in #849
  • Relax requirements on pandas and requests by @touma-I in #901
  • Add image_pull_secrets paameter to add_settings_to_comp for kfp v2 by @revit13 in #915
  • Fixing the broken links in the main README file by @shahrokhDaijavad in #917
  • Update README.md for the Similarity transform by @shahrokhDaijavad in #911
  • Update README.md for the filter transform by @shahrokhDaijavad in #919
  • Refactoring code profiler transform to new pythonic code layout #913 by @pankajskku in #916
  • Adding support for c sharp by @pankajskku in #926
  • Added TRANSFROM_NAME to docker build arg by @matouma in #929
  • Updated readme and added notebook by @yash-kalathiya in #845
  • fix: updated broken links and paths in kfp v2 documentation by @juancappi in #907
  • Support the case where an arbitrary user id runs the ray docker images by @revit13 in #934
  • Deleting obsolete notebooks by @shahrokhDaijavad in #933
  • Fix path issues when running superworkflow pipeline sample for kfp v2 by @revit13 in #935
  • added missing ray notebooks for doc_quality and filter by @matouma in #927
  • Deleting 3 "run first .." notebooks from the example folder and the links to them from the main README file + new notebook by @shahrokhDaijavad in #938
  • Remove DOCKER_REMOTE_IMAGE from .make.defaults by @touma-I in #890
  • Update KFP_DOCKER_VERSION. by @revit13 in #937
  • Update Readme.md removing confusion on version 0.2.3 vs 1.0.0 by @matouma in #939
  • Remove KFP_DOCKER_VERSION. by @revit13 in #943
  • README update for Similarity Transform by @AnLiGentile in #944
  • Adding rules to the semantic rule set by @pankajskku in #941
  • Refactoring pii_redactor as its own dpk_ named module by @matouma in #895
  • Relax fasttext requirements >=0.9.2 by @matouma in #950
  • Cleanup documentation for 1.0.0 by @touma-I in #945
  • Fixed sample notebook location for html2parquet by @sujee in #948
  • refactor noop transform to use dpk_ structures by @daw3rd in #951
  • refactored profiler transform by @matouma in #966
  • initial refactoring resize by @matouma in #960
  • Updated Resources page per latest DPK announcement by @agoyal26 in #961
  • Cut-off release for refactored language transforms by @matouma in #967

New Contributors

Full Changelog: v0.2.3...v1.0.0

v0.2.3

17 Dec 12:18
9e1b281

Choose a tag to compare

What's Changed

  • Fuzzy dedup by @Kibnelson in #699
  • Doc Quality Transform: update readme and add sample notebook by @dtsuzuku-ibm in #790
  • Fix for inability to read some parquet files (issue #816) by @daw3rd in #817
  • Updated Resources webpage with latest talks and links by @agoyal26 in #846
  • HAP transform: Update README.md and add sample notebook by @ian-cho in #821
  • publish transforms==0.2.3.dev0 pre-release to pypi with dependency on toolkit==0.2.2 by @touma-I in #837
  • Semantic profiler and report generation module integration by @pankajskku in #824
  • Update doc for doc_id and ededup to follow template in issue #753 by @cmadam in #836
  • Update README.md for check-marking the table with Python and Spark versions of fdedup by @shahrokhDaijavad in #855
  • Added links to example notebooks - issue #848 fix by @cmadam in #861
  • Hap score - Example Notebook by @AishaDarga in #840
  • Simplified fix for issue 803 by @cmadam in #839
  • Html rag 1 -- Crawl a website / process HTML / run RAG queries by @sujee in #838
  • fix usage of pandas 2.1.x by @dolfim-ibm in #867
  • Bug fix for Agda language in code profiler transform by @pankajskku in #865
  • Release 0.2.3.dev1 per Constantin's request by @touma-I in #875
  • Create pre-release wheels for code_profiler using transform 0.2.3.dev1 and toolkit 0.2.3.dev0 by @touma-I in #857
  • Grant non-root users the necessary permissions to the ray directory by @revit13 in #881
  • Start of a new release cycle with 1.0.0 by @matouma in #885

New Contributors

Full Changelog: v0.2.2...v0.2.3