We have scraped and complied a corpus of 3k+ Indian legal judgments and their parallel summaries.
from datasets import load_dataset
dataset = load_dataset("d0r1h/ILC")
train_set = pd.DataFrame(dataset['train'])
test_set = pd.DataFrame(dataset['test'])git clone https://github.com/d0r1h/ILC.git
cd ILC
pip install -r requirement.txtSummarzing using Extractive approach
!python Code/Models/extractive.py \
--output_dir dir_name \
--text_column text \
--summary_column summary \
--data_file data.csv \
--sentence_count 3 Training LED using Abstractive approach
!python Code/Models/led_summarization.py \
--model_name allenai/led-base-16384 \
--text_column Case \
--summary_column Summary \
--max_input_length 8192 \
--max_output_length 600 \
--batch_size 2 \
--num_beams 2 \
--output_dir output_dir_nameInference on test-set using led-base-ilc model
| Notebook | Colab |
|---|---|
| led-base-ilc |
Following results are obtained on test-set with transformer based models and extractive methods
| Algorithm / model | Rouge-1 | Rouge-2 | Rouge-L |
|---|---|---|---|
| Extractive | |||
| SumBasics | 15.69 | 6.02 | 14.48 |
| LSA | 21.20 | 7.37 | 19.76 |
| KLSum | 21.40 | 10.19 | 19.66 |
| LexRank | 33.09 | 16.81 | 22.99 |
| TextRank | 34.54 | 18.10 | 31.11 |
| Abstractive | |||
| LedBase | 4.31 | 1.08 | 4.11 |
| Led-ilc | 42.24 | 23.18 | 39.30 |
