Skip to content

Commit 693b53d

Browse files
committed
Merge branch 'main' into HideLord-main
2 parents 63c5a13 + 1413931 commit 693b53d

File tree

17 files changed

+322
-156
lines changed

17 files changed

+322
-156
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: "Bug report"
2+
description: Report a bug
3+
labels: [ "bug" ]
4+
body:
5+
- type: markdown
6+
attributes:
7+
value: |
8+
Thanks for taking the time to fill out this bug report!
9+
- type: textarea
10+
id: bug-description
11+
attributes:
12+
label: Describe the bug
13+
description: A clear and concise description of what the bug is.
14+
placeholder: Bug description
15+
validations:
16+
required: true
17+
- type: checkboxes
18+
attributes:
19+
label: Is there an existing issue for this?
20+
description: Please search to see if an issue already exists for the issue you encountered.
21+
options:
22+
- label: I have searched the existing issues
23+
required: true
24+
- type: textarea
25+
id: reproduction
26+
attributes:
27+
label: Reproduction
28+
description: Please provide the steps necessary to reproduce your issue.
29+
placeholder: Reproduction
30+
validations:
31+
required: true
32+
- type: textarea
33+
id: screenshot
34+
attributes:
35+
label: Screenshot
36+
description: "If possible, please include screenshot(s) so that we can understand what the issue is."
37+
- type: textarea
38+
id: logs
39+
attributes:
40+
label: Logs
41+
description: "Please include the full stacktrace of the errors you get in the command-line (if any)."
42+
render: shell
43+
validations:
44+
required: true
45+
- type: textarea
46+
id: system-info
47+
attributes:
48+
label: System Info
49+
description: "Please share your system info with us: operating system, GPU brand, and GPU model. If you are using a Google Colab notebook, mention that instead."
50+
render: shell
51+
placeholder:
52+
validations:
53+
required: true
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
name: Feature request
3+
about: Suggest an improvement or new feature for the web UI
4+
title: ''
5+
labels: 'enhancement'
6+
assignees: ''
7+
8+
---
9+
10+
**Description**
11+
12+
A clear and concise description of what you want to be implemented.
13+
14+
**Additional Context**
15+
16+
If applicable, please provide any extra information, external links, or screenshots that could be useful.

.github/dependabot.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# To get started with Dependabot version updates, you'll need to specify which
2+
# package ecosystems to update and where the package manifests are located.
3+
# Please see the documentation for all configuration options:
4+
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
5+
6+
version: 2
7+
updates:
8+
- package-ecosystem: "pip" # See documentation for possible values
9+
directory: "/" # Location of package manifests
10+
schedule:
11+
interval: "weekly"

.github/workflows/stale.yml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: Close inactive issues
2+
on:
3+
schedule:
4+
- cron: "10 23 * * *"
5+
6+
jobs:
7+
close-issues:
8+
runs-on: ubuntu-latest
9+
permissions:
10+
issues: write
11+
pull-requests: write
12+
steps:
13+
- uses: actions/stale@v5
14+
with:
15+
stale-issue-message: ""
16+
close-issue-message: "This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, you can reopen it (if you are the author) or leave a comment below."
17+
days-before-issue-stale: 30
18+
days-before-issue-close: 0
19+
stale-issue-label: "stale"
20+
days-before-pr-stale: -1
21+
days-before-pr-close: -1
22+
repo-token: ${{ secrets.GITHUB_TOKEN }}

README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,9 @@ pip3 install torch torchvision torchaudio --extra-index-url https://download.pyt
6060
conda install pytorch torchvision torchaudio git -c pytorch
6161
```
6262

63-
See also: [Installation instructions for human beings](https://github.com/oobabooga/text-generation-webui/wiki/Installation-instructions-for-human-beings).
63+
> **Note**
64+
> 1. If you are on Windows, it may be easier to run the commands above in a WSL environment. The performance may also be better.
65+
> 2. For a more detailed, user-contributed guide, see: [Installation instructions for human beings](https://github.com/oobabooga/text-generation-webui/wiki/Installation-instructions-for-human-beings).
6466
6567
## Installation option 2: one-click installers
6668

@@ -140,8 +142,9 @@ Optionally, you can use the following command-line flags:
140142
| `--cai-chat` | Launch the web UI in chat mode with a style similar to Character.AI's. If the file `img_bot.png` or `img_bot.jpg` exists in the same folder as server.py, this image will be used as the bot's profile picture. Similarly, `img_me.png` or `img_me.jpg` will be used as your profile picture. |
141143
| `--cpu` | Use the CPU to generate text.|
142144
| `--load-in-8bit` | Load the model with 8-bit precision.|
143-
| `--load-in-4bit` | Load the model with 4-bit precision. Currently only works with LLaMA.|
144-
| `--gptq-bits GPTQ_BITS` | Load a pre-quantized model with specified precision. 2, 3, 4 and 8 (bit) are supported. Currently only works with LLaMA. |
145+
| `--load-in-4bit` | DEPRECATED: use `--gptq-bits 4` instead. |
146+
| `--gptq-bits GPTQ_BITS` | Load a pre-quantized model with specified precision. 2, 3, 4 and 8 (bit) are supported. Currently only works with LLaMA and OPT. |
147+
| `--gptq-model-type MODEL_TYPE` | Model type of pre-quantized model. Currently only LLaMa and OPT are supported. |
145148
| `--bf16` | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
146149
| `--auto-devices` | Automatically split the model across the available GPU(s) and CPU.|
147150
| `--disk` | If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk. |

api-example-stream.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ async def run(context):
2626
'top_p': 0.9,
2727
'typical_p': 1,
2828
'repetition_penalty': 1.05,
29+
'encoder_repetition_penalty': 1.0,
2930
'top_k': 0,
3031
'min_length': 0,
3132
'no_repeat_ngram_size': 0,
@@ -59,6 +60,7 @@ async def run(context):
5960
params['top_p'],
6061
params['typical_p'],
6162
params['repetition_penalty'],
63+
params['encoder_repetition_penalty'],
6264
params['top_k'],
6365
params['min_length'],
6466
params['no_repeat_ngram_size'],

api-example.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
'top_p': 0.9,
2525
'typical_p': 1,
2626
'repetition_penalty': 1.05,
27+
'encoder_repetition_penalty': 1.0,
2728
'top_k': 0,
2829
'min_length': 0,
2930
'no_repeat_ngram_size': 0,
@@ -45,6 +46,7 @@
4546
params['top_p'],
4647
params['typical_p'],
4748
params['repetition_penalty'],
49+
params['encoder_repetition_penalty'],
4850
params['top_k'],
4951
params['min_length'],
5052
params['no_repeat_ngram_size'],

extensions/gallery/script.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def generate_html():
7676
return container_html
7777

7878
def ui():
79-
with gr.Accordion("Character gallery"):
79+
with gr.Accordion("Character gallery", open=False):
8080
update = gr.Button("Refresh")
8181
gallery = gr.HTML(value=generate_html())
8282
update.click(generate_html, [], gallery)

extensions/silero_tts/script.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ def input_modifier(string):
8181
if (shared.args.chat or shared.args.cai_chat) and len(shared.history['internal']) > 0:
8282
shared.history['visible'][-1] = [shared.history['visible'][-1][0], shared.history['visible'][-1][1].replace('controls autoplay>','controls>')]
8383

84+
shared.processing_message = "*Is recording a voice message...*"
8485
return string
8586

8687
def output_modifier(string):
@@ -119,6 +120,7 @@ def output_modifier(string):
119120
if params['show_text']:
120121
string += f'\n\n{original_string}'
121122

123+
shared.processing_message = "*Is typing...*"
122124
return string
123125

124126
def bot_prefix_modifier(string):

modules/quantized_LLaMA.py renamed to modules/GPTQ_loader.py

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,28 +7,40 @@
77
import modules.shared as shared
88

99
sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa")))
10-
from llama import load_quant
10+
import llama
11+
import opt
1112

1213

13-
# 4-bit LLaMA
14-
def load_quantized_LLaMA(model_name):
15-
if shared.args.load_in_4bit:
16-
bits = 4
14+
def load_quantized(model_name):
15+
if not shared.args.gptq_model_type:
16+
# Try to determine model type from model name
17+
model_type = model_name.split('-')[0].lower()
18+
if model_type not in ('llama', 'opt'):
19+
print("Can't determine model type from model name. Please specify it manually using --gptq-model-type "
20+
"argument")
21+
exit()
1722
else:
18-
bits = shared.args.gptq_bits
23+
model_type = shared.args.gptq_model_type.lower()
24+
25+
if model_type == 'llama':
26+
load_quant = llama.load_quant
27+
elif model_type == 'opt':
28+
load_quant = opt.load_quant
29+
else:
30+
print("Unknown pre-quantized model type specified. Only 'llama' and 'opt' are supported")
31+
exit()
1932

2033
path_to_model = Path(f'models/{model_name}')
21-
pt_model = ''
2234
if path_to_model.name.lower().startswith('llama-7b'):
23-
pt_model = f'llama-7b-{bits}bit.pt'
35+
pt_model = f'llama-7b-{shared.args.gptq_bits}bit.pt'
2436
elif path_to_model.name.lower().startswith('llama-13b'):
25-
pt_model = f'llama-13b-{bits}bit.pt'
37+
pt_model = f'llama-13b-{shared.args.gptq_bits}bit.pt'
2638
elif path_to_model.name.lower().startswith('llama-30b'):
27-
pt_model = f'llama-30b-{bits}bit.pt'
39+
pt_model = f'llama-30b-{shared.args.gptq_bits}bit.pt'
2840
elif path_to_model.name.lower().startswith('llama-65b'):
29-
pt_model = f'llama-65b-{bits}bit.pt'
41+
pt_model = f'llama-65b-{shared.args.gptq_bits}bit.pt'
3042
else:
31-
pt_model = f'{model_name}-{bits}bit.pt'
43+
pt_model = f'{model_name}-{shared.args.gptq_bits}bit.pt'
3244

3345
# Try to find the .pt both in models/ and in the subfolder
3446
pt_path = None
@@ -40,7 +52,7 @@ def load_quantized_LLaMA(model_name):
4052
print(f"Could not find {pt_model}, exiting...")
4153
exit()
4254

43-
model = load_quant(str(path_to_model), str(pt_path), bits)
55+
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
4456

4557
# Multiple GPUs or GPU+CPU
4658
if shared.args.gpu_memory:

0 commit comments

Comments
 (0)