Prerequisites
Describe your issue
import random
import matplotlib.pyplot as plt
# Toy data to sample from
sample = range(1, 101)
# Wanted number of elements (-n)
number = 10
# Calculate proportion based of the wanted number and total number
proportion = number / len(sample)
# Init dictionary to count the number of times each element was selected
element_count = {element: 0 for element in range(1, 101)}
# Iterate a million times
for _ in range(1_000_000):
# Counter for the number of selected elements in each iteration
added_element_counter = 0
for element in sample:
if random.random() <= proportion:
element_count[element] += 1
added_element_counter += 1
if number == added_element_counter:
break
plt.bar(element_count.keys(), height=element_count.values())
plt.show()

A solution could be to shuffle the records before iterating over them:
import random
import matplotlib.pyplot as plt
# Wanted number of elements (-n)
number = 10
# Calculate proportion based of the wanted number and total number
proportion = number / len(sample)
# Init dictionary to count the number of times each element was selected
element_count = {element: 0 for element in range(1, 101)}
for _ in range(1_000_000):
# Toy data to sample from
sample = list(range(1, 101))
# Shuffle the list each iteration
random.shuffle(sample)
# Counter for the number of selected elements in each iteration
added_element_counter = 0
for element in sample:
if random.random() <= proportion:
element_count[element] += 1
added_element_counter += 1
if number == added_element_counter:
break
plt.bar(element_count.keys(), height=element_count.values())
plt.xlabel('Element')
plt.ylabel('Number of times element was picked.')
plt.tight_layout()
plt.show()

Prerequisites
seqkit versionDescribe your issue
When using
seqkit sample -n, the first number of sequences in the file will a higher chance of being picked than later sequences.Porting your logic:
seqkit/seqkit/cmd/sample.go
Lines 155 to 163 in 22f71ff
to Python:
A solution could be to shuffle the records before iterating over them: