-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFnf-performanceNon-functional change: PerformanceNon-functional change: Performancenf-securityNon-functional change: SecurityNon-functional change: Security
Description
When you try to get the content stream of this attached PDF, PyPDF2 will end up in an infinite loop. So this is probably a security issue because it might be possible to denial-of-service applications using PyPDF2.
The reason is that the last while-loop in ContentStream._readInlineImage only terminates when it finds the EI token, but never actually checks if the stream has already ended. So it's as simple as adding a (broken) inline image that doesn't have an EI token at all, like the attached PDF.
You can see the infinite loop by running this test script with the attached PDF:
import sys
from PyPDF2 import PdfFileReader, PdfFileWriter
from PyPDF2.pdf import ContentStream
with open(sys.argv[1], 'rb') as f:
pdf = PdfFileReader(f, strict=False)
for page in pdf.pages:
contentstream = ContentStream(page.getContents(), pdf)
for operands, command in contentstream.operations:
if command == b'INLINE IMAGE':
data = operands['data']
print(len(data))I will soon prepare a pull request that fixes this issue.
Larivact, jalan, tataganesh, MartinThoma and elibroftw
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFnf-performanceNon-functional change: PerformanceNon-functional change: Performancenf-securityNon-functional change: SecurityNon-functional change: Security