PDFs: Rise, Decline, and Revival — JavaScript Sandboxing in Modern PDF Libraries

Name: PDFs: Rise, Decline, and Revival — JavaScript Sandboxing in Modern PDF Libraries
Uploaded: 2025-04-16
Duration: 42 min 23 s
Description: PDFs have evolved from desktop clients (Adobe, Foxit) to browser-based and cloud rendering via libraries like PDF.js, PDFium, and PDFTron. This talk traces how the shift to client-side PDF processing changed the attack surface, examining real vulnerabilities in JavaScript sandboxes alongside widespr

BSides Sofia42:2365 viewsPublished 2025-04Watch on YouTube ↗

Speakers

Luigi Gubello

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

Adobe Acrobat Adobe Reader Firefox Foxit Reader Google Chrome PDF.js PDFium PDFTron

Service

Dropbox

About this talk

PDFs have evolved from desktop clients (Adobe, Foxit) to browser-based and cloud rendering via libraries like PDF.js, PDFium, and PDFTron. This talk traces how the shift to client-side PDF processing changed the attack surface, examining real vulnerabilities in JavaScript sandboxes alongside widespread false-positive CVEs where legitimate PDF features (like alert popups) are misclassified as XSS.

Show original YouTube description

by Luigi Gubello PDFs - rise, decline, and revival: a journey across how we have changed our way of viewing and editing PDF files by moving from offline clients to online services, and how this is changing the role of PDF files as attack vectors. A talk on how we have moved from desktop clients (Adobe, etc) to browsers and online services to render, view, edit, and sign PDF files, and how this has changed the role of PDFs in attacks and exploitations. From the false-positive vulnerabilities (CVE-2020-26505-, CVE-2023-0108, CVE-2023-5873, and other CVEs that were not vulnerabilities) to vulnerabilities in client-side PDF SDKs.

Show transcript [en]

Hello. Okay, I hope that for the end of this talk, this title can be clear to everyone in this room. Who I am? Uh, well, I am a security researcher. I work in a Berlin startup now pitch but previously I worked in small PDF that is a Swiss based startup that process and offer an online so in cloud editing tool for PDF that are still a thing after 30 years. So let's start from PDF in web application because uh PDF has an interesting story uh created has a specification uh proprietary specification by Adob uh 30 years ago so in uh 1993 and they released this format uh portable document format and a reader a crobat reader now uh renamed in Adobe

reader I guess that uh allow people to open and read document static document or at least a sort of virtual version of a paper document on a computer. Five years, six years later in 99, Acrobat uh decided to have this smart idea to introduce JavaScript inside the PDF specification web.3. Very smart idea because this helped at least some of us to have a carriers and find a job in the future. Um but it is also interesting because JavaScript was developed and uh uh invented just four years before. So in 95 in just four years uh Adobe was able to introduce in the specification an entire languages that can run in PDF file. uh in 2001 Foxy read that uh version one

was released and it it is it was slash it is still uh the major alternative in desktop environment for adob reader uh and in uh uh 2008 beca PDF became an open standard. So now there is an ISO uh there is an association there are company that work to update the heiso now we are we have the PDF 2.0 zero released I guess two years ago no three years ago right now and it's no longer propetary uh by adub uh of course open PDF uh uh to to the community to the enterprise to everyone uh helped the company to create their own reader to improve their own reader and so in uh 2010 uh Google

Chrome version 8 uh became the first browser to to render PDF to support natively PDF inside the browser. Before this to open PDF uh for example from an attachment in the email you need to download it and open in the desktop using a client and now uh we were moving to a different approach PDF. So you can open them and visualize them directly in the browser. Um and so the browser became your uh PDF uh viewer PDF client. Interesting is that uh the first version of PDFU that is the core system that run in Chrome to render PDF was based on a Foxy technology that was a proprietary technology. In 2011, Modzilla released his own

system to render PDF in browser that is PDF.js. And there is a very important a big difference between PDF fume and previous uh PDF reader like Adobber Foxit and PDF.js. Previous technology were written in C or C++. Uh PDF.js JS is entirely written in JavaScript. So, Firefox is able to render PDF file in uh in a JavaScript based uh reader. The project is open source and was designed and maintained by Modzilla. Uh one year later, we have a PDFRON web viewer that is released. PFORN is the one of the uh now uh the name is a prize but it is one of the other competitor in the PDF market. It is 25 years old. It's based in Canada

and it's famous uh because they acquired a lot of old but good and solid PDF technology. for example, the opensource project ITEX. Uh but because also they released um this viewer that run in the client and allow you to uh open and edit PDF directly in the client. This means that you don't need to process uh the the PDF in the back end for example but you can do a lot of manipulation a lot of editing directly in the client delegating the calculation cost to to the customer or to the final user and in uh 14 Google released PDF as an open source project. So the first CV that of vulnerabilities that I want to present

is uh 7 years old right now and uh it is the first vulnerabilities uh found in PDF.js. So it was fun. Um the the vulnerabilities was report uh using bugzilla that is the modzilla uh opensource self-hosted platform to report bug not just vulnerabilities but was reported by uh Vladimir Palant. I don't know if you already know this person but he is the original how of adblock plus. So it was quite good to uh write JavaScript code. Uh he found a cross scripting vulnerabilities. The first one interesting uh note about these vulnerabilities. If you open PDF.js project in GitHub and you go on the security tab, you can find just one CV um founded in uh uh 2024.

And this uh this current vulnerabilities that is 7 years old is not in the GitHub security advisor database. Um the issues was um a function a postcript calculator function that uh doesn't properly sanitize a user input allowing of course JavaScript injection uh because this user input was an array a numeric array uh but the PDF.js JS code just check if the input was an array and that didn't check every item to be sure that was a number. Uh we know that this uh exploit or at least that some research or bug bounty hunter reported these vulnerabilities in at least two different back bounty. Interesting is also that they reported in 2020. about two years later that the PD the CV was

released and fixed by Modzilla and um showing also how can be not so easy to update an MPM packages especially if you don't uh expect risk by the the file because a PDF open in the browser should not be so risky at least for a company um the airport were made to slack and nextcloud. So let's see the payload. This is the original payload provided by the author and we can see that we have an object. So I don't know if you are familiar with PDF. Uh you can open PDF like uh well normal people using adob or similar reader or vim if you are crazy enough. And uh this is an object uh and

uh the object uh is a sort of container for instruction and this instruction is just a function uh and it work like a function. So there is a domain and there is a range that is uh the domain is the input and the do the range is the output and this uh function just should help to define where a specific input should be put in the specific output. So the domain is the input the range is the output. Uh these are true array two numeric array with two just two item. uh but the the researcher the developer have seen that uh injecting an item uh that is not just a number uh the code was uh still processed by PDF.js

um we can also see that the JavaScript injection so the cross-ite script in PDF.js JS uh is not related to the JavaScript uh the embedded JavaScript feature that uh the PDF file can have. Uh and this is interesting because often uh JavaScript in PDF is defined like a high risk feature and it is for a reason. Um but for a a JavaScript based PDF uh viewer uh every input can be a risk. So you don't need to focalize just focus only on the JavaScript in the JavaScript. Oh pointer. So let's see now a another vulnerabilities the CV uh 2024. This is more popular. This is well known for two reason. What the first one is that is

more recent. The second reason is that it is in a popular packages still PDF.js and it is uh in the security tab in GitB of the project and the blog post uh written by the author that found the vulnerability is a great post. So I really recommend it. Let's start with with a quote from their blog post from Tom Thomas Reinosma that is uh the security engineer that found a cross scripting in PDF.js. So 7 years later after the first and unique uh cross scripting vulnerability in the library. uh you can be surprised that to her that this bug is not related to the PDF format JavaScript scripting functionality because uh in this I mean

in our field in the security uh in previous readers so client u desktop oriented read a lot of vulnerability were based on the JavaScript abuse uh or for logical issue so they implement method that were unsafe by design or for other kind of memory issue vulnerabilities because the languages used to write to code Foxit and adob were memory unsafe. So uh this is the second CV for PDF.js has seven year later still a cross-ite scripting. Of course, if you have a client library, the cross-ite scripting is the one of the most critical vulnerability you can have. It is technically a remote code execution in the library context in a very popular package right now because uh in

2025 this package is downloaded by mpm 3.5 million uh it has a 3.5 million download per week. So it it is used in open source project, it is used in project, it is used in production and it is used by Firefox to render PDF. So the surface is quite big. The issue uh the font matrix uh array was not properly sanitized allow JavaScript injection uh using compile glyph. What is interesting in this uh in this description is that the problem is still an array a user input not properly sanitized but it's still an array that uh is considered a trusted input just because uh I mean it it is an error was a bug uh by the maintainer of pdf.js but

we can see that this CV is quite similar uh to the previous one. Uh let's uh let's check the uh the payload used. We have still an object. We have a font this time. When it's not this when it's not fault of DNS, it's uh a font fault. Um and we have the font matrix. It is another uh an array of numbers. And still uh the researcher try to inject the code inside uh the array in the item. And we can also see that they break the code the JavaScript code to inject our battery JavaScript code. Uh let's see the two vulnerabilities. So after six, seven year are pretty the same and they abuse the same

uh code. Yeah, maybe little it is changed a little over the year but it is the same function. So uh the author I mean the founder of vulnerability in 2024 uh understood that if they can inject uh arbitrary code here in the command they can break the JavaScript code in the best the best scenario they can inject arbitrary code in the version scenario and run arbitrary JavaScript. So uh we can see that font.mmatics have any check it is considered just an array and there is the dot slice to split the various item and in this way and then it is passed to the command it is injected direct directly has JavaScript code. This is why here we have this the

parenthesis and the and the comma help to break the JavaScript code to inject our line. And it is interesting to see that uh uh in uh 2018 the previous uh uh researcher, the previous um person that reported the vulnerabilities uh wrote it would send a malicious glyph to the viewer uh and glyph that is being used to produce JavaScript code without order validation. And this is exactly what happened seven year later. So what this researcher did uh is probably study the pre the only uh previous vulnerability in PDF.js understood how the PDF.js has handled uh the JavaScript transformation and the JavaScript injection uh with a list of safe command and try to inject arbitrary command uh exploiting exactly

the same vulnerabilities. Uh I recommend also the report written in Bagzilla. There is the link in the previous slide uh written by uh the um first researcher because it helped uh you to really understand that this vulnerability was already technically found seven years ago but probably not totally understood by the maintenance not because it is easy to understand the entire context of a so complex uh library. Uh so what we have seen uh until now is uh PDF.js JS that is definitely the most popular opensource library to render uh PDF in client so in the website front end uh this library has only two CVS they are cross-ite scripting and the code was injected using similar method

methodology and uh it's not based on JavaScript uh supported by PDF format in addition we need to know that uh PDF.js has implemented JavaScript just um four years ago um in 2021 if I remember correctly and it is a very very uh sandboxed JavaScript API uh and a lot of function defined in the specification are um uh not implemented because the the Modilla security team uh decide that it's not safe enough but in uh what I want to present now another vulnerabilities now based on the apprise editor uh the apprise apprise is um um a company that provide proper technology and uh want to be a competitor of adob the difference between PDF.js for example PDF in Chrome

and a price product is that PDF.js JS want to offer just a solution to the user um to visualize a render PDF in the browser. So you don't want to download a Croat reader. You want to visualize your PDF attached in the email without uh going out from the browser. You uh and Modzilla offer this solution. Similar situation for Chrome. You have um a PDF reader inside the browser. So you don't need to download or install third party integration because years ago with the first version of promo you need to install a plug-in by Adob. Now API want to offer a client solution for developers. So the target is different. is not the final user is

the developer that want to build um a client oriented uh solution to visualize, render, fill out whatever PDF and their goal is to be aligned to adob standard, adob specification. Why? Because in the enterprise and especially in the public administration, a lot of PDS, they have a lot of PDF. It is a mess. They can have a form. They can have a function. You can calculate the value in the form. And so they start to create this um um parity feature with a do editor in JavaScript. Uh it uh it is uh adopted by a lot of enterprise. There is it is used in Dropbox. It is used in virtual small PDF. I love PDF. Lumin PDF. So every um

PDF cloud-based company use a price because if you can operate on the PDF in the client, you can save cost in your AW your infrastructure or cloud infrastructure. So you delegate to the client operation that until 15 years ago you needed to do in the back end. It is popular enough to be a paid uh packages in in npm and this is uh a a cv cross scripting based on the javascript feature in pdf. So um apprise uh web viewer support uh some of the JavaScript function uh defined in the PDF specification but um to do this this means that you need to find a solution or try to mitigate the risk of arbitrary injection. So they

have a sandbox uh to uh restrict arbitrary code uh but it is a sandbox in a JavaScript codebase written in JavaScript. Uh so let's see this is the payload. So what we have here we have a an action uh and action is a JavaScript code. This is uh the when you find an object in in a PDF file that have /javascript/js. This means that there is an action in the javascript in the in the pdf file that run javascript code and uh we can see uh uh console uh print ln a command inside console.print print ln and command inside and until the last line where we have a windows doconfirm document cookie. Okay, console.print ln is not a javascript web

api. So if you turn to run it in your um um web de in your browser web develop development console, you receive an error. If you want to run console uh and print something, you need to use console.log. log but console.print ln is the console log of a crobat javascript API. Uh so how this payload work? Well uh come back uh this payload work because uh we know that that sandbox written by uh a price is written in JavaScript. So our goal is to bypass a JavaScript code that try to um sandbox our own JavaScript code. There is a old medium blog post by a rust security researcher that wrote these sentences. This is probably

correctly the title of the medium uh blog post. But building a sandbox in pure javascript is a full earrand and it is technically true. uh Google team tried for year to write sandbox in AngularJS and they failed uh over and over in the in the attempt because uh every time there was a security researcher that was able to bypass the JavaScript senders written by Google developer in addition to write a sandbox in JavaScript probably you will use an unsafe function to run the code that you want to run. So you will use a after that you have um parser the code to remove unsafe line but you have enable that means that if uh the attacker is able to send the

right string that you was not able to parse uh the code can run. So how the a price uh code works? There is a parser of course because to render a PDF every library, every uh editor, every client parse the PDF to render it correctly. And this is also why um some PDF the same PDF can uh appear differently in two different uh viewer because the codebase that visualize the PDF even if the PDF has a specification can be implemented for various reason in different way and this means that you can have a different renderization of the same file. It is very common in the printer for example. Uh so what we have here we have a

JavaScript uh code that parse the object in the PDF. So it parse also the embedded JavaScript code because a price support it. So found the object with the JavaScript code and then start to map every line, every acrobat JavaScript API to a function or to um a to a custom function or a function that is supported by a JavaScript web API. This means that console clear uh console.clear is still console. uh and print ln even if it is offuscated is a very very big JavaScript offuscated but this function just map print ln in console log the classic console log and this happened for all the acrobat API uh JavaScript API that you can have or at

least the supported then this output after that they converted the JavaScript object to a JavaScript script web uh uh code it is uh passed to a JavaScript sandbox where they uh add literally literally add a list a long list I trunk this but it is I mean 80% of the sandbox is a list of variable that are defined has undefined undefined undefined and why they do this because they want to be sure that even if you try to inject arbitrary code you cannot run it because they have previously undefined it. But there is a problem in this um in this uh um sandbox. After that they have uh create this safe output they pass to the

evil command command to run the JavaScript code. But it is hard I don't know if it is possible honestly. Uh if you have any good sample of uh JavaScript sandbox written JavaScript that have no um crosscrip thing or injection vulnerability, please send me because I'm very curious. But it's very very hard to write a JavaScript that cannot a JavaScript send that cannot be bypassed. And we can see what we have. We you can now understand better the the the payload console. print len is the acrobat javascript function but delete window is a a v um a function or properties that they didn't delete or remove by in their sandbox. This uh allow me to delete the

undefined value defined for window for confirm and for document. Deleting it the the three function because they are uh JavaScript web API function window confirm document come back to the original um original uh purpose. So I can use again and then I can run this the evil convert this in console.log. So if you open your web console when uh when there is this malicious PDF in the applied web viewer you can see that in console log you have a true true true and this means that we were able to run the lead correctly and then there is the popup. Uh so we um say that um uh in the PDF.js the previous two vulnerability were not

based on uh acrobat on the acrobat javascript function. Uh but uh pdf.js JS and also Chrome support a very restrict uh set of uh acrobat javascript API. So why they are safe or are they safe or are they safe enough? This is the question. Well uh yeah they are safe enough we cannot say that they are 100% safe but they are safe enough because they uh adopted a solution to mitigate the risk that is quite different. The first one is that this for example in PDF.js JS and PDF fume the set of acrobat JavaScript function supported or API supported is very very small set and the second is that the security team in Modzilla in 2021 wrote a blog post where

they explain how they implemented a sandbox to support a restricted set of API and they of course evaluated JavaScript evil as a solution but uh it was quite simple why they decide to not proceed in this way because it is uh a the best way to allow to run untrusted code and so evil is unsafe by design and it is how um PDF.js is able to run um JavaScript Crobat JavaScript without uh exposing or without using a JavaScript sandbox. Well, they adopted two solution. If you run pdf.js in Firefox, they use the same sandbox system that they have for extension extension in the browser should not be able to run arbitrary JavaScript code without the user consent

and the PDF.js supported natively by uh modzilla use the same extension sandbox and so it cannot access to the doom. uh if you try to print using console print and the origin you cannot probably but if you can and maybe editing the code and rebuilding Firefox you can see that the origin of the page is not the URL but it is the extension that run the sandbox and for PDF.js has uh that everyone use as an npm package for example so has client solution well they create a sandbox in wasma using quickjs so all the javascript code that you can run in pdf dojs run in a wasma isolated sandbox and uh yeah you can try to hack

a quick js for example but it is interesting because quickjs is not written in JavaScript it's written in or C or C++ sorry I don't remember now um but this means that to find the vulnerability in the sandbox used by PDF.js has to run Java arbitrary JavaScript code, you need to start to study uh probably C and C++ and find issue in unsafe uh memory languages. And so we are moving uh we are increased the difficulty of course and we are moving from JavaScript code base to C uh family code base uh increased definitely the complexity to find just an XSS. Uh there is another um library proprietary library uh uh nutrient if I remember correctly the

name uh that use the wasma solution to isolate uh JavaScript uh but instead of using quickjs has sandbox they use uh another a different sandbox uh duct tape uh and it is interesting because you can find some uh uh open issue or old issue in duct tape uh where you were able to uh generate uh error in the sandbox so in the duct code not uh in the JavaScript code that can crash the uh nutrient but no injection. Last CV uh that I want to present is another cross scripting vulnerabilities. Uh this time is for an anotized input injected directly in the Doom. It's it's not based on on JavaScript uh object. It is based

on annotation object. It is more similar to the PDF.js vulnerabilities. But it is interesting for two reason. The first one is that I looked to I mean to find an alternative way to inject um to inject uh JavaScript code to bypass the content security policy in Dropbox. I was able to find a way to bypass the content security but it was enough because Dropbox run a prize web viewer in a sandbox domain. Good good choice uh Dropbox. Uh let's see uh the payload used. So it's quite basic. We have an annotation and this is the name of the file. This is the text in the form and the react it is uh define the dimension

of the form. So if you open a PDF you can find just a form that is prefilleded with this text. Uh the original policy by um by Dropbox was this one was a content security policy in a meta tag and this give me um and in the script uh SRC you can see self and unsafe line. Why unsafe line? Because to run that specific version of a price you need to have unsafe line. Uh fun fact uh if you open the price documentation they recommend you to implement a content security policy the for the last version you don't need to have unsafe in line but if you want to support JavaScript a crop JavaScript API you need to add to your content

security policy and it is written their documentation unsafe line and unsafe evil that show you just reading the content security policy that price use an evil function to run the code in the JavaScript object. So, uh a meta CSP has two um at least this one have um uh two benefit for an attacker. The first one is that there is unsafe inline and this means that you can inject code if you find a vulnerability input and this was the the way how it worked and the second one is that the there the meta the CSP in a meta tag doesn't support query. So this means that the security the dropper security team didn't have visibility on my attempt.

They cannot see that I was trying to bypass their security policy. And this is the um main concern when you implement a a cont security meta tag. It's better to have in the header for this reason because you can track the error. Uh now that we have a see four different CV in two different packages we can also see and we have seen that three out of four were not based on JavaScript API not based on app.al alert we can see that PDF are still complex for the developer but also for the security researcher and we have a lot of vulnerabilities in the meter database that are not real vulnerabilities. When you uh open a PDF file and you see an

alert pop up with uh where there is written one or there is written XSS it is a legitimate feature because PDF.js JS PDF fume support but also a price support the JavaScript alert feature that is technically safe because it should not has access to the DOM. they cannot have access to cookies. But uh it is so common to write report for for example for cross keeping vulnerabilities where you just show a popup uh that a lot of research didn't understand a lot of research that a lot of developer uh don't know how PDF work and so that alert have have not a clear uh reason to exist and so they consider this a vulnerabilities but it's not so

this list is a list of CV that I just stopped in 2020 because it was too long otherwise and honestly I skipped a lot of them because there were too many. But this is a list of 11 10 CVE reported by our independent researcher like this in open source project. If you open the link you can see that there is a discussion later but still this TV exist is not removed. You can have uh big company also that report vulnerability that are not vulnerabilities. Uh this is just because JavaScript in PDF is supported but it is some boxed so it's not vulnerabilities. So some recommendation if you are a bug bounty hunter to prevent to submit this kind of report

where technically you haven't found vulnerabilities. So the first one is understand the context. Where is the PDF rendered in the website? So in the client or in your browser because for example if you use an object tag the PDF is not rendered by any library. It's rendered directly by the browser and the PDF is rendered by the browser because Chrome, Firefox but also Safari support the PDF visualization. This means that if you if you are able to um use two different browsers for example I use Chrome and Firefox just as as example uh and the PDF viewer layout is different probably you are not opening that PDF in the web application but it's your browser that is opening

that PDF so the issue cannot be in the website and the second is uh the payload that you use app.al alert one or app.al alalert xss is not enough because yeah you can run an alert a popup in your browser but have you access to the dom can you print a cookie for example if you cannot print print a cookie probably you don't have access to the local storage to the cookie you cannot run any privilege action so how it can be a vulnerabilities if you can just print a one But there are also some recommendation for developer maintainers, open source maintainers for example or back bounty managers. So uh if you receive this kind

of report uh maybe it's u it's better to ask for more details. For example, how you can have received a report of a cross-sight scripting vulnerability in a PDF visualization. If your codebase has no library to render PDF, where is the vulnerability in your code? Probably you you are not vulnerable. And the second one is that you should not ask to the researcher to I mean if you read a bug bounty program um description often they ask for SQL injection, remote code execution, crosscaping. Yes, it's true. You want to uh describe which kind of vulnerabilities are important for you and you want to describe the scope and the purpose. But maybe it's better to ask for a flag. Don't ask for an SQL

injection. Ask to print the first line of the database. If they can print the first line of the database, the first row, probably they have found an SQL injection. Ask to print a cookie. If they can print a cookie, they have a cross escaping vulnerabilities. And in this way you can avoid a lot of this where there are just garbage in a database that we cannot trust right now because a lot of people study previous vulnerabilities to understand what is a vulnerabilities and uh they now think that a popup in PDF is a vulnerabilities but it's not usually uh mitigation how we can pre if you are really use a PDF library to render your

PDF in the web application that you wrote. How you can mitigate the risk? Well, you have some option of course. The first one is a good content security policy. A modern content security policy based on stick dynamic plus ash probably can mitigate almost all the attempt for sure it could mitigate the uh two CV in PDF.js J has and also one in a price um isolated domain if you offer if you work in a company that offer a service has to isolate these packages the PDF renderer can be in the subdomain in a frame I mean something that help you to isolate oh it's over and disable JavaScript code supported thanks and if you have any question I'm sorry for the

Mhm.

PDFs: Rise, Decline, and Revival — JavaScript Sandboxing in Modern PDF Libraries

Related talks