Skip to content

Conversation

@ematipico
Copy link
Member

@ematipico ematipico commented Apr 14, 2023

This RFC introduce a new middleware API in Astro

Links

@ematipico ematipico force-pushed the rfc/stage-3-middleware branch from 85a3b3d to d13e252 Compare April 14, 2023 14:23
@ematipico ematipico force-pushed the rfc/stage-3-middleware branch from d13e252 to 9395cb9 Compare April 14, 2023 14:27
@ematipico ematipico marked this pull request as ready for review April 14, 2023 14:31
@matthewp
Copy link
Contributor

One other comment, if you call next() but don't care about the result (because you don't need to do anything to the response), are you expected to return the return value? Ie do you have to do return next(); or is just next(); ok? I don't have an opinion on this and am kind of assuming that the first way is how it's currently implemented. If so I could see this as being a good thing, that you must return some response.

@ematipico ematipico mentioned this pull request Apr 17, 2023
@ematipico
Copy link
Member Author

One other comment, if you call next() but don't care about the result (because you don't need to do anything to the response), are you expected to return the return value? Ie do you have to do return next(); or is just next(); ok? I don't have an opinion on this and am kind of assuming that the first way is how it's currently implemented. If so I could see this as being a good thing, that you must return some response.

I suppose it really depends on how the user will it. next(); and return next() will yield two different results. Without the return, the next middleware won't access to the Response object. If the user knows that that middleware won't need it, I suppose next(); is a correct pattern.

@lilnasy
Copy link
Contributor

lilnasy commented Apr 29, 2023

Could a middleware tell if it's running in dev mode?

The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev.

cc: #228

@lilnasy
Copy link
Contributor

lilnasy commented Apr 29, 2023

Could a middleware communicate with an integration?

The use case I'm thinking of where an integration hashes inline resources, and a middleare uses the hashes to create a strict Content-Security-Policy header.

I understand letting middleware be added by integrations was made out of scope for now, but receiving data from an integration alone would be valuable.

cc: #377

@ematipico
Copy link
Member Author

Could a middleware tell if it's running in dev mode?

The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev.

cc: #228

Not at this stage. It wasn't part of the original RFC. Currently, the middleware is code that goes in production, and the mode is a piece of information that is lost after the build.

@ematipico
Copy link
Member Author

Could a middleware communicate with an integration?

The use case I'm thinking of where an integration hashes inline resources, and a middleare uses the hashes to create a strict Content-Security-Policy header.

I understand letting middleware be added by integrations was made out of scope for now, but receiving data from an integration alone would be valuable.

cc: #377

We can explore this valid use case after the middleware is released under an experimental flag. During the exploration phase, I evaluated creating a middleware as an integration, but that was too far from the initial RFC.

@lilnasy
Copy link
Contributor

lilnasy commented May 1, 2023

Could a middleware tell if it's running in dev mode?
The use case I'm thinking of is where a middleare caches responses in memory, similar how to how Next's ISR works. The middleware should not act during dev.
cc: #228

Not at this stage. It wasn't part of the original RFC. Currently, the middleware is code that goes in production, and the mode is a piece of information that is lost after the build.

@ematipico I looked around a little more, and matthewp gave an example for a dev-only route in the issue. If that example works, prod-only ISR could too.

@matthewp
Copy link
Contributor

matthewp commented May 1, 2023

Yeah, I would expect import.meta.env.MODE to work just like it does in any other file in the project.

@ematipico
Copy link
Member Author

I wasn't aware of that metadata, and I personally haven't tried. But if that metadata is available in endpoints, it will definitely be available in middleware too!

@snimavat
Copy link

snimavat commented May 5, 2023

I come from Java background, and middleware word has always been confusing to me.
If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

@ematipico
Copy link
Member Author

I come from Java background, and middleware word has always been confusing to me. If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

Yeah, there are different names to address the same pattern. SvelteKit calls it "hooks", like fastify.

@raulfdm
Copy link

raulfdm commented May 5, 2023

I come from Java background, and middleware word has always been confusing to me. If it does the job of intercepting requests/response - "Interceptors" seems more meaningful

Yeah, there are different names to address the same pattern. SvelteKit calls it "hooks", like fastify.

In Next.js world they also call middleware, but their middleware is less capable than the pattern we're seeing here.

https://nextjs.org/docs/pages/building-your-application/routing/middleware

@gkatsanos
Copy link

how safe are we to use a middleware in production that intercepts a request and sets a global variable or a value in a nano store based on the request domain name?

@ematipico
Copy link
Member Author

how safe are we to use a middleware in production that intercepts a request and sets a global variable or a value in a nano store based on the request domain name?

The feature is under an experimental flag for different reasons:

  • it might contain bugs
  • the APIs might still change, which are not semver complaint

Although, the code written inside the middleware is stored in the backend (like the rest of the Astro code). It's advised to ensure you don't write any sensible data inside Astro.locals

@pilcrowonpaper
Copy link

Quick question, why should locals be serializable?

The information must be serializable because storing information that evaluates at runtime is unsafe. If, for example, we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

How would an attacker inject or execute unsafe code?

@pilcrowonpaper
Copy link

It also looks like API routes run before the middleware?

@ematipico
Copy link
Member Author

ematipico commented May 17, 2023

Quick question, why should locals be serializable?

The information must be serializable because storing information that evaluates at runtime is unsafe. If, for example, we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

How would an attacker inject or execute unsafe code?

It's mostly about third-party libraries. Let's suppose that there's an astro-magic-auth library that does something with locals; then, this library gets hacked and decides to push some sketchy code in thelocals:

locals.email = "<script>eval(alert('hacked'))</script>"

Nobody prevents a third-party library from doing so.

@ematipico
Copy link
Member Author

It also looks like API routes run before the middleware?

That was a bug :) I believe it's been fixed in withastro/astro#7106

@pilcrowonpaper
Copy link

@ematipico

I'm not sure serializing data prevents this?

locals.email = returnsUnsafeScript(); // <script>eval(alert('hacked'))</script>

if we were able to store a JavaScript function, an attacker could exploit the victim’s website and execute some unsafe code

But if you're using third party dependencies, they can already run unsafe code:

// node_modules/some-library/index.js

export someFunction = () => {};

console.log("unsafe code");

Not having the ability to store any data type seems to be a massive limitation IMO.

@ematipico
Copy link
Member Author

I'm not sure serializing data prevents this?

It indeed doesn't; probably, we should also escape the strings, just to be sure.

But if you're using third-party dependencies, they can already run unsafe code:

It's more about data that gets rendered on the page, not about the code per se.

Not having the ability to store any data type seems to be a massive limitation IMO.

It is, I know! What would you suggest instead? How can we support more data types without overlooking the security?

@tropperstyle
Copy link

I have a request to expose the exports of the matched route module to the middleware.

Since auth guards are a primary example for this proposal, this could help avoid path logic growing in the middleware and allow explicitly flagging pages as needed:

---
export const requireAdmin = true;
---
// Make all non-Astro defined exports available to context
export const onRequest = ({ exports, request, redirect, locals }, next) => {
  if (exports.requireAdmin) {
    locals.admin = await fetchAdmin(request);
    if (!locals.admin) return redirect('/login');
  }
  return next();
}

@pilcrowonpaper
Copy link

@ematipico

Doesn't Astro escape strings inside html templates anyway? If you’re using set:html, it should be your responsibility to make sure your dependencies are safe. I really think this is a non issue that doesn’t need to be addressed. SvelteKit has locals for like 2 years and I’m not aware of any html injection, and Express has had it for even longer.

@matthewp
Copy link
Contributor

I think the primary driver for the serializable restriction is that we plan to make it possible for adapters (like Vercel) to split the middleware into a separate bundle to deploy to an Edge. This way you can have things like auth-checks happen at edge locations and then your app code in serverless.

For this to work we need to be able to serialize locals to send from the edge worker to the origin. We weren't sure that people would even notice this!

But it sounds like a lot of people are wanting to put complex objects onto locals. This would just mean you couldn't use the edge feature. But for a lot of people that's a fine tradeoff. So I think this user feedback is a good reason to lift the restriction.

@matthewp
Copy link
Contributor

cc @ematipico Curious what you think about the above 👆

@gkatsanos
Copy link

Sorry to jump into the RFC with a slightly different topic. We have a use-case that combines nanostores with the middleware and we wanted to share it with you to get some feedback ("will it work"). :

import { setBrand } from "@store/brand";

export function onRequest(context, next) {
  if (context.request.includes === "europages") {
    setBrand("ep");
  } else {
    setBrand("wlw");
  }

  return next();
}

store:

// store/brand.js
import { atom, action } from "nanostores";

export const brand = atom("wlw");

export const setBrand = action(brand, "setBrand", (store, brand) => {
  store.set(brand);
});

then use the store value in a root Layout to provide a 'theme' based on the route base url:

<html lang="en" data-brand={brand.value}>

That's in a node/standalone SSR mode. I was told it's not a reliable solution over Discord.

@ematipico
Copy link
Member Author

cc @ematipico Curious what you think about the above 👆

We could lift off the check, I don't mind at all, as long as it's always secure to store "complex" types.

Regarding the serialisation and the edge worker, maybe we should move the check inside the integration itself.

@matthewp
Copy link
Contributor

@ematipico Ah yeah, if that's a restriction of that use-case it does make senes to move the check there. I'm not sure exactly how that will be implemented, but I imagine that the adapter will add it's own middleware at the end of the chain. It can do the check there.

@matthewp
Copy link
Contributor

@gkatsanos I'm not very familiar with nanostores, but looking at the code it appears that brand is a global which doesn't work well in a server environment, you have multiple requests all being processed at the same time.

I would recommend putting branch onto the locals object.

@pilcrowonpaper
Copy link

For the Vercel adapter, if we're going to add it, please make deploying to Vercel's Edge middleware optional. The biggest limitation with Next.js' Middleware is that you can't deploy it to just a regular serverless server(?).

@wassfila
Copy link

@gkatsanos I'm not very familiar with nanostores, but looking at the code it appears that brand is a global which doesn't work well in a server environment, you have multiple requests all being processed at the same time.

I would recommend putting branch onto the locals object.

right, that's where the discord support post is going to, thanks for confirmation https://discord.com/channels/830184174198718474/1109101214802653224

@matthewp
Copy link
Contributor

@pilcrowonpaper Yeah, that will definitely be an opt-in feature.

@matthewp
Copy link
Contributor

The serialization restriction has been lifted from the RFC.

With that being the case, I'd like to do a call for consensus on this RFC. This will be the final comment period, 3 days, if there are still issues that need to be addressed please bring those forward now.

@matthewp matthewp merged commit c0f5c7f into main May 30, 2023
@matthewp matthewp deleted the rfc/stage-3-middleware branch May 30, 2023 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants