GraphQL 中的文件上传处理
¥Handling File Uploads in GraphQL
GraphQL 的设计初衷并非用于文件上传。虽然技术上可以实现这些功能,但这需要扩展传输层,并会引入一些安全性和可靠性方面的风险。
¥GraphQL was not designed with file uploads in mind. While it’s technically possible to implement them, doing so requires extending the transport layer and introduces several risks, both in security and reliability.
本指南解释了为什么通过 GraphQL 上传文件存在问题,并提供了更安全的替代方案。
¥This guide explains why file uploads via GraphQL are problematic and presents safer alternatives.
上传为何困难
¥Why uploads are challenging
GraphQL 规范 与传输方式和序列化方式无关(尽管 HTTP 和 JSON 是社区中最常见的组合)。GraphQL 的设计初衷是处理来自客户端的相对较小的请求,并非为了处理二进制数据。
¥The GraphQL specification is transport-agnostic and serialization-agnostic (though HTTP and JSON are the most prevalent combination seen in the community). GraphQL was designed to work with relatively small requests from clients, and was not designed with handling binary data in mind.
相比之下,文件上传通常处理图片和 PDF 等二进制数据 - 许多编码(包括 JSON)无法直接处理此类数据。一种方法是在我们的编码体系内进行编码(例如,在 JSON 中使用 base64 编码的字符串),但这效率低下,并且不适用于较大的二进制文件,因为它不易支持流式处理。相反,multipart/form-data 是传输二进制数据的常用选择;但这并非没有其自身的复杂性。
¥File uploads, by contrast, typically handle binary data such as images and PDFs — something many encodings, including JSON, cannot handle directly.
One option is to encode within our encoding (e.g. use a base64-encoded string within our JSON), but this is inefficient and is not suitable for larger binary files as it does not support streamed processing easily.
Instead, multipart/form-data is a common choice for transferring binary data; but it is not without its own set of complexities.
通过 GraphQL 支持上传通常需要遵循社区约定,其中最普遍的是 GraphQL 多部分请求规范。此规范已在多种语言和框架中成功实现,但实现者必须格外注意,确保不会引入安全或可靠性问题。
¥Supporting uploads over GraphQL usually involves adopting community conventions, the most prevalent of which is the GraphQL multipart request specification. This specification has been successfully implemented in many languages and frameworks, but users implementing it must pay very close attention to ensure that they do not introduce security or reliability concerns.
需要注意的风险
¥Risks to be aware of
重复变量导致的内存耗尽
¥Memory exhaustion from repeated variables
GraphQL 操作允许多次引用同一个变量。如果文件上传变量被重用,则底层流可能会被多次读取或过早耗尽。这可能导致行为异常或内存耗尽。
¥GraphQL operations allow the same variable to be referenced multiple times. If a file upload variable is reused, the underlying stream may be read multiple times or prematurely drained. This can result in incorrect behavior or memory exhaustion.
安全的做法是使用可信文档或验证规则,确保每个上传变量仅被引用一次。
¥A safe practice is to use trusted documents or a validation rule to ensure each upload variable is referenced exactly once.
失败操作导致的流泄漏
¥Stream leaks on failed operations
GraphQL 分阶段执行:先验证,后执行。如果验证失败或授权检查过早终止执行,上传的文件流可能永远不会被使用。如果你的服务器缓冲或保留这些流,可能会导致内存泄漏。
¥GraphQL executes in phases: validation, then execution. If validation fails or an authorization check prematurely terminates execution, uploaded file streams may never be consumed. If your server buffers or retains these streams, it can cause memory leaks.
为避免此类问题,请确保在请求完成后终止所有流,无论它们是否已被解析器使用。另一种方法是将传入的文件立即写入临时存储,并将引用(例如文件名)传递给解析器。确保在请求完成后清理此存储,无论成功还是失败。
¥To avoid this, ensure that all streams are terminated when the request finishes, whether or not they were consumed in resolvers. An alternative to consider is writing incoming files to temporary storage immediately, and passing references (like filenames) into resolvers. Ensure this storage is cleaned up after request completion, regardless of success or failure.
跨站请求伪造 (CSRF)
¥Cross-Site Request Forgery (CSRF)
multipart/form-data 在 CORS 规范中被归类为“简单”请求,不会触发预检。如果没有显式的 CSRF 保护,你的 GraphQL 服务器可能会在不知情的情况下接受来自恶意来源的上传。
¥multipart/form-data is classified as a “simple” request in the CORS spec and does not trigger a preflight check. Without
explicit CSRF protection, your GraphQL server may unknowingly accept uploads from malicious origins.
过大或过多的有效负载
¥Oversized or excess payloads
攻击者可能会提交非常大的上传文件,或在未使用的变量名下包含无关文件。接受并缓存这些请求的服务器可能会不堪重负。
¥Attackers may submit very large uploads or include extraneous files under unused variable names. Servers that accept and buffer these can be overwhelmed.
强制执行请求大小限制,并拒绝任何未在 multipart 有效负载的 map 字段中显式引用的文件。
¥Enforce request size caps and reject any files not explicitly referenced in the map field of the multipart payload.
不受信任的文件元数据
¥Untrusted file metadata
文件名、MIME 类型和内容等信息永远不应被信任。降低风险的方法:
¥Information such as file names, MIME types, and contents should never be trusted. To mitigate risk:
-
清理文件名以防止路径遍历或注入问题。
¥Sanitize filenames to prevent path traversal or injection issues.
-
独立于声明的 MIME 类型来检测文件类型,并拒绝不匹配的文件。
¥Sniff file types independently of declared MIME types, and reject mismatches.
-
验证文件内容。注意特定格式的攻击,例如 ZIP 炸弹或恶意构造的 PDF 文件。
¥Validate file contents. Be aware of format-specific exploits like zip bombs or maliciously crafted PDFs.
建议:使用签名 URL
¥Recommendation: Use signed URLs
最安全、可扩展的方法是完全避免通过 GraphQL 上传文件。正确做法:
¥The most secure and scalable approach is to avoid uploading files through GraphQL entirely. Instead:
-
使用 GraphQL mutation 从存储提供商(例如 Amazon S3)请求签名的上传 URL。
¥Use a GraphQL mutation to request a signed upload URL from your storage provider (e.g., Amazon S3).
-
使用该 URL 直接从客户端上传文件。
¥Upload the file directly from the client using that URL.
-
提交第二个 mutation,将上传的文件与应用的数据关联起来(或者使用自动触发的流程,例如 Amazon Lambda 来完成相同的操作)。
¥Submit a second mutation to associate the uploaded file with your application’s data (or use an automatically triggered process, such as Amazon Lambda, to do the same).
你应该确保这些文件上传仅保留很短的时间,以防止攻击者仅完成步骤 1 和 2 就耗尽你的存储空间。处理文件上传(步骤 3)时,应根据需要将文件移动到更永久的存储位置。
¥You should ensure that these file uploads are only retained for a short period such that an attacker completing only steps 1 and 2 will not exhaust your storage. When processing the file upload (step 3), the file should be moved to more permanent storage as appropriate.
这清晰地分离了职责,保护你的服务器免受二进制数据处理的影响,并符合现代 Web 结构的最佳实践。
¥This separates responsibilities cleanly, protects your server from binary data handling, and aligns with best practices for modern web architecture.
如果你仍然选择支持上传功能
¥If you still choose to support uploads
如果你的应用确实需要通过 GraphQL 上传文件,请谨慎操作。至少,你应该:
¥If your application truly requires file uploads through GraphQL, proceed with caution. At a minimum, you should:
-
使用维护良好的 GraphQL 多部分请求规范 实现。
¥Use a well-maintained implementation of the GraphQL multipart request spec.
-
强制执行上传变量只能引用一次的规则。
¥Enforce a rule that upload variables are only referenced once.
-
流式上传到磁盘或云存储 - 避免将其缓存在内存中。
¥Stream uploads to disk or cloud storage—avoid buffering them in memory.
-
确保无论流是否已被使用,在请求结束时始终终止流。
¥Ensure that streams are always terminated when the request ends, whether or not they were consumed.
-
应用严格的请求大小限制并验证所有字段。
¥Apply strict request size limits and validate all fields.
-
将文件名、类型和内容视为不受信任的数据。
¥Treat file names, types, and contents as untrusted data.