This project is a RESTful API that allows users to upload files, stores them in AWS S3, and triggers a process to extract metadata using an AWS Lambda function. The extracted metadata is stored in DynamoDB and can be retrieved via an API endpoint.
- Node.js (v18 or higher)
- AWS CLI (configured with appropriate permissions)
- Terraform (v1.0 or higher)
The deployment workflow involves an initial two-step process due to the need for the Lambda zip file to be uploaded to the S3 bucket before it can be configured as part of the infrastructure. For subsequent updates, additional manual steps are required.
-
Clone the Repository:
git clone https://github.com/romulogatto/opstream_challenge.git cd opstream_challenge -
Install Dependencies:
npm install
-
First-Time Workflow:
-
Partial Terraform Deployment:
cd terraform terraform init terraform apply -target=aws_s3_bucket.file_upload_bucket -target=aws_dynamodb_table.metadata_table -target=aws_iam_role.lambda_roleTerraform will output the S3 bucket name where the Lambda zip file should be uploaded.
-
Package and Upload Lambda Function:
cd lambda/metadataExtractor zip -r metadataExtractor.zip index.js node_modules aws s3 cp metadataExtractor.zip s3://<your-s3-bucket-name>/lambda/metadataExtractor.zip
-
Complete Terraform Deployment:
terraform apply
-
-
Subsequent Lambda Updates:
- When updating the Lambda code:
- Repackage and upload the Lambda function as described in Step 3.
- Manually redeploy the Lambda function via the AWS Management Console to apply the updates.
- When updating the Lambda code:
-
Run the Server:
npm start
-
Endpoint:
POST /api/files/upload -
Request:
- Form-data:
file: The file to upload (required).- Any additional fields in the request body will be included as metadata.
- Example:
curl -X POST http://localhost:3000/api/files/upload \ -F "file=@/path/to/file.pdf" \ -F "author=John Doe" \ -F "expirationDate=2025-12-31"
- Form-data:
-
Response:
- Success:
{ "fileId": "unique-file-id" } - Error:
{ "error": "File upload failed" }
- Success:
- Endpoint:
GET /api/metadata/:fileId - Response:
- Success:
{ "expirationDate": "2025-12-31", "uploadDate": "2025-01-10T21:14:49.273Z", "metadata": { "pages": 2, "textPreview": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ullamcorper ante eget nulla sagittis blandit. Cras commodo elit a justo ultrices mollis. Duis tempor dictum nisi, ac lobortis ex aliquam id. Nunc sollicit", "fileSize": 151112, "fileType": "application/pdf" }, "fileName": "LoremIpsumDolor.pdf", "fileId": "13af05cd-5143-4e67-a0dc-763e5c6e972e", "author": "John Doe" } - Error:
{ "error": "Metadata not found" }
- Success:
To run all tests:
npm test- Ensure that Terraform has been applied before running integration tests, as these require the AWS infrastructure to be active.
- Unit tests will work independently of the AWS infrastructure.
-
Two-Step Terraform Workflow:
- The initial deployment requires manual steps for uploading the Lambda zip file and completing the configuration.
-
Lambda Redeployment:
- Subsequent updates to the Lambda code require manual redeployment through the AWS Console.
-
Enhanced Error Handling:
- Improve logging and retry mechanisms for better fault tolerance when interacting with AWS services.
-
CI/CD Integration:
- Automate the packaging, uploading, and redeployment of the Lambda function using CI/CD pipelines.
-
Scalability:
- Optimize the metadata extraction process to handle larger file sizes and more complex file types.