Project Brief

Project description for the Hackathon project

Tour Boat

A Data Lake Explorer web app and management toolset

Hackathon participants will create and populate a small (10-100MB) data lake in the cloud with flat files and/or semi-structured data (files should be KB/MB in scale). They will then create tools and user interfaces for exploring, manipulating and visualizing the data.

Core Features:

  • A web app facilitating the exploration of the files/folders in the data lake

  • Tools for managing the data, e.g. extracting from original sources and storing the data in a useful format (CSV, JSON, etc.)

Potential Features:

Additional features and capabilities of the finished product will be dependent upon the interest and capabilities of the of the participants, but could include:

  • Map spatial data from the data lake using MapBox

  • Allow uploads into data lake via UI using drag-drop

  • Implement authentication for web app using Auth0

  • Automate data processing using S3+Lambda (trigger code when a file is dropped into an S3 bucket to be processed)

  • View file contents from data lake using Ag-grid

  • Visualize file contents from data lake using D3

  • Build backend processing tools to convert custom data (e.g. log files) to CSV for inclusion in data lake

  • Allow SQL processing of files from data lake using AlaSQL

  • Manage cloud resources as code using Serverless.com

Project rules:

  • All code and required resources will be managed and stored in a GitHub repo; ability to use basic git functionality is required

  • Basic documentation and instructions will be produced by the participants for all code and tools they create. It will be in markdown files stored in the code repository, and should be sufficient to understand and maintain everything produced. Any diagrams or illustrations should be produced using draw.io and stored in the repo as .PNG files (which can be easily edited later) and linked in the markdown files.

Last updated