Creating jobs is at the heart of marvin.If you have not previously set up projects within Google Cloud Platform, you will need to first create new projects to which your jobs then can be associated. Please check out the marvin. requirements page. This page will guide you through the process of setting up projects within Google.
Get setup in 6 easy steps
Start by adding a Google project
Manage Files or Google Cloud Storage buckets, upload data
Manage Database or Google BigQuery datasets and create tables with ease
Schedules (jobs) - marvin.’s automated ingestion engine for Google BigQuery
Explore other cool options within marvin like predictions, jobs history, queries, etc.
STEP 1 - START FROM PROJECTS
Depending on whether you login with a Gmail account or not, you will be able to add projects from Google Cloud Platform in the first screen. This page allows you to manage the projects that you want. Only projects added to this list will be able to work within marvin. Note: project usage (costs) are billed separate by Google unless you have a full service agreement with Bimotics (contact sales for details).
1A. Authenticate with an account that has Google access to the projects.
1B. Select the projects you want managed by marvin.. You can add or remove projects from this list. Please note, removing a project from this tool does not remove the project from Google. Selecting a project will allow other non-Google users to manage jobs or assets on your behalf. Only account owners have access to add or remove projects.
STEP 2 - MANAGE FILES OR GOOGLE CLOUD STORAGE BUCKETS
This task is optional, it allows you to review the buckets and files within the marvin environment and preview the files to be imported or ingested into BigQuery.
2A. Buckets: marvin. allows you to import files and move them into separate buckets in order to keep track of processed files vs. files in queue. Google allows you to configure Durable Reduce Availability (DRA) buckets for files that were already used by marvin. DRA storage is appropriate for applications that are particularly cost-sensitive, or for which some unavailability is acceptable. If you want to leverage DRA, create a bucket which we refer to as "history".
2B. Create and delete buckets. In addition, you can upload multiple files into a bucket or upload a folder into a bucket. Uploading a folder will create the same structure of folders within the bucket in Google Cloud Storage. You also have the option to compress files uploaded to the buckets. Simply the “Do you want to Gzip uploading files?" option and every file uploaded will be compressed automatically.
2C. Preview files. This preview only shows the first section of the file, but allows you to see the content of the files. Note this only works for text, json or csv files. This preview does NOT support modifying or changing the content of the data in the files.
STEP 3 - MANAGE DATABASE OR GOOGLE BIGQUERY DATASETS AND TABLES
This task is optional, but lets you review the datasets and tables within the marvin. environment and preview the tables, schemas and even simple queries on BigQuery.
3A. Tables: Create simple tables with no data, with a simple screen. These tables can be used to import the data loaded into the buckets. This feature allows you to delete tables as well.
3B. Datasets: View datasets, and the tables within each dataset. Also it allows you to see the schema, and information related to the tables. In addition, marvin. allows you to delete datasets as long as the dataset is empty of tables or views.
3C. Data Preview: Preview the data within the tables and or datasets. It displays only the first few records.
3D. Data Load: Allows you to load data from Google Cloud Storage in two ways. One way is a fully automated approach wherein multiple files get ingested via jobs (see Jobs section). The second way is via this feature that allows a single file from Google Cloud Storage to be ingested into an existing table within a dataset. This feature requires you to pull the data from the same project as the table it is associated with.
STEP 4 - SCHEDULES (JOBS)- MARVIN.’S AUTOMATED INGESTION ENGINE FOR GOOGLE BIGQUERY
Automate the ingestion process of Google BigQuery from files in Google Cloud Storage. marvin. jobs are based on the premise that a customer has a project in Google Cloud Storage (see Systems Requirements in order to set up projects within Google) and that those projects have Google Cloud Storage and Google BigQuery APIs enabled and that billing is properly configured.
Select a source bucket where the files will be available for ingestion. Select a wildcard of the filename to be ingested. This allows you to have multiple files like customers, orders, products, leads, etc. in the same bucket and ready for ingestion. Then, select a dataset and a table as a destination. The data could be appended or replaced by the job.
Schemas are not required as the jobs use the table schema (the files in the buckets should match the schema on the tables, unless the format is json). marvin. runs the ingestion pipelines every 5 minutes. marvin. will then seek out new files based on your chosen parameters. If there are new files, marvin. will mark them for ingestion. marvin. prevents importing the same filename as a precaution to avoid duplicates. marvin. keeps a record of all the previously imported files and flags them as imported. marvin. takes all the new files and imports them in parallel. Once the files are imported, marvin. lets you move those files into a different bucket, for example history files. marvin.’s jobs are per table basis. This means many files can go into one table. If you want to import data into multiple tables, you need to set up one job per table. All the limits are managed by Google. marvin. leverages the Google BigQuery API to perform the ingestions. Please review the detailed setup job section for a deep dive on each feature.
NOTE: marvin. requires a monthly subscription to leverage this service. Look at the plans for more information. Manage the subscription, payments and history within marvin.
4A. Jobs: Create, edit and delete jobs from a single screen. This feature provides high level information about the jobs including the last time it was executed.
4B. Creating jobs in marvin. is easy, via a simple wizard. The process itemized below guides you through the elements required to configure an automated job.
Job Title - Select a unique name to easily reference your jobs
Select Project - Select one of the projects
Select Cloud Storage Bucket - Select the bucket where the files to be ingested are stored
File Wildcard - Select a unique way to identify a set of files to be ingested
Select BigQuery Dataset - Select the destination dataset
Select Write Disposition - Select either append or replace
Select BigQuery Table - Select the destination table based on the dataset previously selected
Format - Select the file format- CSV or JSON. The file can be gzip
Skip Leading Rows - Select number of records to skip if the files are CSV format
Frequency in Minutes - Select marvin. run time intervals. Remember marvin. will look around the clock to see if there are new files based on these intervals.
Number of Retries - How many times BigQuery should retry to pull this file, before failing the job.
Delimiter - In the case of CSV files, select a delimiter (usually a comma).
History Folder - Once the ingestion is complete select a bucket to which processed/ ingested files are moved. It is a good practice to create a separate bucket for new files vs. ingested files. Also consider setting up the history bucket as a DRA in order to reduce storage cost.
4C. Editing jobs in marvin. is easy. This feature shows all the elements of the job and allows users to configure a few elements of the job like frequency, rows to skip, format, etc. If the buckets or tables need to be modified, please delete the job and recreate a new one.
STEP 5 - REVIEW JOBS HISTORY AND STATUS
This task is optional. It allows you to review the historical status of the jobs and execution against Google BigQuery. This includes, jobs, queries, errors etc. Just select a project and all the messages will be listed on this page.
5A. History Preview: Review the details of the jobs under the project selected. Just select a single job, and this page will provide you with detailed information related to the job.
STEP 6 - MANAGE USERS
This task is optional,but allows you to add additional users- either Google accounts or non Google accounts. Additional users have limited capabilities in terms of access to setting up projects or jobs. Users are able to leverage buckets for importing files, databases for data review, and history to stay informed. Only account owners (customers) have the capability to configure jobs. This is because it requires a subscription plan.
6A. Adding or removing users is easy in marvin. With a simple form, users will receive an invitation by email and confirm their access.