Skip to content

Invoice Service

Purpose

The Invoice Service automates the end-to-end process of invoice management. It acts as a bridge between external email providers (Gmail) and internal data storage. Its primary goal is to ensure that every invoice received via email is automatically downloaded, parsed, stored, and logged in a central Google Sheet for accounting purposes.

High-Level Architecture

This service runs as a Docker container (service-invoice) and exposes an HTTP API on port 8002.

  • Upstream: Triggered by the Frontend or scheduled tasks.
  • Downstream:
  • Gmail API: For searching and retrieving emails.
  • Google Sheets API: For updating the accounting ledger.
  • Postgres: Stores invoice metadata and processing status.
  • Parfumdreams: A specific external vendor integration for downloading invoices directly from their portal.
  • Siblings: Shares the backend_logging module with other services.

Database Schema

The service manages several key entities in the enterprisedb database:

GoogleCredentials (google_credentials table)

Stores OAuth2 tokens and client configuration for Google APIs.

  • Key Fields: client_id, client_secret, refresh_token, scopes, active.
  • Purpose: Allows the service to act on behalf of a user to read emails and edit sheets.

SheetSettings (sheet_settings table)

Configures how data is mapped to the target Google Sheet.

  • Key Fields: spreadsheet_id, sheet_id.
  • Column Mappings:
  • date_column (e.g., "A")
  • order_place_column (e.g., "C")
  • invoice_number_column (e.g., "J")
  • confirmation_column (e.g., "D")

Invoice (invoices table)

Represents a processed invoice.

  • Key Fields: invoice_number, date, amount, sender, file_path.

Application Structure

  • app.py: Entry point. Initializes FastAPI, CORS, and includes routers.
  • routers/:
  • gmail_router.py: Manages Gmail settings and search configurations.
  • gmail_utils_router.py: Handles the heavy lifting of downloading and processing emails.
  • google_router.py: Handles OAuth2 authentication flow.
  • sheets_router.py: Manages Google Sheets interactions.
  • Services/:
  • gmail.py: Core logic for interacting with the Gmail API.
  • invoice_manager.py: Database abstraction for invoice CRUD operations.
  • Parfumdreams/: Specialized module for scraping invoices from the Parfumdreams portal using Selenium.

FastAPI Routers

1. Gmail Integration (/invoice/gmail)

  • GET /settings: Retrieves configured search queries and folder mappings.
  • POST /util/download/by_label: Triggers the batch download of invoices for a specific year/month.

2. Google Auth (/invoice/google)

  • GET /login: Initiates the OAuth2 flow.
  • GET /oauth/callback: Handles the redirect from Google and exchanges the code for tokens.

3. Sheets Integration (/invoice/sheets)

  • POST /sync: Forces a synchronization between the local database and the Google Sheet.

Key Workflows

1. Invoice Download & Processing

This is the core loop of the service. It searches Gmail for specific labels (e.g., "rechnungen-2024"), downloads attachments, and parses metadata.

graph TD User["User or Scheduler"] -->|Trigger Download| API["FastAPI Endpoint"] API -->|Auth Check| GmailMgr["Gmail Manager"] GmailMgr -->|Search Emails| GmailAPI["Gmail API"] GmailAPI -->|Return Messages| GmailMgr GmailMgr -->|Loop Messages| Processor["Email Processor"] Processor -->|Extract Attachments| PDF["PDF Files"] Processor -->|Parse Metadata| Meta["Metadata (Sender, Date, Amount)"] Meta -->|Save| DB[("Postgres")] PDF -->|Save| FS["File System"]

2. Special Case: Parfumdreams

For the vendor "Parfumdreams", invoices are not always attached to emails. The system uses a specialized scraper to log in to their portal and download the invoice.

graph TD System -->|Identify Parfumdreams Email| Logic[Special Case Logic] Logic -->|Init Selenium| Scraper[Parfumdreams Manager] Scraper -->|Login| Portal[Parfumdreams Website] Portal -->|Navigate to Orders| Scraper Scraper -->|Download PDF| FS[File System] FS -->|Update Record| DB[(Postgres)]

Technical Details

Metadata Sanitization

The service implements a robust sanitization layer (sanitize_metadata) to handle inconsistent data formats from email headers.

  • Regex: [A-Za-z0-9_.+-]+@[A-Za-z0-9-]+\.[A-Za-z0-9-.]+
  • Logic: Recursively searches dictionaries, lists, and strings to find valid email addresses, discarding garbage characters often found in raw MIME headers.

Selenium Grid Fallback

The Parfumdreams_Manager connects to a remote Selenium Grid container (selenium:4444).

  • Session Management: It includes a cleanup_all_selenium_sessions method that queries the Grid's status API and kills orphaned sessions to prevent resource exhaustion.
  • Browser: Uses Chrome in headless mode.

Month-Based Batching

To avoid hitting API rate limits or memory overflows, the download process is chunked by month (download_all_invoices_via_gmail). This allows for granular control over historical data imports.