Invoice Service¶
Purpose¶
The Invoice Service automates the end-to-end process of invoice management. It acts as a bridge between external email providers (Gmail) and internal data storage. Its primary goal is to ensure that every invoice received via email is automatically downloaded, parsed, stored, and logged in a central Google Sheet for accounting purposes.
High-Level Architecture¶
This service runs as a Docker container (service-invoice) and exposes an HTTP API on port 8002.
- Upstream: Triggered by the Frontend or scheduled tasks.
- Downstream:
- Gmail API: For searching and retrieving emails.
- Google Sheets API: For updating the accounting ledger.
- Postgres: Stores invoice metadata and processing status.
- Parfumdreams: A specific external vendor integration for downloading invoices directly from their portal.
- Siblings: Shares the
backend_loggingmodule with other services.
Database Schema¶
The service manages several key entities in the enterprisedb database:
GoogleCredentials (google_credentials table)¶
Stores OAuth2 tokens and client configuration for Google APIs.
- Key Fields:
client_id,client_secret,refresh_token,scopes,active. - Purpose: Allows the service to act on behalf of a user to read emails and edit sheets.
SheetSettings (sheet_settings table)¶
Configures how data is mapped to the target Google Sheet.
- Key Fields:
spreadsheet_id,sheet_id. - Column Mappings:
date_column(e.g., "A")order_place_column(e.g., "C")invoice_number_column(e.g., "J")confirmation_column(e.g., "D")
Invoice (invoices table)¶
Represents a processed invoice.
- Key Fields:
invoice_number,date,amount,sender,file_path.
Application Structure¶
app.py: Entry point. Initializes FastAPI, CORS, and includes routers.routers/:gmail_router.py: Manages Gmail settings and search configurations.gmail_utils_router.py: Handles the heavy lifting of downloading and processing emails.google_router.py: Handles OAuth2 authentication flow.sheets_router.py: Manages Google Sheets interactions.Services/:gmail.py: Core logic for interacting with the Gmail API.invoice_manager.py: Database abstraction for invoice CRUD operations.Parfumdreams/: Specialized module for scraping invoices from the Parfumdreams portal using Selenium.
FastAPI Routers¶
1. Gmail Integration (/invoice/gmail)¶
GET /settings: Retrieves configured search queries and folder mappings.POST /util/download/by_label: Triggers the batch download of invoices for a specific year/month.
2. Google Auth (/invoice/google)¶
GET /login: Initiates the OAuth2 flow.GET /oauth/callback: Handles the redirect from Google and exchanges the code for tokens.
3. Sheets Integration (/invoice/sheets)¶
POST /sync: Forces a synchronization between the local database and the Google Sheet.
Key Workflows¶
1. Invoice Download & Processing¶
This is the core loop of the service. It searches Gmail for specific labels (e.g., "rechnungen-2024"), downloads attachments, and parses metadata.
2. Special Case: Parfumdreams¶
For the vendor "Parfumdreams", invoices are not always attached to emails. The system uses a specialized scraper to log in to their portal and download the invoice.
Technical Details¶
Metadata Sanitization¶
The service implements a robust sanitization layer (sanitize_metadata) to handle inconsistent data formats from email headers.
- Regex:
[A-Za-z0-9_.+-]+@[A-Za-z0-9-]+\.[A-Za-z0-9-.]+ - Logic: Recursively searches dictionaries, lists, and strings to find valid email addresses, discarding garbage characters often found in raw MIME headers.
Selenium Grid Fallback¶
The Parfumdreams_Manager connects to a remote Selenium Grid container (selenium:4444).
- Session Management: It includes a
cleanup_all_selenium_sessionsmethod that queries the Grid's status API and kills orphaned sessions to prevent resource exhaustion. - Browser: Uses Chrome in headless mode.
Month-Based Batching¶
To avoid hitting API rate limits or memory overflows, the download process is chunked by month (download_all_invoices_via_gmail). This allows for granular control over historical data imports.