Condensing some information so that others can benefit from my experience with importing CSVs into GnuCash.
Having several banks means some difficulty in getting a proper overview of your finances. Hence, I use GnuCash to import all transactions and get a good overview of my finances.
Getting the data into GnuCash is quite tedious though. Banks don’t typically provide API access. So every once in a while, I log on to each bank, one by one, and download
exported transactions. Sometimes, these are provided as CSVs. Sometimes, they are provided as Excel (.xlsx
) files. Sometimes, the UI will say it’s an Excel file but what
you end up downloading is a CSV file.
No matter. Pandas can handle both. Pandas is awesome. You should use it. But I digress.
Recommendations
- Pre-format your data. Do not take your CSV straight from your bank to GnuCash. Massage the data a bit first.
- When possible, use multi-split CSV files. This will save you a lot of time and swearing instead of relying on GnuCash’s bayesian matching algorithm.
- Generate as few CSV files as possible for import into GnuCash. Also a time saver.
Pre-formatting
You should generate two files, one for single transactions, one for multi-split transactions.
Single transactions specify a single account for the transaction, while multi-split transactions specify multiple accounts associated with the transaction.
GnuCash requires the entire file to be multi-split or not multi-split. You can’t mix and match in a single file.
Single transactions
You want to format your data so that it corresponds to how GnuCash wants it. These are the minimum columns that you’ll want to have in your CSV:
- Account, the account name in GnuCash to import the transaction into
- Date, in yyyy-mm-dd format
- Description, the text that describes the transaction
- Amount, the amount of the transaction
These columns will let you smoothly import transactions into GnuCash.
Multi-split transactions
Use these columns to import multi-split transactions:
- TransactionId, a unique identifier for the transaction
- Account, the account name in GnuCash to import the transaction into
- Date, in yyyy-mm-dd format
- Description, the text that describes the transaction
- Amount, the amount of the transaction
- Value, the amount of the transaction in the original currency (I’ll explain why this is important later)
- Commodity/Currency, the currency or commodity of the transaction
- Price/Rate, the exchange rate used for the transaction
Why use multi-split transactions?
Several reasons:
- A certain description is always a certain kind of Expense
- You are transferring money between your own accounts, like between Savings accounts
- You are buying/selling stocks
- Your are convering between currencies are part of the transaction
It all boils down to: you know where the money came from and where it is going. Let’s just get that into the CSV.
Use transactionIds to match multi-split transactions
Multi-split transactions are just several transactions listed in the same file. GnuCash will group these transactions based on them having the same TransactionId
.
So related transactions must share the same TransactionId
.
The simplest case if for two transactions to be related. One is the debit, one is the credit.
But it is easy to extend this to more. Consider a mortgage payment. Three transactions will be involved:
- TX1: Money leaving your bank account
- TX2: Paying the interest
- TX3: Paying down the principal
All three transactions will have the same TransactionId
. The Values (not necessarily amounts) will also sum to zero.
What is Value and what is Amount?
We have to consider a currency exchange for this to make sense.
- TX1: 100 USD leaves account A
- TX2: 90 EUR arrives in account B
In this case the transactions will look like this:
TransactionId | Account | Amount | Value | Commodity | Price |
---|---|---|---|---|---|
123 | Assets:AccountA | -100 | -100 | CURRENCY:USD | 1 |
123 | Assets:AccountB | 90 | 100 | CURRENCY:EUR | 100/90 |
What tripped me up was that the price is always expressed as the source currency divided by the destination currency. Which seems opposite to me…
The Amount
is the amount in the account currency. The Value
is the amount in the “source” currency.
So as you can see, Amount
doesn’t necessarily sum to zero, but Value
should always do so.
Show me some code
I keep it simple, and work with Dicts. Because Pandas can handle lists of dicts to generate a dataframe which it can then write to a CSV.
Given you’ve parsed your raw CSV, you can create a transaction with:
def make_single_transaction(
transaction_id: str,
account: str,
date: str,
description: str,
amount: float,
commodity: str | None = None,
balance: float | None = None,
**kwargs,
) -> dict[str, str | float | None]:
return {
"TransactionId": transaction_id,
"Account": account,
"Date": date,
"Description": description,
"Amount": amount,
"Value": amount,
"Commodity/Currency": commodity,
"Price/Rate": 1,
"Balance": balance,
**kwargs,
}
Similarly for multi-split transactions, not much different:
def make_multisplit_transaction(
transaction_id: str,
account: str,
date: str,
description: str,
amount: float,
value: float,
commodity: str | None,
price: float | None = None,
balance: float | None = None,
) -> dict[str, str | float | None]:
price = price if price is not None else 1
return {
"TransactionId": transaction_id,
"Account": account,
"Date": date,
"Description": description,
"Amount": amount,
"Value": value,
"Commodity/Currency": commodity,
"Price/Rate": 1 if price is None else price,
"Balance": balance,
}
You can generate a transaction ID by combining the account name and all columns in your raw CSV row to get a reasonably unique identifier. I like to generate UUIDs from this because it looks nicer.
import uuid
transaction_id = uuid.uuid5(uuid.NAMESPACE_DNS, f"{account}{date}{description}{amount}")
If you have more columns which makes the row more unique, stick em in there. If you have a balance column, that should hopefully change for each transaction.
Now if you have a transaction, and you know where the money is going, you can use this function to generate a matching transaction:
def make_matching_transaction(
transaction: dict[str, str | float | None],
transfer_account: str,
amount: float | None = None,
) -> dict[str, str | float | None]:
return make_multisplit_transaction(
transaction_id=transaction["TransactionId"],
account=transfer_account,
date=transaction["Date"],
description=transaction["Description"],
amount= amount if amount is not None else -transaction["Amount"],
value=amount if amount is not None else -transaction["Value"],
commodity=transaction["Commodity/Currency"],
price=transaction["Price/Rate"],
)
Use it like this:
tx1 = make_single_transaction(
transaction_id=transaction_id,
account="Assets:Bank:Card",
date=date,
description=description,
amount=amount,
)
tx2 = make_matching_transaction(
transaction=tx1,
transfer_account="Expenses:Groceries",
)
And here is a utility function to generate a currency exchange:
def make_currency_exchange_transaction(
transaction_id: str,
src_account: str,
dest_account: str,
date: str,
description: str,
src_amount: float,
dest_amount: float,
src_currency: str,
src_balance: float | None = None,
dest_balance: float | None = None,
) -> list[dict[str, str | float | None]]:
if src_amount > 0:
raise ValueError("src_amount should be negative")
if dest_amount < 0:
raise ValueError("dest_amount should be positive")
src = make_multisplit_transaction(
transaction_id=transaction_id,
account=src_account,
date=date,
description=description,
amount=src_amount,
value=src_amount,
price=1,
commodity=src_currency,
balance=src_balance,
)
dest = make_multisplit_transaction(
transaction_id=transaction_id,
account=dest_account,
date=date,
description=description,
amount=dest_amount,
value=-src_amount,
price=abs(dest_amount) / abs(src_amount),
commodity=src_currency,
balance=dest_balance,
)
return [src, dest]
And writing to file a simple matter of:
import pandas as pd
def write_to_file(self, list_of_txns: list[dict], output_dir: str):
if len(list_of_txns) > 0:
output_file = os.path.join(output_dir, f"gnucash_txns.csv")
with open(output_file, "w") as f:
df = pd.DataFrame(list_of_txns)
df.to_csv(f, index=False)