update instructions for TER, OGC, Performance Fees

This commit is contained in:
Blade He 2024-12-02 11:45:19 -06:00
parent 276ff93a1d
commit 352886ade2
4 changed files with 84 additions and 2 deletions

View File

@ -35,6 +35,7 @@
"Ratio de gastos totales" "Ratio de gastos totales"
], ],
"german": [ "german": [
"TER inkl. Performance-Fee in % **)",
"Gesamtgebühren", "Gesamtgebühren",
"Kostenpauschale", "Kostenpauschale",
"Gesamtkostenquote", "Gesamtkostenquote",
@ -242,6 +243,7 @@
"Gastos corrientes en porcentaje 3)" "Gastos corrientes en porcentaje 3)"
], ],
"german": [ "german": [
"Ongoing Charges inkl.Performance-Fee in % **)",
"Laufende Kosten", "Laufende Kosten",
"Laufende Kosten in Prozent", "Laufende Kosten in Prozent",
"Laufende Kosten 1)", "Laufende Kosten 1)",

View File

@ -68,12 +68,19 @@
"- With \"TER with Performance Fee\" and \"Fund TER\", pick up the values from \"TER with Performance Fee\".", "- With \"TER with Performance Fee\" and \"Fund TER\", pick up the values from \"TER with Performance Fee\".",
"- With \"TER (en %) (with performance)\" and \"TER(en %) (without performance)\", pick up the values from \"TER (en %) (with performance)\".", "- With \"TER (en %) (with performance)\" and \"TER(en %) (without performance)\", pick up the values from \"TER (en %) (with performance)\".",
"- With \"TER including Performance Fee\" and \"TER excluding Performance Fee\", pick up the values from \"TER including Performance Fee\".", "- With \"TER including Performance Fee\" and \"TER excluding Performance Fee\", pick up the values from \"TER including Performance Fee\".",
"- With \"TER inkl. Performance-Fee in % **)\" and \"TER exkl. Performance-Fee in % **)\", pick up the values from \"TER inkl. Performance-Fee in % **)\".",
"- With \"TER inkl. Performance-Fee in % **)\" and \"TER inkl. Performance-Fee in % (inkl. Zielfonds)\", pick up the values from \"TER inkl. Performance-Fee in % **)\".",
"- With both of \"Synthetic TER\" and \"Fund TER\", if \"Synthetic TER\" with value(s), pick up the value(s) from \"Synthetic TER\", otherwise, pick up the value(s) from \"Fund TER\".", "- With both of \"Synthetic TER\" and \"Fund TER\", if \"Synthetic TER\" with value(s), pick up the value(s) from \"Synthetic TER\", otherwise, pick up the value(s) from \"Fund TER\".",
"- With both of \"Net TER\" and \"Capped Expense Ratio\", the priority is \"Net TER\", please exclude the column: \"Capped Expense Ratio\", only pick up the values from \"Net TER\".", "- With both of \"Net TER\" and \"Capped Expense Ratio\", the priority is \"Net TER\", please exclude the column: \"Capped Expense Ratio\", only pick up the values from \"Net TER\".",
"- With \"Gross TER\", \"Waiver\", \"Net TER\", \"Capped Expense Ratio\" as column titles, pick up the values from \"Net TER\".", "- With \"Gross TER\", \"Waiver\", \"Net TER\", \"Capped Expense Ratio\" as column titles, pick up the values from \"Net TER\".",
"- If exist Gross TER as column title, please ignore this title", "- If exist Gross TER as column title, please ignore this title",
"Please ignore TER values which with the exception of performance fees or excluded performance fees." "Please ignore TER values which with the exception of performance fees or excluded performance fees."
], ],
"ogc": [
"If there are multiple OGC value columns, here is the priority rules:",
"- With \"Ongoing Charges inkl. Performance-Fee in % **)\" and \"Ongoing Charges exkl. Performance-Fee in % **)\", pick up the values from \"Ongoing Charges inkl. Performance-Fee in % **)\".",
"- With \"Ongoing Charges inkl. Performance-Fee in % **)\" and \"Ongoing Charges inkl. Performance-Fee in % (inkl. Zielfonds)\", pick up the values from \"Ongoing Charges inkl. Performance-Fee in % **)\"."
],
"performance_fee": [ "performance_fee": [
"The performance fees should not be the presence of the rates at which the performance fees are calculated." "The performance fees should not be the presence of the rates at which the performance fees are calculated."
] ]
@ -162,12 +169,20 @@
"The performance fees should be:", "The performance fees should be:",
"TER including performance fees - TER excluding performance fees.", "TER including performance fees - TER excluding performance fees.",
"Here is the example:", "Here is the example:",
"Example 1:",
"-----Example Start-----", "-----Example Start-----",
"GAMAX FUNDS FCP\\nClass\\nTER (excluding Performance Fees)\\nTER (including Performance Fees)\\nGAMAX FUNDS - ASIA PACIFIC\\nA\\n2.07%\\n2.07%\\n", "GAMAX FUNDS FCP\\nClass\\nTER (excluding Performance Fees)\\nTER (including Performance Fees)\\nGAMAX FUNDS - ASIA PACIFIC\\nA\\n2.07%\\n2.07%\\n",
"-----Example End-----", "-----Example End-----",
"The output should be:", "The output should be:",
"{\"data\": [{\"fund name\": \"GAMAX FUNDS - ASIA PACIFIC\", \"share name\": \"A\", \"ter\": 2.07, \"performance_fee\": 0}]}", "{\"data\": [{\"fund name\": \"GAMAX FUNDS - ASIA PACIFIC\", \"share name\": \"A\", \"ter\": 2.07, \"performance_fee\": 0}]}",
"The performance fees value is TER (including Performance Fees) - TER (excluding Performance Fees) = 2.07 - 2.07 = 0", "The performance fees value is TER (including Performance Fees) - TER (excluding Performance Fees) = 2.07 - 2.07 = 0",
"Example 2:",
"-----Example Start-----",
"D/S Strategie ausgewogen\\nErgänzende Angaben für Anleger in der Schweiz zum 31. Dezember 2020 (ungeprüft)\\nFonds\\nTER exkl. \\nPerformance-Fee in % **)\\nTER inkl. \\nPerformance-Fee in % **)\\nTER inkl. \\nPerformance-Fee in % (inkl. Zielfonds)\\n1,15\\n1,63\\n1,15\\n1,63\\nTER exkl.\\nPerformance-Fee in % (inkl. Zielfonds)",
"-----Example End-----",
"The output should be:",
"{\"data\": [{\"fund name\": \"D/S Strategie ausgewogen\", \"ter\": 1.15, \"performance_fee\": 0}]}",
"The performance fees value is TER inkl. Performance-Fee in % **) - TER exkl. Performance-Fee in % **) = 1,15 - 1,15 = 0",
"Case 2:", "Case 2:",
"If some table is with three value columns: \"TER including performance fees\", \"TER excluding performance fees\", \"Performance fees\", ", "If some table is with three value columns: \"TER including performance fees\", \"TER excluding performance fees\", \"Performance fees\", ",
"The Performance fees value in column: Performance fees, chould be \"-\", because of \"TER including performance fees\" - \"TER excluding performance fees\" = 0, ", "The Performance fees value in column: Performance fees, chould be \"-\", because of \"TER including performance fees\" - \"TER excluding performance fees\" = 0, ",
@ -184,6 +199,30 @@
"The output should be:", "The output should be:",
"{\"data\": [{\"fund name\": \"Pictet Corto Europe Long Short\", \"share name\": \"P EUR\", \"ter\": 1.98, \"performance_fee\": 0}]}" "{\"data\": [{\"fund name\": \"Pictet Corto Europe Long Short\", \"share name\": \"P EUR\", \"ter\": 1.98, \"performance_fee\": 0}]}"
] ]
},
{
"title": "Performance fees is part of OGC:",
"contents": [
"If exist both of \"Ongoing Charges including Performance Fee\" and \"Ongoing Charges excluding Performance Fee\",",
"The OGC should be \"Ongoing Charges including Performance Fee\".",
"The performance fees should be:",
"Ongoing Charges including Performance Fee - Ongoing Charges excluding Performance Fee.",
"Here is the example:",
"Example 1:",
"-----Example Start-----",
"GAMAX FUNDS FCP\\nClass\\Ongoing Charges (excluding Performance Fees)\\Ongoing Charges (including Performance Fees)\\nFund 1\\nShare A\\n1.50%\\n1.58%\\n",
"-----Example End-----",
"The output should be:",
"{\"data\": [{\"fund name\": \"Fund 1\", \"share name\": \"Share 1\", \"ogc\": 1.58, \"performance_fee\": 0.08}]}",
"The performance fees value is Ongoing Charges including Performance Fee - Ongoing Charges excluding Performance Fee = 1.58 - 1.50 = 0.08",
"Example 2:",
"-----Example Start-----",
"Fund1\\nOngoing Charges exkl. \\nPerformance-Fee in % **)\\nOngoing Charges exkl. \\nPerformance-Fee in % (inkl. Zielfonds)\\nOngoing Charges inkl. \\nPerformance-Fee in % **)\\nOngoing Charges inkl. \\nPerformance-Fee in % (inkl. Zielfonds)\\n1,15\\n1,35\\n1,20\\n1,35\\n",
"-----Example End-----",
"The output should be:",
"{\"data\": [{\"fund name\": \"Fund1\", \"ogc\": 1.20, \"performance_fee\": 0.05}]}",
"The performance fees value is Ongoing Charges inkl. Performance-Fee in % **) - Ongoing Charges exkl. Performance-Fee in % **) = 1.20 - 1.15 = 0.05"
]
} }
] ]
}, },

View File

@ -1194,12 +1194,12 @@ if __name__ == "__main__":
"534535767" "534535767"
] ]
special_doc_id_list = check_db_mapping_doc_id_list special_doc_id_list = check_db_mapping_doc_id_list
# special_doc_id_list = ["334584772"] special_doc_id_list = ["334584772"]
output_mapping_child_folder = r"/data/emea_ar/output/mapping_data/docs/" output_mapping_child_folder = r"/data/emea_ar/output/mapping_data/docs/"
output_mapping_total_folder = r"/data/emea_ar/output/mapping_data/total/" output_mapping_total_folder = r"/data/emea_ar/output/mapping_data/total/"
re_run_extract_data = True re_run_extract_data = True
re_run_mapping_data = True re_run_mapping_data = True
force_save_total_data = True force_save_total_data = False
calculate_metrics = False calculate_metrics = False
extract_ways = ["text"] extract_ways = ["text"]

View File

@ -572,6 +572,47 @@
"list(document_mapping[\"ShareClassName\"].unique())" "list(document_mapping[\"ShareClassName\"].unique())"
] ]
}, },
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pymupdf4llm"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processing ./data/emea_ar/pdf/501380553.pdf...\n",
"[ ] (0/47[ ] ( 1/47[= ] ( 2/4[== ] ( 3/47[=== ] ( 4/4[==== ] ( 5/47[===== ] ( 6/47[===== ] ( 7/4[====== ] ( 8/47[======= ] ( 9/4[======== ] (10/47[========= ] (11/4[========== ] (12/47[=========== ] (13/47[=========== ] (14/4[============ ] (15/47[============= ] (16/4[============== ] (17/47[=============== ] (18/4[================ ] (19/47[================= ] (20/47[================= ] (21/4[================== ] (22/47[=================== ] (23/4[==================== ] (24/47[===================== ] (25/4[====================== ] (26/4[====================== ] (27/47[======================= ] (28/4[======================== ] (29/47[========================= ] (30/4[========================== ] (31/47[=========================== ] (32/4[============================ ] (33/4[============================ ] (34/47[============================= ] (35/4[============================== ] (36/47[=============================== ] (37/4[================================ ] (38/47[================================= ] (39/4[================================== ] (40/4[================================== ] (41/47[=================================== ] (42/4[==================================== ] (43/47[===================================== ] (44/4[====================================== ] (45/47[======================================= ] (46/47[========================================] (47/47]\n"
]
},
{
"data": {
"text/plain": [
"107851"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"md_text = pymupdf4llm.to_markdown(r\"./data/emea_ar/pdf/501380553.pdf\")\n",
"\n",
"# now work with the markdown text, e.g. store as a UTF8-encoded file\n",
"import pathlib\n",
"pathlib.Path(r\"./data/emea_ar/output/markdown/501380553.md\").write_bytes(md_text.encode())"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,