Commit Graph

90 Commits

Author SHA1 Message Date
Blade He d79b05885d optimize prompts for TOR 2024-12-06 14:50:34 -06:00
Blade He a25991e2bb 1. Set TOR reported name priority
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He 95c386911c Clean fund name after getting response from ChatGPT 2024-12-04 22:08:09 -06:00
Blade He 70362b554f Fix issue for "The last fund name of previous PDF page" logic:
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He 36fbaa946e Add the statement when transferring the last fund name of previous PDF page:
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement.
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He bc32860f87 remove_abundant_data 2024-12-02 17:16:56 -06:00
Blade He c146497052 optimize share feature judgment logic:
accumulation with capitalisation and institutional
income with distribution

Document: 337293427
2024-12-02 13:11:49 -06:00
Blade He 352886ade2 update instructions for TER, OGC, Performance Fees 2024-12-02 11:45:19 -06:00
Blade He 276ff93a1d Optimize drilldown algorithm
Share class names with currency
Reason
The currency in document not next to share name
Solution
If can't get relevant text from PDF page contents, and the last word of share class name belongs to currency, remove currency from share class name, then try again.
After implementing this solution, recall is from 95% to 96%
Can't find relevant text from current PDF page text
Reason
Hence apply try to merge previous page text into current page, perhaps the text is from previous page text.
Solution
Try to get previous page and search relevant value.
After implementing this solution, recall is from 96% to 98%.
2024-11-26 16:35:07 -06:00
Blade He a09778d9d1 Create EMEA AR API code file.
Optimize annotation list for drilldown.
2024-11-26 11:24:29 -06:00
Blade He fb356fce76 1. optimize drilldown algorithm
2. support calculate drilldown recall metrics
2024-11-25 15:11:03 -06:00
Blade He 78fb283130 update python libraries 2024-11-25 11:11:02 -06:00
Blade He fc80093557 optimize investment mapping 2024-11-22 14:54:52 -06:00
Blade He f1c0290588 Optimize investment mapping algorithm.
1. Get proper currency if exist multiple currencies in share name, e.g. CHF EUR
2. Default currency should be based on scenario: USD or EUR.
3. Remove special chars should be based on \W, instead of [^a-zA-Z0-9\s]
2024-11-21 16:36:58 -06:00
Blade He 5b9f9416de 1. Update for mapping multilingual share class names.
2. Optimize getting currency logic
2024-11-21 11:37:58 -06:00
Blade He 843bbbd13f dynamic loading instructions for multilingual. 2024-11-20 17:00:22 -06:00
Blade He 067d89e0f9 Add datapoint_reportedname.json for dynamic loading reported names based on document language. 2024-11-19 16:49:15 -06:00
Blade He 8223ca9a5c a little change 2024-11-18 16:13:24 -06:00
Blade He a42c0b5c2b optimize retrieve fund instructions 2024-11-13 10:25:08 -06:00
Blade He 7a41b03634 1. optimize instructions for fund name
2. optimize drilldown logic
2024-11-12 17:01:10 -06:00
Blade He c2d2e54670 "total match" logic for single word value, need consider the "\n" char scenario 2024-11-12 11:40:19 -06:00
Blade He 5b67bd332b optimize drilldown algorithm 2024-11-12 11:20:38 -06:00
Blade He c6c3e99d3e integrate pdf drilldown logic to pdf_util.py 2024-11-11 16:34:25 -06:00
Blade He c34e2e960e optimize drilldown algorithm 2024-11-08 15:00:34 -06:00
Blade He 81f855f725 support drilldown data to PDF 2024-11-08 11:22:35 -06:00
Blade He 0349033eaf update for more statistics methods 2024-11-06 16:39:42 -06:00
Blade He 81a424b00d Support replaces share class name in database to be more readable.
Examples document 532422720
M&G European Credit Investment Fund A CHFH Acc -> M&G European Credit Investment Fund A CHF H Accumulation

M&G European Credit Investment Fund A CHFHInc -> M&G European Credit Investment Fund A CHF H Income

M&G European High Yield Credit Investment Fund E GBPHedgedAcc -> M&G European High Yield Credit Investment Fund E GBP Hedged Accumulation
2024-11-05 11:14:56 -06:00
Blade He 2645d528b1 support output data point reported name 2024-10-29 16:47:45 -05:00
Blade He 9d453c9fae a little updates 2024-10-28 15:15:55 -05:00
Blade He fa763f4f14 1. optimize instructions
2. optimize mapping algorithm
2024-10-24 16:24:21 -05:00
Blade He 53dadf61f4 optimize keywords/ instructions for special cases documents. 2024-10-23 16:56:43 -05:00
Blade He 171f3b6d1f optimize for OGC data extraction. 2024-10-23 16:07:54 -05:00
Blade He 03365227b9 optimize instructions 2024-10-21 11:04:53 -05:00
Blade He 3f2bb38208 Resolve issue first records only with share class name but without fund name (in previous page text). 2024-10-16 16:55:32 -05:00
Blade He f166e73362 optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT 2024-10-15 15:57:54 -05:00
Blade He 8b651f374c optimize instructions 2024-10-14 09:12:05 -05:00
Blade He df66489c5f support this scenario: fund and share are with same name. 2024-10-11 13:14:04 -05:00
Blade He 92a26cd262 optimize configuration 2024-10-11 12:16:34 -05:00
Blade He 17284c74f0 optimize for investment mapping: share feature logic 2024-10-09 14:07:07 -05:00
Blade He 04a2409c58 optimize investment mapping algorithm 2024-10-08 23:53:55 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He 8bd6008425 refactor code 2024-10-07 10:34:13 -05:00
Blade He b18c48efeb A little change 2024-10-03 16:31:16 -05:00
Blade He f0dd7f9e89 Consider multiple share short names cases. 2024-10-02 17:25:25 -05:00
Blade He edb90c718e Optimize mapping algorithm
Consider some share class names are with multiple short name, e.g.
CPR Invest Global Disruptive Opportunities Class I sw EUR - Acc
The short names are I and sw
The purpose is to support get all of short names from share class name.
2024-10-02 15:08:26 -05:00
Blade He 3bb13947af Optimize mapping algorithm:
For multiple currencies in fund/ share name, if exist USD, remove it
Fix the issue for split words without space
If there is no currency in share class name, try to get same currency from document mapping which with same fund name and same short share class name.
2024-10-02 13:25:08 -05:00
Blade He f06355e0c8 optimize mapping algorithm: check whether exist "-" to connect share names 2024-10-02 11:38:11 -05:00
Blade He 035f028155 optimize mapping algorithm 2024-10-01 16:46:59 -05:00
Blade He 3adbd7631a optimize mapping algorithm 2024-10-01 15:31:15 -05:00