Blade He
201a809ffa
comment remove_abundant_data function
2025-01-06 15:27:43 -06:00
Blade He
c335992ced
update requirements.txt
2025-01-06 13:56:09 -06:00
Blade He
9348e32caa
support more performance fee keywords
2025-01-06 13:14:20 -06:00
Blade He
65e752e25a
realize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams
2024-12-18 09:19:55 -06:00
Blade He
309bb714f6
fix issue for parsing data via Vision Function.
2024-12-11 16:49:04 -06:00
Blade He
d673a99e21
switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
...
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He
f71e2968cc
simplify code
2024-12-09 22:24:40 -06:00
Blade He
75ea5e70de
1. support fetch data from messy-code page by ChatGPT4o Vision function.
...
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He
d96f77fe00
Split share class names which with multiple share classes in same line
2024-12-06 16:31:42 -06:00
Blade He
d79b05885d
optimize prompts for TOR
2024-12-06 14:50:34 -06:00
Blade He
a25991e2bb
1. Set TOR reported name priority
...
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He
95c386911c
Clean fund name after getting response from ChatGPT
2024-12-04 22:08:09 -06:00
Blade He
70362b554f
Fix issue for "The last fund name of previous PDF page" logic:
...
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He
36fbaa946e
Add the statement when transferring the last fund name of previous PDF page:
...
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He
a11a99fdc3
1. Optimize instructions: not to fetch the data with "up to" statement.
...
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He
bc32860f87
remove_abundant_data
2024-12-02 17:16:56 -06:00
Blade He
c146497052
optimize share feature judgment logic:
...
accumulation with capitalisation and institutional
income with distribution
Document: 337293427
2024-12-02 13:11:49 -06:00
Blade He
352886ade2
update instructions for TER, OGC, Performance Fees
2024-12-02 11:45:19 -06:00
Blade He
276ff93a1d
Optimize drilldown algorithm
...
Share class names with currency
Reason
The currency in document not next to share name
Solution
If can't get relevant text from PDF page contents, and the last word of share class name belongs to currency, remove currency from share class name, then try again.
After implementing this solution, recall is from 95% to 96%
Can't find relevant text from current PDF page text
Reason
Hence apply try to merge previous page text into current page, perhaps the text is from previous page text.
Solution
Try to get previous page and search relevant value.
After implementing this solution, recall is from 96% to 98%.
2024-11-26 16:35:07 -06:00
Blade He
a09778d9d1
Create EMEA AR API code file.
...
Optimize annotation list for drilldown.
2024-11-26 11:24:29 -06:00
Blade He
fb356fce76
1. optimize drilldown algorithm
...
2. support calculate drilldown recall metrics
2024-11-25 15:11:03 -06:00
Blade He
78fb283130
update python libraries
2024-11-25 11:11:02 -06:00
Blade He
fc80093557
optimize investment mapping
2024-11-22 14:54:52 -06:00
Blade He
f1c0290588
Optimize investment mapping algorithm.
...
1. Get proper currency if exist multiple currencies in share name, e.g. CHF EUR
2. Default currency should be based on scenario: USD or EUR.
3. Remove special chars should be based on \W, instead of [^a-zA-Z0-9\s]
2024-11-21 16:36:58 -06:00
Blade He
5b9f9416de
1. Update for mapping multilingual share class names.
...
2. Optimize getting currency logic
2024-11-21 11:37:58 -06:00
Blade He
843bbbd13f
dynamic loading instructions for multilingual.
2024-11-20 17:00:22 -06:00
Blade He
067d89e0f9
Add datapoint_reportedname.json for dynamic loading reported names based on document language.
2024-11-19 16:49:15 -06:00
Blade He
8223ca9a5c
a little change
2024-11-18 16:13:24 -06:00
Blade He
a42c0b5c2b
optimize retrieve fund instructions
2024-11-13 10:25:08 -06:00
Blade He
7a41b03634
1. optimize instructions for fund name
...
2. optimize drilldown logic
2024-11-12 17:01:10 -06:00
Blade He
c2d2e54670
"total match" logic for single word value, need consider the "\n" char scenario
2024-11-12 11:40:19 -06:00
Blade He
5b67bd332b
optimize drilldown algorithm
2024-11-12 11:20:38 -06:00
Blade He
c6c3e99d3e
integrate pdf drilldown logic to pdf_util.py
2024-11-11 16:34:25 -06:00
Blade He
c34e2e960e
optimize drilldown algorithm
2024-11-08 15:00:34 -06:00
Blade He
81f855f725
support drilldown data to PDF
2024-11-08 11:22:35 -06:00
Blade He
0349033eaf
update for more statistics methods
2024-11-06 16:39:42 -06:00
Blade He
81a424b00d
Support replaces share class name in database to be more readable.
...
Examples document 532422720
M&G European Credit Investment Fund A CHFH Acc -> M&G European Credit Investment Fund A CHF H Accumulation
M&G European Credit Investment Fund A CHFHInc -> M&G European Credit Investment Fund A CHF H Income
M&G European High Yield Credit Investment Fund E GBPHedgedAcc -> M&G European High Yield Credit Investment Fund E GBP Hedged Accumulation
2024-11-05 11:14:56 -06:00
Blade He
2645d528b1
support output data point reported name
2024-10-29 16:47:45 -05:00
Blade He
9d453c9fae
a little updates
2024-10-28 15:15:55 -05:00
Blade He
fa763f4f14
1. optimize instructions
...
2. optimize mapping algorithm
2024-10-24 16:24:21 -05:00
Blade He
53dadf61f4
optimize keywords/ instructions for special cases documents.
2024-10-23 16:56:43 -05:00
Blade He
171f3b6d1f
optimize for OGC data extraction.
2024-10-23 16:07:54 -05:00
Blade He
03365227b9
optimize instructions
2024-10-21 11:04:53 -05:00
Blade He
3f2bb38208
Resolve issue first records only with share class name but without fund name (in previous page text).
2024-10-16 16:55:32 -05:00
Blade He
f166e73362
optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT
2024-10-15 15:57:54 -05:00
Blade He
8b651f374c
optimize instructions
2024-10-14 09:12:05 -05:00
Blade He
df66489c5f
support this scenario: fund and share are with same name.
2024-10-11 13:14:04 -05:00
Blade He
92a26cd262
optimize configuration
2024-10-11 12:16:34 -05:00
Blade He
17284c74f0
optimize for investment mapping: share feature logic
2024-10-09 14:07:07 -05:00
Blade He
04a2409c58
optimize investment mapping algorithm
2024-10-08 23:53:55 -05:00