Commit Graph

69 Commits

Author SHA1 Message Date
Blade He d3be711859 optimize administration fees instructions 2025-02-28 22:12:18 -06:00
Blade He d4bc3aba4e optimize for management fees 2025-02-28 16:55:33 -06:00
Blade He d0295995d8 support judge whether next page contents with same structure table as current page.
If yes, handle next page data extraction pipeline.
2025-02-27 23:08:57 -06:00
Blade He d0128d6279 1. optimize for administration fees.
2. optimize for management fees
2025-02-27 17:36:41 -06:00
Blade He 543cab74e1 1. get production name
2. if some data point with production name, set each fund/ share with relevant data point value(s)
2025-02-27 12:07:49 -06:00
Blade He 70079d176e Support remove duplicated values to keep the values to be the latest ones. 2025-02-26 17:05:58 -06:00
Blade He f467945cd4 support benchmark name data extraction 2025-02-26 10:05:46 -06:00
Blade He 357bb6d580 1. support dynamic show fund level data examples.
2. optimize for minimum_initial_investment data point
2025-02-25 10:35:53 -06:00
Blade He 75ea383354 support identify aus prospectus document category: MIS or Super 2025-02-24 15:08:15 -06:00
Blade He bb6862b179 update a little 2025-02-19 14:32:08 -06:00
Blade He 705933bbdd optimized for phase 2 data 2025-02-18 18:52:26 -06:00
Blade He 01e2a0e38d add configuration for datapoints data types
update configuration for minimum initial investment
support apply value to all of funds for minimum initial investment
2025-02-05 12:08:12 -06:00
Blade He a8810519f8 optimize instructions configuration
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He 47c41e492f 1. only get name mapping data from document mapping
2. Compare name mapping metrics between Ravi's and mine.
2025-01-27 12:29:49 -06:00
Blade He 350550d1b0 fix issue for removing item from list 2025-01-21 17:24:05 -06:00
Blade He e2b9bcbdbc initial abbreviation configurations 2025-01-21 17:09:45 -06:00
Blade He b15d260a58 migrate name mapping algorithm from Ravi 2025-01-21 16:55:08 -06:00
Blade He d41fae3dba prepare for 100 multi-funds document samples 2025-01-17 16:26:31 -06:00
Blade He f10ff8ee33 update for deployment 2025-01-16 20:34:43 -06:00
Blade He 9f0e77a11e support load configurations by doc_source parameter 2025-01-16 11:17:48 -06:00
Blade He acc30d4b72 if fail to get text by pdf to html API, then try to get text by pymupdf. 2025-01-15 18:36:02 -06:00
Blade He a89aa9c4de support fetch data from Prospectus 2025-01-14 16:21:48 -06:00
Blade He 0a867dcf07 complete configuration for AUS Prospectus 2025-01-07 16:25:13 -06:00
Blade He 201a809ffa comment remove_abundant_data function 2025-01-06 15:27:43 -06:00
Blade He 309bb714f6 fix issue for parsing data via Vision Function. 2024-12-11 16:49:04 -06:00
Blade He d673a99e21 switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He f71e2968cc simplify code 2024-12-09 22:24:40 -06:00
Blade He 75ea5e70de 1. support fetch data from messy-code page by ChatGPT4o Vision function.
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He d96f77fe00 Split share class names which with multiple share classes in same line 2024-12-06 16:31:42 -06:00
Blade He a25991e2bb 1. Set TOR reported name priority
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He 95c386911c Clean fund name after getting response from ChatGPT 2024-12-04 22:08:09 -06:00
Blade He 70362b554f Fix issue for "The last fund name of previous PDF page" logic:
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He 36fbaa946e Add the statement when transferring the last fund name of previous PDF page:
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He a11a99fdc3 1. Optimize instructions: not to fetch the data with "up to" statement.
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He bc32860f87 remove_abundant_data 2024-12-02 17:16:56 -06:00
Blade He 843bbbd13f dynamic loading instructions for multilingual. 2024-11-20 17:00:22 -06:00
Blade He 2645d528b1 support output data point reported name 2024-10-29 16:47:45 -05:00
Blade He 9d453c9fae a little updates 2024-10-28 15:15:55 -05:00
Blade He 3f2bb38208 Resolve issue first records only with share class name but without fund name (in previous page text). 2024-10-16 16:55:32 -05:00
Blade He f166e73362 optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT 2024-10-15 15:57:54 -05:00
Blade He df66489c5f support this scenario: fund and share are with same name. 2024-10-11 13:14:04 -05:00
Blade He 17284c74f0 optimize for investment mapping: share feature logic 2024-10-09 14:07:07 -05:00
Blade He 04a2409c58 optimize investment mapping algorithm 2024-10-08 23:53:55 -05:00
Blade He aa2c2332ae optimize for more cases 2024-10-08 17:16:01 -05:00
Blade He d92053a16e optimize mapping metrics algorithm 2024-10-01 12:19:45 -05:00
Blade He 18174bf1cf optimize mapping: choose proper candidates mapping list. 2024-10-01 11:35:29 -05:00
Blade He 60a26377e5 optimize investment mapping algorithm 2024-09-30 16:32:56 -05:00
Blade He 3aa596ea33 optimize mapping logic 2024-09-27 16:39:56 -05:00
Blade He 39cd53dc33 support calculate mapping metrics based on document investment mapping in database 2024-09-27 13:20:50 -05:00
Blade He 598e2ab820 investment mapping: optimize for currency logic 2024-09-25 17:28:22 -05:00