Commit Graph

202 Commits

Author SHA1 Message Date
blade ea81197bcd update for apply ALI QWEN as Demo 2025-11-11 13:33:57 +08:00
blade 255752c848 add mini_main.py 2025-11-10 16:55:55 +08:00
Blade He 37cf06a394 Confirm span pages calculation, the management fee and costs page only with management_fee_and_costs and management_fee datapoints 2025-04-03 18:08:27 -05:00
Blade He f333cc30f5 1. fit the scenario when document type is not 1 or 4, 5
2. support the scenario:
"investment fees and costs including performance" statement in performance fee data page, instead of in management fee and costs data page.
2025-04-03 17:06:43 -05:00
Blade He 4b896f4460 update latest metrics based on optimized matching algorithm
Support: 2663 -> 2773
percentage of Share Matched: 80.08 -> 82.59

F1: 0.956 -> 0.943
2025-04-02 20:39:31 -05:00
Blade He 427a379b3b 1. support re-call ChatGPT API to match non-matched prediction fund/ share names
2. If document fund amount less than 3, cancel the production name judgment logic
2025-04-02 16:34:41 -05:00
Blade He 4cee95db9a fix issue for post actions 2025-03-31 22:04:31 -05:00
Blade He 50e51e0894 recover main.py 2025-03-31 17:16:05 -05:00
Blade He a42033f848 Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-31 17:09:06 -05:00
Blade He 984c686bf3 support separate tables and pages data which with specific biz rules 2025-03-31 17:08:49 -05:00
Russell Spence ac6332ad46 Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-28 08:36:58 -05:00
Russell Spence f7b9652c75 gitignore a virtual environment directory 2025-03-28 08:36:44 -05:00
Blade He 355b145cf7 If found total_annual_dollar_based_charges and could be divisible by 52 or 12,
then set the fund name and share name to be document production name
2025-03-28 01:33:33 -05:00
Blade He 46f86b124b update instructions fund name section structure 2025-03-28 00:51:51 -05:00
Blade He 8a5723c150 optimize for Entry Fee/ Nil Entry case 2025-03-27 21:10:33 -05:00
Blade He d925992326 1. Support the keywords of complex special cases to be regex
2. Support set sub-datapoints list to complex special cases node.
3. Simplify the common management fee and costs instructions.
4. Add markdown title characters: ## or ### to instructions.
2025-03-27 16:00:19 -05:00
Blade He dc560e1e01 update metrics 2025-03-26 23:14:28 -05:00
Blade He ff2325c72d 1. fix issue for assign values based on production name
2. optimize instructions for extract non-necessary data by Cost of Product message
2025-03-26 18:58:45 -05:00
Blade He 8ad472fb39 UPDATE metrics code file 2025-03-24 18:00:53 -05:00
Blade He dd1f8f76ae update for metrics 2025-03-24 17:12:13 -05:00
Blade He 4edc4b4768 clean code 2025-03-24 17:10:16 -05:00
Blade He 9be6d1296d update benchmark check logic 2025-03-19 00:52:25 -05:00
Blade He 5ba39a394b 1. keep fund/ share db list before applying LLM
2. add key words for interposed_vehicle_performance_fee_cost
2025-03-18 22:15:31 -05:00
Blade He c71936c5ff 1. optimize benchmark_name instructions
2. consider possible with multiple same raw fund names in documents, not to remove unmatched_db_list when match relevant raw fund/ share name
Otherwise, it will occur some raw names couldn't match db name issue.
2025-03-18 17:22:21 -05:00
Blade He 0cea2e501b For AUS Prospectus, cancel visiting Vision ChatGPT when page contents without any numeric text or perhaps with messy code.
(But should keep this logic for EMEA LUX AR, because of some special providers cases for this market documents.)
2025-03-18 14:15:43 -05:00
Ravi Maheshwari 6614972849 Raw Code added to identify benchmark names 2025-03-18 18:57:08 +05:30
Ravi Maheshwari 0ad17e338e Code added to save anomilities 2025-03-18 17:56:50 +05:30
Ravi Maheshwari 2817490652 Code added to save anomilities 2025-03-18 17:54:33 +05:30
Ravi Maheshwari ad371f6584 Changed Performance matrix code to get all anomilities to analyze and Prompt to get better accuracy 2025-03-18 16:43:55 +05:30
Blade He b3941ee4b3 update instructions for total_annual_dollar_based_charges 2025-03-17 15:07:02 -05:00
Ravi Maheshwari 0ce604021c updating gitignore 2025-03-17 18:52:49 +05:30
Ravi Maheshwari dc9180ca1b Rollback file name 2025-03-17 17:09:48 +05:30
Ravi Maheshwari af3d1222a6 Changes done for Bugfix: 1. SSL issue \n2. Ignore Example Tables \n3. Performacne fee 2025-03-17 17:07:08 +05:30
Blade He dd15c1c48e Optimize for benchmark name 2025-03-14 11:51:10 -05:00
Blade He bceff71fa4 Set re-run parameters to be True
re_run_extract_data = True
re_run_mapping_data = True
force_save_total_data = True
2025-03-14 04:03:22 -05:00
Blade He 0f65537478 optimize instructions for minimum_initial_investment 2025-03-14 04:02:15 -05:00
Blade He f539340d04 1. optimize instructions
Only load relevant fund name for investment objective, instead of full page text with the most recent investment objective
2. Exclude the table which with only one numeric column: Cost Product
2025-03-14 01:04:51 -05:00
Blade He 551f754379 Fix issue when saving data extraction data 2025-03-13 18:36:04 -05:00
Blade He a48af9ddf0 A. Metrics score
Blade's updates
1. Set the secondary key to be the share class name, instead of the fund name
2. Remove the data point which support is 0 to calculate the metrics
3. Add the message list to store the error message
4. Support save metrics/ error message to excel file
5. Support statistics for different document list
6. Set F1-Score to the first column in the metrics table
B. Optimize instructions for benchmark_name
2025-03-13 17:52:06 -05:00
Blade He a090b5cc9e 1. metrics's key should be share class name: sec_name
2. support output metrics data as Excel file
3. Optimize instructions for performance_fee_costs
2025-03-13 11:53:27 -05:00
Ravi Maheshwari 1f6b781b12 Merge branches 'aus_prospectus_ravi' and 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-13 17:34:35 +05:30
Ravi Maheshwari 97da7e4961 Added code to identify anomaly cases and performance matrix and updated for pdf downloading code 2025-03-13 17:31:54 +05:30
Blade He fd2430082c optimize instructions for management_fee_and_costs and buy_spread, sell_spread 2025-03-13 02:59:19 -05:00
Ravi Maheshwari 336fd9a24f Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-13 11:39:30 +05:30
Blade He fb5dda2170 1. optimize performance_fee_costs prompts
2. support calculate metrics by zero equal with empty
2025-03-12 23:45:52 -05:00
Blade He c2c0b33015 align fund name based on production name
optimize performance relevant prompts
2025-03-12 21:52:00 -05:00
Blade He 6f17c2253c optimize instructions for document 412778803 2025-03-12 17:24:39 -05:00
Blade He 765772e5a8 optimize performance_fee_costs by document 391080133 2025-03-12 14:45:48 -05:00
Ravi Maheshwari 76fbb7c071 Merge branches 'aus_prospectus_ravi' and 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi 2025-03-12 14:16:27 +05:30
Blade He c7c36dbdd2 1. update performance_fee name to performance_fee_costs
2. support extract data for total_annual_dollar_based_charges
2025-03-11 17:15:39 -05:00