blade
255752c848
add mini_main.py
2025-11-10 16:55:55 +08:00
Blade He
4b896f4460
update latest metrics based on optimized matching algorithm
...
Support: 2663 -> 2773
percentage of Share Matched: 80.08 -> 82.59
F1: 0.956 -> 0.943
2025-04-02 20:39:31 -05:00
Blade He
427a379b3b
1. support re-call ChatGPT API to match non-matched prediction fund/ share names
...
2. If document fund amount less than 3, cancel the production name judgment logic
2025-04-02 16:34:41 -05:00
Blade He
4cee95db9a
fix issue for post actions
2025-03-31 22:04:31 -05:00
Blade He
984c686bf3
support separate tables and pages data which with specific biz rules
2025-03-31 17:08:49 -05:00
Blade He
355b145cf7
If found total_annual_dollar_based_charges and could be divisible by 52 or 12,
...
then set the fund name and share name to be document production name
2025-03-28 01:33:33 -05:00
Blade He
46f86b124b
update instructions fund name section structure
2025-03-28 00:51:51 -05:00
Blade He
d925992326
1. Support the keywords of complex special cases to be regex
...
2. Support set sub-datapoints list to complex special cases node.
3. Simplify the common management fee and costs instructions.
4. Add markdown title characters: ## or ### to instructions.
2025-03-27 16:00:19 -05:00
Blade He
dc560e1e01
update metrics
2025-03-26 23:14:28 -05:00
Blade He
ff2325c72d
1. fix issue for assign values based on production name
...
2. optimize instructions for extract non-necessary data by Cost of Product message
2025-03-26 18:58:45 -05:00
Blade He
8ad472fb39
UPDATE metrics code file
2025-03-24 18:00:53 -05:00
Blade He
dd1f8f76ae
update for metrics
2025-03-24 17:12:13 -05:00
Blade He
4edc4b4768
clean code
2025-03-24 17:10:16 -05:00
Blade He
9be6d1296d
update benchmark check logic
2025-03-19 00:52:25 -05:00
Blade He
5ba39a394b
1. keep fund/ share db list before applying LLM
...
2. add key words for interposed_vehicle_performance_fee_cost
2025-03-18 22:15:31 -05:00
Blade He
c71936c5ff
1. optimize benchmark_name instructions
...
2. consider possible with multiple same raw fund names in documents, not to remove unmatched_db_list when match relevant raw fund/ share name
Otherwise, it will occur some raw names couldn't match db name issue.
2025-03-18 17:22:21 -05:00
Blade He
0cea2e501b
For AUS Prospectus, cancel visiting Vision ChatGPT when page contents without any numeric text or perhaps with messy code.
...
(But should keep this logic for EMEA LUX AR, because of some special providers cases for this market documents.)
2025-03-18 14:15:43 -05:00
Ravi Maheshwari
0ad17e338e
Code added to save anomilities
2025-03-18 17:56:50 +05:30
Ravi Maheshwari
2817490652
Code added to save anomilities
2025-03-18 17:54:33 +05:30
Ravi Maheshwari
ad371f6584
Changed Performance matrix code to get all anomilities to analyze and Prompt to get better accuracy
2025-03-18 16:43:55 +05:30
Blade He
b3941ee4b3
update instructions for total_annual_dollar_based_charges
2025-03-17 15:07:02 -05:00
Blade He
dd15c1c48e
Optimize for benchmark name
2025-03-14 11:51:10 -05:00
Blade He
0f65537478
optimize instructions for minimum_initial_investment
2025-03-14 04:02:15 -05:00
Blade He
f539340d04
1. optimize instructions
...
Only load relevant fund name for investment objective, instead of full page text with the most recent investment objective
2. Exclude the table which with only one numeric column: Cost Product
2025-03-14 01:04:51 -05:00
Blade He
a48af9ddf0
A. Metrics score
...
Blade's updates
1. Set the secondary key to be the share class name, instead of the fund name
2. Remove the data point which support is 0 to calculate the metrics
3. Add the message list to store the error message
4. Support save metrics/ error message to excel file
5. Support statistics for different document list
6. Set F1-Score to the first column in the metrics table
B. Optimize instructions for benchmark_name
2025-03-13 17:52:06 -05:00
Blade He
a090b5cc9e
1. metrics's key should be share class name: sec_name
...
2. support output metrics data as Excel file
3. Optimize instructions for performance_fee_costs
2025-03-13 11:53:27 -05:00
Ravi Maheshwari
97da7e4961
Added code to identify anomaly cases and performance matrix and updated for pdf downloading code
2025-03-13 17:31:54 +05:30