Commit Graph

  • 7b0c825a39 fix issue aus_prospectus_ravi blade 2025-11-12 14:07:55 +0800
  • ea81197bcd update for apply ALI QWEN as Demo blade 2025-11-11 13:33:57 +0800
  • 255752c848 add mini_main.py blade 2025-11-10 16:55:55 +0800
  • 37cf06a394 Confirm span pages calculation, the management fee and costs page only with management_fee_and_costs and management_fee datapoints Blade He 2025-04-03 18:08:27 -0500
  • f333cc30f5 1. fit the scenario when document type is not 1 or 4, 5 2. support the scenario: "investment fees and costs including performance" statement in performance fee data page, instead of in management fee and costs data page. Blade He 2025-04-03 17:06:43 -0500
  • 4b896f4460 update latest metrics based on optimized matching algorithm Support: 2663 -> 2773 percentage of Share Matched: 80.08 -> 82.59 Blade He 2025-04-02 20:39:31 -0500
  • 427a379b3b 1. support re-call ChatGPT API to match non-matched prediction fund/ share names 2. If document fund amount less than 3, cancel the production name judgment logic Blade He 2025-04-02 16:34:41 -0500
  • 4cee95db9a fix issue for post actions Blade He 2025-03-31 22:04:31 -0500
  • 50e51e0894 recover main.py Blade He 2025-03-31 17:16:05 -0500
  • a42033f848 Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Blade He 2025-03-31 17:09:06 -0500
  • 984c686bf3 support separate tables and pages data which with specific biz rules Blade He 2025-03-31 17:08:49 -0500
  • ac6332ad46 Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Russell Spence 2025-03-28 08:36:58 -0500
  • f7b9652c75 gitignore a virtual environment directory Russell Spence 2025-03-28 08:36:44 -0500
  • 355b145cf7 If found total_annual_dollar_based_charges and could be divisible by 52 or 12, then set the fund name and share name to be document production name Blade He 2025-03-28 01:33:33 -0500
  • 46f86b124b update instructions fund name section structure Blade He 2025-03-28 00:51:51 -0500
  • 8a5723c150 optimize for Entry Fee/ Nil Entry case Blade He 2025-03-27 21:10:33 -0500
  • d925992326 1. Support the keywords of complex special cases to be regex 2. Support set sub-datapoints list to complex special cases node. 3. Simplify the common management fee and costs instructions. 4. Add markdown title characters: ## or ### to instructions. Blade He 2025-03-27 16:00:19 -0500
  • dc560e1e01 update metrics Blade He 2025-03-26 23:14:28 -0500
  • ff2325c72d 1. fix issue for assign values based on production name 2. optimize instructions for extract non-necessary data by Cost of Product message Blade He 2025-03-26 18:58:45 -0500
  • 8ad472fb39 UPDATE metrics code file Blade He 2025-03-24 18:00:53 -0500
  • dd1f8f76ae update for metrics Blade He 2025-03-24 17:12:13 -0500
  • 4edc4b4768 clean code Blade He 2025-03-24 17:10:16 -0500
  • 9be6d1296d update benchmark check logic Blade He 2025-03-19 00:52:25 -0500
  • 5ba39a394b 1. keep fund/ share db list before applying LLM 2. add key words for interposed_vehicle_performance_fee_cost Blade He 2025-03-18 22:15:31 -0500
  • c71936c5ff 1. optimize benchmark_name instructions 2. consider possible with multiple same raw fund names in documents, not to remove unmatched_db_list when match relevant raw fund/ share name Otherwise, it will occur some raw names couldn't match db name issue. Blade He 2025-03-18 17:22:21 -0500
  • 0cea2e501b For AUS Prospectus, cancel visiting Vision ChatGPT when page contents without any numeric text or perhaps with messy code. (But should keep this logic for EMEA LUX AR, because of some special providers cases for this market documents.) Blade He 2025-03-18 14:15:43 -0500
  • 6614972849 Raw Code added to identify benchmark names Ravi Maheshwari 2025-03-18 18:57:08 +0530
  • 0ad17e338e Code added to save anomilities Ravi Maheshwari 2025-03-18 17:56:50 +0530
  • 2817490652 Code added to save anomilities Ravi Maheshwari 2025-03-18 17:54:33 +0530
  • ad371f6584 Changed Performance matrix code to get all anomilities to analyze and Prompt to get better accuracy Ravi Maheshwari 2025-03-18 16:43:55 +0530
  • b3941ee4b3 update instructions for total_annual_dollar_based_charges Blade He 2025-03-17 15:07:02 -0500
  • 0ce604021c updating gitignore Ravi Maheshwari 2025-03-17 18:52:49 +0530
  • dc9180ca1b Rollback file name Ravi Maheshwari 2025-03-17 17:09:48 +0530
  • af3d1222a6 Changes done for Bugfix: 1. SSL issue \n2. Ignore Example Tables \n3. Performacne fee Ravi Maheshwari 2025-03-17 17:07:08 +0530
  • dd15c1c48e Optimize for benchmark name Blade He 2025-03-14 11:51:10 -0500
  • bceff71fa4 Set re-run parameters to be True re_run_extract_data = True re_run_mapping_data = True force_save_total_data = True Blade He 2025-03-14 04:03:22 -0500
  • 0f65537478 optimize instructions for minimum_initial_investment Blade He 2025-03-14 04:02:15 -0500
  • f539340d04 1. optimize instructions Only load relevant fund name for investment objective, instead of full page text with the most recent investment objective 2. Exclude the table which with only one numeric column: Cost Product Blade He 2025-03-14 01:04:51 -0500
  • 551f754379 Fix issue when saving data extraction data Blade He 2025-03-13 18:36:04 -0500
  • a48af9ddf0 A. Metrics score Blade's updates 1. Set the secondary key to be the share class name, instead of the fund name 2. Remove the data point which support is 0 to calculate the metrics 3. Add the message list to store the error message 4. Support save metrics/ error message to excel file 5. Support statistics for different document list 6. Set F1-Score to the first column in the metrics table B. Optimize instructions for benchmark_name Blade He 2025-03-13 17:52:06 -0500
  • a090b5cc9e 1. metrics's key should be share class name: sec_name 2. support output metrics data as Excel file 3. Optimize instructions for performance_fee_costs Blade He 2025-03-13 11:53:27 -0500
  • 1f6b781b12 Merge branches 'aus_prospectus_ravi' and 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Ravi Maheshwari 2025-03-13 17:34:35 +0530
  • 97da7e4961 Added code to identify anomaly cases and performance matrix and updated for pdf downloading code Ravi Maheshwari 2025-03-13 17:31:54 +0530
  • fd2430082c optimize instructions for management_fee_and_costs and buy_spread, sell_spread Blade He 2025-03-13 02:59:19 -0500
  • 336fd9a24f Merge branch 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Ravi Maheshwari 2025-03-13 11:39:30 +0530
  • fb5dda2170 1. optimize performance_fee_costs prompts 2. support calculate metrics by zero equal with empty Blade He 2025-03-12 23:45:52 -0500
  • c2c0b33015 align fund name based on production name optimize performance relevant prompts Blade He 2025-03-12 21:52:00 -0500
  • 6f17c2253c optimize instructions for document 412778803 Blade He 2025-03-12 17:24:39 -0500
  • 765772e5a8 optimize performance_fee_costs by document 391080133 Blade He 2025-03-12 14:45:48 -0500
  • 76fbb7c071 Merge branches 'aus_prospectus_ravi' and 'aus_prospectus_ravi' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Ravi Maheshwari 2025-03-12 14:16:27 +0530
  • c7c36dbdd2 1. update performance_fee name to performance_fee_costs 2. support extract data for total_annual_dollar_based_charges Blade He 2025-03-11 17:15:39 -0500
  • b7506c78f3 Add API code file Blade He 2025-03-10 16:00:17 -0500
  • e9f6383258 apply configuration file to replace disorder table header contents Blade He 2025-03-10 11:09:00 -0500
  • 2548606ccc a little change Blade He 2025-03-10 08:20:01 -0500
  • 604ab326a7 a little change Blade He 2025-03-08 21:50:44 -0600
  • 4ee762963e optimized for management_fee_and_costs and administration_fees Blade He 2025-03-08 21:40:00 -0600
  • fa2dede454 optimize for management_fee_and_costs and management_fee Blade He 2025-03-07 18:38:36 -0600
  • 2cd4f5f787 Supplement provider information to ground truth data Calculate metrics based on providers Integrate "merge" data algorithm for AUS Prospectus final outputs Blade He 2025-03-07 15:02:12 -0600
  • 52515fc152 1. simplify management_fee_and_costs instructions 2. optimize management_fee_and_costs instructions 3. resolve the issues for complex scenarios: need sum management_fee, recoverable_expenses, indirect_costs as management_fee_and_costs Blade He 2025-03-06 17:27:18 -0600
  • c4ed65770d Try to support more complex management_fee_and_costs scenarios Support calculate all of data points metrics Blade He 2025-03-05 17:21:13 -0600
  • cd7e09757d check in calc_metrics to repo. Blade He 2025-03-05 09:57:02 -0600
  • fdcb4b2ec0 Merge branch 'main' of https://msstash.morningstar.com/scm/dc/dc-ml-emea-ar into aus_prospectus_ravi Ravi Maheshwari 2025-03-05 12:01:12 +0530
  • d00820c14d update AUS Prospectus data point configurations Blade He 2025-03-04 16:52:06 -0600
  • f4b4d00f58 optimize instructions for management fee and costs. support dynamic loading complex instructions by keywords Blade He 2025-03-04 08:32:55 -0600
  • d3be711859 optimize administration fees instructions Blade He 2025-02-28 22:12:18 -0600
  • d4bc3aba4e optimize for management fees Blade He 2025-02-28 16:55:33 -0600
  • d0295995d8 support judge whether next page contents with same structure table as current page. If yes, handle next page data extraction pipeline. Blade He 2025-02-27 23:08:57 -0600
  • d0128d6279 1. optimize for administration fees. 2. optimize for management fees Blade He 2025-02-27 17:36:41 -0600
  • 543cab74e1 1. get production name 2. if some data point with production name, set each fund/ share with relevant data point value(s) Blade He 2025-02-27 12:07:49 -0600
  • 412692e1c4 update keywords for management fee and costs Blade He 2025-02-27 08:34:46 -0600
  • 70079d176e Support remove duplicated values to keep the values to be the latest ones. Blade He 2025-02-26 17:05:58 -0600
  • f467945cd4 support benchmark name data extraction Blade He 2025-02-26 10:05:46 -0600
  • 357bb6d580 1. support dynamic show fund level data examples. 2. optimize for minimum_initial_investment data point Blade He 2025-02-25 10:35:53 -0600
  • e60e1fd546 move configuration files for all datapoints to "all_datapoints" folder Blade He 2025-02-24 15:23:16 -0600
  • 590f7e2249 1. backup data points configurations 2. simplify data points configurations for important 11 data points. Blade He 2025-02-24 15:21:32 -0600
  • 75ea383354 support identify aus prospectus document category: MIS or Super Blade He 2025-02-24 15:08:15 -0600
  • f7d53acdde support get sqlpass api by configuration main Blade He 2025-02-19 14:37:21 -0600
  • bb6862b179 update a little Blade He 2025-02-19 14:32:08 -0600
  • 705933bbdd optimized for phase 2 data Blade He 2025-02-18 18:52:26 -0600
  • 353bc28599 update a little Blade He 2025-02-11 11:49:53 -0600
  • 01e2a0e38d add configuration for datapoints data types update configuration for minimum initial investment support apply value to all of funds for minimum initial investment Blade He 2025-02-05 12:08:12 -0600
  • a8810519f8 optimize instructions configuration optimize drilldown part logic Blade He 2025-02-04 15:29:24 -0600
  • f9ef4cec96 update sql_query cache file store location At most cache 5 days, then clean from local disk. Blade He 2025-01-31 10:59:54 -0600
  • 7f37f3532f switch example document Blade He 2025-01-27 14:59:26 -0600
  • 6f831e241c Merge branch 'aus_prospectus_ravi' Blade He 2025-01-27 12:32:42 -0600
  • 41f8c307ff a little change Blade He 2025-01-27 12:32:36 -0600
  • 47c41e492f 1. only get name mapping data from document mapping 2. Compare name mapping metrics between Ravi's and mine. Blade He 2025-01-27 12:29:49 -0600
  • d9b0bed39a a little change Blade He 2025-01-22 09:57:42 -0600
  • 350550d1b0 fix issue for removing item from list Blade He 2025-01-21 17:24:05 -0600
  • e2b9bcbdbc initial abbreviation configurations Blade He 2025-01-21 17:09:45 -0600
  • b15d260a58 migrate name mapping algorithm from Ravi Blade He 2025-01-21 16:55:08 -0600
  • d41fae3dba prepare for 100 multi-funds document samples Blade He 2025-01-17 16:26:31 -0600
  • b93a8d55e8 update for output data as template Blade He 2025-01-17 11:41:58 -0600
  • f10ff8ee33 update for deployment Blade He 2025-01-16 20:34:43 -0600
  • fb4a6402f0 support output merged data format Blade He 2025-01-16 16:31:04 -0600
  • 2eace81f51 support more configurable parts Blade He 2025-01-16 13:54:45 -0600
  • db0827435b supplement EMEA AR configuration files Blade He 2025-01-16 11:30:44 -0600
  • 9f0e77a11e support load configurations by doc_source parameter Blade He 2025-01-16 11:17:48 -0600
  • acc30d4b72 if fail to get text by pdf to html API, then try to get text by pymupdf. Blade He 2025-01-15 18:36:02 -0600
  • ace0ac2674 a little change Blade He 2025-01-15 18:22:08 -0600